Apr 18, 20181 min read

Load Parquet file from HDFS to Cassandra

Updated: Jun 4, 2018

Loading large Apache Parquet files from HDFS to Cassandra is straightforward with DataRow.io. Below steps walk you through building such, simple two step pipeline.

1. First, open Yumi Platform and create a new job. Then modify job settings as show below

2. Locate File Reader activity in the toolbar

3. Drag the File Reader activity into designer and click on the settings icon

4. Enter parquet file location in hdfs://<namenode>/<path> tmplate and select parquet as format. Then press OK

5. Locate Cassandra writer activity in the toolbar

6. Drag the Cassandra Writer activity into designer and click on the settings icon

7. Enter Cassandra location and authentication details. Then press OK

8. Run the job

Note: this post assumes that your parquet file schema matches the schema in Cassandra table. If that is not the case, you can use transformation activities in between to modify and filter contents of the parquet file before writing into Cassandra.

DataRow.io | Big Data as a Service | Try it here.

Load Parquet file from HDFS to Cassandra

Recent Posts

Comentários