Load Parquet file from HDFS to Cassandra
Updated: Jun 4, 2018
Loading large Apache Parquet files from HDFS to Cassandra is straightforward with DataRow.io. Below steps walk you through building such, simple two step pipeline.
1. First, open Yumi Platform and create a new job. Then modify job settings as show below
2. Locate File Reader activity in the toolbar
3. Drag the File Reader activity into designer and click on the settings icon
4. Enter parquet file location in hdfs://<namenode>/<path> tmplate and select parquet as format. Then press OK
5. Locate Cassandra writer activity in the toolbar
6. Drag the Cassandra Writer activity into designer and click on the settings icon
7. Enter Cassandra location and authentication details. Then press OK
8. Run the job
Note: this post assumes that your parquet file schema matches the schema in Cassandra table. If that is not the case, you can use transformation activities in between to modify and filter contents of the parquet file before writing into Cassandra.
DataRow.io | Big Data as a Service | Try it here.