top of page

Search

Archive Avro formatted Kafka messages to HDFS as Parquet file

Nodir Yuldashev
Jun 9, 2018
1 min read

DataRow.io includes Kafka connector. You can transform and ingest Kafka streams into multiple destinations in different formats within single data pipeline. This post focuses on archiving Avro formatted Kafka messages to HDFS in Parquet format. Below steps walk you through building such, simple two step pipeline.

1. First, open DataRow Platform and create a new job. Then modify job settings

2. Locate Kafka Reader activity as shown below

3. Drag the Kafka Reader activity into designer and click on the settings icon

4. Enter Kafka Servers/Brokers details including Kafka topics

5. In the toolbar, locate file writer activity as shown below

6. Drag the File Writer activity into designer as sub-group of Kafka Reader activity and click on the settings icon

7. Enter target HDFS path and format. The path should be in hdfs://<name-node>/<path> format.

8. Run the job

Note that, you can easily substitute HDFS with S3 or Azure Blob Storage.

—

DataRow.io | Big Data Integration | Try it here.

Recent Posts

Write Avro formatted Kafka messages to Cassandra

Write Avro formatted Kafka messages to Cassandra

Write Avro formatted Kafka messages to Amazon S3 as Parquet file

Write Avro formatted Kafka messages to Amazon S3 as Parquet file

Load ORC file from Amazon S3 to Amazon Redshift

Load ORC file from Amazon S3 to Amazon Redshift

Comments

bottom of page