top of page

Archive Avro formatted Kafka messages to HDFS as Parquet file

  • Writer: Nodir Yuldashev
    Nodir Yuldashev
  • Jun 9, 2018
  • 1 min read

ree

DataRow.io includes Kafka connector. You can transform and ingest Kafka streams into multiple destinations in different formats within single data pipeline. This post focuses on archiving Avro formatted Kafka messages to HDFS in Parquet format. Below steps walk you through building such, simple two step pipeline.


1. First, open DataRow Platform and create a new job. Then modify job settings


ree

2. Locate Kafka Reader activity as shown below

ree

3. Drag the Kafka Reader activity into designer and click on the settings icon

ree

4. Enter Kafka Servers/Brokers details including Kafka topics

ree

ree

5. In the toolbar, locate file writer activity as shown below

ree

6. Drag the File Writer activity into designer as sub-group of Kafka Reader activity and click on the settings icon

ree


7. Enter target HDFS path and format. The path should be in hdfs://<name-node>/<path> format. 

ree

8. Run the job


Note that, you can easily substitute HDFS with S3 or Azure Blob Storage.


DataRow.io | Big Data Integration | Try it here.

Comments


bottom of page