top of page

Archive Avro formatted Kafka messages to HDFS as Parquet file



DataRow.io includes Kafka connector. You can transform and ingest Kafka streams into multiple destinations in different formats within single data pipeline. This post focuses on archiving Avro formatted Kafka messages to HDFS in Parquet format. Below steps walk you through building such, simple two step pipeline.


1. First, open DataRow Platform and create a new job. Then modify job settings



2. Locate Kafka Reader activity as shown below


3. Drag the Kafka Reader activity into designer and click on the settings icon


4. Enter Kafka Servers/Brokers details including Kafka topics



5. In the toolbar, locate file writer activity as shown below

6. Drag the File Writer activity into designer as sub-group of Kafka Reader activity and click on the settings icon



7. Enter target HDFS path and format. The path should be in hdfs://<name-node>/<path> format. 


8. Run the job


Note that, you can easily substitute HDFS with S3 or Azure Blob Storage.


DataRow.io | Big Data Integration | Try it here.

32 views0 comments

Comments


bottom of page