top of page

Search

Write Avro formatted Kafka messages to Amazon S3 as Parquet file

Nodir Yuldashev
May 22, 2018
1 min read

Updated: Jun 28, 2020

DataRow.io includes powerful Kafka connector. You can transform and ingest Kafka streams into multiple destinations in different formats within a single data pipeline. This post focuses on writing Avro formatted Kafka messages to Amazon S3 in Parquet format. Below steps walk you through building such, simple two-step pipeline.

1. First, open DataRow.io Platform and create a new job. Then modify job settings including AWS access and secret keys as shown below

2. Locate Kafka Reader activity as shown below

3. Drag the Kafka Reader activity into designer and click on the settings icon

4. Enter Kafka Servers/Brokers details including Kafka topics

5. In the toolbar, locate S3 writer activity as shown below

6. Drag the S3 Writer activity into designer as sub-group of Kafka Reader activity and click on the settings icon

7. Enter target S3 path and format. The path should be in s3[a]://<bucket-name>/<path> format.

8. Run the job

Note that, you can easily substitute S3 with HDFS or Azure Blob Storage.

—

DataRow.io | Big Data as a Service | Try it here.

Recent Posts

Write Avro formatted Kafka messages to Cassandra

Write Avro formatted Kafka messages to Cassandra

Archive Avro formatted Kafka messages to HDFS as Parquet file

Archive Avro formatted Kafka messages to HDFS as Parquet file

Load ORC file from Amazon S3 to Amazon Redshift

Load ORC file from Amazon S3 to Amazon Redshift

Comments

bottom of page