Write Avro formatted Kafka messages to Amazon S3 as Parquet file
Updated: Jun 28, 2020
DataRow.io includes powerful Kafka connector. You can transform and ingest Kafka streams into multiple destinations in different formats within a single data pipeline. This post focuses on writing Avro formatted Kafka messages to Amazon S3 in Parquet format. Below steps walk you through building such, simple two-step pipeline.
1. First, open DataRow.io Platform and create a new job. Then modify job settings including AWS access and secret keys as shown below
2. Locate Kafka Reader activity as shown below
3. Drag the Kafka Reader activity into designer and click on the settings icon
4. Enter Kafka Servers/Brokers details including Kafka topics
5. In the toolbar, locate S3 writer activity as shown below
6. Drag the S3 Writer activity into designer as sub-group of Kafka Reader activity and click on the settings icon
7. Enter target S3 path and format. The path should be in s3[a]://<bucket-name>/<path> format.
8. Run the job
Note that, you can easily substitute S3 with HDFS or Azure Blob Storage.
DataRow.io | Big Data as a Service | Try it here.