Load Parquet file from Amazon S3 to Amazon Redshift

Updated: Jun 4, 2018

Loading large Apache Parquet files from Amazon S3 to Redshift is straightforward with DataRow.io. Below steps walk you through building such, simple two step pipeline.

1. First, open DataRow Platform and create a new job. Then modify job settings as show below

2. Locate S3 Reader activity as show below

3. Drag the S3 Reader activity into designer and click on the settings icon

4. Enter parquet file location in S3 bucket in s3://<bucket-name>/<path> format and select parquet as format. Then press OK

5. In the toolbar, locate Redshift writer activity as shown below

6. Drag the Redshift Writer activity into designer and click on the settings icon

7. Enter Redshift location and authentication details. Note that valid S3 path needs be specified to keep intermediate data. The path should be in s3://<bucket-name>/<path> format.

8. Run the job

Note that S3 reader supports other formats like ORC as well. You could you use any of the supported format and get the same result.


DataRow.io | Big Data as a Service | Try it here.

32 views0 comments