Copy Parquet file from Amazon S3 to Azure Blob Storage

Updated: Jun 4, 2018



DataRow.io is built for Big Data and Cloud. When working with Big Data formats such as Apache Parquet, its extremely useful for the system you working to be aware of context of the data format. For example, portion files and portion columns. Yumi does exactly that, which enables it to run data transformations and re-partitioning as well along with data movement.


In this this example, we only use data movement capability of Yumi, where we copy an existing parquet file from Amazon S3 to Azure Blob Storage.


1. First, open DataRow Platform and create a new job. Then modify job settings as shown below



2. Locate S3 Reader activity as show below



3. Drag the S3 Reader activity into designer and click on the settings icon



4. Enter parquet file location in S3 bucket in s3://<bucket-name>/<path> format and select parquet as format. Then press OK


5. In the toolbar, locate Blob Storage Writer activity as shown below



6. Drag the Blob Storage Writer activity into designer and click on the settings icon



7. Enter target Azure Blob Storage wasbs://<container>@<account>.blob.core.windows.net/<path> format and select parquet as format. Then press OK


8. Run the job


Note that you could specify different file format in Blob Storage Writer settings and DataRow would do the format conversion as well while copying the the file.


_

DataRow.io | Big Data as a Service | Try it here.


10 views

© 2019 by Bootstrap Intelligence LLC. All Rights Reserved.

Reston, Virginia  |  info@datarow.io  |   (202) 256 - 9439