top of page

Copy Parquet file from Amazon S3 to Azure Blob Storage

  • Writer: Nodir Yuldashev
    Nodir Yuldashev
  • Apr 18, 2018
  • 1 min read

Updated: Jun 4, 2018


ree

DataRow.io is built for Big Data and Cloud. When working with Big Data formats such as Apache Parquet, its extremely useful for the system you working to be aware of context of the data format. For example, portion files and portion columns. Yumi does exactly that, which enables it to run data transformations and re-partitioning as well along with data movement.


In this this example, we only use data movement capability of Yumi, where we copy an existing parquet file from Amazon S3 to Azure Blob Storage.


1. First, open DataRow Platform and create a new job. Then modify job settings as shown below


ree

2. Locate S3 Reader activity as show below


ree

3. Drag the S3 Reader activity into designer and click on the settings icon

ree


4. Enter parquet file location in S3 bucket in s3://<bucket-name>/<path> format and select parquet as format. Then press OK


ree

5. In the toolbar, locate Blob Storage Writer activity as shown below


ree

6. Drag the Blob Storage Writer activity into designer and click on the settings icon


ree

7. Enter target Azure Blob Storage wasbs://<container>@<account>.blob.core.windows.net/<path> format and select parquet as format. Then press OK


ree

8. Run the job


Note that you could specify different file format in Blob Storage Writer settings and DataRow would do the format conversion as well while copying the the file.


_

DataRow.io | Big Data as a Service | Try it here.


Comments


bottom of page