Migrate DynamoDB Using TotalCloud
AWS Tips & Tricks, Cloud Automation, Cloud Computing

There are a few ways to migrate DynamoDB on the web if you go looking for it. One of the best sources is the AWS guide.

This guide can be used to

a. Migrate DynamoDB from one region to another.

b. Import external data into DynamoDB from sources like S3

c. Export DynamoDB to S3 as well

All these use cases, including migrating DynamoDB are mainly driven by two actions:

  • Exporting DynamoDB data to AWS S3 using Data Pipeline.
  • Importing data from AWS S3 to DynamoDB again using the Data Pipeline.

Amazon Documents provide a detailed description of how to leverage AWS Data Pipeline to do the above task. We decided to put TotalCloud to a test and recreate the Data Pipeline, activate it and perform any of the above actions used to migrate DynamoDB.

Here’s are the templates that you can view to understand what TotalCloud does to perform the same task you would have otherwise done manually.

First, we tried creating a Data pipeline using the interface. Here’s how we did it.

As usual, it starts with a trigger node. Urgh! Scratch that. We created a template for you to get started.

Before we go there, there are some pre-requisites to be taken care of.

Pre-requisites

Roles

Two IAM roles are required to make this work.

Role Name 1: DataPipelineDefaultRole

Policy: AWSDataPipelineRole

Role Name 2: DataPipelineDefaultResourceRole

Policy: AmazonEC2RoleforDataPipelineRole

Amazon has provided the required steps.

S3 Buckets

Create two buckets. One for the data to be exported/imported and one for the Data Pipeline logs. We’ll be using these buckets to work with the data pipeline.

Once you are set up, let’s create the workflow.

Select a template

When you create a new workflow, you will see an option to “Select a workflow template” or “Create workflow from scratch”.

Select the template option and click on “Next”. You should view a number of templates available in a list. Go ahead and just select “AWS DynamoDB to S3 exporter”.

This should give you a preset that you can use to quickly configure your Data Pipeline.

You can just double click any node to configure it. You can start from:

Trigger

Double click the trigger node to see the configurations for it and edit it as needed.

Action 1

Select the AWS Account that you wish to use with the region in which you want to create the pipeline.

If you haven’t synced your AWS account, you can use the “Sync AWS Account” option and follow these instructions to do so.

Once done, click on “Save Node”.

Action 2

Again, select the AWS account and region. Then just click on “Additional Parameters” to view the changes that are needed for this node. You can view the nodes a

{

/*---------- required params ----------*/ 

    "pipelineId": "VALUE",


/*---------- optional params ----------*/

/*  
 *   (Use keyword MAP in place of value if want to
 *   autofill any value from previous  data) 
 */   

 
    "pipelineObjects": [
        {
            "id": "Default",
            "name": "Default",		/* name can be autogenerated with timestamp */
            "fields": [
                {
                    "key": "failureAndRerunMode",
                    "stringValue": "CASCADE"
                },
                {
                    "key": "resourceRole",
                    "stringValue": "DataPipelineDefaultResourceRole"
                },
                {
                    "key": "role",
                    "stringValue": "DataPipelineDefaultRole"
                },
                {
                    "key": "pipelineLogUri",
                    "stringValue": ""     //S3 bucket for logs
                },
                {
                    "key": "scheduleType",
                    "stringValue": "ONDEMAND"
                }
            ]
        },
        {
            "id": "EmrClusterForBackup",
            "name": "EmrClusterForBackup",
            "fields": [
                {
                    "key": "role",
                    "stringValue": "DataPipelineDefaultRole"
                },
                {
                    "key": "coreInstanceCount",
                    "stringValue": "1"
                },
                {
                    "key": "coreInstanceType",
                    "stringValue": "m3.xlarge"
                },
                {
                    "key": "releaseLabel",
                    "stringValue": "emr-5.13.0"
                },
                {
                    "key": "masterInstanceType",
                    "stringValue": "m3.xlarge"
                },
                {
                    "key": "region",
                    "stringValue": ""    //pipeline region
                },
                {
                    "key": "type",
                    "stringValue": "EmrCluster"
                },
                {
                    "key": "terminateAfter",
                    "stringValue": "15 Minutes" // can be changed based on the size of the table. It takes 7 minutes to setup the EMR account that information as well while setting this time.
                }
            ]
        },
        {
            "id": "TableBackupActivity",
            "name": "TableBackupActivity",
            "fields": [
                {
                    "key": "output",
                    "refValue": "S3BackupLocation"
                },
                {
                    "key": "input",
                    "refValue": "DDBSourceTable"
                },
                {
                    "key": "maximumRetries",
                    "stringValue": "2"
                },
                {
                    "key": "step",
                    "stringValue": "s3://dynamodb-emr-#{myDDBRegion}/emr-ddb-storage-handler/2.1.0/emr-ddb-2.1.0.jar,org.apache.hadoop.dynamodb.tools.DynamoDbExport,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}"
                },
                {
                    "key": "runsOn",
                    "refValue": "EmrClusterForBackup"
                },
                {
                    "key": "type",
                    "stringValue": "EmrActivity"
                },
                {
                    "key": "resizeClusterBeforeRunning",
                    "stringValue": "true"
                }
            ]
        },
        {
            "id": "S3BackupLocation",
            "name": "S3BackupLocation",
            "fields": [
                {
                    "key": "directoryPath",
                    "stringValue": "/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}"                       //S3 bucket for backup(before the /)
                },
                {
                    "key": "type",
                    "stringValue": "S3DataNode"
                }
            ]
        },
        {
            "id": "DDBSourceTable",
            "name": "DDBSourceTable",
            "fields": [
                {
                    "key": "readThroughputPercent",
                    "stringValue": "0.25"
                },
                {
                    "key": "type",
                    "stringValue": "DynamoDBDataNode"
                },
                {
                    "key": "tableName",
                    "stringValue": ""      //dynamoDB table name
                }
            ]
        }
    ],
    "parameterObjects": [
        {
            "id": "myDDBTableName",
            "attributes": [
                {
                    "key": "description",
                    "stringValue": "Source DynamoDB table name"
                },
                {
                    "key": "type",
                    "stringValue": "String"
                }
            ]
        },
        {
            "id": "myOutputS3Loc",
            "attributes": [
                {
                    "key": "description",
                    "stringValue": "Output S3 folder"
                },
                {
                    "key": "type",
                    "stringValue": "AWS::S3::ObjectKey"
                }
            ]
        },
        {
            "id": "myDDBReadThroughputRatio",
            "attributes": [
                {
                    "key": "default",
                    "stringValue": "0.25"
                },
                {
                    "key": "watermark",
                    "stringValue": "Enter value between 0.1-1.0"
                },
                {
                    "key": "description",
                    "stringValue": "DynamoDB read throughput ratio"
                },
                {
                    "key": "type",
                    "stringValue": "Double"
                }
            ]
        },
        {
            "id": "myDDBRegion",
            "attributes": [
                {
                    "key": "default",
                    "stringValue": ""  //pipeline region
                },
                {
                    "key": "watermark",
                    "stringValue": ""   //pipeline region
                },
                {
                    "key": "description",
                    "stringValue": "Region of the DynamoDB table"
                },
                {
                    "key": "type",
                    "stringValue": "String"
                }
            ]
        }
    ],
    "parameterValues": [
        {
            "id": "myDDBRegion",
            "stringValue": ""                                                           //pipeline region
        },
        {
            "id": "myDDBTableName",
            "stringValue": ""                                                          //dynamoDB table name
        },
        {
            "id": "myDDBReadThroughputRatio",
            "stringValue": "0.25"
        },
        {
            "id": "myOutputS3Loc",
            "stringValue": ""                                                  //S3 bucket for backup
        }
    ]
}

Add appropriate values as mentioned in the pre-requisites.

Click on “Apply Query” and then “Save Node”.

Notification

Just add the email or slack channel you want to get notified on once the workflow is complete.

Run the workflow

“Save Node” and click on “Run Now”. This should show you the result about the workflow, how it ran and what it did.

Next, Save the workflow to check if you have the permissions or not. Then, enable the workflow using the toggle switch.

Just click on “Run Now” to run the workflow and create the pipeline.

Then deactivate the workflow so that it doesn’t run again.

Define the data pipeline

Click on “Pick the template” from the editor menu.

Select “Activate DynamoDB export pipeline”.

Pick “Activate DynamoDB export pipeline” from the list and you will see the following template setup for you.

Trigger

You know the drill. Set it up as done earlier.

Resource

Select the same AWS account and region as done earlier.

Filter

This node filters out the data pipeline we created earlier. Leave it as it is.

Action

Select the same AWS account and region as done earlier.

Notification

Set up an email or slack notification as done earlier.

Just add the email or slack channel you want to get notified on once the workflow is complete.

Run the workflow

“Save Node” and click on “Run Now”. This should show you the result about the workflow, how it ran and what it did.

Next, Save the workflow to check if you have the permissions or not. Then, enable the workflow using the toggle switch.

Just click on “Run Now” to run the workflow and define the pipeline we created earlier.

Then deactivate the workflow so that it doesn’t run again.

Now go to the AWS console to see the progress. It takes around 7 minutes to get the EMR setup and then the transfer begins. Here’s some more information on the same provided by AWS.

AWS Console> Data Pipeline

Use cases for Import and Export DynamoDB

There are templates available for the following use cases as mentioned below.

Export DynamoDB to AWS S3

  1. Create, configure and save the Data Pipeline(AWS DynamoDB to S3 exporter).
  2. Activate the Pipeline(Activate DynamoDB export pipeline).

Import data from AWS S3 to DynamoDB

  1. Create, configure and save the Data Pipeline(S3 to DynamoDB importer).
  2. Activate the Pipeline(Activate DynamoDB Importer Pipeline).

TotalCloud is workflow-based cloud management for AWS. We are modularizing cloud management to make it accessible to everyone.

 

AWS Workflow Automation

TotalCloud automates actions on AWS resources and services using its unique cloud graph engine.

Sign up for a free trial

Top Categories
Stay up to date on the latest stories case studies and videos
You might also like
AWS DynamoDB for Serverless Microservices
DynamoDB is the Serverless NoSQL Database offering by AWS. Being Serverless makes it easier to consider AWS DynamoDB for Se...
Read More