Migrating Databases using DMS

This is a post about databases, Amazon Web Services (AWS), and most importantly, how to have full database migrations from Data Center Servers to Relational Database Service (RDS) ones.

The picture above describes what we will do in this post and which tools we will use. So, if your database, er, goldfish (let’s call him McMilian) needs a new home, read on to see how to seamlessly get him to fresh, new waters.

The Migration - A quick summary

First, let's define that a Database Migration Service (DMS) is a service that can be accessed from the AWS console to migrate data to and from widely used databases such as Amazon Aurora, Oracle, MySQL, and MSSQL, among others.

It can migrate data from homogeneous or heterogeneous services and either uses auto conversion or a manual conversion tool to format the data properly. It connects to the source database, reads the source data, formats the data for consumption by the target database, and loads the data into the target database.

The heavy lifting is done by the replication instance, which is just another AWS managed instance that you will need to have in order to do the migration. It needs to have sufficient storage and processing power to perform the tasks needed to migrate the information.

The next step is to specify the database endpoints in which both the source and target information can be either an EC2 instance, RDS DB instance, an on-premises database or a Docker container. After specifying the endpoints for the migration, the next step is to create a task which can specify which data to migrate, map data using target schema, or even create new data on the target database.

The final step is to monitor the tasks that are operating. AWS provides different services to monitor the information, and at a minimum, it provides a statistical table for the task running. This is a high-level overview of the basic steps.

The map for all the components needed by the AWS DMS is explained in detail in the picture below, where we will assume that McMilian the fish’s will to move to the next bowl by jumping out of them is the technology that AWS uses in the process of migrating the data from all the source and target databases.

If you prefer you can also use the image AWS provides, which is basically the same (but I think watching McMilians’s travels is a bit more fun):

Sounds easy right? Let's do it.

There must be a better way

I actually ended up creating this process because there was a “crazy” requirement from some developers and product owners that needed the dev and uat databases with almost real-time data, coming from multiple production databases. This was only possible by creating backups and then restoring them, which took around 2 - 3 hours.

In the meantime the live databases were overwhelmed by the backing up process. In the end, this process was done every Sunday morning in an automated way by using AWS Lambda and AWS CloudWatch Rules, and everyone was satisfied, but I knew there was a better way to do it.

There are not real difficulties in the process of doing this, it is really straightforward, you just need to pay attention to the parameters you set up on the endpoints and replication instances and the tasks themselves.

Some common issues you will find while working with DMS, is that you need to understand and have full control and visibility of the underlying infrastructure where your replication instance(s) and your source and target databases are.

By underlying infrastructure, I mean the network connectivity between the servers involved in the process. The connectivity between the source, the replication instances, and the target instances should be enabled by Security Groups or other types of firewalls you have in place on your datacenter.

Step by step migration

1 - STEP ONE: CREATE THE REPLICATION INSTANCE

I recommend to start with an dms.t2.micro instance type, which is included in the Free tier, you also need to set up the VPC where you will be launching the instance and the space you might need to have for this instance to work; Availability zone, KMS, and security groups can also be set. Once you have the parameters, create your instance and wait until the status changes to available.

2 - STEP TWO: CREATE THE SOURCE ENDPOINT

The source endpoint is the connection to the source. It is where the data you need to migrate, move, copy, etc., lives. Fill out the information and then click Test Connection.

Once the connection is successfully tested you are almost ready to go. The server name can be either an IP or DNS name. If it’s an RDS instance, please ensure that the source database has enabled the connectivity from the replication instance.

3 - STEP THREE: CREATE THE TARGET ENDPOINT

The target endpoint is the connection to the target OR, where the data you need to migrate will live. Fill out the information and then click Test Connection, once the connection is successfully tested, we are ready to set up the task that will do the job. The server name can be either an IP or DNS name. If it’s an RDS instance, please ensure that the target database has enabled the connectivity from the replication instance.

4 - STEP FOUR: CREATE THE TASK

This is the most important part of it all, as it is the one that triggers and runs the migration. You will need to specify a name, the replication instance you will use, the source, the target, and how you want to perform the migration.

You have three types of migrations which are actually very easy to understand once you read the type of migration. What is truly amazing is that you can have a real live replica of your database being copied in real time from the source db to the target db, with no extra steps or any other configuration. That’s the true power of technology and how this facilitates and simplifies the job we do – simply superb!

You just need to specify what type of migration you need, let me sum it up for you:

Migrating Existing Data – This process creates files or tables in the target database, automatically defines the metadata that is required at the target, and populates the tables with data from the source. The data from the tables is loaded in parallel for improved efficiency.

Migrating Existing Data and Replicating Ongoing Changes – This process captures changes to the source database that occur while the data is being migrated from the source to the target. When the migration of the originally requested data has completed, the change data capture (CDC) process then applies the captured changes to the target database. Changes are captured and applied as units of single committed transactions, and several different target tables can be updated as a single source commit. This approach guarantees transactional integrity in the target database.

Replicating Data Changes Only – This process reads the recovery log file of the source database management system (DBMS) and groups the entries for each transaction together. If AWS DMS can’t apply changes to the target within a reasonable time (for example, if the target is not accessible), AWS DMS buffers the changes on the replication server for as long as necessary. It doesn’t reread the source DBMS logs, which can take a long time.

So, just select the one you need and you are all set. In this case, we are using the first one as the target database is empty and we only need to have the data that is currently in the live database. We don’t need to have the live replication – for the moment.

The last part of the task is to actually describe how we will be doing the migration of the schemas and tables, the first drop down menu will show you the existing schemas on the source database and it will assume that you don’t need to change the names of it, or any other tables. If you do, you can simply add a transformation rule and specify where you need the changes. On here, you can also specify which tables you need to migrate. If you need everything, leave the “%” and the Action “Include”. That should be it. You hit create task and wait until the migration is complete.

Mission accomplished

Voilá! It’s been a long road to get here, but hopefully you now have a fully replicated database running on whatever you have your database hosted on.

You did this by using the power of AWS and the DMS service. When used wisely, you can use DMS for many other tasks and be sure that developers and ops will love you forever (at least make them smile).

Plus, you can sleep soundly knowing McMilian arrived in his new home!

More information can be found in here: AWS DMS Webpage