In this post, I will explain the demo to migrate AWS aurora database to TiDB Cloud with AWS SDK and TiDB Cloud OpenAPI with minimum manual work. It includes AWS resource generation, full data migration and incremental replication.
- AWS aurora cluster setup with golang SDK According to provided config file.
- EC2 node generation for workstation and DM cluster Generate EC2 nodes for workstation and DM clusters. The workstation is used to run TiUP to deploy DM cluster, data comparison, table creation and data generation. The DM cluster is used for data replication from AWS aurora to TiDB Cloud.
- TiDB Cloud setup The TiDB Cloud API is used to generate TiDB Cloud cluster. Please find the OpenAPI interface and TiDB cloud golang sdk for your reference.
- Setup Private Link between TiDB Cloud and workstation/aurora (Interactive operation) Need interactive operation to setup private link between workstation/DM cluster and TiDB Cloud since it has not been provided by TiDB Cloud.
- Test table creation and test data generation Create one table and insert few rows for test.
- Take binlog position from AWS aurora Take the binlog position before taking snapshot, with which the DM replication task will be created.
- AWS aurora snapshot taking Take the aurora snapshot to extract data to S3.
- Export snapshot to S3 parquet data Export the snapshot to S3 parquet which is supported to be imported by TiDB Cloud.
- Data import to TiDB Cloud TiDB Cloud supports the data import API which make the data integration easier.
- DM setup for replication So far DM on TiDB Cloud API has not been supported. To make the demo simpler, deploy the DM cluster on AWS premise.
- Data comparison between TiDB Cloud and aurora The sync-diff-inspector is used to compare between Aurora and TiDB Cloud to make sure that the data has been migrated successfully.
pi@local$ more /tmp/aurora2tidbcloud.yaml
keyname: key-name # public key name
keyfile: /home/pi/.ssh/private-key-name # private key name
s3backup_folder: s3://jay-data/aurora-export/ # s3 directory for data export
# debian os
imageid: ami-07d02ee1eeb0c996c # Default image id for EC2
keyname: jay-us-east-01 # Public key to access the EC2 instance
keyfile: /home/pi/.ssh/jay-us-east-01.pem # Private key ti access the EC2 instance
cidr: 18.104.22.168/16 # The cidr for the VPC
instance_type: m5.2xlarge # Default instance type
tidb_version: v6.5.2 # TiDB version
excluded_az: # The AZ to be excluded for the subnets
instance_type: t2.small # Instance type for dm master
count: 1 # Number of dm master node to be deployed
instance_type: t2.small # Instance type for dm worker
count: 1 # Number of dm worker node to be deployed
tidbcloud_project_id: 1111113089206752222 # The project id in the tidb cloud in which tidb cluster is to be created.
description: Data migration from aurora to TiDB Test
pi@local$./bin/aws aurora2tidbcloud deploy aurora2tidbcloud /tmp/aurora2tidbcloud.yaml
Parallel Main step ... Echo: Create TransitGateway ... ...
Private Link Setup between TiDB Cloud and TiDB Cloud
Once workstation and TiDB Cloud is setup, the prompt asks the endpoint service which is provided by TiDB Cloud. You will have to switch to TiDB Cloud to get the information.
Go to TiDB Console to check that the TiDB Cluster has been created as below picture.
The cli command has been included in the script. No need to run the cli command any more.
Do the same process for private link between TiDB Cloud and DM Cluster VPC
Input private link connection host
The execution time for each process is showed after the demo is setup.
Confirm the data has been copied to TiDb Cloud and DM replication
What can we use this script?
First, it helps to setup the demo very quickly. Without too much effort, it is completed within 90 minutes.
Secondly, based on this demo, one tool to migrate aurora to TiDB is to provided. In the future, with one command, we are able to complete the data migration .
What do I expect of the OpenAPI? If we get below API, it can be improved much more easily.
- TiDB Cloud API to fetch private endpoint service
- TiDB Cloud API to accept the private endpoint connection
- TiDB Cloud API to fetch private endpoint service host name
- TiDB Cloud API to support DM
- TiDB Cloud API to support TiCDC