Zero-Downtime MySQL to Aurora Migration: An Enterprise-Grade AWS DMS Runbook



Modernizing databases is rarely about “just moving data.” It is about consistency, safety, rollback readiness, and zero downtime.

In this post, I walk through a production-proven MySQL → Amazon Aurora MySQL migration using AWS Database Migration Service (DMS) with Full Load + Change Data Capture (CDC).

This guide is based on real-world enterprise migrations and focuses on what actually breaks in production—and how to prevent it.

Components

    • Source: MySQL (on-premises or EC2)
    • Target: Amazon Aurora MySQL (private subnets)
    • Migration Engine: AWS DMS (Full Load + CDC)
    • Networking: VPC, private subnets, NAT Gateway
    • Access: Bastion host for controlled administration
    • Observability: CloudWatch logs and metrics

Why This Architecture Works

    • No application downtime during bulk load
    • Continuous replication via binlogs
    • Controlled, reversible cutover
    • Safe rollback at every stage

Migration Checklist For FULL LOAD + CDC

1. Pre‑Migration Checks (SOURCE MySQL):
    1.1: Version & Binlog Validation
    1.2: GTID Validation
    1.3: Timeout Configuration
    1.4: Charset & Collation Validation
    1.5: Validate Unsupported and Partially Supported Data Types
    1.6: Replication User
    1.7: Schema Backup (No Deferred Objects)
2. Target Environment (Amazon Aurora MySQL):
    2.1: Create Subnet Group & Security Groups
    2.2: Create Custom Cluster Parameter Group
    2.3: Create Aurora Cluster
    2.4: Deploy a Bastion/Jump Host
    2.5: Target Schema Creation
    2.6: Remove Foreign Key Constraints
3. AWS DMS Configuration:
    3.1: Replication Instance
    3.2: Task Settings
    3.3: Table Mappings
    3.4: Task Creation
    3.5: Pre-Migration Assessment
4. Full Load + CDC Execution
5. Cutover Procedure:
    5.1: CDC Drain Verification
    5.2: Freeze Writes
    5.3: Row Count Validation
    5.4: Rollback Safety
    5.5: Cutover Point
6. Recreate DEFERRED Objects.
    6.1: Recreate Stored Procedures, Functions, Events, Triggers and Views
    6.2: Recreate Foreign Keys
7. Rollback Strategy
8. Start Application Traffic
9. Post‑Go‑Live Validation

An Enterprise-Grade, Zero-Downtime Migration Runbook

1.1: Version & Binlog Validation

SELECT VERSION();
SHOW VARIABLES LIKE 'log_bin';
SHOW VARIABLES WHERE Variable_name IN ('binlog_format','binlog_row_image','server_id');
SHOW VARIABLES LIKE 'binlog_expire_logs_seconds';

Required values
log_bin            = ON
binlog_format      = ROW
binlog_row_image   = FULL
server_id          != 0

Ensure binlog retention is greater than the expected migration + CDC duration to avoid binlog purge during DMS replication.

1.2: GTID Validation

SHOW VARIABLES LIKE 'gtid_mode';
SHOW VARIABLES LIKE 'enforce_gtid_consistency';

GTID ON or OFF is supported, but must be consistent.
GTID mode must remain unchanged for the entire duration of Full Load + CDC.
Do NOT enable or disable GTID while DMS is running.

1.3: Timeout Configuration

SHOW VARIABLES WHERE Variable_name IN('net_read_timeout','net_write_timeout','wait_timeout');
SET GLOBAL net_read_timeout=600
SET GLOBAL net_write_timeout=600
SET GLOBAL wait_timeout=600 (MySQL default is okay also)

1.4 Charset & Collation Validation

SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME
FROM information_schema.SCHEMATA
WHERE SCHEMA_NAME IN ('your_db1','your_db2');

Aurora MySQL 8.0 defaults differ from many MySQL 5.7 sources. Ensure Aurora parameter group matches (e.g. utf8mb4, not latin1).

SQL_MODE comparison (source vs target)
SELECT @@sql_mode;

Compare SQL_MODE between source and Aurora. Differences may affect triggers, functions, or strict inserts.

If sql_mode differs, explicitly align Aurora to source before loading data, especially flags like:
STRICT_TRANS_TABLES
NO_ZERO_DATE
ONLY_FULL_GROUP_BY

1.5: Validate Unsupported and Partially Supported Data Types

AWS DMS pre-migration assessment may report certain MySQL data types as PARTIALLY SUPPORTED.

These do not automatically block migration, but they must be reviewed and validated to ensure application compatibility after cutover.

Data types commonly flagged by AWS DMS (MySQL → Aurora)

Data Type             Support Level             Notes
-----------------------------------------------------------------------------------------------
ENUM                  Partially supported       Stored as strings; app logic must be validated
SET                   Partially supported       Bitmask behavior differs
GEOMETRY / spatial    Partially supported       Requires spatial function validation
JSON                  Supported                 Version compatibility matters
BIT                   Supported                 Casting issues possible
YEAR                  Supported                 Formatting differences

Inspect source MySQL existing data types:

SELECT
  TABLE_SCHEMA,
  TABLE_NAME,
  COLUMN_NAME,
  DATA_TYPE,
  COLUMN_TYPE
FROM information_schema.COLUMNS
WHERE DATA_TYPE IN (
  'enum','set','geometry','point','linestring','polygon',
  'multipoint','multilinestring','multipolygon',
  'geometrycollection','json','bit','year'
)
ORDER BY TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME;

1.6 Replication User

CREATE USER 'dms_user'@'%' IDENTIFIED BY 'StrongPassword123!';
GRANT REPLICATION SLAVE, REPLICATION CLIENT, SELECT, SHOW VIEW ON *.* TO 'dms_user'@'%';
GRANT SELECT ON *.* TO 'dms_user'@'%';
FLUSH PRIVILEGES;

1.7 Schema Backup (No Deferred Objects)

Take the migrating schemas DDL backup without Stored Procecdures, Functions, Events, Triggers and Views.

mysqldump -u root -p db1 --no-data --routines=0 --triggers=0 --events=0 > /path/to/backup/db1_schema.sql
mysqldump -u root -p db2 --no-data --routines=0 --triggers=0 --events=0 > /path/to/backup/db2_schema.sql

2.1: Create Subnet Group & Security Groups

Create DB Subnet Group:

aws rds create-db-subnet-group \
  --db-subnet-group-name aurora-prod-subnet-group \
  --db-subnet-group-description "Aurora production subnet group" \
  --subnet-ids subnet-aaaaaaa subnet-bbbbbbb

Verify the Subnet Groups:

aws rds describe-db-subnet-groups \
  --db-subnet-group-name aurora-prod-subnet-group

Create Security Group: (if not exists already)

Here we require four Security Groups (recommended, least-privilege)

SG    Used by                     Purpose                                   Inbound rules
--------------------------------------------------------------------------------------------------------------------
SG-A  Bastion host                SSH access from your IP                   Port:22       Source:YOUR_PUBLIC_IP/32
SG-B  Aurora cluster              MySQL access from DMS, Bastion, App       Port:3306     Source:SG-A, SG-B, SG-C
SG-C  DMS replication instance    MySQL access to source + target           Not required
SG-D  Application (optional)      App → Aurora traffic                      Not required

Create Bastion SG (SG-A):

BASTION_SG=$(aws ec2 create-security-group \
  --group-name sgs-bastion-ssh \
  --description "Bastion SSH access" \
  --vpc-id vpc-xxxx \
  --query GroupId --output text)

aws ec2 authorize-security-group-ingress \
  --group-id $BASTION_SG \
  --protocol tcp \
  --port 22 \
  --cidr YOUR_PUBLIC_IP/32

Create Aurora SG (SG-B):

AURORA_SG=$(aws ec2 create-security-group \
  --group-name sgs-aurora-mysql \
  --description "Aurora MySQL access" \
  --vpc-id vpc-xxxx \
  --query GroupId --output text)

Allow Bastion → Aurora

aws ec2 authorize-security-group-ingress \
  --group-id $AURORA_SG \
  --protocol tcp \
  --port 3306 \
  --source-group $BASTION_SG

Create DMS SG (SG-C):

DMS_SG=$(aws ec2 create-security-group \
  --group-name sgs-dms-mysql \
  --description "DMS replication access" \
  --vpc-id vpc-xxxx \
  --query GroupId --output text)

Allow DMS → Aurora

aws ec2 authorize-security-group-ingress \
  --group-id $AURORA_SG \
  --protocol tcp \
  --port 3306 \
  --source-group $DMS_SG

(Optional) Create App SG (SG-D):

APP_SG=$(aws ec2 create-security-group \
  --group-name sgs-app \
  --description "Application outbound to Aurora" \
  --vpc-id vpc-xxxx \
  --query GroupId --output text)

aws ec2 authorize-security-group-ingress \
  --group-id $AURORA_SG \
  --protocol tcp \
  --port 3306 \
  --source-group $APP_SG

2.2: Create Custom Cluster Parameter Group

aws rds create-db-cluster-parameter-group \
 --db-cluster-parameter-group-name aurora-mysql-prod-pg \
 --db-parameter-group-family aurora-mysql8.0 \
 --description "Aurora MySQL prod parameters"

aws rds modify-db-cluster-parameter-group \
  --db-cluster-parameter-group-name aurora-mysql-prod-pg \
  --parameters \
    "ParameterName=character_set_server,ParameterValue=utf8mb4,ApplyMethod=pending-reboot" \
    "ParameterName=collation_server,ParameterValue=utf8mb4_0900_ai_ci,ApplyMethod=pending-reboot" \
    "ParameterName=time_zone,ParameterValue=UTC,ApplyMethod=pending-reboot" \
    "ParameterName=max_allowed_packet,ParameterValue=67108864,ApplyMethod=immediate" \
    "ParameterName=log_bin_trust_function_creators,ParameterValue=1,ApplyMethod=immediate" \
    "ParameterName=event_scheduler,ParameterValue=ON,ApplyMethod=immediate" \
    "ParameterName=binlog_format,ParameterValue=ROW,ApplyMethod=pending-reboot"

Verify parameter values

aws rds describe-db-cluster-parameters \
  --db-cluster-parameter-group-name aurora-mysql-prod-pg \
  --query "Parameters[?
    ParameterName=='character_set_server' ||
    ParameterName=='collation_server' ||
    ParameterName=='time_zone' ||
    ParameterName=='max_allowed_packet' ||
    ParameterName=='binlog_format' ||
    ParameterName=='log_bin_trust_function_creators' ||
    ParameterName=='event_scheduler'
  ].[ParameterName,ParameterValue,ApplyMethod,IsModifiable]" \
  --output table

2.3 Create Aurora Cluster

Create Aurora cluster:

aws rds create-db-cluster \
  --db-cluster-identifier aurora-mysql-cluster \
  --engine aurora-mysql \
  --engine-version 8.0.mysql_aurora.3.04.0 \
  --db-cluster-parameter-group-name aurora-mysql-prod-pg \
  --master-username admin \
  --master-user-password 'StrongPassword123!' \
  --database-name appdb \
  --db-subnet-group-name aurora-prod-subnet-group \
  --vpc-security-group-ids $AURORA_SG \
  --backup-retention-period 7 \
  --preferred-backup-window 02:00-03:00 \
  --preferred-maintenance-window sun:04:00-sun:05:00 \
  --storage-encrypted \
  --deletion-protection \
  --enable-cloudwatch-logs-exports error general slowquery

Add Instances:

aws rds create-db-instance \
  --db-instance-identifier aurora-mysql-writer-1 \
  --db-instance-class db.t4g.medium \
  --engine aurora-mysql \
  --db-cluster-identifier aurora-mysql-cluster \
  --no-publicly-accessible

aws rds create-db-instance \
  --db-instance-identifier aurora-mysql-reader-1 \
  --db-instance-class db.t4g.medium \
  --engine aurora-mysql \
  --db-cluster-identifier aurora-mysql-cluster \
  --no-publicly-accessible

Get the Aurora endpoints, this is what DMS needs:

aws rds describe-db-clusters \
  --db-cluster-identifier aurora-mysql-cluster \
  --query 'DBClusters[0].Endpoint'

Reboot the Aurora cluster for the paramerter changes made earlier to take place.

aws rds reboot-db-instance \
  --db-instance-identifier aurora-mysql-writer-1

aws rds reboot-db-instance \
  --db-instance-identifier aurora-mysql-reader-1

2.4: Deploy a Bastion/Jump Host

Launch Bastion Host (Public Subnet):

aws ec2 run-instances \
  --image-id ami-00d8fc944fb171e29 \
  --instance-type t3.micro \
  --key-name ec2-key \
  --security-group-ids $BASTION_SG \
  --subnet-id subnet-xxxx \
  --associate-public-ip-address \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=bastion-host}]'

aws ec2 describe-instances \ --filters "Name=tag:Name,Values=bastion-host" \ --query 'Reservations[*].Instances[*].PublicIpAddress' \ --output text

2.5: Target Schema Creation

Connect to Aurora cluster via MySQL client tool like CLI, DBeaver, MySQL Workbench.
Create Databases schemas and execute the DDL backups taken earlier.

CREATE DATABASE db1;
USE db1;
SOURCE /path/to/backup/db1_schema.sql

2.6: Remove Foreign Key Constraints

Remove Tables cascade FKs from Target Aurora

mysql -uadmin -pPassword -h target-aurora

SELECT
  CONCAT(
    'ALTER TABLE `', kcu.TABLE_SCHEMA, '`.`', kcu.TABLE_NAME,
    '` DROP FOREIGN KEY `', kcu.CONSTRAINT_NAME, '`;'
  ) AS drop_fk_sql
FROM information_schema.KEY_COLUMN_USAGE kcu
WHERE kcu.REFERENCED_TABLE_NAME IS NOT NULL
  AND kcu.TABLE_SCHEMA = 'your_schema_name';

Execute the script output in SQL CLI on target Aurora.

3.1 Replication Instance

Create dms-vpc-role:

Trust policy:

cat > dms-vpc-trust.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "dms.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF

Create role:

aws iam create-role \
  --role-name dms-vpc-role \
  --assume-role-policy-document file://dms-vpc-trust.json

Attach AWS-managed policy:

aws iam attach-role-policy \
  --role-name dms-vpc-role \
  --policy-arn arn:aws:iam::aws:policy/service-role/AmazonDMSVPCManagementRole

Create dms-cloudwatch-logs-role:

Trust policy:

cat > dms-cw-trust.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "dms.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF

Create role:

aws iam create-role \
  --role-name dms-cloudwatch-logs-role \
  --assume-role-policy-document file://dms-cw-trust.json

Attach policy:

aws iam attach-role-policy \
  --role-name dms-cloudwatch-logs-role \
  --policy-arn arn:aws:iam::aws:policy/service-role/AmazonDMSCloudWatchLogsRole

Verify roles exist:

aws iam list-roles | grep dms

Create Replication Subnet Group:

aws dms create-replication-subnet-group \
  --replication-subnet-group-identifier dms-private-subnet-group \
  --replication-subnet-group-description "Private subnets for AWS DMS replication instance" \
  --subnet-ids subnet-aaaaaaa subnet-bbbbbbb

aws dms describe-replication-subnet-groups \
  --query 'ReplicationSubnetGroups[*].[ReplicationSubnetGroupIdentifier,SubnetGroupStatus,VpcId]' \
  --output table

Create Replication Instance (Private Subnet):

aws dms create-replication-instance \
  --replication-instance-identifier dms-mysql \
  --replication-instance-class dms.c5.large \
  --allocated-storage 200 \
  --replication-subnet-group-identifier dms-private-subnet-group \
  --vpc-security-group-ids $DMS_SG

aws dms describe-replication-instances \
  --query 'ReplicationInstances[*].[ReplicationInstanceIdentifier,ReplicationInstanceArn]' \
  --output table

Create Source Endpoint. For compliance environments, use verify-ca or verify-full.

aws dms create-endpoint \
  --endpoint-identifier src-mysql \
  --endpoint-type source \
  --engine-name mysql \
  --username dms_user \
  --password StrongPassword123! \
  --server-name EC2_IP/ON_PREM_IP \
  --port 3306 \
  --ssl-mode verify-ca

Create Aurora Endpoint. For compliance environments, use verify-ca or verify-full.

aws dms create-endpoint \
  --endpoint-identifier tgt-aurora \
  --endpoint-type target \
  --engine-name aurora \
  --username admin \
  --password StrongPassword123! \
  --server-name AURORA_ENDPOINT \
  --port 3306 \
  --ssl-mode verify-ca

Verify Endpoints:

aws dms describe-endpoints \
  --query 'Endpoints[*].[EndpointIdentifier,EndpointType,EngineName,ServerName]' \
  --output table

Test Connections(both src-mysql and tgt-aurora):

aws dms test-connection \
  --replication-instance-arn arn:aws:dms:region:acct:rep:xxxx \
  --endpoint-arn arn:aws:dms:region:acct:endpoint:yyyy

aws dms describe-connections \
  --query 'Connections[*].[EndpointArn,Status,LastFailureMessage]' \
  --output table

Note: If the source MySQL connection fails check that you have active NAT Gateway configured as the DMS Replication Instance is in a Private Subnet.

aws ec2 describe-route-tables \
  --filters Name=association.subnet-id,Values=<DMS_PRIVATE_SUBNET_ID> \
  --query 'RouteTables[*].Routes'

3.2: Task Settings

Create AWS DMS task with Partition table migration strategy - set "TargetTablePrepMode": "TRUNCATE_BEFORE_LOAD" or "DO_NOTHING"

Create file name task-settings.json with task rule as required. Tune based on data size and instance class.

cat > task-settings.json <<EOF
{
  "FullLoadSettings": {
    "TargetTablePrepMode": "DO_NOTHING",
    "CreatePkAfterFullLoad": false,
    "StopTaskCachedChangesApplied": false,
    "StopTaskCachedChangesNotApplied": false,
    "MaxFullLoadSubTasks": 8,
    "CommitRate": 10000
  },
  "TargetMetadata": {
    "SupportLobs": true,
    "FullLobMode": true,
    "LobMaxSize": 2000000,
    "BatchApplyEnabled": true,
    "BatchApplyPreserveTransaction": true
  },
  "ChangeProcessingDdlHandlingPolicy": {
    "HandleSourceTableDropped": false,
    "HandleSourceTableTruncated": false,
    "HandleSourceTableAltered": true,
    "SkipTableSuspensionForPartitionDdl": true
  },
  "ErrorBehavior": {
    "DataErrorPolicy": "LOG_ERROR",
    "DataTruncationErrorPolicy": "LOG_ERROR",
    "TableErrorEscalationPolicy": "STOP_TASK"
  },
  "Logging": {
    "EnableLogging": true
  }
}
EOF

3.3: Table Mappings

Create file name table-mappings.json with table mapping rule to replicate ALL databases except mysql and system databases.

cat > table_mappings.json <<EOF
{
  "rules": [
    {
      "rule-type": "selection",
      "rule-id": "1",
      "rule-name": "include_all",
      "object-locator": {
        "schema-name": "%",
        "table-name": "%"
      },
      "rule-action": "include"
    },
    {
      "rule-type": "selection",
      "rule-id": "2",
      "rule-name": "exclude_mysql",
      "object-locator": {
        "schema-name": "mysql",
        "table-name": "%"
      },
      "rule-action": "exclude"
    },
    {
      "rule-type": "selection",
      "rule-id": "3",
      "rule-name": "exclude_sys",
      "object-locator": {
        "schema-name": "sys",
        "table-name": "%"
      },
      "rule-action": "exclude"
    },
    {
      "rule-type": "selection",
      "rule-id": "4",
      "rule-name": "exclude_performance_schema",
      "object-locator": {
        "schema-name": "performance_schema",
        "table-name": "%"
      },
      "rule-action": "exclude"
    },
    {
      "rule-type": "selection",
      "rule-id": "5",
      "rule-name": "exclude_information_schema",
      "object-locator": {
        "schema-name": "information_schema",
        "table-name": "%"
      },
      "rule-action": "exclude"
    }
  ]
}
EOF

3.4: Task Creation

aws dms create-replication-task \
  --replication-task-identifier mysql-migrate \
  --source-endpoint-arn arn:aws:dms:region:acct:endpoint:xxxxx \
  --target-endpoint-arn arn:aws:dms:region:acct:endpoint:xxxxx \
  --replication-instance-arn arn:aws:dms:region:acct:rep:xxxxx \
  --migration-type full-load-and-cdc \
  --table-mappings file://table-mappings.json \
  --replication-task-settings file://task-settings.json

aws dms describe-replication-tasks \
  --query 'ReplicationTasks[*].{ID:ReplicationTaskIdentifier,ARN:ReplicationTaskArn,Status:Status}'

aws dms describe-connections \
  --query 'Connections[*].{Endpoint:EndpointIdentifier,Status:Status,Failure:LastFailureMessage}'

Review DMS task settings:

aws dms describe-replication-tasks \
  --query 'ReplicationTasks[*].[ReplicationTaskIdentifier,ReplicationTaskArn,Status]' \
  --output table

aws dms describe-replication-tasks \
  --query 'ReplicationTasks[0].ReplicationTaskSettings' \
  --output text \
| jq -r '.' \
| jq '{FullLoadSettings,TargetMetadata,ChangeProcessingDdlHandlingPolicy,Logging}'


3.5: Pre‑Migration Assessment

All **FAILED** items must be fixed before proceeding.

The IAM user or role executing AWS CLI commands must have DMS control-plane permissions to start and review the assessment.

Create the policy file:

cat > dms-assessment-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dms:StartReplicationTaskAssessment",
        "dms:DescribeReplicationTaskAssessmentResults",
        "dms:DescribeReplicationTasks",
        "dms:DescribeEndpoints",
        "dms:DescribeReplicationInstances"
      ],
      "Resource": "*"
    }
  ]
}
EOF

Create and attach the policy:

aws iam create-policy \
  --policy-name DMSPreMigrationAssessmentPolicy \
  --policy-document file://dms-assessment-policy.json

aws iam attach-user-policy \
  --user-name YOUR_IAM_USER_NAME \
  --policy-arn arn:aws:iam::<ACCOUNT_ID>:policy/DMSPreMigrationAssessmentPolicy

AWS DMS requires a dedicated service role to write assessment results to an AWS-managed S3 bucket.

Create the trust policy:

cat > dms-access-trust.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "dms.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF

Create the role and attach the AWS-managed policy:

aws iam create-role \
  --role-name dms-access-for-tasks \
  --assume-role-policy-document file://dms-access-trust.json

aws iam attach-role-policy \
  --role-name dms-access-for-tasks \
  --policy-arn arn:aws:iam::aws:policy/service-role/AmazonDMSRedshiftS3Role

Verify the role exists:

aws iam get-role --role-name dms-access-for-tasks

Start the pre-migration assessment (CLI):

aws dms start-replication-task-assessment \
  --replication-task-arn arn:aws:dms:region:acct:task:xxxx

Check assessment status and results:

aws dms describe-replication-task-assessment-results \
  --replication-task-arn arn:aws:dms:region:acct:task:xxxx

View and parse the assessment report (CLI)

Extract and pretty-print the full JSON report:

aws dms describe-replication-task-assessment-results \
  --replication-task-arn arn:aws:dms:region:account-id:task:task-id \
  --query 'ReplicationTaskAssessmentResults[0].AssessmentResults' \
  --output text \
| jq .

Download the report (optional):

If a local copy is required, use the pre-signed URL returned in S3ObjectUrl:

curl -o assessment-report.json "<S3ObjectUrl>"

jq . assessment-report.json

4. Full Load + CDC Execution

Start Replication Task

aws dms start-replication-task \
--replication-task-arn <TASK_ARN> \
--start-replication-task-type start-replication

Monitor Replication Task in CloudWatch + CLI.

aws dms describe-replication-tasks \
  --query 'ReplicationTasks[*].{ID:ReplicationTaskIdentifier,ARN:ReplicationTaskArn,Status:Status}'

See table-level migration statistics:

aws dms describe-table-statistics \
  --replication-task-arn arn:aws:dms:region:account-id:task:task-id \
  --query 'TableStatistics[*].{
    Schema:SchemaName,
    Table:TableName,
    State:TableState,
    FullLoadRows:FullLoadRows,
    Inserts:Inserts,
    Updates:Updates,
    Deletes:Deletes
  }' \
  --output table

Show only tables NOT completed (Use this to quickly spot problem tables.):

aws dms describe-table-statistics \
  --replication-task-arn arn:aws:dms:region:account-id:task:task-id \
  --query "TableStatistics[?TableState!='Table completed'].{
    Schema:SchemaName,
    Table:TableName,
    State:TableState
  }" \
  --output table

Check single table (targeted debugging):

aws dms describe-table-statistics \
  --replication-task-arn arn:aws:dms:region:account-id:task:task-id \
  --query "TableStatistics[?TableName=='tbl_name'].{
    Schema:SchemaName,
    Table:TableName,
    State:TableState,
    FullLoadRows:FullLoadRows
  }" \
  --output table

Show the exact error for the failed table:

aws dms describe-table-statistics \
  --replication-task-arn arn:aws:dms:region:account-id:task:task-id \
  --query "TableStatistics[?TableName=='tbl_name'].{
    Schema:SchemaName,
    Table:TableName,
    State:TableState,
    Error:LastError,
    ErrorCount:ErrorCount
  }" \
  --output table

aws dms describe-replication-tasks \
  --query "ReplicationTasks[?ReplicationTaskArn=='arn:aws:dms:region:account-id:task:task-id'].ReplicationTaskStats"

Show the CloudWatch Error logs:

aws logs describe-log-groups \
  --log-group-name-prefix dms-tasks

aws logs describe-log-streams \
  --log-group-name <log-group-name> \
  --order-by LastEventTime \
  --descending

aws logs get-log-events \
  --log-group-name <log-group-name> \
  --log-stream-name <LOG_STREAM_NAME> \
  --limit 50

Use CloudWatch Logs filter and search by table name:

aws logs filter-log-events \
  --log-group-name <log-group-name> \
  --filter-pattern '"schema.tbl_name"' \
  --limit 50 \
  --query 'events[*].message' \
  --output text

Resume the DMS Migration Task (Requires when table/s caught error during replication):

aws dms describe-replication-tasks \
  --query "ReplicationTasks[?ReplicationTaskArn=='arn:aws:dms:region:account-id:task:task-id'].Status"

aws dms start-replication-task \
  --replication-task-arn arn:aws:dms:region:account-id:task:task-id \
  --start-replication-task-type resume-processing

Show CDC activity only (cutover prep):

aws dms describe-replication-tasks \
  --query "ReplicationTasks[?ReplicationTaskArn=='arn:aws:dms:region:account-id:task:task-id'].{
    Status:Status,
    MigrationType:MigrationType
  }" \
  --output table

aws dms describe-table-statistics \
  --replication-task-arn arn:aws:dms:region:account-id:task:task-id \
  --query 'TableStatistics[*].{
    Schema:SchemaName,
    Table:TableName,
    Inserts:Inserts,
    Updates:Updates,
    Deletes:Deletes
  }' \
  --output table

5.1 CDC Drain Verification

You must check both DMS-side latency metrics and task state, and then freeze writes briefly to let CDC drain.

To ensure CDC lag = 0 before cutover using AWS CLI, you must check both DMS-side latency metrics and task state, and then freeze writes briefly to let CDC drain.

You must verify ALL three are true:
1. Source latency = 0
2. Target latency = 0
3. No new writes happening on source

If any one is non-zero → do NOT cut over.

Sometimes it may show NULL for latencies then make sure no writes are there on source and follow next steps.

Check CDC lag from AWS CLI:

Run this:
aws dms describe-replication-tasks \
  --query "ReplicationTasks[?ReplicationTaskArn=='arn:aws:dms:ap-southeast-1:751836677834:task:G6UG4LTEHVBLFAHMYL63ZEEZPY'].{
    FullLoadProgress:ReplicationTaskStats.FullLoadProgressPercent,
    TablesLoaded:ReplicationTaskStats.TablesLoaded,
    TablesLoading:ReplicationTaskStats.TablesLoading,
    TablesErrored:ReplicationTaskStats.TablesErrored,
    CDCInserts:ReplicationTaskStats.CdcInserts,
    CDCUpdates:ReplicationTaskStats.CdcUpdates,
    CDCDeletes:ReplicationTaskStats.CdcDeletes
  }" \
  --output table

Key fields to watch
{
  "FullLoadProgressPercent": 100,
  "TablesLoading": 0,
}

aws dms describe-replication-tasks \
  --query 'ReplicationTasks[*].{
    Task:ReplicationTaskIdentifier,
    Status:Status,
    SourceLag:ReplicationTaskStats.CDCLatencySource,
    TargetLag:ReplicationTaskStats.CDCLatencyTarget
  }'

Expected before cutover:
Status      = running
SourceLag   = 0
TargetLag   = 0

Confirm task is in CDC mode (not full load):

aws dms describe-replication-tasks \
  --query 'ReplicationTasks[*].{
    Task:ReplicationTaskIdentifier,
    Status:Status,
    FullLoad:ReplicationTaskStats.FullLoadProgressPercent
  }'

Must show:
FullLoadProgressPercent = 100
Status = running

5.2 Freeze Writes

Freeze writes on SOURCE (MANDATORY)

CDC lag can bounce between 0–2 seconds forever if writes continue.

App maintenance mode
Disable writes at app layer

SET GLOBAL read_only = ON;
SET GLOBAL super_read_only = ON;

Wait until CDC drains to ZERO

5.3: Row Count Validation

Exact row comparison between source and target.

Run both Source MySQL and Target Aurora and compare:

SET SESSION group_concat_max_len = 1000000;

SELECT
  GROUP_CONCAT(
    CONCAT(
      'SELECT ''', table_name,
      ''' AS table_name, COUNT(*) AS row_count FROM `',
      table_schema, '`.`', table_name, '`'
    )
    SEPARATOR ' UNION ALL '
  ) INTO @sql
FROM information_schema.tables
WHERE table_schema = 'employees'
  AND table_type = 'BASE TABLE';

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

For partitioned tables, total count may match but partition data may not.

Check per-partition (Aurora and Source MySQL)

SELECT
  PARTITION_NAME,
  TABLE_ROWS
FROM information_schema.partitions
WHERE table_schema='your_db'
AND table_name='partitioned_table';

5.4: Rollback Safety

Must exercise.

Before stopping DMS, capture binlog position:

aws dms describe-replication-tasks \
  --query 'ReplicationTasks[0].ReplicationTaskStats'

Record from output:
LastSourceTransactionTimestamp
LastSourceTransactionID
LastSourceBinlogFile
LastSourceBinlogPosition

5.5: Cutover Point

Re-run:

aws dms describe-replication-tasks \
  --query 'ReplicationTasks[*].{
    SourceLag:ReplicationTaskStats.CDCLatencySource,
    TargetLag:ReplicationTaskStats.CDCLatencyTarget
  }'

Wait until:
SourceLag = 0
TargetLag = 0

Hold this state for at least 3060 seconds.

Sometimes it may show NULL for SourceLag and TargetLag.
If this is your case then perform the steps in 5.3: Row Count Validation.
If the row counts are consistent across source and target you are good to cutover.

Validate no pending changes (extra safe)

aws dms describe-table-statistics \
  --replication-task-arn arn:aws:dms:region:acct:task:xxxx \
  --query 'TableStatistics[*].{
    Table:TableName,
    State:TableState,
    Inserts:Inserts,
    Updates:Updates,
    Deletes:Deletes
  }'

If these counters stop increasing → CDC is drained.

STOP the task (this is the CUTOVER point)

Once lag = 0 and writes are frozen:

aws dms stop-replication-task \
  --replication-task-arn arn:aws:dms:region:acct:task:xxxx

This guarantees:
No more CDC
Target is consistent

6.1: Recreate Stored Procedures, Functions, Events, Triggers and Views.

Phase-1: Prepare the creation scripts.

Extract Stored procedures + functions from SOURCE:
mysqldump -h source-host -u admin -p your_db --routines --no-create-info --no-data --skip-triggers  > routines.sql

Extract Triggers from SOURCE:
mysqldump -h source-host -u admin -p your_db --triggers --no-create-info --no-data > triggers.sql

Extract Views from SOURCE:
mysqldump -h source-host -u admin -p your_db --no-data --skip-triggers --no-create-info > views.sql


Alternatively use the SQL on the source MySQL:

SET SESSION group_concat_max_len = 1024000;

Extract procedures from SOURCE:

SELECT
  CONCAT(
    'DELIMITER $$\n',
    'CREATE PROCEDURE `', ROUTINE_SCHEMA, '`.`', ROUTINE_NAME, '` ',
    ROUTINE_DEFINITION,
    '\n$$\nDELIMITER ;'
  ) AS create_procedure_sql
FROM information_schema.ROUTINES
WHERE ROUTINE_TYPE = 'PROCEDURE'
  AND ROUTINE_SCHEMA = 'your_db';

Extract functions from SOURCE:

SELECT
  CONCAT(
    'DELIMITER $$\n',
    'CREATE FUNCTION `', ROUTINE_SCHEMA, '`.`', ROUTINE_NAME, '` ',
    ROUTINE_DEFINITION,
    '\n$$\nDELIMITER ;'
  ) AS create_function_sql
FROM information_schema.ROUTINES
WHERE ROUTINE_TYPE = 'FUNCTION'
  AND ROUTINE_SCHEMA = 'your_db';

Extract triggers from SOURCE:

SELECT
  CONCAT(
    'DELIMITER $$\n',
    'CREATE TRIGGER `', TRIGGER_SCHEMA, '`.`', TRIGGER_NAME, '` ',
    ACTION_TIMING, ' ', EVENT_MANIPULATION,
    ' ON `', EVENT_OBJECT_SCHEMA, '`.`', EVENT_OBJECT_TABLE, '` ',
    'FOR EACH ROW ',
    ACTION_STATEMENT,
    '\n$$\nDELIMITER ;'
  ) AS create_trigger_sql
FROM information_schema.TRIGGERS
WHERE TRIGGER_SCHEMA = 'your_db';

Extract events from SOURCE:

SELECT
  CONCAT(
    'DELIMITER $$\n',
    'CREATE EVENT `', EVENT_SCHEMA, '`.`', EVENT_NAME, '` ',
    'ON SCHEDULE ', EVENT_DEFINITION,
    ' DO ', EVENT_BODY,
    '\n$$\nDELIMITER ;'
  ) AS create_event_sql
FROM information_schema.EVENTS
WHERE EVENT_SCHEMA = 'your_db';

Extract views from SOURCE:

SELECT
  CONCAT(
    'CREATE OR REPLACE VIEW `', TABLE_SCHEMA, '`.`', TABLE_NAME, '` AS ',
    VIEW_DEFINITION,
    ';'
  ) AS create_view_sql
FROM information_schema.VIEWS
WHERE TABLE_SCHEMA = 'your_db';

DEFINER FIX (CRITICAL for Aurora): Failure to fix DEFINER will cause silent runtime failures in Aurora.

Replace definers using Aurora current user for each output of above scripts:

sed -i 's/DEFINER=`[^`]*`@`[^`]*`/DEFINER=CURRENT_USER/g' file_name.sql

Phase-2: Apply the scripts on Aurora with below order

1. Stored Procedures
2. Functions
3. Views
4. Triggers
5. Events

Phase-3: Validation after apply (REQUIRED)

On Aurora:

SHOW PROCEDURE STATUS WHERE Db='your_db';
SHOW FUNCTION STATUS WHERE Db='your_db';
SHOW TRIGGERS FROM your_db;
SHOW EVENTS FROM your_db;
SHOW FULL TABLES IN your_db WHERE Table_type='VIEW';

6.2: Recreate Foreign Keys

Phase 1: Extract FK Metadata from Source MySQL

Run this on SOURCE MySQL.

SELECT
    kcu.CONSTRAINT_SCHEMA,
    kcu.TABLE_NAME,
    kcu.CONSTRAINT_NAME,
    GROUP_CONCAT(kcu.COLUMN_NAME ORDER BY kcu.ORDINAL_POSITION) AS child_columns,
    kcu.REFERENCED_TABLE_NAME,
    GROUP_CONCAT(kcu.REFERENCED_COLUMN_NAME ORDER BY kcu.ORDINAL_POSITION) AS parent_columns,
    rc.UPDATE_RULE,
    rc.DELETE_RULE
FROM information_schema.KEY_COLUMN_USAGE kcu
JOIN information_schema.REFERENTIAL_CONSTRAINTS rc
  ON kcu.CONSTRAINT_SCHEMA = rc.CONSTRAINT_SCHEMA
 AND kcu.CONSTRAINT_NAME = rc.CONSTRAINT_NAME
WHERE kcu.REFERENCED_TABLE_NAME IS NOT NULL
GROUP BY
    kcu.CONSTRAINT_SCHEMA,
    kcu.TABLE_NAME,
    kcu.CONSTRAINT_NAME,
    kcu.REFERENCED_TABLE_NAME,
    rc.UPDATE_RULE,
    rc.DELETE_RULE
ORDER BY kcu.CONSTRAINT_SCHEMA, kcu.TABLE_NAME;

Phase 2: Generate FK Recreation SQL (AUTO-GENERATED)

This query outputs executable ALTER TABLE statements.

SELECT
CONCAT(
  'ALTER TABLE `', TABLE_NAME, '` ',
  'ADD CONSTRAINT `', CONSTRAINT_NAME, '` ',
  'FOREIGN KEY (`', REPLACE(child_columns, ',', '`,`'), '`) ',
  'REFERENCES `', REFERENCED_TABLE_NAME, '` (`',
  REPLACE(parent_columns, ',', '`,`'), '`) ',
  'ON DELETE ', DELETE_RULE, ' ',
  'ON UPDATE ', UPDATE_RULE, ';'
) AS fk_ddl
FROM (
    SELECT
        kcu.TABLE_NAME,
        kcu.CONSTRAINT_NAME,
        GROUP_CONCAT(kcu.COLUMN_NAME ORDER BY kcu.ORDINAL_POSITION) AS child_columns,
        kcu.REFERENCED_TABLE_NAME,
        GROUP_CONCAT(kcu.REFERENCED_COLUMN_NAME ORDER BY kcu.ORDINAL_POSITION) AS parent_columns,
        rc.UPDATE_RULE,
        rc.DELETE_RULE
    FROM information_schema.KEY_COLUMN_USAGE kcu
    JOIN information_schema.REFERENTIAL_CONSTRAINTS rc
      ON kcu.CONSTRAINT_SCHEMA = rc.CONSTRAINT_SCHEMA
     AND kcu.CONSTRAINT_NAME = rc.CONSTRAINT_NAME
    WHERE kcu.REFERENCED_TABLE_NAME IS NOT NULL
      AND kcu.CONSTRAINT_SCHEMA = 'your_database'
    GROUP BY
        kcu.TABLE_NAME,
        kcu.CONSTRAINT_NAME,
        kcu.REFERENCED_TABLE_NAME,
        rc.UPDATE_RULE,
        rc.DELETE_RULE
) x;

Save output as:

fk_recreate.sql

Phase 3: Apply Safely on Aurora MySQL

Disable FK checks temporarily

SET foreign_key_checks = 0;

Execute FK script

mysql -h aurora-endpoint -u admin -p your_database < fk_recreate.sql

Re-enable FK checks

SET foreign_key_checks = 1;

Phase 4: Validation Queries (MANDATORY)

Validate FK Count

SELECT COUNT(*)
FROM information_schema.REFERENTIAL_CONSTRAINTS
WHERE CONSTRAINT_SCHEMA = 'your_database';

Validate FK Integrity

SELECT
  rc.CONSTRAINT_NAME,
  rc.TABLE_NAME,
  rc.REFERENCED_TABLE_NAME
FROM information_schema.REFERENTIAL_CONSTRAINTS rc
WHERE rc.CONSTRAINT_SCHEMA = 'your_database';

7. Rollback Strategy

Phase-1: Source remains read‑only for N hours

On Source

SET GLOBAL read_only = ON;
SET GLOBAL super_read_only = ON;

Phase-2: Aurora snapshot taken before app switch

aws rds create-db-cluster-snapshot \
  --db-cluster-identifier aurora-mysql-cluster \
  --db-cluster-snapshot-identifier aurora-pre-app-switch-$(date -u +%Y%m%d-%H%M%S)

Wait until the snapshot status is available before switching the app.

aws rds describe-db-cluster-snapshots \
  --db-cluster-identifier aurora-cluster-prod \
  --query 'DBClusterSnapshots[*].[DBClusterSnapshotIdentifier,Status,SnapshotCreateTime]' \
  --output table

8: Start Application Traffic

Perform smoke tests.

After rollback or success on Source MySQL:

SET GLOBAL read_only = OFF;
SET GLOBAL super_read_only = OFF;

9. Post‑Go‑Live Validation

Query latency comparison
Enable slow query log
Monitor errors for 24–48 hours
Decommission the Source MySQL




Comments