Amazon RDS does not allow allocated storage to be reduced. This makes sense to me, but I want to do it anyway. I’m working with an RDS multi-zone master and a read replica. Allocated storage space and IOPS can quickly get expensive. Before downsizing, I also want to compress a few large tables without maintenance downtime.

Assumptions

  1. MySQL version 5.6 instances running under Amazon RDS
  2. Tables are InnoDB formatted. default_storage_engine = InnoDB wouldn’t be a bad idea.
  3. innodb_file_format = Barracuda
  4. innodb_file_per_table = true
  5. Tables are using a primary key named id with AUTO_INCREMENT.

Reduce Database Size by Compressing Large Tables

Look for tables that would be good candidates for compression. This is a cpu/storage tradeoff, so to each his own.

SELECT
  table_schema AS "Schema",
  table_name AS "Table",
  round(((data_length + index_length) / 1024 / 1024 / 1024), 2) GB,
  TABLE_ROWS,
  ROW_FORMAT FROM information_schema.TABLES
WHERE
  table_schema NOT IN ('information_schema', 'mysql', 'performance_schema')
ORDER BY
  table_schema,
  GB,
  TABLE_ROWS,
  table_name;

MySQL can’t change a table’s row format without locking and causing downtime. I wrote a script to do this in production, but it doesn’t handle triggers and foreign keys without manual intervention. Here is the basic pseudo-code and relevant SQL.

  1. Check for previously interrupted session by looking for temporary tables ending in _0 or _1.

    SELECT table_name FROM information_schema.TABLES WHERE table_name REGEXP '^{table}_[01]$';
    
  2. Check current ROW_FORMAT to make sure the table isn’t already compressed.

    SELECT ROW_FORMAT FROM information_schema.TABLES WHERE table_name='{table}';
    
  3. Check for existing triggers. There should not be any triggers.

    SELECT EVENT_OBJECT_TABLE FROM information_schema.TRIGGERS WHERE EVENT_OBJECT_TABLE='{table}' LIMIT 1;
    
  4. Fetch table columns to identify the auto-increment field, if not id.

    DESCRIBE {table};
    
  5. Fetch current auto-increment value to use later as lastAutoIncrementValue.

    SELECT AUTO_INCREMENT FROM information_schema.TABLES WHERE table_name='{table}';
    
  6. Fetch table schema.

    SHOW CREATE TABLE {table};
    
  7. Update schema text to create a shadow table ending in _1. I’m using a regex replace function.

    sql = schema
    sql.replace("/^CREATE TABLE `{table}`/m"  , "CREATE TABLE `{table}_1`")
    sql.replace('/ ROW_FORMAT=[^[:space:]]+/m', '')
    sql.replace('/ ENGINE=([^[:space:]]+)/sm' , " ENGINE=$1 ROW_FORMAT=COMPRESSED")
    sql.replace('/(UNIQUE KEY|KEY|CONSTRAINT) `([^`]+)`/m', '$1 `$2_1`')
    
  8. Create shadow table {table}_1.

  9. Create trigger helpers. These will keep the table up to date and assist in copying data over. I create these statements programatically, but used the fields foo/bar/id for example purposes.

    CREATE TRIGGER tg__{table}__after_insert
    AFTER INSERT ON {table} FOR EACH ROW
    BEGIN
        INSERT INTO {table}_1 VALUES (
          NEW.foo,
          NEW.bar
        ) ON DUPLICATE KEY UPDATE
          foo = NEW.foo,
          bar = NEW.bar;
    END;
    
    CREATE TRIGGER tg__{table}__before_update
    BEFORE UPDATE ON {table} FOR EACH ROW
    BEGIN
        IF NEW.id < {lastAutoIncrementValue} THEN
          INSERT INTO {table}_1 VALUES (
            NEW.foo,
            NEW.bar
          ) ON DUPLICATE KEY UPDATE
            foo = NEW.foo,
            bar = NEW.bar;
        ELSE
          UPDATE {table}_1
          SET
            foo = NEW.foo,
            bar = NEW.bar
          WHERE
            id = OLD.id;
        END IF;
    END;
    
    CREATE TRIGGER tg__{table}__after_delete
    AFTER DELETE ON {table} FOR EACH ROW
    BEGIN
       DELETE FROM {table}_1 WHERE id = OLD.id;
    END;
    

    Note: You don’t have to change the delimitter unless executing these queries in a console.

  10. Iterate all rows of the existing table and SET id=id to force the update trigger to run. This will copy the data over to the shadow table.

    UPDATE {table}_1 SET id=id WHERE id >= 10000 AND id < 20000;
    

    Note: I end up using counts of 100k.

  11. The tables should hold the same data now. Double check everything before you break something.

  12. If you went ahead despite the assumption above that you have no foreign key constraints, then you need to perform an extra step. Check foreign key restraints in other tables and update them to reference the shadow table, {table}_1. Disable foreign key constraint checking when adding a new constraint to avoid huge delays.

  13. Swap the shadow table with the existing table.

    RENAME TABLE {table} TO {table}_0, {table}_1 TO {table};
    
  14. Drop the old table and remove the triggers. I will do this manually instead of scripting it.

    DROP TABLE {table}_0;
    DROP TRIGGER IF EXISTS tg__{table}__after_insert;
    DROP TRIGGER IF EXISTS tg__{table}__before_update;
    DROP TRIGGER IF EXISTS tg__{table}__after_delete;
    

And you’re done compressing the table. Repeat for other tables as desired.

Replicate to a Downgraded “Master” Instance

  1. Create the smaller database master now so the transition process can begin. It will be the future master instance, but we’ll manually configure it (later) as a slave to copy data over until then.

  2. Crank up the binlog retention on the current master long enough to do a full export, import and read the binlogs generated during that time.

    CALL mysql.rds_show_configuration;
    CALL mysql.rds_set_configuration('binlog retention hours', 24*7*2); #2 weeks
    
  3. Dump database to file for importing.

    I am dumping from a slave server, as to avoid any unexpected behavior in production.

    echo "SHOW SLAVE STATUS\G" | mysql -h<slave_hostname> -u<username> -p<password> | egrep " (Master_Log_File|Read_Master_Log_Pos):" &&
    mysqldump --quick --single-transaction --hex-blob -h<current_hostname> –u<username> -p<password> --databases <database_names> | gzip > backup.sql.gz
    
  4. Import database file to new master.

    cat backup.sql.gz | gzip -d | mysql -h<new_hostname> -u<username> -p<password>
    
  5. Create replication user on the old master (or slave if you chose that) and the new master.

    CREATE USER 'repl_aws'@'%' IDENTIFIED BY '<password>';
    GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'repl_aws'@'%';
    
  6. Start replication from the new master (currently a slave).

    CALL mysql.rds_set_external_master (host_name, host_port, replication_user_name, replication_user_password, mysql_binary_log_file_name, mysql_binary_log_file_location, ssl_encryption);
    
  7. Check for errors and skip them on the new master. Assuming these are just a handful at the beginning of the file, it is safe to skip them. Especially on a write heavy database the log position may be off a bit from when we checked to when the dump began.

    SHOW SLAVE STATUS\G
    CALL mysql.rds_skip_repl_error();
    
  8. Wait for Seconds_Behind_Master to drop to 0 in SHOW SLAVE STATUS\G.

  9. Put your current master in read only mode; repoint your application to the new master, stop replication and and remove the replication config from the new master.

  10. Decommission old master and slave servers. Set up new read replicas as needed. If you don’t immediately decommission it, remember to reset the binlog retention.

    CALL mysql.rds_set_configuration('binlog retention hours', NULL);