Handling Non-Transactional DDL in MySQL Migrations

The migration runner reports the ALTER TABLE failed, you reach for the rollback, and there is nothing to roll back to — MySQL already committed the partial change. InnoDB implicitly commits the open transaction before a DDL statement and commits again after it, so DDL is not transactional and cannot be reversed by aborting. A migration interrupted mid-flight leaves the table with a new column but no constraint, or a new index but no foreign key, and the runner’s automatic rollback is powerless. This page is the runbook for that situation: how to recognize the stall, reconcile the half-applied schema by hand, and move to a deployment pattern that does not depend on rollback at all. It is the operational deep dive under the transactional vs non-transactional databases section of the migration fundamentals guide.

Why MySQL DDL cannot roll back A timeline shows an implicit commit before the ALTER, the ALTER executing, and an implicit commit after, with the failure point landing between two commits so there is no transaction to abort. Implicit Commit Brackets Every DDL implicit COMMIT (before) ALTER TABLE runs / may fail implicit COMMIT (after) No open transaction spans the ALTER — there is nothing for ROLLBACK to undo.
Because MySQL brackets each DDL statement with implicit commits, a mid-statement failure leaves a committed, partial schema that no rollback can reverse.

Symptom / Error Signatures

When non-transactional DDL collides with live traffic, runners stall or fail with:

  • ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
  • ERROR 1062 (23000): Duplicate entry '...' for key '...' during index creation
  • ERROR 1091 (42000): Can't DROP 'column_name'; check that column/key exists
  • Migration runners (Flyway, Liquibase, ActiveRecord, Django) hanging at Waiting for table metadata lock
  • Partial schema post-interruption: new columns exist but constraints, indexes, or foreign keys are missing

Identify the stalled DDL and what blocks it:

-- MySQL · read-only diagnostic · run on the primary as a DBA/privileged user
SELECT
    p.ID, p.USER, p.HOST, p.DB, p.COMMAND, p.TIME, p.STATE,
    LEFT(p.INFO, 80) AS QUERY
FROM information_schema.processlist p
WHERE p.COMMAND != 'Sleep'
  AND p.TIME > 30
  AND (p.INFO LIKE '%ALTER TABLE%' OR p.INFO LIKE '%CREATE INDEX%' OR p.INFO LIKE '%DROP%')
ORDER BY p.TIME DESC;

Root Cause Analysis

InnoDB commits the current transaction before executing DDL and commits again immediately after, so the statement is never inside a transaction you can abort. Because DDL cannot be rolled back, any mid-execution failure leaves the schema partially altered. Concurrently, every SELECT, INSERT, and UPDATE against the target table holds a shared metadata lock (MDL); an ALTER TABLE requests an exclusive MDL, so if a long-running query or an uncommitted transaction still holds the shared lock, the DDL blocks until it times out or exhausts the connection pool waiting. This is the defining difference between MySQL and a fully transactional engine like PostgreSQL — the same ALTER that PostgreSQL would roll back cleanly leaves MySQL needing manual reconciliation. The broader contrast and what it means for tool choice is laid out in transactional vs non-transactional databases, and it is precisely why your scripts must follow idempotent script design so a retry can converge on a partial state.

Immediate Mitigation

When a migration stalls or fails in production, restore stability and reconcile state in order.

  1. Reduce lock contention before retrying:
-- MySQL · run in the same session before re-executing the DDL · fails fast instead of hanging
SET SESSION lock_wait_timeout = 5;
-- then re-run the exact ALTER/CREATE statement
  1. Terminate the blocking session (coordinate with app owners — only kill long-running SELECTs or idle transactions; killing active DML risks inconsistency):
-- MySQL · run as privileged user · use the process ID from the diagnostic query above
KILL <process_id>;
  1. Audit exactly what applied before the failure:
-- MySQL · read-only · map which DDL steps succeeded
SELECT COLUMN_NAME, DATA_TYPE, IS_NULLABLE, COLUMN_DEFAULT
FROM information_schema.columns
WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME = 'your_table';

SELECT INDEX_NAME, NON_UNIQUE, SEQ_IN_INDEX, COLUMN_NAME
FROM information_schema.statistics
WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME = 'your_table';
  1. Reconcile by hand — do not trust the runner’s automatic rollback, because MySQL DDL cannot be rolled back:
-- MySQL · run as migration role · only revert steps that actually applied; MySQL has no DROP COLUMN IF EXISTS
ALTER TABLE your_table DROP COLUMN new_column;
DROP INDEX idx_partial ON your_table;
ALTER TABLE your_table DROP FOREIGN KEY fk_partial;

-- If the change is correct and should be kept, mark the migration applied so it is not re-run:
-- UPDATE flyway_schema_history SET success = 1 WHERE version = 'X.X';

Permanent Fix / Long-Term Pattern

Stop relying on rollback that the engine cannot provide. Make migrations additive-first: add columns, tables, or indexes in a single deploy, and make new columns nullable with sensible defaults so the change stays metadata-only (ALGORITHM=INSTANT/INPLACE) rather than forcing a full table rebuild. For large tables, replace native ALTER TABLE with an online schema-change tool — pt-online-schema-change or gh-ost — which builds a shadow table, syncs via triggers or binary-log parsing, and performs an atomic swap that bypasses MDL contention:

# Shell · run from a host with replica access during a low-write window · long-running, monitor lag
gh-ost \
  --user="$DB_USER" --password="$DB_PASS" \
  --host="$DB_HOST" \
  --database="$DB_NAME" --table="your_table" \
  --alter="ADD COLUMN new_col VARCHAR(255) DEFAULT 'default'" \
  --execute

Keep every script idempotent so a retry after a partial application converges rather than failing:

-- MySQL · run as migration role · no IF NOT EXISTS for ADD COLUMN, so guard via information_schema
SET @ddl := IF(
    (SELECT COUNT(*) FROM information_schema.columns
     WHERE table_schema = DATABASE()
       AND table_name = 'your_table' AND column_name = 'new_col') = 0,
    'ALTER TABLE your_table ADD COLUMN new_col VARCHAR(255) DEFAULT ''default''',
    'SELECT 1');
PREPARE stmt FROM @ddl; EXECUTE stmt; DEALLOCATE PREPARE stmt;

Finally, decouple schema from app deploys — ship the schema, deploy code that handles old and new shapes, drop deprecated columns later — and add a pre-flight gate that verifies ALGORITHM=INPLACE, LOCK=NONE compatibility and simulates lock-wait scenarios in CI/CD. The closely related trap of DDL silently committing an open transaction mid-script is covered in avoiding implicit commits in MySQL DDL migrations.

Verification Checklist

  • success = 1
  • %_pt_osc% triggers or _gho/_ghc/_del/%_old shadow tables)
  • CHECK TABLE your_table EXTENDED reports status = OK and all expected constraints exist in table_constraints
  • performance_schema.metadata_locks shows no EXCLUSIVE lock stuck in PENDING during peak traffic
  • ALGORITHM=INPLACE (or INSTANT) with LOCK=NONE, not a table rebuild
  • EXPLAIN plans show no regression from new indexes

Frequently Asked Questions

Can I wrap a MySQL ALTER TABLE in a transaction to make it rollback-safe? No. InnoDB issues an implicit commit immediately before and after the DDL, so the statement is never inside your transaction — a ROLLBACK has nothing to undo. The practical substitute is additive, idempotent migrations plus pre-written inverse DDL you can apply by hand if you must reverse.

When should I use gh-ost or pt-online-schema-change instead of native ALTER? Use them when the table is large enough that an exclusive metadata lock would stall live traffic, or when the change cannot run as ALGORITHM=INPLACE. They build a shadow table and swap atomically, avoiding the long MDL hold. For small tables a native INPLACE/INSTANT alter is simpler and lower-risk.

How do I clean up after an interrupted gh-ost run? gh-ost is triggerless but leaves _gho, _ghc, and _del shadow tables if interrupted; pt-online-schema-change leaves %_pt_osc% triggers and an %_old table. Confirm no swap is mid-flight, then drop the orphaned triggers and shadow tables, and verify the original table’s row count and constraints before resuming.