Handling Non-Transactional DDL in MySQL Migrations
The migration runner reports the ALTER TABLE failed, you reach for the rollback, and there is nothing to roll back to — MySQL already committed the partial change. InnoDB implicitly commits the open transaction before a DDL statement and commits again after it, so DDL is not transactional and cannot be reversed by aborting. A migration interrupted mid-flight leaves the table with a new column but no constraint, or a new index but no foreign key, and the runner’s automatic rollback is powerless. This page is the runbook for that situation: how to recognize the stall, reconcile the half-applied schema by hand, and move to a deployment pattern that does not depend on rollback at all. It is the operational deep dive under the transactional vs non-transactional databases section of the migration fundamentals guide.
Symptom / Error Signatures
When non-transactional DDL collides with live traffic, runners stall or fail with:
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transactionERROR 1062 (23000): Duplicate entry '...' for key '...'during index creationERROR 1091 (42000): Can't DROP 'column_name'; check that column/key exists- Migration runners (Flyway, Liquibase, ActiveRecord, Django) hanging at
Waiting for table metadata lock - Partial schema post-interruption: new columns exist but constraints, indexes, or foreign keys are missing
Identify the stalled DDL and what blocks it:
-- MySQL · read-only diagnostic · run on the primary as a DBA/privileged user
SELECT
p.ID, p.USER, p.HOST, p.DB, p.COMMAND, p.TIME, p.STATE,
LEFT(p.INFO, 80) AS QUERY
FROM information_schema.processlist p
WHERE p.COMMAND != 'Sleep'
AND p.TIME > 30
AND (p.INFO LIKE '%ALTER TABLE%' OR p.INFO LIKE '%CREATE INDEX%' OR p.INFO LIKE '%DROP%')
ORDER BY p.TIME DESC;
Root Cause Analysis
InnoDB commits the current transaction before executing DDL and commits again immediately after, so the statement is never inside a transaction you can abort. Because DDL cannot be rolled back, any mid-execution failure leaves the schema partially altered. Concurrently, every SELECT, INSERT, and UPDATE against the target table holds a shared metadata lock (MDL); an ALTER TABLE requests an exclusive MDL, so if a long-running query or an uncommitted transaction still holds the shared lock, the DDL blocks until it times out or exhausts the connection pool waiting. This is the defining difference between MySQL and a fully transactional engine like PostgreSQL — the same ALTER that PostgreSQL would roll back cleanly leaves MySQL needing manual reconciliation. The broader contrast and what it means for tool choice is laid out in transactional vs non-transactional databases, and it is precisely why your scripts must follow idempotent script design so a retry can converge on a partial state.
Immediate Mitigation
When a migration stalls or fails in production, restore stability and reconcile state in order.
- Reduce lock contention before retrying:
-- MySQL · run in the same session before re-executing the DDL · fails fast instead of hanging
SET SESSION lock_wait_timeout = 5;
-- then re-run the exact ALTER/CREATE statement
- Terminate the blocking session (coordinate with app owners — only kill long-running
SELECTs or idle transactions; killing active DML risks inconsistency):
-- MySQL · run as privileged user · use the process ID from the diagnostic query above
KILL <process_id>;
- Audit exactly what applied before the failure:
-- MySQL · read-only · map which DDL steps succeeded
SELECT COLUMN_NAME, DATA_TYPE, IS_NULLABLE, COLUMN_DEFAULT
FROM information_schema.columns
WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME = 'your_table';
SELECT INDEX_NAME, NON_UNIQUE, SEQ_IN_INDEX, COLUMN_NAME
FROM information_schema.statistics
WHERE TABLE_SCHEMA = DATABASE() AND TABLE_NAME = 'your_table';
- Reconcile by hand — do not trust the runner’s automatic rollback, because MySQL DDL cannot be rolled back:
-- MySQL · run as migration role · only revert steps that actually applied; MySQL has no DROP COLUMN IF EXISTS
ALTER TABLE your_table DROP COLUMN new_column;
DROP INDEX idx_partial ON your_table;
ALTER TABLE your_table DROP FOREIGN KEY fk_partial;
-- If the change is correct and should be kept, mark the migration applied so it is not re-run:
-- UPDATE flyway_schema_history SET success = 1 WHERE version = 'X.X';
Permanent Fix / Long-Term Pattern
Stop relying on rollback that the engine cannot provide. Make migrations additive-first: add columns, tables, or indexes in a single deploy, and make new columns nullable with sensible defaults so the change stays metadata-only (ALGORITHM=INSTANT/INPLACE) rather than forcing a full table rebuild. For large tables, replace native ALTER TABLE with an online schema-change tool — pt-online-schema-change or gh-ost — which builds a shadow table, syncs via triggers or binary-log parsing, and performs an atomic swap that bypasses MDL contention:
# Shell · run from a host with replica access during a low-write window · long-running, monitor lag
gh-ost \
--user="$DB_USER" --password="$DB_PASS" \
--host="$DB_HOST" \
--database="$DB_NAME" --table="your_table" \
--alter="ADD COLUMN new_col VARCHAR(255) DEFAULT 'default'" \
--execute
Keep every script idempotent so a retry after a partial application converges rather than failing:
-- MySQL · run as migration role · no IF NOT EXISTS for ADD COLUMN, so guard via information_schema
SET @ddl := IF(
(SELECT COUNT(*) FROM information_schema.columns
WHERE table_schema = DATABASE()
AND table_name = 'your_table' AND column_name = 'new_col') = 0,
'ALTER TABLE your_table ADD COLUMN new_col VARCHAR(255) DEFAULT ''default''',
'SELECT 1');
PREPARE stmt FROM @ddl; EXECUTE stmt; DEALLOCATE PREPARE stmt;
Finally, decouple schema from app deploys — ship the schema, deploy code that handles old and new shapes, drop deprecated columns later — and add a pre-flight gate that verifies ALGORITHM=INPLACE, LOCK=NONE compatibility and simulates lock-wait scenarios in CI/CD. The closely related trap of DDL silently committing an open transaction mid-script is covered in avoiding implicit commits in MySQL DDL migrations.
Verification Checklist
success = 1%_pt_osc%triggers or_gho/_ghc/_del/%_oldshadow tables)CHECK TABLE your_table EXTENDEDreportsstatus = OKand all expected constraints exist intable_constraintsperformance_schema.metadata_locksshows noEXCLUSIVElock stuck inPENDINGduring peak trafficALGORITHM=INPLACE(orINSTANT) withLOCK=NONE, not a table rebuildEXPLAINplans show no regression from new indexes
Frequently Asked Questions
Can I wrap a MySQL ALTER TABLE in a transaction to make it rollback-safe?
No. InnoDB issues an implicit commit immediately before and after the DDL, so the statement is never inside your transaction — a ROLLBACK has nothing to undo. The practical substitute is additive, idempotent migrations plus pre-written inverse DDL you can apply by hand if you must reverse.
When should I use gh-ost or pt-online-schema-change instead of native ALTER?
Use them when the table is large enough that an exclusive metadata lock would stall live traffic, or when the change cannot run as ALGORITHM=INPLACE. They build a shadow table and swap atomically, avoiding the long MDL hold. For small tables a native INPLACE/INSTANT alter is simpler and lower-risk.
How do I clean up after an interrupted gh-ost run?
gh-ost is triggerless but leaves _gho, _ghc, and _del shadow tables if interrupted; pt-online-schema-change leaves %_pt_osc% triggers and an %_old table. Confirm no swap is mid-flight, then drop the orphaned triggers and shadow tables, and verify the original table’s row count and constraints before resuming.