Safely Removing a NOT NULL Column With Expand-Contract

You want to drop users.legacy_status, a NOT NULL column replaced months ago by users.status. The naive migration is one line — ALTER TABLE users DROP COLUMN legacy_status; — and on a staging database with no traffic it works instantly. In production it causes an incident, because some still-running instance of the old application code executes an INSERT that includes legacy_status, and the moment the column is gone that insert fails with “column does not exist.” The NOT NULL constraint makes it worse: while the column exists, any new code path that omits it gets rejected too. Removing a NOT NULL column that old code still writes is not a single DROP; it is an ordered teardown — drop the NOT NULL constraint first, stop the writes, verify no readers, and only then drop the column — and the order is different enough on PostgreSQL and MySQL to matter.

This is the contract half of the additive pattern, applied to its hardest case: a column with a constraint that both old writers and the database itself depend on. The expand-side groundwork is covered in Implementing Expand-Contract for High-Traffic Tables; here we close the loop safely.

Symptom / Error Signatures

The failures cluster at two moments: when the constraint is still present, and when the column is dropped too early.

  • PostgreSQL, dropping too early: ERROR: column "legacy_status" of relation "users" does not exist from an old instance still inserting it.
  • MySQL, dropping too early: ERROR 1054 (42S22): Unknown column 'legacy_status' in 'field list'.
  • Constraint still present, new code omits the column: PostgreSQL ERROR: null value in column "legacy_status" violates not-null constraint; MySQL ERROR 1364 (HY000): Field 'legacy_status' doesn't have a default value.
  • A rejected backfill or constraint change because rows with NULL remain: PostgreSQL ERROR: column "legacy_status" contains null values when trying to re-assert NOT NULL in a panicked rollback.
  • Application error-rate spike confined to write endpoints touching the table, beginning exactly at the deploy that changed the column.

The two error families point in opposite directions — one says “the column is still required,” the other says “the column is already gone” — and the whole procedure exists to make sure neither code path and constraint ever disagree.

Root Cause Analysis

A NOT NULL column couples three independent actors: the old application code that writes it, the new application code that has stopped writing it, and the constraint that forces every writer to supply a value. During a rolling deploy, old and new code run simultaneously. If the constraint is still in place, new code that omits the column violates it. If the column is dropped, old code that includes it errors. There is no single instant at which a bare DROP COLUMN is safe while both versions are live and the constraint stands.

The teardown sequence exists to decouple these actors one at a time. First you remove the constraint, so the database no longer forces old code’s value to be present and tolerates new code’s omission — both versions can now run against the column. Then you remove the writes from the application (ship code that no longer references the column at all) and confirm every instance has rolled. Then you verify nothing reads it. Only when the column is referenced by no live code in any direction is DROP COLUMN safe.

The engines differ in how expensive each step is:

Step PostgreSQL MySQL 8.0 (InnoDB)
Drop NOT NULL ALTER TABLE ... ALTER COLUMN ... DROP NOT NULL — metadata-only, fast, brief lock ALTER TABLE ... MODIFY col ... NULLALGORITHM=INPLACE, but may copy; check
DROP COLUMN Metadata-only since PG 11 for the catalog entry; brief ACCESS EXCLUSIVE lock ALGORITHM=INSTANT (8.0.29+) for many drops; otherwise rebuilds the table
Transactional DDL Yes — steps roll back if they fail No — each DDL implicitly commits; no multi-step rollback
Lock-wait guard SET lock_timeout SET lock_wait_timeout

The MySQL “no transactional DDL” row is the trap: because each ALTER implicitly commits, you cannot wrap the constraint drop and the column drop in one transaction and roll the pair back together — a concern detailed in Handling Non-Transactional DDL in MySQL Migrations. Each step must be independently safe and re-runnable.

Ordered teardown of a NOT NULL column A left-to-right sequence: drop the NOT NULL constraint, stop writing the column, verify no readers remain, then drop the column. Safe Removal Order 1. Drop NOT NULL 2. Stop writes deploy + drain 3. Verify no readers 4. Drop column The constraint goes first so old and new code can both run; the column goes last.
Removing the constraint before stopping writes lets old and new code coexist; the column is dropped only once no code references it in any direction.

Immediate Mitigation

If a DROP COLUMN has already broken writes, restore service before redoing the teardown properly.

  1. Re-add the column as nullable so old code’s inserts succeed again. Do not re-add it NOT NULL — that would reject new code that omits it.
-- PostgreSQL · run on primary as migration role · additive, restores broken writes
-- Nullable on purpose: old code can write it, new code can omit it.
ALTER TABLE users ADD COLUMN IF NOT EXISTS legacy_status VARCHAR(32);
-- MySQL · run on primary as migration role · re-add nullable; implicit commit per ALTER
ALTER TABLE users ADD COLUMN legacy_status VARCHAR(32) NULL;
  1. Confirm the error rate on write endpoints returns to baseline before doing anything else.

  2. Then run the teardown in order. Drop the constraint (it is already absent if you just re-added nullable), ship code that stops writing the column, drain old instances, and verify no readers — covered next.

  3. Cap lock waits on every DDL step so a contended ALTER fails fast instead of queueing behind traffic.

-- PostgreSQL · run as migration role · fail fast rather than blocking the write path
SET lock_timeout = '3s';
-- MySQL · run as migration role · bound how long the metadata lock may wait
SET SESSION lock_wait_timeout = 3;

Permanent Fix / Long-Term Pattern

Run the four steps as separate, deployed changes, each verified before the next.

Step 1 — Drop the NOT NULL constraint. This is metadata-only and lets old and new code coexist immediately, because the database no longer forces the column to be present.

-- PostgreSQL · run on primary as migration role · metadata-only, brief lock
SET lock_timeout = '3s';
ALTER TABLE users ALTER COLUMN legacy_status DROP NOT NULL;
-- MySQL · run on primary · INPLACE where supported; verify it does not copy on huge tables
ALTER TABLE users MODIFY legacy_status VARCHAR(32) NULL, ALGORITHM=INPLACE, LOCK=NONE;

Step 2 — Stop writing the column. Ship application code that no longer references legacy_status in any INSERT or UPDATE, then wait for every old instance to drain. If the write is gated behind a flag, retire the flag first in the order described in Coupling Schema Changes to Feature Flags and Removing Both.

Step 3 — Verify no readers or writers remain. Prove it with evidence, not belief — grep the codebase, and watch the database’s own statistics to confirm the column is no longer touched.

-- PostgreSQL · read-only · column usage is not directly counted, so confirm via the
-- query log / pg_stat_statements that no statement references legacy_status.
SELECT query, calls
FROM   pg_stat_statements
WHERE  query ILIKE '%legacy_status%'
ORDER  BY calls DESC;

Step 4 — Drop the column. With no code referencing it in any direction, the drop is a clean contraction.

-- PostgreSQL · run on primary as migration role · brief ACCESS EXCLUSIVE lock
SET lock_timeout = '3s';
ALTER TABLE users DROP COLUMN IF EXISTS legacy_status;
-- MySQL · run on primary · INSTANT drop on 8.0.29+; older versions rebuild the table
ALTER TABLE users DROP COLUMN legacy_status, ALGORITHM=INSTANT;

On high-traffic tables, even the brief locks in steps 1 and 4 deserve the lock-acquisition care from Implementing Expand-Contract for High-Traffic Tables, and on MySQL the per-step implicit commits mean each ALTER must stand alone — there is no transaction to roll the set back. The governing rule from the parent Expand and Contract Methodology holds throughout: a contraction is safe only when no running code depends on the thing being removed.

Verification Checklist

  • NOT NULL constraint is dropped as its own deploy, before any code change, so old and new code coexist.
  • pg_stat_statements (or the MySQL query log / general log) shows zero statements referencing the column over a full traffic cycle.
  • INSERT, UPDATE, or SELECT references the column in any service.
  • lock_timeout / lock_wait_timeout so a contended ALTER fails fast instead of blocking writes.
  • ALTER is independently safe and re-runnable, since the implicit commit means the steps cannot be rolled back as a set.

Frequently Asked Questions

Why drop the NOT NULL constraint before stopping the writes? Because dropping the constraint is what lets old and new code run at the same time. While NOT NULL stands, new code that omits the column is rejected; once it is gone, the database tolerates both the old code that supplies the value and the new code that does not. Removing the constraint first creates the window in which you can safely roll the write-removal deploy.

Can’t I just drop the column in one statement during a maintenance window? Only if you can guarantee no instance of the old code is running and no constraint forces the value — which during a rolling deploy you cannot. Even in a maintenance window, a single in-flight old instance or a queued write can hit the missing column. The ordered teardown removes that risk regardless of deploy timing, which is why it is preferable to relying on a window.

Is dropping a column expensive on PostgreSQL or MySQL? On PostgreSQL the catalog drop is metadata-only and fast, holding only a brief ACCESS EXCLUSIVE lock. On MySQL 8.0.29+ many column drops use ALGORITHM=INSTANT and are similarly cheap; older versions rebuild the table and should run in a low-write window. Always set a lock-wait timeout so a contended statement fails fast rather than queueing behind traffic.

What if some rows still have data in the old column I need to keep? Copy or reconcile that data into the new column before step 4, while the old column still exists. Once DROP COLUMN runs the data is gone, so any backfill or reconciliation from the old column to the new one must complete and be verified before the drop, not after.