Backfill Optimization

A backfill is the step where a tidy schema migration meets reality: the new column exists, but ten or a hundred million existing rows still hold NULL, and you have to fill them while production keeps reading and writing the same table. Do it in one UPDATE and you bloat the write-ahead log, stall autovacuum, and push replica lag past the point where reads served from followers go stale. The job of a backfill is therefore not to be fast — it is to be invisible: to move data in batches small enough that the primary never notices and the slowest replica never falls behind its budget. This is the backfill phase of the broader Zero-Downtime Schema Evolution Patterns lifecycle, and it serves the engineer running the deploy and the DBA watching the lag graph.

A backfill only begins after the schema has been expanded additively and, where required, the application is already dual-writing into the new column through Dual-Write Synchronization — otherwise rows written during the backfill window arrive empty and you are chasing a moving target. Coordinated with the Expand and Contract Methodology, the backfill is the bridge between an expanded schema and a contracted one: it must finish, and be proven to have finished, before any old column is dropped.

The replication-lag feedback loop of a throttled backfill A backfill worker reads a keyset batch from the primary, applies an idempotent upsert, then samples the slowest replica's lag; if lag exceeds the budget it sleeps and re-samples before reading the next batch. Backfill Throttled by Replica Lag Read batch WHERE id > cursor Upsert idempotent write Sample lag slowest replica Advance cursor = last id lag > budget? sleep, then re-sample The loop self-regulates: it only reads the next batch when the slowest follower has caught up.
Every iteration measures the slowest replica before advancing — the backfill's rate is governed by lag, not by a fixed sleep guess.

Concept & Mechanism

A backfill is a long sequence of small write transactions, and on a live primary the cost of those writes is not the UPDATE itself but its downstream effects. Each batch generates WAL (PostgreSQL) or binlog and redo (MySQL InnoDB) that every replica must replay serially. If the primary commits batches faster than the slowest replica can apply them, lag grows without bound — and any read routed to that replica returns stale data. This is why batch size and batch spacing are not performance tuning; they are correctness controls. The detailed trade-off lives in Tuning Backfill Batch Size Against Replication Lag.

The second mechanism is lock footprint. A batched UPDATE ... WHERE id BETWEEN x AND y touches only the rows in that range, holding row locks for milliseconds. But LIMIT/OFFSET pagination forces the engine to scan and discard every row before the offset on each page, so page ten million reads ten million rows to return a thousand — turning a linear job into a quadratic one. Keyset (a.k.a. seek) pagination, WHERE id > :cursor ORDER BY id LIMIT n, reads only the rows it returns, every time. The full comparison is in Cursor-Based vs Keyset Pagination for Large Backfills.

The third mechanism is idempotency. Workers crash, deploys restart them, and a batch may be re-applied. Every write must be safe to repeat — ON CONFLICT DO NOTHING / DO UPDATE (PostgreSQL) or INSERT ... ON DUPLICATE KEY UPDATE (MySQL), and an UPDATE guarded by WHERE new_col IS NULL so a re-run touches nothing. A non-idempotent backfill cannot be resumed, which means it cannot be run safely at all.

Prerequisites & Decision Criteria

Run a backfill only when the schema is already expanded and the new column is being kept current by live writes. Confirm these before the first batch:

  • lock_timeout / innodb_lock_wait_timeout alerts are wired with explicit thresholds (lag < 2s, CPU < 70%)

The decision of how large each batch should be is a function of your slowest replica’s apply throughput, not a fixed number. Start small (500–1000 rows), measure lag under that rate, and only widen the batch if the slowest follower stays comfortably inside its budget.

Step-by-Step Procedure

1. Validate the batch plan on a replica. Confirm the engine uses an index range scan before you write anything to production.

# PostgreSQL · run against a STAGING replica, read-only · safe anytime.
# Context: EXPLAIN ANALYZE executes the SELECT but commits no writes.
psql "$STAGING_DATABASE_URL" -c "
EXPLAIN (ANALYZE, BUFFERS)
SELECT id FROM orders
WHERE id > 0 AND tenant_id IS NULL
ORDER BY id ASC
LIMIT 1000;"

Verify before proceeding: the plan shows an Index Scan (or Index Only Scan) on the primary key, never a Seq Scan. If it is sequential, the chunking key is not usable for range scans — fix that first.

2. Run the throttled loop with idempotent writes. Page by keyset, upsert, then govern the rate by the slowest replica’s lag.

# PostgreSQL · run as a separate post-deploy worker, NOT inside the deploy step.
# Context: idempotent (ON CONFLICT + IS NULL guard); safe to crash and resume; pauses on lag.
import time
from sqlalchemy import text, create_engine

engine = create_engine(DB_URL, pool_size=2, max_overflow=0)
BATCH = 1000
MAX_LAG_SECONDS = 2.0
cursor = 0

def replica_lag_seconds(conn):
    # On a primary, measure the worst follower via pg_stat_replication.
    row = conn.execute(text(
        "SELECT COALESCE(MAX(EXTRACT(EPOCH FROM replay_lag)), 0) "
        "FROM pg_stat_replication")).scalar()
    return float(row or 0)

while True:
    with engine.begin() as conn:                       # explicit txn; commits on exit
        rows = conn.execute(text("""
            SELECT id FROM orders
            WHERE id > :cursor AND tenant_id IS NULL
            ORDER BY id ASC LIMIT :batch
        """), {"cursor": cursor, "batch": BATCH}).fetchall()
        if not rows:
            break
        ids = [r.id for r in rows]
        conn.execute(text("""
            UPDATE orders SET tenant_id = 'default'
            WHERE id = ANY(:ids) AND tenant_id IS NULL  -- guard makes re-runs no-ops
        """), {"ids": ids})
        cursor = ids[-1]

    with engine.connect() as conn:
        while replica_lag_seconds(conn) > MAX_LAG_SECONDS:
            time.sleep(1.0)                             # back off until the slowest replica catches up

Verify before proceeding: after the first few hundred batches, the lag graph should be flat. If it climbs, lower BATCH rather than raising the sleep.

3. Prove completion before contracting. The backfill is done only when zero rows remain unfilled — checked against the primary, not a possibly-lagging replica.

-- PostgreSQL · run against the PRIMARY · read-only, safe anytime.
-- Context: gate the contract step on this returning 0; never DROP before it does.
SELECT count(*) AS remaining FROM orders WHERE tenant_id IS NULL;

Verification & Observability

While the backfill runs, watch three signals. Replication lag is the hard limiter — on PostgreSQL sample pg_stat_replication.replay_lag; on MySQL read Seconds_Behind_Source from SHOW REPLICA STATUS.

-- MySQL 8.0 · run on each replica · read-only.
-- Context: Seconds_Behind_Source NULL means replication is broken, not caught up — alert on it.
SHOW REPLICA STATUS\G   -- inspect Seconds_Behind_Source and Replica_SQL_Running

Watch for long-held locks and blocked sessions on the primary; a backfill batch should never appear here for more than a moment.

-- PostgreSQL · run against the PRIMARY · read-only.
-- Context: rows here mean a batch is blocked — investigate before the pool fills.
SELECT pid, state, wait_event_type, wait_event, now() - query_start AS runtime, query
FROM pg_stat_activity
WHERE state <> 'idle' AND query ILIKE '%orders%'
ORDER BY runtime DESC;

Finally, track progress against total: remaining / total should fall monotonically. A stalled cursor means the loop is stuck behind lag (expected, recoverable) or a batch is erroring (investigate). For the full script-level treatment of resumability and metrics, see Optimizing Backfill Scripts for Zero-Downtime Deploys.

Rollback Path

A backfill that only ever writes the value it would write again is safe to abandon — stopping it leaves the table partially filled but uncorrupted, and a re-run finishes the job. So the default rollback is simply halt and leave in place: flip the feature flag or send SIGTERM, and the IS NULL guard means a later resume picks up exactly where it stopped.

Destructive rollback — emptying the new column entirely — is only correct when the backfill wrote wrong values (a bad transform) and the column is not yet read by any code path.

-- PostgreSQL · run as migration role · ONLY safe while the new column is unread by all versions.
-- Context: this clears backfilled data; confirm no read path and no dual-write depends on it first.
UPDATE orders SET tenant_id = NULL WHERE tenant_id = 'default';  -- reset only the backfilled value

Never TRUNCATE or DROP the table to undo a backfill if it also holds live production rows — that destroys data the dual-write path is actively maintaining. Reset only the column, only when it is provably unread.

Common Errors & Fixes

Seq Scan on every batch (slow, climbing CPU). Root cause: the chunking column is unindexed or the planner cannot use it for a range scan. Fix: ensure the primary key or a dedicated index covers the WHERE id > :cursor predicate; rebuild stale statistics with ANALYZE orders.

Replica lag climbs and never recovers. Root cause: batches commit faster than the slowest replica applies them. Fix: lower the batch size and confirm the loop samples the slowest replica, not the average; details in Tuning Backfill Batch Size Against Replication Lag.

ERROR: deadlock detected (PostgreSQL) / ERROR 1213 (MySQL). Root cause: the backfill and live traffic update overlapping rows in opposite key order. Fix: always page in ascending key order so the backfill and application lock rows consistently; add bounded retry with backoff.

Duplicate or doubled values after a worker restart. Root cause: the write is not idempotent, so a re-applied batch added rather than guarded. Fix: add ON CONFLICT DO NOTHING for inserts and a WHERE new_col IS NULL guard for updates so any replay is a no-op.

Backfill “complete” but new rows still appear NULL. Root cause: the application was not dual-writing yet, so rows created during the window were never filled. Fix: enable dual-write before backfilling, then run one more pass; the durable solution is Dual-Write Synchronization.

Child Page Index

This section drills into the three decisions that make or break a large backfill. Optimizing Backfill Scripts for Zero-Downtime Deploys covers the script architecture — resumable cursors, idempotent writes, and crash recovery — that lets a job run for hours unattended. Tuning Backfill Batch Size Against Replication Lag covers how to derive the right batch size from your slowest follower’s apply throughput instead of guessing. Cursor-Based vs Keyset Pagination for Large Backfills covers why LIMIT/OFFSET degrades to a sequential scan and how keyset pagination keeps every page cheap.

Frequently Asked Questions

How big should each backfill batch be? Small enough that the slowest replica never falls behind its lag budget — usually 500 to 1000 rows to start. The correct number is not a constant; it is whatever rate keeps replication lag flat. Start conservative, watch the lag graph, and widen the batch only if the slowest follower stays comfortably inside budget.

Why not just run one big UPDATE and let it finish faster? Because one large transaction generates a wall of WAL or binlog that every replica must replay serially, spiking lag and stalling reads served from followers; it also holds row locks for the whole duration and bloats the table. Batching keeps each transaction small enough that production never notices the job is running.

Do I need dual-write before backfilling? If the table takes live writes, yes. Without dual-write, rows inserted or updated during the backfill window are written without the new column populated, so the job appears to finish with fresh gaps behind it. Enable dual-write first so the backfill only has to handle pre-existing rows.

What happens if the backfill worker crashes halfway? With an idempotent design — keyset cursor plus IS NULL write guards — nothing is lost. Restart the worker; it resumes from the last persisted cursor (or re-derives it from the first unfilled row) and re-applies any partial batch harmlessly because the guards make repeats no-ops.