Read/Write Splitting Tactics
Routing reads to replicas and writes to the primary is the standard way to scale a database, and during a schema migration it becomes a safety mechanism — but only if you get the routing exactly right. A migration changes the shape of the data on the primary, then waits for that change to flow down the replication stream to every replica. In the gap between, a replica is running the old schema while the primary is running the new one. Route a query that names a new column to a lagging replica and it fails; route a write to a replica and replication breaks. Read/write splitting during a migration is the discipline of classifying every statement correctly — DDL and writes to the primary, reads to replicas only when they have caught up — and watching replication lag as the gate that decides when a replica is safe to read.
This page is part of the Zero-Downtime Schema Evolution Patterns section and depends on the additive guarantees of the Expand and Contract Methodology: split routing is only backward-safe while both the old and new schema can satisfy a read. It serves backend engineers, DBAs tuning the proxy, and platform teams running the topology. Examples target PostgreSQL 11+ streaming replication and MySQL 8.0 with a proxy such as ProxySQL.
Concept & Mechanism
Replication is asynchronous by default on both PostgreSQL streaming replication and MySQL. The primary commits, writes the change to its WAL or binlog, and replicas pull and apply it some time later. That delay — replication lag — is normally a staleness problem for reads. During a migration it becomes a schema problem: a DDL statement is just another entry in the replication stream, so for a window after the primary runs ALTER TABLE orders ADD COLUMN order_status, the replicas have not yet applied it and their copy of orders has no such column. A read of order_status routed to that replica fails with column does not exist.
Two routing rules keep this safe. First, all DDL and all writes go to the primary — replicas are read-only and cannot accept either; sending a write to one breaks replication or errors outright. Second, reads go to a replica only while that replica’s schema can satisfy them, which during a migration means only after the DDL has replicated. The additive nature of the Expand and Contract Methodology is what makes the in-between window survivable: while a replica still runs the old schema, a backward-compatible read that names only old columns still succeeds against it. Reads that name the new column must wait — or be pinned to the primary — until lag clears.
The proxy classifies statements to enforce this. A connection-pooling proxy parses each statement: SELECT to the replica hostgroup, INSERT/UPDATE/DELETE/DDL and anything inside an explicit transaction to the primary hostgroup. Transactions route whole to the primary because a read-after-write inside one must see its own uncommitted change, which a replica cannot. Getting that DDL-versus-DML classification right under migration conditions is the focus of Routing DDL vs DML Traffic During Migrations.
Prerequisites & Decision Criteria
Split routing during a migration pays off when read volume is high enough that the primary cannot absorb it alone, or when you want replicas to stress-test a new index under real load. It adds risk when reads must be strictly current. Confirm the following first.
pg_last_xact_replay_timestamp()orSHOW REPLICA STATUS), not a guess.
| Read requirement | Route to | Rationale |
|---|---|---|
| Must reflect a write just made (read-after-write) | Primary | Replicas may not have the change yet |
| Names a column added by the in-flight migration | Primary until replicas catch up | Lagging replica lacks the column |
| Backward-compatible (old columns only), tolerates staleness | Replica | Old schema satisfies it; offloads the primary |
| Any write or DDL | Primary | Replicas are read-only |
| Inside an explicit transaction | Primary (whole transaction) | Isolation and read-after-write within the transaction |
Step-by-Step Procedure
The example routes MySQL 8.0 traffic through ProxySQL during an additive migration.
Step 1 — Define routing rules: writes and transactions to the primary, reads to replicas. Configure the rules on the ProxySQL admin interface.
-- ProxySQL admin (port 6032) · run as the admin user · changes the live routing table
-- Hostgroup 0 = primary, hostgroup 1 = replicas. Order matters: transactions before SELECT.
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply)
VALUES
(1, 1, '^(START TRANSACTION|BEGIN|INSERT|UPDATE|DELETE|ALTER|CREATE|DROP)', 0, 1),
(2, 1, '^SELECT ', 1, 1);
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;
Verify before proceeding: confirm a SELECT resolves to hostgroup 1 and an ALTER to hostgroup 0 before applying any schema change.
Step 2 — Run the additive DDL on the primary. The migration runs only against the primary; it replicates from there. Pin the migration connection to the primary explicitly so the proxy never misroutes it.
-- PostgreSQL · migration role · runs on the PRIMARY only · CREATE INDEX CONCURRENTLY outside a transaction
-- Replicas receive this through the WAL stream; do not run DDL against a replica.
SET lock_timeout = '3s';
ALTER TABLE orders ADD COLUMN IF NOT EXISTS order_status VARCHAR(50);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_orders_order_status ON orders (order_status);
Step 3 — Wait for replicas to apply the change before reading the new column. Until lag clears, pin reads that name order_status to the primary; backward-compatible reads can continue against replicas. Topology and replica setup details live in Configuring Read Replicas for Seamless Schema Updates.
Step 4 — Drain any replica that exceeds the lag threshold. A replica past the budget is removed from the read pool until it recovers, so no stale or missing-column read reaches it. Backfill jobs that drive lag should throttle per Backfill Optimization.
-- ProxySQL admin · admin user · pull a lagging replica out of the read pool
-- OFFLINE_SOFT lets in-flight queries finish, then stops new reads to that node.
UPDATE mysql_servers SET status = 'OFFLINE_SOFT'
WHERE hostgroup_id = 1 AND hostname = 'replica-02';
LOAD MYSQL SERVERS TO RUNTIME;
Verify before proceeding: confirm no new SELECT digests hit the drained replica before continuing.
Step 5 — Re-add the replica once caught up and shift reads onto the new column. When lag is back under threshold and the replica has the new column, return it to the pool and route new-column reads to replicas.
Verification & Observability
The decision that governs everything is per-replica lag. Measure it directly and watch routing actually follow it.
-- PostgreSQL · monitoring role · run ON THE REPLICA · read-only · seconds of replay lag
-- Above your threshold, the replica must be drained from the read pool.
SELECT now() - pg_last_xact_replay_timestamp() AS replica_lag;
-- MySQL 8.0 · run ON THE REPLICA · user with REPLICATION CLIENT · read-only
-- Seconds_Behind_Source is the lag; a non-NULL Last_SQL_Errno means apply has stalled.
SHOW REPLICA STATUS\G
-- ProxySQL admin · admin user · read-only · confirm reads land on replicas, writes on primary
-- digest_text starting with SELECT should show hostgroup 1; writes should show hostgroup 0.
SELECT hostgroup, digest_text, count_star FROM stats_mysql_query_digest
WHERE digest_text LIKE 'SELECT%' ORDER BY count_star DESC LIMIT 10;
The gates: routing is correct when every write digest shows the primary hostgroup and read digests show replica hostgroups; a replica is read-eligible only when its lag is under the threshold and it has applied the migration DDL; the migration is fully cut over when new-column reads route to replicas with no column does not exist errors in the logs.
Rollback Path
Rollback here is reverting routing, not reverting schema — the additive migration stays in place.
- Revert read routing to primary-only: the fastest mitigation when replicas misbehave is to send all reads to the primary. Safe whenever the primary can absorb the read load.
-- ProxySQL admin · admin user · removes the split, sending all traffic to the primary
-- Safe as an emergency stop; monitor primary load since it now serves reads too.
DELETE FROM mysql_query_rules WHERE rule_id IN (1, 2);
LOAD MYSQL QUERY RULES TO RUNTIME;
- Drain a specific bad replica: if only one node is stale or erroring,
OFFLINE_SOFTthat node (Step 4) rather than collapsing the whole split. - If a write reached a replica: replication is now broken on that node. Do not attempt to merge it — rebuild the replica from a fresh base backup or
pg_basebackupsnapshot. The safe condition for returning it to the pool is a clean resync, not a manual fix. - What rollback does NOT do: it never drops the migrated column. The schema change already replicated to the caught-up replicas and is backward compatible, so routing can revert independently of the schema.
Common Errors & Fixes
ERROR: column "order_status" does not exist on reads — a read naming the new column reached a replica that has not applied the DDL. Root cause: routed new-column reads to replicas before lag cleared. Fix: pin new-column reads to the primary until every replica reports the migration applied, then shift them; drain any replica still behind.
ERROR: cannot execute INSERT in a read-only transaction — a write was routed to a replica. Root cause: the proxy classified a write as a read, often because it sat inside a transaction the rule did not capture, or a stored-procedure call masked the write. Fix: route whole transactions and DDL keywords to the primary explicitly (Step 1), and route CALL to the primary as well.
Stale reads after a write (read-after-write failure) — a user writes, then immediately reads old data. Root cause: the read went to a lagging replica instead of the primary. Fix: route the read-after-write read to the primary, or keep the whole transaction on the primary so the read sees the write.
Replication stops with Last_SQL_Errno set (MySQL) / replica falls permanently behind — the apply thread errored, often on a duplicate key or a DDL conflict. Root cause: a write or a manual change hit the replica directly, diverging it from the primary. Fix: rebuild the replica from a fresh backup; never replay binlog by hand onto a diverged node.
Lag spikes during backfill drain every replica — all replicas cross the threshold at once and read traffic slams the primary. Root cause: an unthrottled backfill outran replica apply. Fix: throttle the backfill on lag per Backfill Optimization, and lower batch size until lag stays under budget.
Child Page Index
This section covers the two halves of split routing during a migration. Configuring Read Replicas for Seamless Schema Updates covers the topology: setting up streaming replicas, wiring the proxy hostgroups, and tuning the lag threshold that decides read eligibility. Routing DDL vs DML Traffic During Migrations covers statement classification: keeping every DDL and write on the primary, routing whole transactions correctly, and pinning new-column reads until replicas catch up. Both extend the parent Zero-Downtime Schema Evolution Patterns overview.
Frequently Asked Questions
Can a read replica ever run schema migrations? No. Replicas are read-only and receive every DDL statement through the replication stream from the primary. Running DDL directly on a replica either errors or diverges it from the primary and breaks replication. The migration runs once on the primary and flows down from there.
How do I stop reads from hitting a replica that lacks the new column? Gate read eligibility on replication lag and DDL application. Until a replica has applied the migration, pin any read that names the new column to the primary and keep only backward-compatible reads on replicas. Drain any replica past the lag threshold from the read pool until it catches up.
Why must whole transactions route to the primary?
A transaction often reads back its own uncommitted write, which a replica cannot see, and splitting statements across nodes would break isolation. Routing the entire transaction to the primary guarantees read-after-write consistency within it. Only stand-alone, tolerant SELECTs are safe on replicas.
A write accidentally hit a replica — can I just delete it there? No. The replica has now diverged from the primary and its replication will eventually error or silently drift. The only safe recovery is to rebuild that replica from a fresh base backup so it is a faithful copy of the primary again.