Applying migrations safely
A migration that passes Squawk's lint is not automatically safe to run.
To safely apply a migration you must set a lock_timeout
in Postgres. See below for more information.
safety requirements
- Use Squawk to lint your migrations. Follow the advice.
- Set a short
lock_timeout
(e.g. 1 second) within Postgres when running your migrations. If you hit thelock_timeout
, great, retry your migration until it succeeds. - Recommended: Set a
statement_timeout
(e.g. 5 seconds) to prevent runaway migrations from using too many resources & holding locks for too long.
lock_timeout
You must configure a lock_timeout
within Postgres while running your migrations. Without this your migration could block other writes while trying to grab a lock on the table.
You can set this timeout in a few ways:
- globally in Postgres (see below)
- for a specific database role
- via the options connection parameter of your client
- with the
PGOPTIONS
environment variable if your client supports it. Python'spyscogpg2
supports this option. (see below)
-- set the global Postgres lock timeout to 1 second
SET lock_timeout = '1s';
PGOPTIONS='-c lock_timeout=1s' ./migrate_db
With a short lock_timeout
of 1 second, queries will be blocked for up to 1 second. If your migration hits the lock timeout, it will be cancelled and error, allowing the waiting queries to proceed. You should retry a migration that hits the lock timeout until it succeeds.
statement_timeout
It's recommended to use a statement_timeout
whenever you're querying your database.
For migrations, you want to ensure there's an upper bound on how long your queries take. This prevents runaway migrations from using too many resources.
Some queries will take a while, like concurrently creating an index on a large table, and that's okay, so you may have to tweak this based on your database size:
set statement_timeout = '5s';
example
This statement requires an AccessExclusiveLock
lock on "accounts"
. Reads and writes to "accounts"
will be blocked while this statement is running.
ALTER TABLE "accounts" ADD CONSTRAINT "positive_balance" CHECK ("balance" >= 0) NOT VALID
If there is a long running query or open transaction, this ALTER TABLE
statement could be blocked while it waits for an AccessExclusiveLock
lock. While the statement waits for a lock, all other reads and writes sent after will wait for this statement too, bringing your application to a halt.
With a short lock_timeout
of 1 second, queries will be blocked for up to 1 second. If your migration hits the lock timeout, it will be cancelled and error, allowing the waiting queries to proceed. You should retry a migration that hits the lock timeout until it succeeds.
further reading
Benchling's "Move fast and migrate things: how we automated migrations in Postgres" and GoCardless's "Zero-downtime Postgres migrations - the hard parts" provide more background on lock_timeout
and statement_timeout
in a production environment.