Database migrations

We use alembic for our database migrations.

Create a new migration

This is explained on the API Github repository.

Browse migrations

They can be found on the API Github repository.

New column defaults

It’s important to be careful when adding new columns with default values. Though it’s tempting to do this all at once, setting the DEFAULT at ALTER TABLE time, this tends to cause locking problems and interruptions when working on a database in live use. The problem being that the operation has to take an exclusive lock on the table while each row is re-written with the new non-null default value. And this can take some time on larger tables, all the while regular users (even read-only ones!) can’t access the table in any way.

Further information:

The solution is to add the new column as a nullable, DEFAULT-less column initially and to set the default value in a second command. This will leave the existing rows as NULL but will use the default value for newly created rows. It is often suggested that the existing rows are left to be back-filled by setting each row’s value to its default whenever an old row needs to be updated and allowing the values to be populated “naturally” over the course of time, however I suspect in practise people would find this difficult to manage and annoying if they’re itching to be able to apply a NOT NULL constraint to the new column.

In such situations one possible approach may be to set rows to their defaults afterwards in batches using UPDATE commands. However, doing all of these in the same transaction wouldn’t solve the problem as UPDATE touching a row takes an exclusive lock on that row, and locks are only released at the end of a transaction. By the end of the transaction, all rows would be locked. To be effective, each of these batch updates would have to be done in their own transaction and probably therefore separate migrations. So the table-altering migration would be followed up by a number of (let’s say, 8?) similar migrations which simply performed one of the commands:

UPDATE some_table SET some_new_field = DEFAULT WHERE id % 8 = 0;

through:

UPDATE some_table SET some_new_field = DEFAULT WHERE id % 8 = 7;

This has the advantages that only 1/8 of the table ever gets locked at a time and that 1/8 only gets locked for 1/8 of the time. 8 is only chosen as an example here, but I don’t expect any developer will have the patience to create many more migrations than that.

These could now be safely followed up by a migration that adds a NOT NULL constraint if so needed.