docs(deployment): rewrite migration prose for goose adoption
Update the deployment book and glossary to reflect the goose-based schema migration flow shipped in 12b2f9d/0f7450a: - ch07: clarify startup probe assumes migrations ran out-of-band - ch08: drop AutoMigrate-with-advisory-lock prose; describe goose Job - ch12: pod startup checks goose_db_version, no longer runs migrations - ch14: document the Job→wait→roll deploy gate and how to debug failures - ch16: add "Migrate Job fails during deploy" + "Schema precondition failed" failure modes - ch17: new runbook entries §26 (run migrations manually), §27 (recover from failed/dirty migration), §28 (bootstrap goose on fresh clone) - ch19: postscript on §13 noting MigrateWithLock approach is superseded - ch20: mark "Migration Job for schema changes" task done - glossary: add `goose` and `goose_db_version`; flag AutoMigrate as tests-only - references: add goose links; flag AutoMigrate as tests-only
This commit is contained in:
@@ -397,6 +397,35 @@ should reflect reality, not be optimistic.
|
||||
**Moral**: Healthchecks should be realistic, not aspirational. Know
|
||||
what your app actually does at startup.
|
||||
|
||||
#### Postscript (2026-04-26): the whole `MigrateWithLock` shape was wrong
|
||||
|
||||
A few months after the Swarm migration, switching `DB_HOST` to Neon's
|
||||
`-pooler` endpoint for runtime perf wins broke this code completely:
|
||||
`pg_advisory_lock` is session-scoped, but PgBouncer transaction-mode
|
||||
multiplexes statements across backend Postgres sessions, so the lock
|
||||
appeared to be held but actually wasn't. Pods hung at
|
||||
"Acquiring migration advisory lock..." and the startup probe killed
|
||||
them in turn.
|
||||
|
||||
After a brief band-aid (route migrations through the direct endpoint;
|
||||
bump probe to 600s to absorb 5-minute AutoMigrate runs over the slow
|
||||
direct connection — both reverted), we abandoned the runtime-side
|
||||
migration story entirely and adopted [pressly/goose](https://github.com/pressly/goose)
|
||||
in commit `12b2f9d`:
|
||||
|
||||
- Migrations run as a one-shot Kubernetes Job before any api/worker
|
||||
pod rolls. No more in-replica migration, no more advisory lock,
|
||||
no more startup probe gymnastics.
|
||||
- `RequireSchemaApplied` checks `goose_db_version` at startup and
|
||||
refuses to boot on a stale schema — fail-fast for "operator
|
||||
forgot to run migrate," instead of mysterious runtime errors.
|
||||
- `failureThreshold` reverted to its pre-MigrateWithLock value.
|
||||
Pods boot in seconds again.
|
||||
|
||||
See [Chapter 8 §Schema management](./08-database.md) for the goose
|
||||
shape. This entire sub-section is preserved as historical context
|
||||
for why we walked the path we did.
|
||||
|
||||
## What we learned
|
||||
|
||||
### Docker Swarm is in a bad place in 2026
|
||||
|
||||
Reference in New Issue
Block a user