docs(deployment): rewrite migration prose for goose adoption
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled

Update the deployment book and glossary to reflect the goose-based
schema migration flow shipped in 12b2f9d/0f7450a:

- ch07: clarify startup probe assumes migrations ran out-of-band
- ch08: drop AutoMigrate-with-advisory-lock prose; describe goose Job
- ch12: pod startup checks goose_db_version, no longer runs migrations
- ch14: document the Job→wait→roll deploy gate and how to debug failures
- ch16: add "Migrate Job fails during deploy" + "Schema precondition
  failed" failure modes
- ch17: new runbook entries §26 (run migrations manually), §27 (recover
  from failed/dirty migration), §28 (bootstrap goose on fresh clone)
- ch19: postscript on §13 noting MigrateWithLock approach is superseded
- ch20: mark "Migration Job for schema changes" task done
- glossary: add `goose` and `goose_db_version`; flag AutoMigrate as
  tests-only
- references: add goose links; flag AutoMigrate as tests-only
This commit is contained in:
Trey t
2026-04-26 23:01:32 -05:00
parent 0f7450ada9
commit 8d9ca2e6ed
10 changed files with 260 additions and 39 deletions
+13 -11
View File
@@ -69,20 +69,22 @@ Flexible to Full (strict). Verified by:
- CF edge continues to serve its own Let's Encrypt cert to browsers
- both layers now TLS-encrypted
### Migration Job for schema changes
### ~~Migration Job for schema changes~~ — done (2026-04-26, commit 12b2f9d)
**Why**: Currently every api pod runs `MigrateWithLock()` on startup,
serializing on a Postgres advisory lock. Adds 90-240s to cold startup
and caused bug #13 in Chapter 19.
**What shipped**: pressly/goose as the migration tool, run as a one-shot
Kubernetes Job from `deploy-k3s/manifests/migrate/job.yaml` before
api/worker rollout. The Job uses the api image (goose CLI is baked in
during the Dockerfile build), strips `-pooler` from `DB_HOST` for the
direct-endpoint connection migrations need, and exits in seconds when
there's nothing to apply. `RequireSchemaApplied` in the api/worker
startup checks `goose_db_version` and fails fast on a stale schema.
**How**: Create a Kubernetes `Job` resource that runs the api image
with a `--migrate-only` flag. Job runs once per deploy, completes when
schema is current. api pods get an initContainer that waits for the
Job to complete.
The Go-code-with-`--migrate-only` shape originally proposed here was
rejected in favor of using the upstream goose binary directly — see
[Chapter 8 §Schema management](./08-database.md) for the trade-offs.
Requires Go code change to support `--migrate-only` flag.
**Effort**: 3-4 hours (code + job manifest + testing).
Pre-goose `MigrateWithLock` is gone; ch19 §13 has the historical
postmortem context.
### Redis password