docs(deployment): rewrite migration prose for goose adoption

Update the deployment book and glossary to reflect the goose-based schema migration flow shipped in 12b2f9d/0f7450a: - ch07: clarify startup probe assumes migrations ran out-of-band - ch08: drop AutoMigrate-with-advisory-lock prose; describe goose Job - ch12: pod startup checks goose_db_version, no longer runs migrations - ch14: document the Job→wait→roll deploy gate and how to debug failures - ch16: add "Migrate Job fails during deploy" + "Schema precondition failed" failure modes - ch17: new runbook entries §26 (run migrations manually), §27 (recover from failed/dirty migration), §28 (bootstrap goose on fresh clone) - ch19: postscript on §13 noting MigrateWithLock approach is superseded - ch20: mark "Migration Job for schema changes" task done - glossary: add `goose` and `goose_db_version`; flag AutoMigrate as tests-only - references: add goose links; flag AutoMigrate as tests-only
2026-04-26 23:01:32 -05:00
parent 0f7450ada9
commit 8d9ca2e6ed
10 changed files with 260 additions and 39 deletions
@@ -317,10 +317,47 @@ Timeline (approximate, warm state):
 - t=60s: another old pod terminates
 - ...continues until all on new RS

-For cold-boot (e.g., first deploy on a rebuilt cluster), the
-MigrateWithLock advisory lock extends this to several minutes. But the
-rollout is serialized — only one pod starts per iteration, so the lock
-queue is small.
+Migrations run as a separate Kubernetes Job that completes before any
+api/worker pod is rolled. So the rollout above never includes migration
+work — pods that boot are guaranteed to find the schema already at the
+expected version. See §"Migrations are gated, not interleaved" below.
+
+## Migrations are gated, not interleaved
+
+`03-deploy.sh` runs `goose up` as a one-shot Job before applying any
+api/worker manifests:
+
+```
+1. kubectl delete job honeydue-migrate (idempotent, removes prior run)
+2. kubectl apply -f manifests/migrate/job.yaml (with current api image)
+3. kubectl wait --for=condition=complete --timeout=10m job/honeydue-migrate
+4. (only if Job succeeded) kubectl apply -f manifests/api/...
+```
+
+The Job uses the api image — `/usr/local/bin/goose` is baked in at
+Dockerfile build time. The Job script strips the `-pooler` segment
+from `DB_HOST` before connecting (goose's session-scoped advisory
+lock can't survive PgBouncer transaction-mode), runs `goose up`, exits.
+
+If the Job fails, the script aborts before any new app pod sees a
+stale schema. To debug:
+
+```bash
+kubectl -n honeydue logs job/honeydue-migrate --tail=200
+kubectl -n honeydue describe job honeydue-migrate
+```
+
+After investigating, fix the migration file and re-run `03-deploy.sh`.
+The Job is idempotent — successful migrations stay applied, only the
+new/failed file gets retried.
+
+api/worker pods run a `RequireSchemaApplied` check at startup that
+queries `goose_db_version` and refuses to boot if the table is missing
+or the latest row is `is_applied=false`. This is the fail-fast for
+"someone bypassed the deploy script and the schema isn't current."
+
+For full schema management background, see
+[Chapter 8 §Schema management](./08-database.md).

 ## Hotfix workflow