docs(deployment): rewrite migration prose for goose adoption
Update the deployment book and glossary to reflect the goose-based schema migration flow shipped in 12b2f9d/0f7450a: - ch07: clarify startup probe assumes migrations ran out-of-band - ch08: drop AutoMigrate-with-advisory-lock prose; describe goose Job - ch12: pod startup checks goose_db_version, no longer runs migrations - ch14: document the Job→wait→roll deploy gate and how to debug failures - ch16: add "Migrate Job fails during deploy" + "Schema precondition failed" failure modes - ch17: new runbook entries §26 (run migrations manually), §27 (recover from failed/dirty migration), §28 (bootstrap goose on fresh clone) - ch19: postscript on §13 noting MigrateWithLock approach is superseded - ch20: mark "Migration Job for schema changes" task done - glossary: add `goose` and `goose_db_version`; flag AutoMigrate as tests-only - references: add goose links; flag AutoMigrate as tests-only
This commit is contained in:
@@ -175,13 +175,15 @@ doesn't run as root.
|
||||
file writes to the image layer. Go binary doesn't need to write to `/`;
|
||||
only `/tmp` is mutable.
|
||||
|
||||
**`startupProbe.failureThreshold: 48`** (= 48 × 5s = 240s grace) — this
|
||||
was bumped up from the scaffold default of 12. Reason: on first boot,
|
||||
the Go app runs `MigrateWithLock()` which acquires a Postgres advisory
|
||||
lock and runs AutoMigrate. First replica takes ~90s; subsequent
|
||||
replicas wait on the lock. With 3 replicas all starting simultaneously
|
||||
and the lock serializing them, 240s is the right grace. See
|
||||
[Chapter 19](./19-postmortem-swarm.md) for the detailed story.
|
||||
**`startupProbe.failureThreshold: 48`** (= 48 × 5s = 240s grace) —
|
||||
historically bumped from the scaffold default of 12 to absorb in-replica
|
||||
migration time. Now that migrations run out-of-band as a Kubernetes
|
||||
Job ([Chapter 8 §Schema management](./08-database.md)), pods boot in
|
||||
seconds and only need a few probe failures of grace, but the budget
|
||||
stays at 240s because cold pods on a fresh Hetzner node still pay
|
||||
~10s for image pull + startup. See
|
||||
[Chapter 19 §13](./19-postmortem-swarm.md) for the historical
|
||||
context (the in-replica advisory-lock approach this replaced).
|
||||
|
||||
**`readinessProbe.initialDelaySeconds: 5`** — after the startupProbe
|
||||
passes, wait 5s before starting readiness checks. Prevents a racy
|
||||
|
||||
Reference in New Issue
Block a user