docs(deployment): rewrite migration prose for goose adoption
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled

Update the deployment book and glossary to reflect the goose-based
schema migration flow shipped in 12b2f9d/0f7450a:

- ch07: clarify startup probe assumes migrations ran out-of-band
- ch08: drop AutoMigrate-with-advisory-lock prose; describe goose Job
- ch12: pod startup checks goose_db_version, no longer runs migrations
- ch14: document the Job→wait→roll deploy gate and how to debug failures
- ch16: add "Migrate Job fails during deploy" + "Schema precondition
  failed" failure modes
- ch17: new runbook entries §26 (run migrations manually), §27 (recover
  from failed/dirty migration), §28 (bootstrap goose on fresh clone)
- ch19: postscript on §13 noting MigrateWithLock approach is superseded
- ch20: mark "Migration Job for schema changes" task done
- glossary: add `goose` and `goose_db_version`; flag AutoMigrate as
  tests-only
- references: add goose links; flag AutoMigrate as tests-only
This commit is contained in:
Trey t
2026-04-26 23:01:32 -05:00
parent 0f7450ada9
commit 8d9ca2e6ed
10 changed files with 260 additions and 39 deletions
+49
View File
@@ -327,6 +327,55 @@ KUBECONFIG=~/.kube/honeydue.yaml bash deploy-k3s/scripts/03-deploy.sh --skip-bui
Cold-handshake latency goes back up (~440ms first hit) but the API
keeps serving. Switch back when the pooler recovers.
#### Migrate Job fails during deploy
**Symptom**: `03-deploy.sh` aborts at the migrations step:
```
[deploy][error] migrations did not complete cleanly; aborting deploy
```
api/worker pods are NOT updated — they keep running the previous
revision. This is the intentional fail-fast.
**Recovery**:
```bash
# 1. See the failure
kubectl -n honeydue logs job/honeydue-migrate --tail=200
# 2. Common cause: a SQL error in the migration file. Fix the file
# locally, commit, retry the deploy. The Job is idempotent —
# successful prior versions stay applied; only the failed file
# re-runs.
git add migrations/000NNN_*.sql
git commit -m "Fix migration NNN"
git push gitea master
bash deploy-k3s/scripts/03-deploy.sh
# 3. Other cause: Neon down or auth changed. Test direct connection:
DB_PASS=$(kubectl -n honeydue get secret honeydue-secrets \
-o jsonpath='{.data.POSTGRES_PASSWORD}' | base64 -d)
docker run --rm -e PGPASSWORD="$DB_PASS" postgres:17-alpine \
psql "host=ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech \
user=neondb_owner dbname=honeyDue sslmode=require" -c "SELECT 1;"
```
**Why no automatic retry**: `backoffLimit: 0` on the Job is deliberate.
A failing migration almost never gets unstuck by retrying — needs an
operator to look. See [Chapter 17 §27](./17-runbook.md) for recovery
playbook.
#### api refuses to start: "Schema precondition failed"
**Symptom**: api pods log `Schema precondition failed` and exit
immediately after DB connect.
**Cause**: `goose_db_version` table is missing or its latest row has
`is_applied=false`. Means the migrate Job either was never run or
ran and rolled back.
**Recovery**: run the migrate Job manually (see
[Chapter 17 §26](./17-runbook.md)). After it completes successfully,
delete the failing api pods so they restart with a fresh schema check:
```bash
kubectl -n honeydue rollout restart deploy/api
```
#### Backblaze B2 outage
**Symptom**: image uploads fail; image downloads fail unless cached by