docs(deployment): rewrite migration prose for goose adoption
Update the deployment book and glossary to reflect the goose-based schema migration flow shipped in 12b2f9d/0f7450a: - ch07: clarify startup probe assumes migrations ran out-of-band - ch08: drop AutoMigrate-with-advisory-lock prose; describe goose Job - ch12: pod startup checks goose_db_version, no longer runs migrations - ch14: document the Job→wait→roll deploy gate and how to debug failures - ch16: add "Migrate Job fails during deploy" + "Schema precondition failed" failure modes - ch17: new runbook entries §26 (run migrations manually), §27 (recover from failed/dirty migration), §28 (bootstrap goose on fresh clone) - ch19: postscript on §13 noting MigrateWithLock approach is superseded - ch20: mark "Migration Job for schema changes" task done - glossary: add `goose` and `goose_db_version`; flag AutoMigrate as tests-only - references: add goose links; flag AutoMigrate as tests-only
This commit is contained in:
@@ -327,6 +327,55 @@ KUBECONFIG=~/.kube/honeydue.yaml bash deploy-k3s/scripts/03-deploy.sh --skip-bui
|
||||
Cold-handshake latency goes back up (~440ms first hit) but the API
|
||||
keeps serving. Switch back when the pooler recovers.
|
||||
|
||||
#### Migrate Job fails during deploy
|
||||
|
||||
**Symptom**: `03-deploy.sh` aborts at the migrations step:
|
||||
```
|
||||
[deploy][error] migrations did not complete cleanly; aborting deploy
|
||||
```
|
||||
api/worker pods are NOT updated — they keep running the previous
|
||||
revision. This is the intentional fail-fast.
|
||||
|
||||
**Recovery**:
|
||||
```bash
|
||||
# 1. See the failure
|
||||
kubectl -n honeydue logs job/honeydue-migrate --tail=200
|
||||
|
||||
# 2. Common cause: a SQL error in the migration file. Fix the file
|
||||
# locally, commit, retry the deploy. The Job is idempotent —
|
||||
# successful prior versions stay applied; only the failed file
|
||||
# re-runs.
|
||||
git add migrations/000NNN_*.sql
|
||||
git commit -m "Fix migration NNN"
|
||||
git push gitea master
|
||||
bash deploy-k3s/scripts/03-deploy.sh
|
||||
|
||||
# 3. Other cause: Neon down or auth changed. Test direct connection:
|
||||
DB_PASS=$(kubectl -n honeydue get secret honeydue-secrets \
|
||||
-o jsonpath='{.data.POSTGRES_PASSWORD}' | base64 -d)
|
||||
docker run --rm -e PGPASSWORD="$DB_PASS" postgres:17-alpine \
|
||||
psql "host=ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech \
|
||||
user=neondb_owner dbname=honeyDue sslmode=require" -c "SELECT 1;"
|
||||
```
|
||||
**Why no automatic retry**: `backoffLimit: 0` on the Job is deliberate.
|
||||
A failing migration almost never gets unstuck by retrying — needs an
|
||||
operator to look. See [Chapter 17 §27](./17-runbook.md) for recovery
|
||||
playbook.
|
||||
|
||||
#### api refuses to start: "Schema precondition failed"
|
||||
|
||||
**Symptom**: api pods log `Schema precondition failed` and exit
|
||||
immediately after DB connect.
|
||||
**Cause**: `goose_db_version` table is missing or its latest row has
|
||||
`is_applied=false`. Means the migrate Job either was never run or
|
||||
ran and rolled back.
|
||||
**Recovery**: run the migrate Job manually (see
|
||||
[Chapter 17 §26](./17-runbook.md)). After it completes successfully,
|
||||
delete the failing api pods so they restart with a fresh schema check:
|
||||
```bash
|
||||
kubectl -n honeydue rollout restart deploy/api
|
||||
```
|
||||
|
||||
#### Backblaze B2 outage
|
||||
|
||||
**Symptom**: image uploads fail; image downloads fail unless cached by
|
||||
|
||||
Reference in New Issue
Block a user