Adopt pressly/goose for schema migrations
Replaces the previous hand-rolled MigrateWithLock + GORM AutoMigrate path,
which had two compounding problems:
- AutoMigrate ran on every pod startup (~5 min over the transatlantic
link) even when no schema changes had landed
- pg_advisory_lock is session-scoped, which silently fails through
Neon's pgbouncer transaction-mode pooler — turns out this is a
known and documented limitation that bites golang-migrate too
Goose was chosen over golang-migrate (the other heavyweight) because:
- Goose wraps each migration file in a transaction by default, so a
failure rolls back cleanly instead of leaving a "dirty" version
state requiring manual force-reset (golang-migrate's known
weakness, per its own issue tracker — see #1001 + Atlas's writeup)
- Goose's locking is opt-in. We don't opt in: migrations run as a
single Kubernetes Job, which IS the singleton process. No advisory
lock needed at all.
Layout:
- migrations/000001_init.sql — schema-only pg_dump of the live Neon
DB at adoption, stripped of psql-only directives that block goose's
bookkeeping insert. Pre-goose hand-numbered migrations 002-022 had
their effects folded into this baseline; deleted from the live tree
but preserved in git history at 58e6997.
- Dockerfile installs `goose v3.22.1` at build time and copies the
binary into the api image. The migrate Job reuses the api image with
command=goose, so no separate image to build/push/version.
- deploy-k3s/manifests/migrate/job.yaml: a one-shot Job that strips
the -pooler segment from DB_HOST (advisory lock won't survive
pgbouncer transaction-mode), runs `goose up`, exits.
- deploy-k3s/scripts/03-deploy.sh: deletes any prior Job, applies the
fresh one, `kubectl wait --for=condition=complete --timeout=10m`,
then proceeds with api/worker rollout. Job failure aborts the deploy
before any new app pod sees a stale schema.
- internal/database/database.go::RequireSchemaApplied checks
goose_db_version on startup. api/worker refuse to boot if the
table is missing or its latest row has is_applied=false — the
fail-fast for "operator forgot to run migrate."
- Makefile: migrate-up / migrate-down / migrate-status / migrate-new
for local workflow.
Production DB was bootstrapped manually:
$ goose -dir migrations postgres "$DSN" version # creates table
$ psql ... -c "INSERT INTO goose_db_version (version_id, is_applied, tstamp) VALUES (1, true, NOW());"
Smoke test against fresh Postgres locally: 50 user tables created in
284ms via `goose up`, version_id=1 + is_applied=t recorded.
Verified the local goose CLI talks to prod successfully:
$ goose ... status
Applied At Migration
=======================================
Mon Apr 27 03:43:55 2026 -- 000001_init.sql
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,75 @@
|
||||
# One-shot migration Job. Runs goose against Neon's *direct* (non-pooler)
|
||||
# endpoint, applies any pending migrations from /app/migrations (baked into
|
||||
# the api image), exits.
|
||||
#
|
||||
# 03-deploy.sh deletes any prior Job, applies this one, waits for completion
|
||||
# with `kubectl wait --for=condition=complete`, and rolls api/worker only
|
||||
# after the Job succeeds. A Job failure aborts the whole deploy.
|
||||
#
|
||||
# We reuse the api image rather than build a separate one — the api Dockerfile
|
||||
# already installs the goose CLI to /usr/local/bin/goose and copies the
|
||||
# migrations directory to /app/migrations.
|
||||
apiVersion: batch/v1
|
||||
kind: Job
|
||||
metadata:
|
||||
name: honeydue-migrate
|
||||
namespace: honeydue
|
||||
labels:
|
||||
app.kubernetes.io/name: migrate
|
||||
app.kubernetes.io/part-of: honeydue
|
||||
spec:
|
||||
backoffLimit: 0 # fail fast — no silent retries on a bad migration
|
||||
ttlSecondsAfterFinished: 86400 # keep finished Job for 24h so logs are inspectable
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app.kubernetes.io/name: migrate
|
||||
app.kubernetes.io/part-of: honeydue
|
||||
spec:
|
||||
restartPolicy: Never
|
||||
imagePullSecrets:
|
||||
- name: ghcr-credentials
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
runAsGroup: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
containers:
|
||||
- name: goose
|
||||
image: IMAGE_PLACEHOLDER # Replaced by 03-deploy.sh — same as api
|
||||
command: ["/bin/sh", "-c"]
|
||||
# DB_HOST in the ConfigMap points at the -pooler endpoint for runtime.
|
||||
# goose's session-scoped advisory lock can't survive PgBouncer
|
||||
# transaction-mode, so we strip the -pooler segment for migrations.
|
||||
# `set -e` so any sub-command failure exits non-zero.
|
||||
args:
|
||||
- |
|
||||
set -e
|
||||
DIRECT_HOST=$(echo "$DB_HOST" | sed 's/-pooler\.\(.*\)$/.\1/')
|
||||
echo "[migrate] running goose up against $DIRECT_HOST"
|
||||
exec /usr/local/bin/goose \
|
||||
-dir /app/migrations \
|
||||
postgres "host=$DIRECT_HOST port=$DB_PORT user=$POSTGRES_USER password=$POSTGRES_PASSWORD dbname=$POSTGRES_DB sslmode=$DB_SSLMODE" \
|
||||
up
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: true
|
||||
capabilities:
|
||||
drop: ["ALL"]
|
||||
envFrom:
|
||||
- configMapRef:
|
||||
name: honeydue-config
|
||||
env:
|
||||
- name: POSTGRES_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: POSTGRES_PASSWORD
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 64Mi
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 256Mi
|
||||
Reference in New Issue
Block a user