Update the deployment book and glossary to reflect the goose-based schema migration flow shipped in 12b2f9d/0f7450a: - ch07: clarify startup probe assumes migrations ran out-of-band - ch08: drop AutoMigrate-with-advisory-lock prose; describe goose Job - ch12: pod startup checks goose_db_version, no longer runs migrations - ch14: document the Job→wait→roll deploy gate and how to debug failures - ch16: add "Migrate Job fails during deploy" + "Schema precondition failed" failure modes - ch17: new runbook entries §26 (run migrations manually), §27 (recover from failed/dirty migration), §28 (bootstrap goose on fresh clone) - ch19: postscript on §13 noting MigrateWithLock approach is superseded - ch20: mark "Migration Job for schema changes" task done - glossary: add `goose` and `goose_db_version`; flag AutoMigrate as tests-only - references: add goose links; flag AutoMigrate as tests-only
16 KiB
14 — Deployment Process
Summary
A production deploy is: build a new image, push to Gitea, update the
Deployment's image field with the new SHA, Kubernetes rolls new pods in.
No downtime if the change is backward-compatible. Rollback is
kubectl rollout undo. This chapter walks through the full process,
plus alternate paths (config-only changes, manifest changes, hotfixes).
TL;DR using the unified deploy script
The recommended path. deploy-k3s/scripts/03-deploy.sh builds all four
images (api, worker, admin, web), pushes to Gitea, regenerates the
ConfigMap from config.yaml, applies every manifest under
deploy-k3s/manifests/ (including the observability vmagent), and
waits for all rollouts.
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
git add . && git commit -m "..." && git push gitea master
export KUBECONFIG=~/.kube/honeydue.yaml
bash deploy-k3s/scripts/03-deploy.sh # full build + push + rollout
# or, to redeploy without rebuilding:
bash deploy-k3s/scripts/03-deploy.sh --skip-build
# or, to pin a specific tag:
bash deploy-k3s/scripts/03-deploy.sh --tag d3708e6
What the script does, in order:
- Read registry creds from
deploy-k3s/config.yaml. docker login gitea.treytartt.com.- Build all four images with
--platform linux/amd64(so arm64 Macs don't push images that crash on Hetzner amd64 nodes with "exec format error"). - Push to the gitea registry, plus tag and push
:latest. - Generate the env file from
config.yamland apply as ConfigMaphoneydue-config(uses dry-run + apply for diff-free idempotence). - Apply
manifests/namespace.yaml,redis/,ingress/,api/{deployment,service,hpa},worker/,admin/,web/. - Apply
manifests/observability/vmagent.yaml, substitutingTOKEN_PLACEHOLDERwithOBS_INGEST_TOKENfromdeploy/prod.env(gitignored). Skipped with a warning if the token isn't present. kubectl rollout statusfor every Deployment, including vmagent.
~7–10 minutes for a full rebuild. ~1–2 minutes with --skip-build.
TL;DR for a single-service code change (manual)
# 1. Commit + get SHA
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
git add . && git commit -m "..." && SHA=$(git rev-parse --short HEAD)
# 2. Login to Gitea registry (creds in config.yaml)
docker login gitea.treytartt.com -u admin
# 3. Build + push amd64 image
docker build --platform linux/amd64 --target api \
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" .
docker push "gitea.treytartt.com/admin/honeydue-api:${SHA}"
# 4. Roll it in
export KUBECONFIG=~/.kube/honeydue.yaml
kubectl set image deployment/api -n honeydue \
api="gitea.treytartt.com/admin/honeydue-api:${SHA}"
# 5. Watch
kubectl rollout status -n honeydue deployment/api
# 6. Log out
docker logout gitea.treytartt.com
~3–5 minutes end to end for api.
Gotcha: Deployments default to
imagePullPolicy: IfNotPresent, which means kubelet won't re-fetch an image with a tag it already has cached locally — even if the registry now has different bytes at that tag. Always change tags (use the SHA), or temporarily flipimagePullPolicy: Alwaysandkubectl rollout restartif you need to overwrite a tag.
The build
Step 1 — Prepare
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
git status # clean working tree?
git log -1 --oneline # this is the SHA that'll ship
Step 2 — Login to Gitea
set -a; source deploy/registry.env; set +a
printf '%s' "$REGISTRY_TOKEN" | \
docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin
Note: docker login without --password-stdin writes the token to
shell history. Don't skip the printf trick.
Step 3 — Build + push
SHA=$(git rev-parse --short HEAD)
# For API
docker buildx build \
--platform linux/amd64 \
--target api \
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" \
--push .
# For Worker
docker buildx build \
--platform linux/amd64 \
--target worker \
-t "gitea.treytartt.com/admin/honeydue-worker:${SHA}" \
--push .
# For Admin (Next.js)
docker buildx build \
--platform linux/amd64 \
--target admin \
-t "gitea.treytartt.com/admin/honeydue-admin:${SHA}" \
--push .
--platform linux/amd64— cross-compile from operator's arm64 to Hetzner nodes' amd64--target X— select a stage from the multi-stage Dockerfile--push— push to registry in one step; don't leave image in local Docker
First build is slow (~3–5 min cold). Subsequent builds hit BuildKit layer cache and complete in ~30–60s if only app code changed.
Build platform note
If docker buildx isn't configured:
docker buildx create --name honeydue-builder --use
docker buildx inspect --bootstrap
This creates a BuildKit container that supports cross-platform builds.
The --bootstrap line spins it up immediately so errors surface now
instead of on first build.
The deploy
For a single service
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
kubectl set image deployment/api -n honeydue \
api="gitea.treytartt.com/admin/honeydue-api:${SHA}"
This updates the Deployment's image field. Kubernetes:
- Creates a new ReplicaSet with the new image (annotation records rev)
- Starts a new pod (per
maxSurge: 1) - Waits for readinessProbe to pass on the new pod (up to 240s for cold api boot)
- Once ready, removes a pod from the old ReplicaSet
- Repeats until all pods are on the new ReplicaSet
- Marks rollout complete
Watching the rollout
kubectl rollout status -n honeydue deployment/api
Outputs progress; returns when complete or timed out. Default timeout is 10 minutes.
More detailed:
# Watch pods transition
kubectl get pods -n honeydue -l app.kubernetes.io/name=api -w
# Watch events
kubectl get events -n honeydue --sort-by=.lastTimestamp -w
For all three services
for svc in api worker admin; do
kubectl set image deployment/$svc -n honeydue \
$svc="gitea.treytartt.com/admin/honeydue-${svc}:${SHA}"
done
# Watch all rollouts
for svc in api worker admin; do
kubectl rollout status -n honeydue deployment/$svc
done
Config-only changes (no new image)
When you change prod.env but code is unchanged:
# 1. Update prod.env locally
# 2. Regenerate ConfigMap
kubectl create configmap honeydue-config -n honeydue \
--from-env-file=deploy/prod.env \
--dry-run=client -o yaml | kubectl apply -f -
# 3. Pods do NOT auto-reload env vars. Restart them.
kubectl rollout restart -n honeydue deployment/api deployment/admin deployment/worker
rollout restart triggers a rolling update with the same image but
forces pod recreation. New pods pick up the updated ConfigMap.
Why not auto-reload?
Kubernetes has no built-in mechanism to restart pods on ConfigMap change.
There's no envFromWatch equivalent. Third-party operators like
Reloader can do it, but we don't run one.
For sensitive config (like the SECRET_KEY), this is actually good —
pods don't cycle unexpectedly when someone tweaks the ConfigMap.
Secret changes
Same flow as config:
# Rotate a value
kubectl patch secret honeydue-secrets -n honeydue \
--type=merge -p "{\"data\":{\"SECRET_KEY\":\"$(echo -n 'newvalue' | base64)\"}}"
# Restart pods
kubectl rollout restart -n honeydue deployment/api deployment/worker
Manifest changes
When you add/modify a deployment YAML:
kubectl apply -f deploy-k3s/manifests/api/deployment.yaml
If the change is a spec field that Kubernetes considers a new pod template (e.g., changing resource limits, env, volumes), pods roll. If the change is a scalar like replicas, no pod churn — just new pods added/removed.
Rollback
Last-known-good rollback
kubectl rollout undo deployment/api -n honeydue
Reverts to the previous ReplicaSet (the one with the previous image). Takes ~30s to stabilize.
Rollback to a specific revision
# See revision history
kubectl rollout history deployment/api -n honeydue
# Revert to specific revision number
kubectl rollout undo deployment/api -n honeydue --to-revision=3
Kubernetes keeps up to 10 ReplicaSet revisions by default
(spec.revisionHistoryLimit).
Hard rollback (deploy an older image)
kubectl set image deployment/api -n honeydue \
api="gitea.treytartt.com/admin/honeydue-api:<older-sha>"
Useful when you want to go back further than the revision history, or to a specific known-good SHA.
Rolling update semantics
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
For api (3 replicas):
maxUnavailable: 0— no pod is removed until replacement is readymaxSurge: 1— up to 4 pods exist simultaneously during rollout
Timeline (approximate, warm state):
- t=0: kubectl set image
- t=0: k8s creates new RS with 1 pod
- t=30s (or so): new pod readiness probe passes
- t=30s: k8s terminates 1 old pod
- t=60s: next new pod ready
- t=60s: another old pod terminates
- ...continues until all on new RS
Migrations run as a separate Kubernetes Job that completes before any api/worker pod is rolled. So the rollout above never includes migration work — pods that boot are guaranteed to find the schema already at the expected version. See §"Migrations are gated, not interleaved" below.
Migrations are gated, not interleaved
03-deploy.sh runs goose up as a one-shot Job before applying any
api/worker manifests:
1. kubectl delete job honeydue-migrate (idempotent, removes prior run)
2. kubectl apply -f manifests/migrate/job.yaml (with current api image)
3. kubectl wait --for=condition=complete --timeout=10m job/honeydue-migrate
4. (only if Job succeeded) kubectl apply -f manifests/api/...
The Job uses the api image — /usr/local/bin/goose is baked in at
Dockerfile build time. The Job script strips the -pooler segment
from DB_HOST before connecting (goose's session-scoped advisory
lock can't survive PgBouncer transaction-mode), runs goose up, exits.
If the Job fails, the script aborts before any new app pod sees a stale schema. To debug:
kubectl -n honeydue logs job/honeydue-migrate --tail=200
kubectl -n honeydue describe job honeydue-migrate
After investigating, fix the migration file and re-run 03-deploy.sh.
The Job is idempotent — successful migrations stay applied, only the
new/failed file gets retried.
api/worker pods run a RequireSchemaApplied check at startup that
queries goose_db_version and refuses to boot if the table is missing
or the latest row is is_applied=false. This is the fail-fast for
"someone bypassed the deploy script and the schema isn't current."
For full schema management background, see Chapter 8 §Schema management.
Hotfix workflow
When we need to ship a fix fast and skip the usual steps:
- Fix in code
- Build + push
kubectl set imageon the affected service only- Monitor with
kubectl logs -f
Don't skip CI/tests in a real org; for solo operator this is the tradeoff.
Integration with Gitea
Currently no CI/CD. The operator builds from the workstation and pushes manually. Future:
- Gitea Actions (Drone-like CI) could trigger on push to
main - Build + push step could run in a GitHub Actions-compatible workflow
- Auto-deploy on tag push, manual promote to prod
TODO (Chapter 20).
What the old Swarm deploy script did
Contrast: deploy/scripts/deploy_prod.sh (Swarm-era) did:
- Validate every config file (placeholder detection, APNS key format, B2 all-or-none)
- Buildx to amd64
- Push to Gitea (we retrofitted this from GHCR)
- SCP bundle to manager node
docker secret create+docker config createwith versioned namesdocker stack deploy --with-registry-auth- Poll stack services until convergence (420s timeout)
- Prune old secret/config versions
- Healthcheck the final URL; auto-rollback on failure
- Log out of registries
The current k3s replacement, deploy-k3s/scripts/03-deploy.sh, covers
the same ground in fewer steps because Kubernetes does the
versioning/rollout/health bookkeeping natively. See the TL;DR section
at the top of this chapter.
Common deploy failures
| Symptom | Likely cause |
|---|---|
ImagePullBackOff |
Image not in registry, or pull secret expired |
| Stuck at "Progressing" | Readiness probe not passing; check pod logs |
CrashLoopBackOff immediately |
App won't start; check pod logs for panic/exit reason |
CrashLoopBackOff after migration |
Cache service, Redis connection, or post-init code issue |
| Old pods never terminate | New pods not ready; rollout doesn't progress |
| Rollout succeeds but app is broken | Readiness probe is too lenient; passes on broken app |
Debugging commands
# Describe the deployment (shows events, conditions)
kubectl describe deployment api -n honeydue
# Describe the latest pod
kubectl describe pod -n honeydue -l app.kubernetes.io/name=api
# Logs from currently-running pods
kubectl logs -n honeydue -l app.kubernetes.io/name=api --tail=100 --prefix
# Logs from the last-terminated pod
kubectl logs -n honeydue <pod> --previous
# Events in the namespace (newest first)
kubectl get events -n honeydue --sort-by=.lastTimestamp
# Pause a rollout (stops new pods from being created)
kubectl rollout pause deployment/api -n honeydue
# Resume
kubectl rollout resume deployment/api -n honeydue
Zero-downtime considerations
For zero-downtime deploys, the new image must be:
- Backward-compatible with the current database schema (schema migrations run before new code)
- Backward-compatible with in-flight API requests (don't remove endpoints mid-deploy; deprecate first)
- Backward-compatible with Redis data structures (don't change cache key formats abruptly)
For breaking changes:
- Deploy intermediate version that handles both old and new
- Once rolled out everywhere, deploy breaking-change version
- Two deploys, same day or different days
We don't have this discipline yet; our API has too few clients to worry about. As mobile clients proliferate, this becomes more important.
Blue-green / canary (not yet)
Kubernetes supports advanced rollout strategies:
- Canary: route 5% of traffic to new version, scale up gradually
- Blue-green: run new version alongside old, flip traffic all at once
These require Traefik's TraefikService CRD with weighted routing, or a service mesh. TODO if traffic scale justifies.
Cleanup: the old Swarm config
deploy/ directory contains the Swarm-era config. It's still there but
unused. After we're confident in k3s (a few weeks? month?), remove it:
rm -rf deploy/
Keep the useful files in deploy-k3s/ only.
Operator cheat sheet
# Full build + deploy
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
SHA=$(git rev-parse --short HEAD)
set -a; source deploy/registry.env; set +a
printf '%s' "$REGISTRY_TOKEN" | docker login "$REGISTRY" -u admin --password-stdin
docker buildx build --platform linux/amd64 --target api -t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .
docker buildx build --platform linux/amd64 --target worker -t "gitea.treytartt.com/admin/honeydue-worker:${SHA}" --push .
docker buildx build --platform linux/amd64 --target admin -t "gitea.treytartt.com/admin/honeydue-admin:${SHA}" --push .
docker logout gitea.treytartt.com
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
for svc in api worker admin; do
kubectl set image deployment/$svc -n honeydue "$svc=gitea.treytartt.com/admin/honeydue-${svc}:${SHA}"
done
for svc in api worker admin; do
kubectl rollout status -n honeydue deployment/$svc
done