admin/honeyDueAPI

Fork 0

Files

T

Trey t 8d9ca2e6ed

Backend CI / Test (push) Has been cancelled

Details

Backend CI / Contract Tests (push) Has been cancelled

Details

Backend CI / Build (push) Has been cancelled

Details

Backend CI / Lint (push) Has been cancelled

Details

Backend CI / Secret Scanning (push) Has been cancelled

Details

docs(deployment): rewrite migration prose for goose adoption

Update the deployment book and glossary to reflect the goose-based
schema migration flow shipped in 12b2f9d/0f7450a:

- ch07: clarify startup probe assumes migrations ran out-of-band
- ch08: drop AutoMigrate-with-advisory-lock prose; describe goose Job
- ch12: pod startup checks goose_db_version, no longer runs migrations
- ch14: document the Job→wait→roll deploy gate and how to debug failures
- ch16: add "Migrate Job fails during deploy" + "Schema precondition
  failed" failure modes
- ch17: new runbook entries §26 (run migrations manually), §27 (recover
  from failed/dirty migration), §28 (bootstrap goose on fresh clone)
- ch19: postscript on §13 noting MigrateWithLock approach is superseded
- ch20: mark "Migration Job for schema changes" task done
- glossary: add `goose` and `goose_db_version`; flag AutoMigrate as
  tests-only
- references: add goose links; flag AutoMigrate as tests-only

2026-04-26 23:01:32 -05:00

16 KiB

Raw Blame History

14 — Deployment Process

Summary

A production deploy is: build a new image, push to Gitea, update the Deployment's image field with the new SHA, Kubernetes rolls new pods in. No downtime if the change is backward-compatible. Rollback is kubectl rollout undo. This chapter walks through the full process, plus alternate paths (config-only changes, manifest changes, hotfixes).

TL;DR using the unified deploy script

The recommended path. deploy-k3s/scripts/03-deploy.sh builds all four images (api, worker, admin, web), pushes to Gitea, regenerates the ConfigMap from config.yaml, applies every manifest under deploy-k3s/manifests/ (including the observability vmagent), and waits for all rollouts.

cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
git add . && git commit -m "..." && git push gitea master

export KUBECONFIG=~/.kube/honeydue.yaml
bash deploy-k3s/scripts/03-deploy.sh         # full build + push + rollout
# or, to redeploy without rebuilding:
bash deploy-k3s/scripts/03-deploy.sh --skip-build
# or, to pin a specific tag:
bash deploy-k3s/scripts/03-deploy.sh --tag d3708e6

What the script does, in order:

Read registry creds from deploy-k3s/config.yaml.
docker login gitea.treytartt.com.
Build all four images with --platform linux/amd64 (so arm64 Macs don't push images that crash on Hetzner amd64 nodes with "exec format error").
Push to the gitea registry, plus tag and push :latest.
Generate the env file from config.yaml and apply as ConfigMap honeydue-config (uses dry-run + apply for diff-free idempotence).
Apply manifests/namespace.yaml, redis/, ingress/, api/{deployment,service,hpa}, worker/, admin/, web/.
Apply manifests/observability/vmagent.yaml, substituting TOKEN_PLACEHOLDER with OBS_INGEST_TOKEN from deploy/prod.env (gitignored). Skipped with a warning if the token isn't present.
kubectl rollout status for every Deployment, including vmagent.

~7–10 minutes for a full rebuild. ~1–2 minutes with --skip-build.

TL;DR for a single-service code change (manual)

# 1. Commit + get SHA
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
git add . && git commit -m "..." && SHA=$(git rev-parse --short HEAD)

# 2. Login to Gitea registry (creds in config.yaml)
docker login gitea.treytartt.com -u admin

# 3. Build + push amd64 image
docker build --platform linux/amd64 --target api \
  -t "gitea.treytartt.com/admin/honeydue-api:${SHA}" .
docker push "gitea.treytartt.com/admin/honeydue-api:${SHA}"

# 4. Roll it in
export KUBECONFIG=~/.kube/honeydue.yaml
kubectl set image deployment/api -n honeydue \
  api="gitea.treytartt.com/admin/honeydue-api:${SHA}"

# 5. Watch
kubectl rollout status -n honeydue deployment/api

# 6. Log out
docker logout gitea.treytartt.com

~3–5 minutes end to end for api.

Gotcha: Deployments default to imagePullPolicy: IfNotPresent, which means kubelet won't re-fetch an image with a tag it already has cached locally — even if the registry now has different bytes at that tag. Always change tags (use the SHA), or temporarily flip imagePullPolicy: Always and kubectl rollout restart if you need to overwrite a tag.

The build

Step 1 — Prepare

cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
git status                        # clean working tree?
git log -1 --oneline              # this is the SHA that'll ship

set -a; source deploy/registry.env; set +a
printf '%s' "$REGISTRY_TOKEN" | \
  docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin

Note: docker login without --password-stdin writes the token to shell history. Don't skip the printf trick.

Step 3 — Build + push

SHA=$(git rev-parse --short HEAD)

# For API
docker buildx build \
  --platform linux/amd64 \
  --target api \
  -t "gitea.treytartt.com/admin/honeydue-api:${SHA}" \
  --push .

# For Worker
docker buildx build \
  --platform linux/amd64 \
  --target worker \
  -t "gitea.treytartt.com/admin/honeydue-worker:${SHA}" \
  --push .

# For Admin (Next.js)
docker buildx build \
  --platform linux/amd64 \
  --target admin \
  -t "gitea.treytartt.com/admin/honeydue-admin:${SHA}" \
  --push .

--platform linux/amd64 — cross-compile from operator's arm64 to Hetzner nodes' amd64
--target X — select a stage from the multi-stage Dockerfile
--push — push to registry in one step; don't leave image in local Docker

First build is slow (~3–5 min cold). Subsequent builds hit BuildKit layer cache and complete in ~30–60s if only app code changed.

Build platform note

If docker buildx isn't configured:

docker buildx create --name honeydue-builder --use
docker buildx inspect --bootstrap

This creates a BuildKit container that supports cross-platform builds. The --bootstrap line spins it up immediately so errors surface now instead of on first build.

The deploy

For a single service

export KUBECONFIG=~/.kube/honeydue-k3s.yaml

kubectl set image deployment/api -n honeydue \
  api="gitea.treytartt.com/admin/honeydue-api:${SHA}"

This updates the Deployment's image field. Kubernetes:

Creates a new ReplicaSet with the new image (annotation records rev)
Starts a new pod (per maxSurge: 1)
Waits for readinessProbe to pass on the new pod (up to 240s for cold api boot)
Once ready, removes a pod from the old ReplicaSet
Repeats until all pods are on the new ReplicaSet
Marks rollout complete

Watching the rollout

kubectl rollout status -n honeydue deployment/api

Outputs progress; returns when complete or timed out. Default timeout is 10 minutes.

More detailed:

# Watch pods transition
kubectl get pods -n honeydue -l app.kubernetes.io/name=api -w

# Watch events
kubectl get events -n honeydue --sort-by=.lastTimestamp -w

For all three services

for svc in api worker admin; do
  kubectl set image deployment/$svc -n honeydue \
    $svc="gitea.treytartt.com/admin/honeydue-${svc}:${SHA}"
done

# Watch all rollouts
for svc in api worker admin; do
  kubectl rollout status -n honeydue deployment/$svc
done

Config-only changes (no new image)

When you change prod.env but code is unchanged:

# 1. Update prod.env locally
# 2. Regenerate ConfigMap
kubectl create configmap honeydue-config -n honeydue \
  --from-env-file=deploy/prod.env \
  --dry-run=client -o yaml | kubectl apply -f -

# 3. Pods do NOT auto-reload env vars. Restart them.
kubectl rollout restart -n honeydue deployment/api deployment/admin deployment/worker

rollout restart triggers a rolling update with the same image but forces pod recreation. New pods pick up the updated ConfigMap.

Why not auto-reload?

Kubernetes has no built-in mechanism to restart pods on ConfigMap change. There's no envFromWatch equivalent. Third-party operators like Reloader can do it, but we don't run one.

For sensitive config (like the SECRET_KEY), this is actually good — pods don't cycle unexpectedly when someone tweaks the ConfigMap.

Secret changes

Same flow as config:

# Rotate a value
kubectl patch secret honeydue-secrets -n honeydue \
  --type=merge -p "{\"data\":{\"SECRET_KEY\":\"$(echo -n 'newvalue' | base64)\"}}"

# Restart pods
kubectl rollout restart -n honeydue deployment/api deployment/worker

Manifest changes

When you add/modify a deployment YAML:

kubectl apply -f deploy-k3s/manifests/api/deployment.yaml

If the change is a spec field that Kubernetes considers a new pod template (e.g., changing resource limits, env, volumes), pods roll. If the change is a scalar like replicas, no pod churn — just new pods added/removed.

Rollback

Last-known-good rollback

kubectl rollout undo deployment/api -n honeydue

Reverts to the previous ReplicaSet (the one with the previous image). Takes ~30s to stabilize.

Rollback to a specific revision

# See revision history
kubectl rollout history deployment/api -n honeydue

# Revert to specific revision number
kubectl rollout undo deployment/api -n honeydue --to-revision=3

Kubernetes keeps up to 10 ReplicaSet revisions by default (spec.revisionHistoryLimit).

Hard rollback (deploy an older image)

kubectl set image deployment/api -n honeydue \
  api="gitea.treytartt.com/admin/honeydue-api:<older-sha>"

Useful when you want to go back further than the revision history, or to a specific known-good SHA.

Rolling update semantics

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxUnavailable: 0
    maxSurge: 1

For api (3 replicas):

maxUnavailable: 0 — no pod is removed until replacement is ready
maxSurge: 1 — up to 4 pods exist simultaneously during rollout

Timeline (approximate, warm state):

t=0: kubectl set image
t=0: k8s creates new RS with 1 pod
t=30s (or so): new pod readiness probe passes
t=30s: k8s terminates 1 old pod
t=60s: next new pod ready
t=60s: another old pod terminates
...continues until all on new RS

Migrations run as a separate Kubernetes Job that completes before any api/worker pod is rolled. So the rollout above never includes migration work — pods that boot are guaranteed to find the schema already at the expected version. See §"Migrations are gated, not interleaved" below.

Migrations are gated, not interleaved

03-deploy.sh runs goose up as a one-shot Job before applying any api/worker manifests:

1. kubectl delete job honeydue-migrate (idempotent, removes prior run)
2. kubectl apply -f manifests/migrate/job.yaml (with current api image)
3. kubectl wait --for=condition=complete --timeout=10m job/honeydue-migrate
4. (only if Job succeeded) kubectl apply -f manifests/api/...

The Job uses the api image — /usr/local/bin/goose is baked in at Dockerfile build time. The Job script strips the -pooler segment from DB_HOST before connecting (goose's session-scoped advisory lock can't survive PgBouncer transaction-mode), runs goose up, exits.

If the Job fails, the script aborts before any new app pod sees a stale schema. To debug:

kubectl -n honeydue logs job/honeydue-migrate --tail=200
kubectl -n honeydue describe job honeydue-migrate

After investigating, fix the migration file and re-run 03-deploy.sh. The Job is idempotent — successful migrations stay applied, only the new/failed file gets retried.

api/worker pods run a RequireSchemaApplied check at startup that queries goose_db_version and refuses to boot if the table is missing or the latest row is is_applied=false. This is the fail-fast for "someone bypassed the deploy script and the schema isn't current."

For full schema management background, see Chapter 8 §Schema management.

Hotfix workflow

When we need to ship a fix fast and skip the usual steps:

Fix in code
Build + push
kubectl set image on the affected service only
Monitor with kubectl logs -f

Don't skip CI/tests in a real org; for solo operator this is the tradeoff.

Integration with Gitea

Currently no CI/CD. The operator builds from the workstation and pushes manually. Future:

Gitea Actions (Drone-like CI) could trigger on push to main
Build + push step could run in a GitHub Actions-compatible workflow
Auto-deploy on tag push, manual promote to prod

TODO (Chapter 20).

What the old Swarm deploy script did

Contrast: deploy/scripts/deploy_prod.sh (Swarm-era) did:

Validate every config file (placeholder detection, APNS key format, B2 all-or-none)
Buildx to amd64
Push to Gitea (we retrofitted this from GHCR)
SCP bundle to manager node
docker secret create + docker config create with versioned names
docker stack deploy --with-registry-auth
Poll stack services until convergence (420s timeout)
Prune old secret/config versions
Healthcheck the final URL; auto-rollback on failure
Log out of registries

The current k3s replacement, deploy-k3s/scripts/03-deploy.sh, covers the same ground in fewer steps because Kubernetes does the versioning/rollout/health bookkeeping natively. See the TL;DR section at the top of this chapter.

Common deploy failures

Symptom	Likely cause
`ImagePullBackOff`	Image not in registry, or pull secret expired
Stuck at "Progressing"	Readiness probe not passing; check pod logs
`CrashLoopBackOff` immediately	App won't start; check pod logs for panic/exit reason
`CrashLoopBackOff` after migration	Cache service, Redis connection, or post-init code issue
Old pods never terminate	New pods not ready; rollout doesn't progress
Rollout succeeds but app is broken	Readiness probe is too lenient; passes on broken app

Debugging commands

# Describe the deployment (shows events, conditions)
kubectl describe deployment api -n honeydue

# Describe the latest pod
kubectl describe pod -n honeydue -l app.kubernetes.io/name=api

# Logs from currently-running pods
kubectl logs -n honeydue -l app.kubernetes.io/name=api --tail=100 --prefix

# Logs from the last-terminated pod
kubectl logs -n honeydue <pod> --previous

# Events in the namespace (newest first)
kubectl get events -n honeydue --sort-by=.lastTimestamp

# Pause a rollout (stops new pods from being created)
kubectl rollout pause deployment/api -n honeydue

# Resume
kubectl rollout resume deployment/api -n honeydue

Zero-downtime considerations

For zero-downtime deploys, the new image must be:

Backward-compatible with the current database schema (schema migrations run before new code)
Backward-compatible with in-flight API requests (don't remove endpoints mid-deploy; deprecate first)
Backward-compatible with Redis data structures (don't change cache key formats abruptly)

For breaking changes:

Deploy intermediate version that handles both old and new
Once rolled out everywhere, deploy breaking-change version
Two deploys, same day or different days

We don't have this discipline yet; our API has too few clients to worry about. As mobile clients proliferate, this becomes more important.

Blue-green / canary (not yet)

Kubernetes supports advanced rollout strategies:

Canary: route 5% of traffic to new version, scale up gradually
Blue-green: run new version alongside old, flip traffic all at once

These require Traefik's TraefikService CRD with weighted routing, or a service mesh. TODO if traffic scale justifies.

Cleanup: the old Swarm config

deploy/ directory contains the Swarm-era config. It's still there but unused. After we're confident in k3s (a few weeks? month?), remove it:

rm -rf deploy/

Keep the useful files in deploy-k3s/ only.

Operator cheat sheet

# Full build + deploy
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
SHA=$(git rev-parse --short HEAD)
set -a; source deploy/registry.env; set +a
printf '%s' "$REGISTRY_TOKEN" | docker login "$REGISTRY" -u admin --password-stdin
docker buildx build --platform linux/amd64 --target api -t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .
docker buildx build --platform linux/amd64 --target worker -t "gitea.treytartt.com/admin/honeydue-worker:${SHA}" --push .
docker buildx build --platform linux/amd64 --target admin -t "gitea.treytartt.com/admin/honeydue-admin:${SHA}" --push .
docker logout gitea.treytartt.com

export KUBECONFIG=~/.kube/honeydue-k3s.yaml
for svc in api worker admin; do
  kubectl set image deployment/$svc -n honeydue "$svc=gitea.treytartt.com/admin/honeydue-${svc}:${SHA}"
done

for svc in api worker admin; do
  kubectl rollout status -n honeydue deployment/$svc
done

16 KiB Raw Blame History Unescape Escape

14 — Deployment Process

Summary

TL;DR using the unified deploy script

TL;DR for a single-service code change (manual)

The build

Step 1 — Prepare

Step 2 — Login to Gitea

Step 3 — Build + push

Build platform note

The deploy

For a single service

Watching the rollout

For all three services

Config-only changes (no new image)

Why not auto-reload?

Secret changes

Manifest changes

Rollback

Last-known-good rollback

Rollback to a specific revision

Hard rollback (deploy an older image)

Rolling update semantics

Migrations are gated, not interleaved

Hotfix workflow

Integration with Gitea

What the old Swarm deploy script did

Common deploy failures

Debugging commands

Zero-downtime considerations

Blue-green / canary (not yet)

Cleanup: the old Swarm config

Operator cheat sheet

References

16 KiB

Raw Blame History