Files
honeyDueAPI/docs/deployment/14-deployment-process.md
T
Trey t 6f303dbbaa
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled
Migrate prod deploy from Swarm to K3s; add full deployment book
Infrastructure:
- Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers)
- Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh
- All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept
  temporarily for reference

Bug fixes surfaced during migration:
- Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25)
- cache_service.go: remove sync.Once reassignment from inside Do()
  callback (was causing 'unlock of unlocked mutex' fatal after
  Redis Ping failure)
- router.go: relax CSP from 'default-src none' to 'default-src self'
  + allowlist fonts.googleapis.com so the marketing landing page CSS
  actually loads in browsers
- deploy/scripts/deploy_prod.sh: use docker buildx with
  --platform linux/amd64 so arm64 (Apple Silicon) dev machines produce
  images runnable on x86_64 Hetzner nodes; fix array expansion under
  set -u
- deploy/swarm-stack.prod.yml: fix secret source references to use
  top-level aliases (the '\${X_SECRET}' form never actually resolved);
  dozzle ports: long-form host_ip is rejected by Swarm, switched to
  short-form (bound to 0.0.0.0 with UFW-based loopback restriction);
  worker replicas 2 -> 1 (Asynq scheduler singleton)
- deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/'
  (Next.js serves at root; /admin/ returned 404 and killed pods);
  startupProbe failureThreshold 12 -> 24
- deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable
  1 -> 0 (singleton)
- deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold
  12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot;
  real startup takes up to 240s)
- .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/
  and admin/src/app/api/*, hiding legitimate files)

New files:
- deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet +
  hostNetwork override for k3s-bundled Traefik
- deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress
  without TLS (CF Flexible SSL) and without middleware
- deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log

Documentation:
- docs/deployment/ — full deployment book, 26 files, ~42k words:
  - Part I Overview, infrastructure, orchestrator choice (Ch 0-2)
  - Part II Networking, firewall, Cloudflare (Ch 3-4, 13)
  - Part III Security, Traefik ingress (Ch 5-6)
  - Part IV Services, DB, storage, secrets, registry (Ch 7-11)
  - Part V Data flow, deploy process, observability, failures, runbook
    (Ch 12, 14-17)
  - Part VI Cost, Swarm postmortem, roadmap (Ch 18-20)
  - Appendices: glossary, kubectl cheat sheet, file locations,
    consolidated citations
- README.md: Production Deployment section replaced with pointer to
  the book; Go version bumped to 1.25

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 07:20:54 -05:00

12 KiB
Raw Blame History

14 — Deployment Process

Summary

A production deploy is: build a new image, push to Gitea, update the Deployment's image field with the new SHA, Kubernetes rolls new pods in. No downtime if the change is backward-compatible. Rollback is kubectl rollout undo. This chapter walks through the full process, plus alternate paths (config-only changes, manifest changes, hotfixes).

TL;DR for a code change

# 1. Commit + get SHA
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
git add . && git commit -m "..." && SHA=$(git rev-parse --short HEAD)

# 2. Login to Gitea registry
set -a; source deploy/registry.env; set +a
printf '%s' "$REGISTRY_TOKEN" | docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin

# 3. Build + push amd64 image
docker buildx build --platform linux/amd64 --target api \
  -t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .

# 4. Roll it in
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
kubectl set image deployment/api -n honeydue \
  api="gitea.treytartt.com/admin/honeydue-api:${SHA}"

# 5. Watch
kubectl rollout status -n honeydue deployment/api

# 6. Log out
docker logout "$REGISTRY"

~35 minutes end to end for api.

The build

Step 1 — Prepare

cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
git status                        # clean working tree?
git log -1 --oneline              # this is the SHA that'll ship

Step 2 — Login to Gitea

set -a; source deploy/registry.env; set +a
printf '%s' "$REGISTRY_TOKEN" | \
  docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin

Note: docker login without --password-stdin writes the token to shell history. Don't skip the printf trick.

Step 3 — Build + push

SHA=$(git rev-parse --short HEAD)

# For API
docker buildx build \
  --platform linux/amd64 \
  --target api \
  -t "gitea.treytartt.com/admin/honeydue-api:${SHA}" \
  --push .

# For Worker
docker buildx build \
  --platform linux/amd64 \
  --target worker \
  -t "gitea.treytartt.com/admin/honeydue-worker:${SHA}" \
  --push .

# For Admin (Next.js)
docker buildx build \
  --platform linux/amd64 \
  --target admin \
  -t "gitea.treytartt.com/admin/honeydue-admin:${SHA}" \
  --push .
  • --platform linux/amd64 — cross-compile from operator's arm64 to Hetzner nodes' amd64
  • --target X — select a stage from the multi-stage Dockerfile
  • --push — push to registry in one step; don't leave image in local Docker

First build is slow (~35 min cold). Subsequent builds hit BuildKit layer cache and complete in ~3060s if only app code changed.

Build platform note

If docker buildx isn't configured:

docker buildx create --name honeydue-builder --use
docker buildx inspect --bootstrap

This creates a BuildKit container that supports cross-platform builds. The --bootstrap line spins it up immediately so errors surface now instead of on first build.

The deploy

For a single service

export KUBECONFIG=~/.kube/honeydue-k3s.yaml

kubectl set image deployment/api -n honeydue \
  api="gitea.treytartt.com/admin/honeydue-api:${SHA}"

This updates the Deployment's image field. Kubernetes:

  1. Creates a new ReplicaSet with the new image (annotation records rev)
  2. Starts a new pod (per maxSurge: 1)
  3. Waits for readinessProbe to pass on the new pod (up to 240s for cold api boot)
  4. Once ready, removes a pod from the old ReplicaSet
  5. Repeats until all pods are on the new ReplicaSet
  6. Marks rollout complete

Watching the rollout

kubectl rollout status -n honeydue deployment/api

Outputs progress; returns when complete or timed out. Default timeout is 10 minutes.

More detailed:

# Watch pods transition
kubectl get pods -n honeydue -l app.kubernetes.io/name=api -w

# Watch events
kubectl get events -n honeydue --sort-by=.lastTimestamp -w

For all three services

for svc in api worker admin; do
  kubectl set image deployment/$svc -n honeydue \
    $svc="gitea.treytartt.com/admin/honeydue-${svc}:${SHA}"
done

# Watch all rollouts
for svc in api worker admin; do
  kubectl rollout status -n honeydue deployment/$svc
done

Config-only changes (no new image)

When you change prod.env but code is unchanged:

# 1. Update prod.env locally
# 2. Regenerate ConfigMap
kubectl create configmap honeydue-config -n honeydue \
  --from-env-file=deploy/prod.env \
  --dry-run=client -o yaml | kubectl apply -f -

# 3. Pods do NOT auto-reload env vars. Restart them.
kubectl rollout restart -n honeydue deployment/api deployment/admin deployment/worker

rollout restart triggers a rolling update with the same image but forces pod recreation. New pods pick up the updated ConfigMap.

Why not auto-reload?

Kubernetes has no built-in mechanism to restart pods on ConfigMap change. There's no envFromWatch equivalent. Third-party operators like Reloader can do it, but we don't run one.

For sensitive config (like the SECRET_KEY), this is actually good — pods don't cycle unexpectedly when someone tweaks the ConfigMap.

Secret changes

Same flow as config:

# Rotate a value
kubectl patch secret honeydue-secrets -n honeydue \
  --type=merge -p "{\"data\":{\"SECRET_KEY\":\"$(echo -n 'newvalue' | base64)\"}}"

# Restart pods
kubectl rollout restart -n honeydue deployment/api deployment/worker

Manifest changes

When you add/modify a deployment YAML:

kubectl apply -f deploy-k3s/manifests/api/deployment.yaml

If the change is a spec field that Kubernetes considers a new pod template (e.g., changing resource limits, env, volumes), pods roll. If the change is a scalar like replicas, no pod churn — just new pods added/removed.

Rollback

Last-known-good rollback

kubectl rollout undo deployment/api -n honeydue

Reverts to the previous ReplicaSet (the one with the previous image). Takes ~30s to stabilize.

Rollback to a specific revision

# See revision history
kubectl rollout history deployment/api -n honeydue

# Revert to specific revision number
kubectl rollout undo deployment/api -n honeydue --to-revision=3

Kubernetes keeps up to 10 ReplicaSet revisions by default (spec.revisionHistoryLimit).

Hard rollback (deploy an older image)

kubectl set image deployment/api -n honeydue \
  api="gitea.treytartt.com/admin/honeydue-api:<older-sha>"

Useful when you want to go back further than the revision history, or to a specific known-good SHA.

Rolling update semantics

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxUnavailable: 0
    maxSurge: 1

For api (3 replicas):

  • maxUnavailable: 0 — no pod is removed until replacement is ready
  • maxSurge: 1 — up to 4 pods exist simultaneously during rollout

Timeline (approximate, warm state):

  • t=0: kubectl set image
  • t=0: k8s creates new RS with 1 pod
  • t=30s (or so): new pod readiness probe passes
  • t=30s: k8s terminates 1 old pod
  • t=60s: next new pod ready
  • t=60s: another old pod terminates
  • ...continues until all on new RS

For cold-boot (e.g., first deploy on a rebuilt cluster), the MigrateWithLock advisory lock extends this to several minutes. But the rollout is serialized — only one pod starts per iteration, so the lock queue is small.

Hotfix workflow

When we need to ship a fix fast and skip the usual steps:

  1. Fix in code
  2. Build + push
  3. kubectl set image on the affected service only
  4. Monitor with kubectl logs -f

Don't skip CI/tests in a real org; for solo operator this is the tradeoff.

Integration with Gitea

Currently no CI/CD. The operator builds from the workstation and pushes manually. Future:

  • Gitea Actions (Drone-like CI) could trigger on push to main
  • Build + push step could run in a GitHub Actions-compatible workflow
  • Auto-deploy on tag push, manual promote to prod

TODO (Chapter 20).

What the old Swarm deploy script did

Contrast: deploy/scripts/deploy_prod.sh (Swarm-era) did:

  1. Validate every config file (placeholder detection, APNS key format, B2 all-or-none)
  2. Buildx to amd64
  3. Push to Gitea (we retrofitted this from GHCR)
  4. SCP bundle to manager node
  5. docker secret create + docker config create with versioned names
  6. docker stack deploy --with-registry-auth
  7. Poll stack services until convergence (420s timeout)
  8. Prune old secret/config versions
  9. Healthcheck the final URL; auto-rollback on failure
  10. Log out of registries

Our current k3s deploy is more manual but simpler. We'd write a similar script for k3s if deploys become frequent:

# deploy-k3s/scripts/04-deploy.sh (not yet updated for Gitea)

See the scaffold in deploy-k3s/scripts/.

Common deploy failures

Symptom Likely cause
ImagePullBackOff Image not in registry, or pull secret expired
Stuck at "Progressing" Readiness probe not passing; check pod logs
CrashLoopBackOff immediately App won't start; check pod logs for panic/exit reason
CrashLoopBackOff after migration Cache service, Redis connection, or post-init code issue
Old pods never terminate New pods not ready; rollout doesn't progress
Rollout succeeds but app is broken Readiness probe is too lenient; passes on broken app

Debugging commands

# Describe the deployment (shows events, conditions)
kubectl describe deployment api -n honeydue

# Describe the latest pod
kubectl describe pod -n honeydue -l app.kubernetes.io/name=api

# Logs from currently-running pods
kubectl logs -n honeydue -l app.kubernetes.io/name=api --tail=100 --prefix

# Logs from the last-terminated pod
kubectl logs -n honeydue <pod> --previous

# Events in the namespace (newest first)
kubectl get events -n honeydue --sort-by=.lastTimestamp

# Pause a rollout (stops new pods from being created)
kubectl rollout pause deployment/api -n honeydue

# Resume
kubectl rollout resume deployment/api -n honeydue

Zero-downtime considerations

For zero-downtime deploys, the new image must be:

  1. Backward-compatible with the current database schema (schema migrations run before new code)
  2. Backward-compatible with in-flight API requests (don't remove endpoints mid-deploy; deprecate first)
  3. Backward-compatible with Redis data structures (don't change cache key formats abruptly)

For breaking changes:

  1. Deploy intermediate version that handles both old and new
  2. Once rolled out everywhere, deploy breaking-change version
  3. Two deploys, same day or different days

We don't have this discipline yet; our API has too few clients to worry about. As mobile clients proliferate, this becomes more important.

Blue-green / canary (not yet)

Kubernetes supports advanced rollout strategies:

  • Canary: route 5% of traffic to new version, scale up gradually
  • Blue-green: run new version alongside old, flip traffic all at once

These require Traefik's TraefikService CRD with weighted routing, or a service mesh. TODO if traffic scale justifies.

Cleanup: the old Swarm config

deploy/ directory contains the Swarm-era config. It's still there but unused. After we're confident in k3s (a few weeks? month?), remove it:

rm -rf deploy/

Keep the useful files in deploy-k3s/ only.

Operator cheat sheet

# Full build + deploy
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
SHA=$(git rev-parse --short HEAD)
set -a; source deploy/registry.env; set +a
printf '%s' "$REGISTRY_TOKEN" | docker login "$REGISTRY" -u admin --password-stdin
docker buildx build --platform linux/amd64 --target api -t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .
docker buildx build --platform linux/amd64 --target worker -t "gitea.treytartt.com/admin/honeydue-worker:${SHA}" --push .
docker buildx build --platform linux/amd64 --target admin -t "gitea.treytartt.com/admin/honeydue-admin:${SHA}" --push .
docker logout gitea.treytartt.com

export KUBECONFIG=~/.kube/honeydue-k3s.yaml
for svc in api worker admin; do
  kubectl set image deployment/$svc -n honeydue "$svc=gitea.treytartt.com/admin/honeydue-${svc}:${SHA}"
done

for svc in api worker admin; do
  kubectl rollout status -n honeydue deployment/$svc
done

References