8d9ca2e6ed
Update the deployment book and glossary to reflect the goose-based schema migration flow shipped in 12b2f9d/0f7450a: - ch07: clarify startup probe assumes migrations ran out-of-band - ch08: drop AutoMigrate-with-advisory-lock prose; describe goose Job - ch12: pod startup checks goose_db_version, no longer runs migrations - ch14: document the Job→wait→roll deploy gate and how to debug failures - ch16: add "Migrate Job fails during deploy" + "Schema precondition failed" failure modes - ch17: new runbook entries §26 (run migrations manually), §27 (recover from failed/dirty migration), §28 (bootstrap goose on fresh clone) - ch19: postscript on §13 noting MigrateWithLock approach is superseded - ch20: mark "Migration Job for schema changes" task done - glossary: add `goose` and `goose_db_version`; flag AutoMigrate as tests-only - references: add goose links; flag AutoMigrate as tests-only
513 lines
16 KiB
Markdown
513 lines
16 KiB
Markdown
# 14 — Deployment Process
|
||
|
||
## Summary
|
||
|
||
A production deploy is: build a new image, push to Gitea, update the
|
||
Deployment's image field with the new SHA, Kubernetes rolls new pods in.
|
||
No downtime if the change is backward-compatible. Rollback is
|
||
`kubectl rollout undo`. This chapter walks through the full process,
|
||
plus alternate paths (config-only changes, manifest changes, hotfixes).
|
||
|
||
## TL;DR using the unified deploy script
|
||
|
||
The recommended path. `deploy-k3s/scripts/03-deploy.sh` builds all four
|
||
images (api, worker, admin, web), pushes to Gitea, regenerates the
|
||
ConfigMap from `config.yaml`, applies every manifest under
|
||
`deploy-k3s/manifests/` (including the observability vmagent), and
|
||
waits for all rollouts.
|
||
|
||
```bash
|
||
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
|
||
git add . && git commit -m "..." && git push gitea master
|
||
|
||
export KUBECONFIG=~/.kube/honeydue.yaml
|
||
bash deploy-k3s/scripts/03-deploy.sh # full build + push + rollout
|
||
# or, to redeploy without rebuilding:
|
||
bash deploy-k3s/scripts/03-deploy.sh --skip-build
|
||
# or, to pin a specific tag:
|
||
bash deploy-k3s/scripts/03-deploy.sh --tag d3708e6
|
||
```
|
||
|
||
What the script does, in order:
|
||
|
||
1. Read registry creds from `deploy-k3s/config.yaml`.
|
||
2. `docker login gitea.treytartt.com`.
|
||
3. Build all four images with `--platform linux/amd64` (so arm64 Macs
|
||
don't push images that crash on Hetzner amd64 nodes with
|
||
"exec format error").
|
||
4. Push to the gitea registry, plus tag and push `:latest`.
|
||
5. Generate the env file from `config.yaml` and apply as ConfigMap
|
||
`honeydue-config` (uses dry-run + apply for diff-free idempotence).
|
||
6. Apply `manifests/namespace.yaml`, `redis/`, `ingress/`,
|
||
`api/{deployment,service,hpa}`, `worker/`, `admin/`, `web/`.
|
||
7. Apply `manifests/observability/vmagent.yaml`, substituting
|
||
`TOKEN_PLACEHOLDER` with `OBS_INGEST_TOKEN` from `deploy/prod.env`
|
||
(gitignored). Skipped with a warning if the token isn't present.
|
||
8. `kubectl rollout status` for every Deployment, including vmagent.
|
||
|
||
~7–10 minutes for a full rebuild. ~1–2 minutes with `--skip-build`.
|
||
|
||
## TL;DR for a single-service code change (manual)
|
||
|
||
```bash
|
||
# 1. Commit + get SHA
|
||
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
|
||
git add . && git commit -m "..." && SHA=$(git rev-parse --short HEAD)
|
||
|
||
# 2. Login to Gitea registry (creds in config.yaml)
|
||
docker login gitea.treytartt.com -u admin
|
||
|
||
# 3. Build + push amd64 image
|
||
docker build --platform linux/amd64 --target api \
|
||
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" .
|
||
docker push "gitea.treytartt.com/admin/honeydue-api:${SHA}"
|
||
|
||
# 4. Roll it in
|
||
export KUBECONFIG=~/.kube/honeydue.yaml
|
||
kubectl set image deployment/api -n honeydue \
|
||
api="gitea.treytartt.com/admin/honeydue-api:${SHA}"
|
||
|
||
# 5. Watch
|
||
kubectl rollout status -n honeydue deployment/api
|
||
|
||
# 6. Log out
|
||
docker logout gitea.treytartt.com
|
||
```
|
||
|
||
~3–5 minutes end to end for api.
|
||
|
||
> **Gotcha:** Deployments default to `imagePullPolicy: IfNotPresent`,
|
||
> which means kubelet won't re-fetch an image with a tag it already
|
||
> has cached locally — even if the registry now has different bytes
|
||
> at that tag. Always change tags (use the SHA), or temporarily flip
|
||
> `imagePullPolicy: Always` and `kubectl rollout restart` if you need
|
||
> to overwrite a tag.
|
||
|
||
## The build
|
||
|
||
### Step 1 — Prepare
|
||
|
||
```bash
|
||
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
|
||
git status # clean working tree?
|
||
git log -1 --oneline # this is the SHA that'll ship
|
||
```
|
||
|
||
### Step 2 — Login to Gitea
|
||
|
||
```bash
|
||
set -a; source deploy/registry.env; set +a
|
||
printf '%s' "$REGISTRY_TOKEN" | \
|
||
docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin
|
||
```
|
||
|
||
**Note**: `docker login` without `--password-stdin` writes the token to
|
||
shell history. Don't skip the `printf` trick.
|
||
|
||
### Step 3 — Build + push
|
||
|
||
```bash
|
||
SHA=$(git rev-parse --short HEAD)
|
||
|
||
# For API
|
||
docker buildx build \
|
||
--platform linux/amd64 \
|
||
--target api \
|
||
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" \
|
||
--push .
|
||
|
||
# For Worker
|
||
docker buildx build \
|
||
--platform linux/amd64 \
|
||
--target worker \
|
||
-t "gitea.treytartt.com/admin/honeydue-worker:${SHA}" \
|
||
--push .
|
||
|
||
# For Admin (Next.js)
|
||
docker buildx build \
|
||
--platform linux/amd64 \
|
||
--target admin \
|
||
-t "gitea.treytartt.com/admin/honeydue-admin:${SHA}" \
|
||
--push .
|
||
```
|
||
|
||
- `--platform linux/amd64` — cross-compile from operator's arm64 to
|
||
Hetzner nodes' amd64
|
||
- `--target X` — select a stage from the multi-stage Dockerfile
|
||
- `--push` — push to registry in one step; don't leave image in local
|
||
Docker
|
||
|
||
First build is slow (~3–5 min cold). Subsequent builds hit BuildKit
|
||
layer cache and complete in ~30–60s if only app code changed.
|
||
|
||
### Build platform note
|
||
|
||
If `docker buildx` isn't configured:
|
||
|
||
```bash
|
||
docker buildx create --name honeydue-builder --use
|
||
docker buildx inspect --bootstrap
|
||
```
|
||
|
||
This creates a BuildKit container that supports cross-platform builds.
|
||
The `--bootstrap` line spins it up immediately so errors surface now
|
||
instead of on first build.
|
||
|
||
## The deploy
|
||
|
||
### For a single service
|
||
|
||
```bash
|
||
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
|
||
|
||
kubectl set image deployment/api -n honeydue \
|
||
api="gitea.treytartt.com/admin/honeydue-api:${SHA}"
|
||
```
|
||
|
||
This updates the Deployment's image field. Kubernetes:
|
||
1. Creates a new ReplicaSet with the new image (annotation records
|
||
rev)
|
||
2. Starts a new pod (per `maxSurge: 1`)
|
||
3. Waits for readinessProbe to pass on the new pod (up to 240s for
|
||
cold api boot)
|
||
4. Once ready, removes a pod from the old ReplicaSet
|
||
5. Repeats until all pods are on the new ReplicaSet
|
||
6. Marks rollout complete
|
||
|
||
### Watching the rollout
|
||
|
||
```bash
|
||
kubectl rollout status -n honeydue deployment/api
|
||
```
|
||
|
||
Outputs progress; returns when complete or timed out. Default timeout
|
||
is 10 minutes.
|
||
|
||
More detailed:
|
||
|
||
```bash
|
||
# Watch pods transition
|
||
kubectl get pods -n honeydue -l app.kubernetes.io/name=api -w
|
||
|
||
# Watch events
|
||
kubectl get events -n honeydue --sort-by=.lastTimestamp -w
|
||
```
|
||
|
||
### For all three services
|
||
|
||
```bash
|
||
for svc in api worker admin; do
|
||
kubectl set image deployment/$svc -n honeydue \
|
||
$svc="gitea.treytartt.com/admin/honeydue-${svc}:${SHA}"
|
||
done
|
||
|
||
# Watch all rollouts
|
||
for svc in api worker admin; do
|
||
kubectl rollout status -n honeydue deployment/$svc
|
||
done
|
||
```
|
||
|
||
## Config-only changes (no new image)
|
||
|
||
When you change `prod.env` but code is unchanged:
|
||
|
||
```bash
|
||
# 1. Update prod.env locally
|
||
# 2. Regenerate ConfigMap
|
||
kubectl create configmap honeydue-config -n honeydue \
|
||
--from-env-file=deploy/prod.env \
|
||
--dry-run=client -o yaml | kubectl apply -f -
|
||
|
||
# 3. Pods do NOT auto-reload env vars. Restart them.
|
||
kubectl rollout restart -n honeydue deployment/api deployment/admin deployment/worker
|
||
```
|
||
|
||
`rollout restart` triggers a rolling update with the *same* image but
|
||
forces pod recreation. New pods pick up the updated ConfigMap.
|
||
|
||
### Why not auto-reload?
|
||
|
||
Kubernetes has no built-in mechanism to restart pods on ConfigMap change.
|
||
There's no `envFromWatch` equivalent. Third-party operators like
|
||
Reloader can do it, but we don't run one.
|
||
|
||
For sensitive config (like the `SECRET_KEY`), this is actually good —
|
||
pods don't cycle unexpectedly when someone tweaks the ConfigMap.
|
||
|
||
## Secret changes
|
||
|
||
Same flow as config:
|
||
|
||
```bash
|
||
# Rotate a value
|
||
kubectl patch secret honeydue-secrets -n honeydue \
|
||
--type=merge -p "{\"data\":{\"SECRET_KEY\":\"$(echo -n 'newvalue' | base64)\"}}"
|
||
|
||
# Restart pods
|
||
kubectl rollout restart -n honeydue deployment/api deployment/worker
|
||
```
|
||
|
||
## Manifest changes
|
||
|
||
When you add/modify a deployment YAML:
|
||
|
||
```bash
|
||
kubectl apply -f deploy-k3s/manifests/api/deployment.yaml
|
||
```
|
||
|
||
If the change is a spec field that Kubernetes considers a new pod
|
||
template (e.g., changing resource limits, env, volumes), pods roll.
|
||
If the change is a scalar like replicas, no pod churn — just new pods
|
||
added/removed.
|
||
|
||
## Rollback
|
||
|
||
### Last-known-good rollback
|
||
|
||
```bash
|
||
kubectl rollout undo deployment/api -n honeydue
|
||
```
|
||
|
||
Reverts to the previous ReplicaSet (the one with the previous image).
|
||
Takes ~30s to stabilize.
|
||
|
||
### Rollback to a specific revision
|
||
|
||
```bash
|
||
# See revision history
|
||
kubectl rollout history deployment/api -n honeydue
|
||
|
||
# Revert to specific revision number
|
||
kubectl rollout undo deployment/api -n honeydue --to-revision=3
|
||
```
|
||
|
||
Kubernetes keeps up to 10 ReplicaSet revisions by default
|
||
(`spec.revisionHistoryLimit`).
|
||
|
||
### Hard rollback (deploy an older image)
|
||
|
||
```bash
|
||
kubectl set image deployment/api -n honeydue \
|
||
api="gitea.treytartt.com/admin/honeydue-api:<older-sha>"
|
||
```
|
||
|
||
Useful when you want to go back further than the revision history, or
|
||
to a specific known-good SHA.
|
||
|
||
## Rolling update semantics
|
||
|
||
```yaml
|
||
strategy:
|
||
type: RollingUpdate
|
||
rollingUpdate:
|
||
maxUnavailable: 0
|
||
maxSurge: 1
|
||
```
|
||
|
||
For api (3 replicas):
|
||
- `maxUnavailable: 0` — no pod is removed until replacement is ready
|
||
- `maxSurge: 1` — up to 4 pods exist simultaneously during rollout
|
||
|
||
Timeline (approximate, warm state):
|
||
- t=0: kubectl set image
|
||
- t=0: k8s creates new RS with 1 pod
|
||
- t=30s (or so): new pod readiness probe passes
|
||
- t=30s: k8s terminates 1 old pod
|
||
- t=60s: next new pod ready
|
||
- t=60s: another old pod terminates
|
||
- ...continues until all on new RS
|
||
|
||
Migrations run as a separate Kubernetes Job that completes before any
|
||
api/worker pod is rolled. So the rollout above never includes migration
|
||
work — pods that boot are guaranteed to find the schema already at the
|
||
expected version. See §"Migrations are gated, not interleaved" below.
|
||
|
||
## Migrations are gated, not interleaved
|
||
|
||
`03-deploy.sh` runs `goose up` as a one-shot Job before applying any
|
||
api/worker manifests:
|
||
|
||
```
|
||
1. kubectl delete job honeydue-migrate (idempotent, removes prior run)
|
||
2. kubectl apply -f manifests/migrate/job.yaml (with current api image)
|
||
3. kubectl wait --for=condition=complete --timeout=10m job/honeydue-migrate
|
||
4. (only if Job succeeded) kubectl apply -f manifests/api/...
|
||
```
|
||
|
||
The Job uses the api image — `/usr/local/bin/goose` is baked in at
|
||
Dockerfile build time. The Job script strips the `-pooler` segment
|
||
from `DB_HOST` before connecting (goose's session-scoped advisory
|
||
lock can't survive PgBouncer transaction-mode), runs `goose up`, exits.
|
||
|
||
If the Job fails, the script aborts before any new app pod sees a
|
||
stale schema. To debug:
|
||
|
||
```bash
|
||
kubectl -n honeydue logs job/honeydue-migrate --tail=200
|
||
kubectl -n honeydue describe job honeydue-migrate
|
||
```
|
||
|
||
After investigating, fix the migration file and re-run `03-deploy.sh`.
|
||
The Job is idempotent — successful migrations stay applied, only the
|
||
new/failed file gets retried.
|
||
|
||
api/worker pods run a `RequireSchemaApplied` check at startup that
|
||
queries `goose_db_version` and refuses to boot if the table is missing
|
||
or the latest row is `is_applied=false`. This is the fail-fast for
|
||
"someone bypassed the deploy script and the schema isn't current."
|
||
|
||
For full schema management background, see
|
||
[Chapter 8 §Schema management](./08-database.md).
|
||
|
||
## Hotfix workflow
|
||
|
||
When we need to ship a fix fast and skip the usual steps:
|
||
|
||
1. Fix in code
|
||
2. Build + push
|
||
3. `kubectl set image` on the affected service only
|
||
4. Monitor with `kubectl logs -f`
|
||
|
||
Don't skip CI/tests in a real org; for solo operator this is the tradeoff.
|
||
|
||
## Integration with Gitea
|
||
|
||
Currently no CI/CD. The operator builds from the workstation and pushes
|
||
manually. Future:
|
||
|
||
- Gitea Actions (Drone-like CI) could trigger on push to `main`
|
||
- Build + push step could run in a GitHub Actions-compatible workflow
|
||
- Auto-deploy on tag push, manual promote to prod
|
||
|
||
**TODO** (Chapter 20).
|
||
|
||
## What the old Swarm deploy script did
|
||
|
||
Contrast: `deploy/scripts/deploy_prod.sh` (Swarm-era) did:
|
||
|
||
1. Validate every config file (placeholder detection, APNS key format,
|
||
B2 all-or-none)
|
||
2. Buildx to amd64
|
||
3. Push to Gitea (we retrofitted this from GHCR)
|
||
4. SCP bundle to manager node
|
||
5. `docker secret create` + `docker config create` with versioned names
|
||
6. `docker stack deploy --with-registry-auth`
|
||
7. Poll stack services until convergence (420s timeout)
|
||
8. Prune old secret/config versions
|
||
9. Healthcheck the final URL; auto-rollback on failure
|
||
10. Log out of registries
|
||
|
||
The current k3s replacement, `deploy-k3s/scripts/03-deploy.sh`, covers
|
||
the same ground in fewer steps because Kubernetes does the
|
||
versioning/rollout/health bookkeeping natively. See the TL;DR section
|
||
at the top of this chapter.
|
||
|
||
## Common deploy failures
|
||
|
||
| Symptom | Likely cause |
|
||
|---|---|
|
||
| `ImagePullBackOff` | Image not in registry, or pull secret expired |
|
||
| Stuck at "Progressing" | Readiness probe not passing; check pod logs |
|
||
| `CrashLoopBackOff` immediately | App won't start; check pod logs for panic/exit reason |
|
||
| `CrashLoopBackOff` after migration | Cache service, Redis connection, or post-init code issue |
|
||
| Old pods never terminate | New pods not ready; rollout doesn't progress |
|
||
| Rollout succeeds but app is broken | Readiness probe is too lenient; passes on broken app |
|
||
|
||
### Debugging commands
|
||
|
||
```bash
|
||
# Describe the deployment (shows events, conditions)
|
||
kubectl describe deployment api -n honeydue
|
||
|
||
# Describe the latest pod
|
||
kubectl describe pod -n honeydue -l app.kubernetes.io/name=api
|
||
|
||
# Logs from currently-running pods
|
||
kubectl logs -n honeydue -l app.kubernetes.io/name=api --tail=100 --prefix
|
||
|
||
# Logs from the last-terminated pod
|
||
kubectl logs -n honeydue <pod> --previous
|
||
|
||
# Events in the namespace (newest first)
|
||
kubectl get events -n honeydue --sort-by=.lastTimestamp
|
||
|
||
# Pause a rollout (stops new pods from being created)
|
||
kubectl rollout pause deployment/api -n honeydue
|
||
|
||
# Resume
|
||
kubectl rollout resume deployment/api -n honeydue
|
||
```
|
||
|
||
## Zero-downtime considerations
|
||
|
||
For zero-downtime deploys, the new image must be:
|
||
|
||
1. **Backward-compatible** with the current database schema (schema
|
||
migrations run before new code)
|
||
2. **Backward-compatible** with in-flight API requests (don't remove
|
||
endpoints mid-deploy; deprecate first)
|
||
3. **Backward-compatible** with Redis data structures (don't change
|
||
cache key formats abruptly)
|
||
|
||
For breaking changes:
|
||
1. Deploy intermediate version that handles both old and new
|
||
2. Once rolled out everywhere, deploy breaking-change version
|
||
3. Two deploys, same day or different days
|
||
|
||
We don't have this discipline yet; our API has too few clients to
|
||
worry about. As mobile clients proliferate, this becomes more important.
|
||
|
||
## Blue-green / canary (not yet)
|
||
|
||
Kubernetes supports advanced rollout strategies:
|
||
- **Canary**: route 5% of traffic to new version, scale up gradually
|
||
- **Blue-green**: run new version alongside old, flip traffic all at
|
||
once
|
||
|
||
These require Traefik's TraefikService CRD with weighted routing, or
|
||
a service mesh. **TODO** if traffic scale justifies.
|
||
|
||
## Cleanup: the old Swarm config
|
||
|
||
`deploy/` directory contains the Swarm-era config. It's still there but
|
||
unused. After we're confident in k3s (a few weeks? month?), remove it:
|
||
|
||
```bash
|
||
rm -rf deploy/
|
||
```
|
||
|
||
Keep the useful files in `deploy-k3s/` only.
|
||
|
||
## Operator cheat sheet
|
||
|
||
```bash
|
||
# Full build + deploy
|
||
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
|
||
SHA=$(git rev-parse --short HEAD)
|
||
set -a; source deploy/registry.env; set +a
|
||
printf '%s' "$REGISTRY_TOKEN" | docker login "$REGISTRY" -u admin --password-stdin
|
||
docker buildx build --platform linux/amd64 --target api -t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .
|
||
docker buildx build --platform linux/amd64 --target worker -t "gitea.treytartt.com/admin/honeydue-worker:${SHA}" --push .
|
||
docker buildx build --platform linux/amd64 --target admin -t "gitea.treytartt.com/admin/honeydue-admin:${SHA}" --push .
|
||
docker logout gitea.treytartt.com
|
||
|
||
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
|
||
for svc in api worker admin; do
|
||
kubectl set image deployment/$svc -n honeydue "$svc=gitea.treytartt.com/admin/honeydue-${svc}:${SHA}"
|
||
done
|
||
|
||
for svc in api worker admin; do
|
||
kubectl rollout status -n honeydue deployment/$svc
|
||
done
|
||
```
|
||
|
||
## References
|
||
|
||
- [Kubernetes Deployment rolling update][rolling]
|
||
- [kubectl rollout][rollout]
|
||
- [Docker buildx][buildx]
|
||
|
||
[rolling]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment
|
||
[rollout]: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#rollout
|
||
[buildx]: https://docs.docker.com/build/buildx/
|