1347ffadf5
09-storage.md:
- Replaced the "Upload flow" section. The previous text described the
multipart-via-API path that was removed in b7f8329. Now documents
the three-step direct-to-B2 flow (presign → POST to B2 → attach
via upload_ids[]) with an ASCII diagram and a server-side
enforcement-points table.
- Replaced the "Future: signed URLs" placeholder (since presigned
URLs are now the present, not the future).
- Added "Lifecycle and retention" subsections covering the
pending_uploads cleanup cron (worker, 30 * * * *), the B2 bucket
lifecycle as backstop (uploads/ prefix, 7-day hide + 1-day delete),
and the still-open user-deletion cascade gap.
14-deployment-process.md:
- Added a "One-time B2 bucket lifecycle (manual)" section explaining
why the rule can't live in the deploy script (B2's S3 lifecycle
API is partial), the exact rule to apply via the Backblaze
console, and a verification command.
docs/deployment/README.md:
- Updated the chapter 9 description to mention presigned-URL uploads.
README.md (root):
- Added a paragraph under "Object storage" pointing to the new
upload architecture and the relevant deployment-book chapters.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
545 lines
17 KiB
Markdown
545 lines
17 KiB
Markdown
# 14 — Deployment Process
|
||
|
||
## Summary
|
||
|
||
A production deploy is: build a new image, push to Gitea, update the
|
||
Deployment's image field with the new SHA, Kubernetes rolls new pods in.
|
||
No downtime if the change is backward-compatible. Rollback is
|
||
`kubectl rollout undo`. This chapter walks through the full process,
|
||
plus alternate paths (config-only changes, manifest changes, hotfixes).
|
||
|
||
## TL;DR using the unified deploy script
|
||
|
||
The recommended path. `deploy-k3s/scripts/03-deploy.sh` builds all four
|
||
images (api, worker, admin, web), pushes to Gitea, regenerates the
|
||
ConfigMap from `config.yaml`, applies every manifest under
|
||
`deploy-k3s/manifests/` (including the observability vmagent), and
|
||
waits for all rollouts.
|
||
|
||
```bash
|
||
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
|
||
git add . && git commit -m "..." && git push gitea master
|
||
|
||
export KUBECONFIG=~/.kube/honeydue.yaml
|
||
bash deploy-k3s/scripts/03-deploy.sh # full build + push + rollout
|
||
# or, to redeploy without rebuilding:
|
||
bash deploy-k3s/scripts/03-deploy.sh --skip-build
|
||
# or, to pin a specific tag:
|
||
bash deploy-k3s/scripts/03-deploy.sh --tag d3708e6
|
||
```
|
||
|
||
What the script does, in order:
|
||
|
||
1. Read registry creds from `deploy-k3s/config.yaml`.
|
||
2. `docker login gitea.treytartt.com`.
|
||
3. Build all four images with `--platform linux/amd64` (so arm64 Macs
|
||
don't push images that crash on Hetzner amd64 nodes with
|
||
"exec format error").
|
||
4. Push to the gitea registry, plus tag and push `:latest`.
|
||
5. Generate the env file from `config.yaml` and apply as ConfigMap
|
||
`honeydue-config` (uses dry-run + apply for diff-free idempotence).
|
||
6. Apply `manifests/namespace.yaml`, `redis/`, `ingress/`,
|
||
`api/{deployment,service,hpa}`, `worker/`, `admin/`, `web/`.
|
||
7. Apply `manifests/observability/vmagent.yaml`, substituting
|
||
`TOKEN_PLACEHOLDER` with `OBS_INGEST_TOKEN` from `deploy/prod.env`
|
||
(gitignored). Skipped with a warning if the token isn't present.
|
||
8. `kubectl rollout status` for every Deployment, including vmagent.
|
||
|
||
~7–10 minutes for a full rebuild. ~1–2 minutes with `--skip-build`.
|
||
|
||
## TL;DR for a single-service code change (manual)
|
||
|
||
```bash
|
||
# 1. Commit + get SHA
|
||
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
|
||
git add . && git commit -m "..." && SHA=$(git rev-parse --short HEAD)
|
||
|
||
# 2. Login to Gitea registry (creds in config.yaml)
|
||
docker login gitea.treytartt.com -u admin
|
||
|
||
# 3. Build + push amd64 image
|
||
docker build --platform linux/amd64 --target api \
|
||
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" .
|
||
docker push "gitea.treytartt.com/admin/honeydue-api:${SHA}"
|
||
|
||
# 4. Roll it in
|
||
export KUBECONFIG=~/.kube/honeydue.yaml
|
||
kubectl set image deployment/api -n honeydue \
|
||
api="gitea.treytartt.com/admin/honeydue-api:${SHA}"
|
||
|
||
# 5. Watch
|
||
kubectl rollout status -n honeydue deployment/api
|
||
|
||
# 6. Log out
|
||
docker logout gitea.treytartt.com
|
||
```
|
||
|
||
~3–5 minutes end to end for api.
|
||
|
||
> **Gotcha:** Deployments default to `imagePullPolicy: IfNotPresent`,
|
||
> which means kubelet won't re-fetch an image with a tag it already
|
||
> has cached locally — even if the registry now has different bytes
|
||
> at that tag. Always change tags (use the SHA), or temporarily flip
|
||
> `imagePullPolicy: Always` and `kubectl rollout restart` if you need
|
||
> to overwrite a tag.
|
||
|
||
## The build
|
||
|
||
### Step 1 — Prepare
|
||
|
||
```bash
|
||
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
|
||
git status # clean working tree?
|
||
git log -1 --oneline # this is the SHA that'll ship
|
||
```
|
||
|
||
### Step 2 — Login to Gitea
|
||
|
||
```bash
|
||
set -a; source deploy/registry.env; set +a
|
||
printf '%s' "$REGISTRY_TOKEN" | \
|
||
docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin
|
||
```
|
||
|
||
**Note**: `docker login` without `--password-stdin` writes the token to
|
||
shell history. Don't skip the `printf` trick.
|
||
|
||
### Step 3 — Build + push
|
||
|
||
```bash
|
||
SHA=$(git rev-parse --short HEAD)
|
||
|
||
# For API
|
||
docker buildx build \
|
||
--platform linux/amd64 \
|
||
--target api \
|
||
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" \
|
||
--push .
|
||
|
||
# For Worker
|
||
docker buildx build \
|
||
--platform linux/amd64 \
|
||
--target worker \
|
||
-t "gitea.treytartt.com/admin/honeydue-worker:${SHA}" \
|
||
--push .
|
||
|
||
# For Admin (Next.js)
|
||
docker buildx build \
|
||
--platform linux/amd64 \
|
||
--target admin \
|
||
-t "gitea.treytartt.com/admin/honeydue-admin:${SHA}" \
|
||
--push .
|
||
```
|
||
|
||
- `--platform linux/amd64` — cross-compile from operator's arm64 to
|
||
Hetzner nodes' amd64
|
||
- `--target X` — select a stage from the multi-stage Dockerfile
|
||
- `--push` — push to registry in one step; don't leave image in local
|
||
Docker
|
||
|
||
First build is slow (~3–5 min cold). Subsequent builds hit BuildKit
|
||
layer cache and complete in ~30–60s if only app code changed.
|
||
|
||
### Build platform note
|
||
|
||
If `docker buildx` isn't configured:
|
||
|
||
```bash
|
||
docker buildx create --name honeydue-builder --use
|
||
docker buildx inspect --bootstrap
|
||
```
|
||
|
||
This creates a BuildKit container that supports cross-platform builds.
|
||
The `--bootstrap` line spins it up immediately so errors surface now
|
||
instead of on first build.
|
||
|
||
## The deploy
|
||
|
||
### For a single service
|
||
|
||
```bash
|
||
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
|
||
|
||
kubectl set image deployment/api -n honeydue \
|
||
api="gitea.treytartt.com/admin/honeydue-api:${SHA}"
|
||
```
|
||
|
||
This updates the Deployment's image field. Kubernetes:
|
||
1. Creates a new ReplicaSet with the new image (annotation records
|
||
rev)
|
||
2. Starts a new pod (per `maxSurge: 1`)
|
||
3. Waits for readinessProbe to pass on the new pod (up to 240s for
|
||
cold api boot)
|
||
4. Once ready, removes a pod from the old ReplicaSet
|
||
5. Repeats until all pods are on the new ReplicaSet
|
||
6. Marks rollout complete
|
||
|
||
### Watching the rollout
|
||
|
||
```bash
|
||
kubectl rollout status -n honeydue deployment/api
|
||
```
|
||
|
||
Outputs progress; returns when complete or timed out. Default timeout
|
||
is 10 minutes.
|
||
|
||
More detailed:
|
||
|
||
```bash
|
||
# Watch pods transition
|
||
kubectl get pods -n honeydue -l app.kubernetes.io/name=api -w
|
||
|
||
# Watch events
|
||
kubectl get events -n honeydue --sort-by=.lastTimestamp -w
|
||
```
|
||
|
||
### For all three services
|
||
|
||
```bash
|
||
for svc in api worker admin; do
|
||
kubectl set image deployment/$svc -n honeydue \
|
||
$svc="gitea.treytartt.com/admin/honeydue-${svc}:${SHA}"
|
||
done
|
||
|
||
# Watch all rollouts
|
||
for svc in api worker admin; do
|
||
kubectl rollout status -n honeydue deployment/$svc
|
||
done
|
||
```
|
||
|
||
## Config-only changes (no new image)
|
||
|
||
When you change `prod.env` but code is unchanged:
|
||
|
||
```bash
|
||
# 1. Update prod.env locally
|
||
# 2. Regenerate ConfigMap
|
||
kubectl create configmap honeydue-config -n honeydue \
|
||
--from-env-file=deploy/prod.env \
|
||
--dry-run=client -o yaml | kubectl apply -f -
|
||
|
||
# 3. Pods do NOT auto-reload env vars. Restart them.
|
||
kubectl rollout restart -n honeydue deployment/api deployment/admin deployment/worker
|
||
```
|
||
|
||
`rollout restart` triggers a rolling update with the *same* image but
|
||
forces pod recreation. New pods pick up the updated ConfigMap.
|
||
|
||
### Why not auto-reload?
|
||
|
||
Kubernetes has no built-in mechanism to restart pods on ConfigMap change.
|
||
There's no `envFromWatch` equivalent. Third-party operators like
|
||
Reloader can do it, but we don't run one.
|
||
|
||
For sensitive config (like the `SECRET_KEY`), this is actually good —
|
||
pods don't cycle unexpectedly when someone tweaks the ConfigMap.
|
||
|
||
## Secret changes
|
||
|
||
Same flow as config:
|
||
|
||
```bash
|
||
# Rotate a value
|
||
kubectl patch secret honeydue-secrets -n honeydue \
|
||
--type=merge -p "{\"data\":{\"SECRET_KEY\":\"$(echo -n 'newvalue' | base64)\"}}"
|
||
|
||
# Restart pods
|
||
kubectl rollout restart -n honeydue deployment/api deployment/worker
|
||
```
|
||
|
||
## One-time B2 bucket lifecycle (manual)
|
||
|
||
The `pending_uploads` cleanup cron (`30 * * * *` on the worker) handles
|
||
the common case of reaping orphaned uploads. The B2 bucket lifecycle
|
||
rule on the `uploads/` prefix is the **backstop** if the worker is
|
||
offline for >24 hours. It's configured once via the Backblaze web
|
||
console — B2's S3 lifecycle API isn't fully implemented, so this can't
|
||
be in the deploy script.
|
||
|
||
One-time setup:
|
||
|
||
1. Open https://secure.backblaze.com/b2_buckets.htm → bucket
|
||
`honeyDueProd` → **Lifecycle Settings** → **Custom**
|
||
2. Add rule:
|
||
- File name prefix: `uploads/`
|
||
- Hide files older than: **7 days**
|
||
- Delete hidden files older than: **1 day**
|
||
|
||
Total maximum lifetime of an orphaned object after the rule fires: 8
|
||
days. The worker normally reaps within an hour, so the rule should
|
||
almost never trigger.
|
||
|
||
Verify:
|
||
|
||
```bash
|
||
# Requires the b2 CLI: brew install b2-tools
|
||
b2 bucket get-info honeyDueProd | jq '.lifecycleRules'
|
||
```
|
||
|
||
See `deploy-k3s/manifests/b2-lifecycle.md` for the canonical rule
|
||
definition and a curl-based fallback if the b2 CLI isn't available.
|
||
|
||
## Manifest changes
|
||
|
||
When you add/modify a deployment YAML:
|
||
|
||
```bash
|
||
kubectl apply -f deploy-k3s/manifests/api/deployment.yaml
|
||
```
|
||
|
||
If the change is a spec field that Kubernetes considers a new pod
|
||
template (e.g., changing resource limits, env, volumes), pods roll.
|
||
If the change is a scalar like replicas, no pod churn — just new pods
|
||
added/removed.
|
||
|
||
## Rollback
|
||
|
||
### Last-known-good rollback
|
||
|
||
```bash
|
||
kubectl rollout undo deployment/api -n honeydue
|
||
```
|
||
|
||
Reverts to the previous ReplicaSet (the one with the previous image).
|
||
Takes ~30s to stabilize.
|
||
|
||
### Rollback to a specific revision
|
||
|
||
```bash
|
||
# See revision history
|
||
kubectl rollout history deployment/api -n honeydue
|
||
|
||
# Revert to specific revision number
|
||
kubectl rollout undo deployment/api -n honeydue --to-revision=3
|
||
```
|
||
|
||
Kubernetes keeps up to 10 ReplicaSet revisions by default
|
||
(`spec.revisionHistoryLimit`).
|
||
|
||
### Hard rollback (deploy an older image)
|
||
|
||
```bash
|
||
kubectl set image deployment/api -n honeydue \
|
||
api="gitea.treytartt.com/admin/honeydue-api:<older-sha>"
|
||
```
|
||
|
||
Useful when you want to go back further than the revision history, or
|
||
to a specific known-good SHA.
|
||
|
||
## Rolling update semantics
|
||
|
||
```yaml
|
||
strategy:
|
||
type: RollingUpdate
|
||
rollingUpdate:
|
||
maxUnavailable: 0
|
||
maxSurge: 1
|
||
```
|
||
|
||
For api (3 replicas):
|
||
- `maxUnavailable: 0` — no pod is removed until replacement is ready
|
||
- `maxSurge: 1` — up to 4 pods exist simultaneously during rollout
|
||
|
||
Timeline (approximate, warm state):
|
||
- t=0: kubectl set image
|
||
- t=0: k8s creates new RS with 1 pod
|
||
- t=30s (or so): new pod readiness probe passes
|
||
- t=30s: k8s terminates 1 old pod
|
||
- t=60s: next new pod ready
|
||
- t=60s: another old pod terminates
|
||
- ...continues until all on new RS
|
||
|
||
Migrations run as a separate Kubernetes Job that completes before any
|
||
api/worker pod is rolled. So the rollout above never includes migration
|
||
work — pods that boot are guaranteed to find the schema already at the
|
||
expected version. See §"Migrations are gated, not interleaved" below.
|
||
|
||
## Migrations are gated, not interleaved
|
||
|
||
`03-deploy.sh` runs `goose up` as a one-shot Job before applying any
|
||
api/worker manifests:
|
||
|
||
```
|
||
1. kubectl delete job honeydue-migrate (idempotent, removes prior run)
|
||
2. kubectl apply -f manifests/migrate/job.yaml (with current api image)
|
||
3. kubectl wait --for=condition=complete --timeout=10m job/honeydue-migrate
|
||
4. (only if Job succeeded) kubectl apply -f manifests/api/...
|
||
```
|
||
|
||
The Job uses the api image — `/usr/local/bin/goose` is baked in at
|
||
Dockerfile build time. The Job script strips the `-pooler` segment
|
||
from `DB_HOST` before connecting (goose's session-scoped advisory
|
||
lock can't survive PgBouncer transaction-mode), runs `goose up`, exits.
|
||
|
||
If the Job fails, the script aborts before any new app pod sees a
|
||
stale schema. To debug:
|
||
|
||
```bash
|
||
kubectl -n honeydue logs job/honeydue-migrate --tail=200
|
||
kubectl -n honeydue describe job honeydue-migrate
|
||
```
|
||
|
||
After investigating, fix the migration file and re-run `03-deploy.sh`.
|
||
The Job is idempotent — successful migrations stay applied, only the
|
||
new/failed file gets retried.
|
||
|
||
api/worker pods run a `RequireSchemaApplied` check at startup that
|
||
queries `goose_db_version` and refuses to boot if the table is missing
|
||
or the latest row is `is_applied=false`. This is the fail-fast for
|
||
"someone bypassed the deploy script and the schema isn't current."
|
||
|
||
For full schema management background, see
|
||
[Chapter 8 §Schema management](./08-database.md).
|
||
|
||
## Hotfix workflow
|
||
|
||
When we need to ship a fix fast and skip the usual steps:
|
||
|
||
1. Fix in code
|
||
2. Build + push
|
||
3. `kubectl set image` on the affected service only
|
||
4. Monitor with `kubectl logs -f`
|
||
|
||
Don't skip CI/tests in a real org; for solo operator this is the tradeoff.
|
||
|
||
## Integration with Gitea
|
||
|
||
Currently no CI/CD. The operator builds from the workstation and pushes
|
||
manually. Future:
|
||
|
||
- Gitea Actions (Drone-like CI) could trigger on push to `main`
|
||
- Build + push step could run in a GitHub Actions-compatible workflow
|
||
- Auto-deploy on tag push, manual promote to prod
|
||
|
||
**TODO** (Chapter 20).
|
||
|
||
## What the old Swarm deploy script did
|
||
|
||
Contrast: `deploy/scripts/deploy_prod.sh` (Swarm-era) did:
|
||
|
||
1. Validate every config file (placeholder detection, APNS key format,
|
||
B2 all-or-none)
|
||
2. Buildx to amd64
|
||
3. Push to Gitea (we retrofitted this from GHCR)
|
||
4. SCP bundle to manager node
|
||
5. `docker secret create` + `docker config create` with versioned names
|
||
6. `docker stack deploy --with-registry-auth`
|
||
7. Poll stack services until convergence (420s timeout)
|
||
8. Prune old secret/config versions
|
||
9. Healthcheck the final URL; auto-rollback on failure
|
||
10. Log out of registries
|
||
|
||
The current k3s replacement, `deploy-k3s/scripts/03-deploy.sh`, covers
|
||
the same ground in fewer steps because Kubernetes does the
|
||
versioning/rollout/health bookkeeping natively. See the TL;DR section
|
||
at the top of this chapter.
|
||
|
||
## Common deploy failures
|
||
|
||
| Symptom | Likely cause |
|
||
|---|---|
|
||
| `ImagePullBackOff` | Image not in registry, or pull secret expired |
|
||
| Stuck at "Progressing" | Readiness probe not passing; check pod logs |
|
||
| `CrashLoopBackOff` immediately | App won't start; check pod logs for panic/exit reason |
|
||
| `CrashLoopBackOff` after migration | Cache service, Redis connection, or post-init code issue |
|
||
| Old pods never terminate | New pods not ready; rollout doesn't progress |
|
||
| Rollout succeeds but app is broken | Readiness probe is too lenient; passes on broken app |
|
||
|
||
### Debugging commands
|
||
|
||
```bash
|
||
# Describe the deployment (shows events, conditions)
|
||
kubectl describe deployment api -n honeydue
|
||
|
||
# Describe the latest pod
|
||
kubectl describe pod -n honeydue -l app.kubernetes.io/name=api
|
||
|
||
# Logs from currently-running pods
|
||
kubectl logs -n honeydue -l app.kubernetes.io/name=api --tail=100 --prefix
|
||
|
||
# Logs from the last-terminated pod
|
||
kubectl logs -n honeydue <pod> --previous
|
||
|
||
# Events in the namespace (newest first)
|
||
kubectl get events -n honeydue --sort-by=.lastTimestamp
|
||
|
||
# Pause a rollout (stops new pods from being created)
|
||
kubectl rollout pause deployment/api -n honeydue
|
||
|
||
# Resume
|
||
kubectl rollout resume deployment/api -n honeydue
|
||
```
|
||
|
||
## Zero-downtime considerations
|
||
|
||
For zero-downtime deploys, the new image must be:
|
||
|
||
1. **Backward-compatible** with the current database schema (schema
|
||
migrations run before new code)
|
||
2. **Backward-compatible** with in-flight API requests (don't remove
|
||
endpoints mid-deploy; deprecate first)
|
||
3. **Backward-compatible** with Redis data structures (don't change
|
||
cache key formats abruptly)
|
||
|
||
For breaking changes:
|
||
1. Deploy intermediate version that handles both old and new
|
||
2. Once rolled out everywhere, deploy breaking-change version
|
||
3. Two deploys, same day or different days
|
||
|
||
We don't have this discipline yet; our API has too few clients to
|
||
worry about. As mobile clients proliferate, this becomes more important.
|
||
|
||
## Blue-green / canary (not yet)
|
||
|
||
Kubernetes supports advanced rollout strategies:
|
||
- **Canary**: route 5% of traffic to new version, scale up gradually
|
||
- **Blue-green**: run new version alongside old, flip traffic all at
|
||
once
|
||
|
||
These require Traefik's TraefikService CRD with weighted routing, or
|
||
a service mesh. **TODO** if traffic scale justifies.
|
||
|
||
## Cleanup: the old Swarm config
|
||
|
||
`deploy/` directory contains the Swarm-era config. It's still there but
|
||
unused. After we're confident in k3s (a few weeks? month?), remove it:
|
||
|
||
```bash
|
||
rm -rf deploy/
|
||
```
|
||
|
||
Keep the useful files in `deploy-k3s/` only.
|
||
|
||
## Operator cheat sheet
|
||
|
||
```bash
|
||
# Full build + deploy
|
||
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
|
||
SHA=$(git rev-parse --short HEAD)
|
||
set -a; source deploy/registry.env; set +a
|
||
printf '%s' "$REGISTRY_TOKEN" | docker login "$REGISTRY" -u admin --password-stdin
|
||
docker buildx build --platform linux/amd64 --target api -t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .
|
||
docker buildx build --platform linux/amd64 --target worker -t "gitea.treytartt.com/admin/honeydue-worker:${SHA}" --push .
|
||
docker buildx build --platform linux/amd64 --target admin -t "gitea.treytartt.com/admin/honeydue-admin:${SHA}" --push .
|
||
docker logout gitea.treytartt.com
|
||
|
||
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
|
||
for svc in api worker admin; do
|
||
kubectl set image deployment/$svc -n honeydue "$svc=gitea.treytartt.com/admin/honeydue-${svc}:${SHA}"
|
||
done
|
||
|
||
for svc in api worker admin; do
|
||
kubectl rollout status -n honeydue deployment/$svc
|
||
done
|
||
```
|
||
|
||
## References
|
||
|
||
- [Kubernetes Deployment rolling update][rolling]
|
||
- [kubectl rollout][rollout]
|
||
- [Docker buildx][buildx]
|
||
|
||
[rolling]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment
|
||
[rollout]: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#rollout
|
||
[buildx]: https://docs.docker.com/build/buildx/
|