docs: rewrite ch15 observability + cross-refs for the live obs stack
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled

ch15 is now an account of what's actually running, not a roadmap for
what we'd add: VictoriaMetrics + Jaeger + Grafana on 88oakappsUpdate
fronted by Cloudflare and bearer-gated nginx, vmagent in-cluster, the
internal/prom histogram set, the rollout's NetworkPolicy footprint,
the obs.88oakapps.com endpoint shape, the ~$0/700MB resource budget,
and a token-rotation runbook. The "what we still don't have" section
keeps log aggregation, alerting, and full distributed tracing as the
honest gap list.

Other touched docs:
- 00-overview: \"deliberately absent\" no longer claims we have no
  metrics — calls out the cross-cluster shape instead.
- 14-deployment-process: TL;DR now points at deploy-k3s/scripts/03-deploy.sh
  (full build + push + apply + obs vmagent), with the manual
  kubectl-set-image flow kept as the single-service path. Notes the
  IfNotPresent gotcha that bit us during the rollout.
- 16-failure-modes: adds vmagent-can't-reach-obs and Grafana-no-data.
- 18-cost: $0 line item for the obs stack on 88oakappsUpdate, with the
  CX32 migration trigger.
- 17/18 README + appendix b: link the new ch15, add the obs cheat
  sheet block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-04-25 15:05:06 -05:00
parent d3708e6c72
commit 77cfcc0b27
8 changed files with 414 additions and 187 deletions
+58 -16
View File
@@ -8,23 +8,62 @@ No downtime if the change is backward-compatible. Rollback is
`kubectl rollout undo`. This chapter walks through the full process,
plus alternate paths (config-only changes, manifest changes, hotfixes).
## TL;DR for a code change
## TL;DR using the unified deploy script
The recommended path. `deploy-k3s/scripts/03-deploy.sh` builds all four
images (api, worker, admin, web), pushes to Gitea, regenerates the
ConfigMap from `config.yaml`, applies every manifest under
`deploy-k3s/manifests/` (including the observability vmagent), and
waits for all rollouts.
```bash
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
git add . && git commit -m "..." && git push gitea master
export KUBECONFIG=~/.kube/honeydue.yaml
bash deploy-k3s/scripts/03-deploy.sh # full build + push + rollout
# or, to redeploy without rebuilding:
bash deploy-k3s/scripts/03-deploy.sh --skip-build
# or, to pin a specific tag:
bash deploy-k3s/scripts/03-deploy.sh --tag d3708e6
```
What the script does, in order:
1. Read registry creds from `deploy-k3s/config.yaml`.
2. `docker login gitea.treytartt.com`.
3. Build all four images with `--platform linux/amd64` (so arm64 Macs
don't push images that crash on Hetzner amd64 nodes with
"exec format error").
4. Push to the gitea registry, plus tag and push `:latest`.
5. Generate the env file from `config.yaml` and apply as ConfigMap
`honeydue-config` (uses dry-run + apply for diff-free idempotence).
6. Apply `manifests/namespace.yaml`, `redis/`, `ingress/`,
`api/{deployment,service,hpa}`, `worker/`, `admin/`, `web/`.
7. Apply `manifests/observability/vmagent.yaml`, substituting
`TOKEN_PLACEHOLDER` with `OBS_INGEST_TOKEN` from `deploy/prod.env`
(gitignored). Skipped with a warning if the token isn't present.
8. `kubectl rollout status` for every Deployment, including vmagent.
~710 minutes for a full rebuild. ~12 minutes with `--skip-build`.
## TL;DR for a single-service code change (manual)
```bash
# 1. Commit + get SHA
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
git add . && git commit -m "..." && SHA=$(git rev-parse --short HEAD)
# 2. Login to Gitea registry
set -a; source deploy/registry.env; set +a
printf '%s' "$REGISTRY_TOKEN" | docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin
# 2. Login to Gitea registry (creds in config.yaml)
docker login gitea.treytartt.com -u admin
# 3. Build + push amd64 image
docker buildx build --platform linux/amd64 --target api \
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .
docker build --platform linux/amd64 --target api \
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" .
docker push "gitea.treytartt.com/admin/honeydue-api:${SHA}"
# 4. Roll it in
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
export KUBECONFIG=~/.kube/honeydue.yaml
kubectl set image deployment/api -n honeydue \
api="gitea.treytartt.com/admin/honeydue-api:${SHA}"
@@ -32,11 +71,18 @@ kubectl set image deployment/api -n honeydue \
kubectl rollout status -n honeydue deployment/api
# 6. Log out
docker logout "$REGISTRY"
docker logout gitea.treytartt.com
```
~35 minutes end to end for api.
> **Gotcha:** Deployments default to `imagePullPolicy: IfNotPresent`,
> which means kubelet won't re-fetch an image with a tag it already
> has cached locally — even if the registry now has different bytes
> at that tag. Always change tags (use the SHA), or temporarily flip
> `imagePullPolicy: Always` and `kubectl rollout restart` if you need
> to overwrite a tag.
## The build
### Step 1 — Prepare
@@ -314,14 +360,10 @@ Contrast: `deploy/scripts/deploy_prod.sh` (Swarm-era) did:
9. Healthcheck the final URL; auto-rollback on failure
10. Log out of registries
Our current k3s deploy is more manual but simpler. We'd write a similar
script for k3s if deploys become frequent:
```bash
# deploy-k3s/scripts/04-deploy.sh (not yet updated for Gitea)
```
See the scaffold in `deploy-k3s/scripts/`.
The current k3s replacement, `deploy-k3s/scripts/03-deploy.sh`, covers
the same ground in fewer steps because Kubernetes does the
versioning/rollout/health bookkeeping natively. See the TL;DR section
at the top of this chapter.
## Common deploy failures