docs: rewrite ch15 observability + cross-refs for the live obs stack
ch15 is now an account of what's actually running, not a roadmap for what we'd add: VictoriaMetrics + Jaeger + Grafana on 88oakappsUpdate fronted by Cloudflare and bearer-gated nginx, vmagent in-cluster, the internal/prom histogram set, the rollout's NetworkPolicy footprint, the obs.88oakapps.com endpoint shape, the ~$0/700MB resource budget, and a token-rotation runbook. The "what we still don't have" section keeps log aggregation, alerting, and full distributed tracing as the honest gap list. Other touched docs: - 00-overview: \"deliberately absent\" no longer claims we have no metrics — calls out the cross-cluster shape instead. - 14-deployment-process: TL;DR now points at deploy-k3s/scripts/03-deploy.sh (full build + push + apply + obs vmagent), with the manual kubectl-set-image flow kept as the single-service path. Notes the IfNotPresent gotcha that bit us during the rollout. - 16-failure-modes: adds vmagent-can't-reach-obs and Grafana-no-data. - 18-cost: $0 line item for the obs stack on 88oakappsUpdate, with the CX32 migration trigger. - 17/18 README + appendix b: link the new ch15, add the obs cheat sheet block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -8,23 +8,62 @@ No downtime if the change is backward-compatible. Rollback is
|
||||
`kubectl rollout undo`. This chapter walks through the full process,
|
||||
plus alternate paths (config-only changes, manifest changes, hotfixes).
|
||||
|
||||
## TL;DR for a code change
|
||||
## TL;DR using the unified deploy script
|
||||
|
||||
The recommended path. `deploy-k3s/scripts/03-deploy.sh` builds all four
|
||||
images (api, worker, admin, web), pushes to Gitea, regenerates the
|
||||
ConfigMap from `config.yaml`, applies every manifest under
|
||||
`deploy-k3s/manifests/` (including the observability vmagent), and
|
||||
waits for all rollouts.
|
||||
|
||||
```bash
|
||||
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
|
||||
git add . && git commit -m "..." && git push gitea master
|
||||
|
||||
export KUBECONFIG=~/.kube/honeydue.yaml
|
||||
bash deploy-k3s/scripts/03-deploy.sh # full build + push + rollout
|
||||
# or, to redeploy without rebuilding:
|
||||
bash deploy-k3s/scripts/03-deploy.sh --skip-build
|
||||
# or, to pin a specific tag:
|
||||
bash deploy-k3s/scripts/03-deploy.sh --tag d3708e6
|
||||
```
|
||||
|
||||
What the script does, in order:
|
||||
|
||||
1. Read registry creds from `deploy-k3s/config.yaml`.
|
||||
2. `docker login gitea.treytartt.com`.
|
||||
3. Build all four images with `--platform linux/amd64` (so arm64 Macs
|
||||
don't push images that crash on Hetzner amd64 nodes with
|
||||
"exec format error").
|
||||
4. Push to the gitea registry, plus tag and push `:latest`.
|
||||
5. Generate the env file from `config.yaml` and apply as ConfigMap
|
||||
`honeydue-config` (uses dry-run + apply for diff-free idempotence).
|
||||
6. Apply `manifests/namespace.yaml`, `redis/`, `ingress/`,
|
||||
`api/{deployment,service,hpa}`, `worker/`, `admin/`, `web/`.
|
||||
7. Apply `manifests/observability/vmagent.yaml`, substituting
|
||||
`TOKEN_PLACEHOLDER` with `OBS_INGEST_TOKEN` from `deploy/prod.env`
|
||||
(gitignored). Skipped with a warning if the token isn't present.
|
||||
8. `kubectl rollout status` for every Deployment, including vmagent.
|
||||
|
||||
~7–10 minutes for a full rebuild. ~1–2 minutes with `--skip-build`.
|
||||
|
||||
## TL;DR for a single-service code change (manual)
|
||||
|
||||
```bash
|
||||
# 1. Commit + get SHA
|
||||
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
|
||||
git add . && git commit -m "..." && SHA=$(git rev-parse --short HEAD)
|
||||
|
||||
# 2. Login to Gitea registry
|
||||
set -a; source deploy/registry.env; set +a
|
||||
printf '%s' "$REGISTRY_TOKEN" | docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin
|
||||
# 2. Login to Gitea registry (creds in config.yaml)
|
||||
docker login gitea.treytartt.com -u admin
|
||||
|
||||
# 3. Build + push amd64 image
|
||||
docker buildx build --platform linux/amd64 --target api \
|
||||
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .
|
||||
docker build --platform linux/amd64 --target api \
|
||||
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" .
|
||||
docker push "gitea.treytartt.com/admin/honeydue-api:${SHA}"
|
||||
|
||||
# 4. Roll it in
|
||||
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
|
||||
export KUBECONFIG=~/.kube/honeydue.yaml
|
||||
kubectl set image deployment/api -n honeydue \
|
||||
api="gitea.treytartt.com/admin/honeydue-api:${SHA}"
|
||||
|
||||
@@ -32,11 +71,18 @@ kubectl set image deployment/api -n honeydue \
|
||||
kubectl rollout status -n honeydue deployment/api
|
||||
|
||||
# 6. Log out
|
||||
docker logout "$REGISTRY"
|
||||
docker logout gitea.treytartt.com
|
||||
```
|
||||
|
||||
~3–5 minutes end to end for api.
|
||||
|
||||
> **Gotcha:** Deployments default to `imagePullPolicy: IfNotPresent`,
|
||||
> which means kubelet won't re-fetch an image with a tag it already
|
||||
> has cached locally — even if the registry now has different bytes
|
||||
> at that tag. Always change tags (use the SHA), or temporarily flip
|
||||
> `imagePullPolicy: Always` and `kubectl rollout restart` if you need
|
||||
> to overwrite a tag.
|
||||
|
||||
## The build
|
||||
|
||||
### Step 1 — Prepare
|
||||
@@ -314,14 +360,10 @@ Contrast: `deploy/scripts/deploy_prod.sh` (Swarm-era) did:
|
||||
9. Healthcheck the final URL; auto-rollback on failure
|
||||
10. Log out of registries
|
||||
|
||||
Our current k3s deploy is more manual but simpler. We'd write a similar
|
||||
script for k3s if deploys become frequent:
|
||||
|
||||
```bash
|
||||
# deploy-k3s/scripts/04-deploy.sh (not yet updated for Gitea)
|
||||
```
|
||||
|
||||
See the scaffold in `deploy-k3s/scripts/`.
|
||||
The current k3s replacement, `deploy-k3s/scripts/03-deploy.sh`, covers
|
||||
the same ground in fewer steps because Kubernetes does the
|
||||
versioning/rollout/health bookkeeping natively. See the TL;DR section
|
||||
at the top of this chapter.
|
||||
|
||||
## Common deploy failures
|
||||
|
||||
|
||||
Reference in New Issue
Block a user