docs: rewrite ch15 observability + cross-refs for the live obs stack
ch15 is now an account of what's actually running, not a roadmap for what we'd add: VictoriaMetrics + Jaeger + Grafana on 88oakappsUpdate fronted by Cloudflare and bearer-gated nginx, vmagent in-cluster, the internal/prom histogram set, the rollout's NetworkPolicy footprint, the obs.88oakapps.com endpoint shape, the ~$0/700MB resource budget, and a token-rotation runbook. The "what we still don't have" section keeps log aggregation, alerting, and full distributed tracing as the honest gap list. Other touched docs: - 00-overview: \"deliberately absent\" no longer claims we have no metrics — calls out the cross-cluster shape instead. - 14-deployment-process: TL;DR now points at deploy-k3s/scripts/03-deploy.sh (full build + push + apply + obs vmagent), with the manual kubectl-set-image flow kept as the single-service path. Notes the IfNotPresent gotcha that bit us during the rollout. - 16-failure-modes: adds vmagent-can't-reach-obs and Grafana-no-data. - 18-cost: $0 line item for the obs stack on 88oakappsUpdate, with the CX32 migration trigger. - 17/18 README + appendix b: link the new ch15, add the obs cheat sheet block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -349,7 +349,11 @@ All protected endpoints require an `Authorization: Token <token>` header.
|
||||
|
||||
Production runs on a **3-node K3s HA cluster** on Hetzner Cloud, fronted
|
||||
by Cloudflare, with Neon Postgres, Backblaze B2, and a self-hosted Gitea
|
||||
container registry. See the full deployment book for every detail:
|
||||
container registry. Live observability (VictoriaMetrics + Jaeger +
|
||||
Grafana) runs on a separate Linode VPS at
|
||||
[`grafana.88oakapps.com`](https://grafana.88oakapps.com) and is fed by a
|
||||
`vmagent` sidecar in-cluster. See the full deployment book for every
|
||||
detail:
|
||||
|
||||
**→ [docs/deployment/](./docs/deployment/README.md) — The Deployment Book**
|
||||
|
||||
@@ -371,7 +375,9 @@ Quick links:
|
||||
|
||||
- **Runbook** — [docs/deployment/17-runbook.md](./docs/deployment/17-runbook.md) — 22 common ops procedures
|
||||
- **kubectl cheat sheet** — [docs/deployment/appendices/b-commands.md](./docs/deployment/appendices/b-commands.md)
|
||||
- **Deploy process** — [docs/deployment/14-deployment-process.md](./docs/deployment/14-deployment-process.md) — build → push → rollout
|
||||
- **Deploy process** — [docs/deployment/14-deployment-process.md](./docs/deployment/14-deployment-process.md) — `bash deploy-k3s/scripts/03-deploy.sh` builds → pushes → rolls out
|
||||
- **Observability** — [docs/deployment/15-observability.md](./docs/deployment/15-observability.md) — VictoriaMetrics + Jaeger + Grafana on `obs.88oakapps.com`
|
||||
- **Observability plan** — [docs/observability-plan.md](./docs/observability-plan.md) — design doc and rollout phases
|
||||
- **Failure modes** — [docs/deployment/16-failure-modes.md](./docs/deployment/16-failure-modes.md) — what happens when X dies
|
||||
- **Swarm postmortem** — [docs/deployment/19-postmortem-swarm.md](./docs/deployment/19-postmortem-swarm.md) — why we migrated
|
||||
|
||||
|
||||
Reference in New Issue
Block a user