bc3da007db
Step 1 — OTel SDK: cmd/api and cmd/worker initialize a tracer provider that exports OTLP/HTTP to obs.88oakapps.com (Jaeger all-in-one). Sampling is AlwaysSample in dev (DEBUG=true) and TraceIDRatioBased(0.1) in prod, overridable via OTEL_TRACES_SAMPLER_ARG. Service names are honeydue-api and honeydue-worker. otelecho.Middleware opens a span per HTTP request. Step 2 — Manual spans: storage_service.Upload now takes ctx and emits storage.upload + b2.PutObject spans (size_bytes, key, mime_type, bucket, result attrs). APNs Send/SendWithCategory and FCM sendOne emit per-token spans with topic, status_code, reason. Asynq middleware emits asynq.handle:<task_type> per job with retry/payload attrs and records asynq_job_duration_seconds. Step 3 — Database: otelgorm plugin registered in database.Connect, so any SQL emitted via db.WithContext(ctx) attaches to the request span. Every repository now exposes WithContext(ctx) *XRepository as the migration helper. TaskService.ListTasks and GetTasksByResidence are migrated end-to-end (ctx threaded through handler → service → repo); remaining services adopt the same pattern incrementally — pre-migration methods still emit untraced SQL via the unchanged db field. OBS_TRACES_URL and OBS_INGEST_TOKEN flow from deploy/prod.env → honeydue-secrets → api+worker Deployments via secretKeyRef (optional). 02-setup-secrets.sh sources them from prod.env on next run; manifests mark both env vars optional so the deployment rolls without traces if the secret is absent. ch15 observability doc now lists what produces spans today vs the remaining migration work, with the explicit per-method pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
honeyDue Production Deployment — The Book
This is the complete reference for the honeyDue production deployment as it exists on 2026-04-24. It serves two audiences:
- A new engineer learning the system for the first time. Start at Chapter 0 (Overview) and read in order. Concepts are built up; nothing is assumed beyond "you've deployed web apps before."
- The operator (future-you) needing a specific fact fast. Every chapter opens with a one-paragraph summary and has an operator runbook at its end. The appendices are a cheat sheet.
The deployment is non-trivial. It's a 3-node HA Kubernetes cluster running a Go API, a Next.js admin panel, a background worker, Redis, and Traefik — all fronted by Cloudflare, integrated with Neon Postgres, Backblaze B2, and a self-hosted Gitea registry. This book explains why each of those pieces was chosen (often over two or three alternatives we tried first), what they do, and how to operate them.
Table of Contents
Part I — The System
- 00 — Overview — what's running, at a glance
- 01 — Infrastructure — Hetzner nodes, specs, cost, region
- 02 — Orchestrator Choice — why k3s (and not Swarm, full k8s, or Nomad)
Part II — Networking
- 03 — Networking — flannel, CoreDNS, kube-proxy, the overlay story
- 04 — Firewall — every UFW rule on every node, rationale
- 13 — Cloudflare — DNS, SSL modes, round-robin origin pool
Part III — Security
- 05 — Security — RBAC, Pod Security, secrets, TLS chain
- 06 — Traefik Ingress — host-network DaemonSet, cert plan
Part IV — Workloads
- 07 — Services — api, admin, worker, redis per-service deep dive
- 08 — Database — Neon Postgres, advisory-lock migrations
- 09 — Storage — Backblaze B2, minio-go client details
- 10 — Secrets & Config — ConfigMap, Secret, env mapping
- 11 — Registry — Gitea container registry, multi-arch builds
Part V — Operation
- 12 — Data Flow — end-to-end request lifecycle
- 14 — Deployment Process — how to roll new code
- 15 — Observability — VictoriaMetrics + Jaeger + Grafana on
obs.88oakapps.com, vmagent in-cluster, Prometheus histograms in the Go API - 16 — Failure Modes — what happens when X dies
- 17 — Runbook — common ops tasks
Part VI — Context
- 18 — Cost — what this costs to run, per service
- 19 — Swarm Postmortem — the story of why we migrated from Docker Swarm
- 20 — Roadmap — known TODOs and scaling triggers
Appendices
Quick Facts
| Field | Value |
|---|---|
| Orchestrator | K3s v1.34.6+k3s1 (3 nodes, HA control plane) |
| Ingress | Traefik v3 (DaemonSet, hostNetwork) |
| Nodes | 3× Hetzner Cloud CX33 (4 vCPU, 8 GB RAM, 80 GB SSD) in nbg1 (Nuremberg) |
| DNS & Edge | Cloudflare (Free plan), SSL=Flexible, round-robin 3 node A records |
| Database | Neon Postgres, ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech |
| Cache + Queue | Redis 7-alpine, in-cluster, 1 replica, PVC-backed, pinned to nbg1-2 |
| Object Storage | Backblaze B2, honeyDueProd bucket, us-east-005 region |
| Image Registry | Self-hosted Gitea v1.25.5 at gitea.treytartt.com |
| Transactional Email | Fastmail SMTP (smtp.fastmail.com:587) |
| Domains | api.myhoneydue.com, admin.myhoneydue.com, myhoneydue.com |
| Monthly Cost (current) | ~$30–40 (3× Hetzner + Neon Launch + B2 + Cloudflare Free + Gitea free) |
| kubeconfig | ~/.kube/honeydue-k3s.yaml on operator workstation |
| Repo | honeyDueAPI-go/deploy-k3s/ for manifests, deploy/ is the legacy Swarm config |
How to Read This Book
- "Why did we…?" answers are in the chapter covering that component. Every major design choice has an explicit rejection of 1–3 alternatives.
- Historical bugs are in Chapter 19. The rest of the book describes the current (fixed) state; 19 is the forensic record of what was broken and how we figured it out.
- Operator commands you'll run regularly are in Appendix B. Chapter 17 has longer procedures (cert rotation, DB migration, etc.).
- Citations throughout use footnote-style links to the canonical source (k3s docs, moby issues, Cloudflare docs, etc.). Appendix D collects them.
Conventions
- Kubernetes namespace for the app is
honeydue. - SSH aliases are
hetzner1,hetzner2,hetzner3in your~/.ssh/config. - Node hostnames in the cluster are
ubuntu-8gb-nbg1-{1,2,3}(Hetzner-assigned). - The mapping is non-obvious because the Hetzner hostname suffix order does not match SSH alias order:
| SSH alias | Public IP | Hostname in k3s |
|---|---|---|
| hetzner1 | 178.104.247.152 | ubuntu-8gb-nbg1-2 |
| hetzner2 | 178.105.32.198 | ubuntu-8gb-nbg1-1 |
| hetzner3 | 178.104.249.189 | ubuntu-8gb-nbg1-3 |
When a chapter refers to "hetzner1" it means the box at 178.104.247.152 / nbg1-2.