admin/honeyDueAPI

Fork 0

Files

T

History

Trey t c9ac273dbd

Backend CI / Test (push) Has been cancelled

Details

Backend CI / Contract Tests (push) Has been cancelled

Details

Backend CI / Build (push) Has been cancelled

Details

Backend CI / Lint (push) Has been cancelled

Details

Backend CI / Secret Scanning (push) Has been cancelled

Details

docs: capture latency optimizations + new caching invariants

Shipping commit 88fb175 changed the trace shape and added a new caching
layer with required invalidation rules. Updating the operator-facing
docs so they match the running system.

ch08 (database):
- DB_HOST is the -pooler Neon endpoint, not direct compute
- Connection pool: MaxIdleConns 20 (was 10), MaxLifetime 30m (was 10m),
  MaxIdleTime 0 (never close idle)
- New \"Pool warm-up at boot\" section documenting the 20-parallel-ping
  warm-up in database.Connect
- Replaced the \"Neon regions\" section: explicit RTT numbers, the
  optimization stack that minimizes round-trips, when this still matters

ch15 (observability):
- Replaced the 2,473ms/5-span sample trace with the new 229ms/2-span
  post-optimization trace; kept the old one underneath for diff context

ch16 (failure modes):
- Added: stale residence-IDs cache (data freshness bug + recovery)
- Added: Redis at maxmemory limit (verify allkeys-lru policy)
- Added: Neon pooler unreachable but direct endpoint up — emergency
  switchover procedure

ch17 (runbook):
- §23 Invalidate residence-IDs cache for a user (DEL key + grep for
  missing invalidation in new code)
- §24 Verify DB pool warm-up is working (log pattern + impact test)
- §25 Switch DB host between pooler and direct endpoints

observability-plan.md status flipped from \"plan only\" to shipped
with the latency-cut summary.

README links to the new ch08 latency section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-25 17:36:36 -05:00

appendices

docs: rewrite ch15 observability + cross-refs for the live obs stack

2026-04-25 15:05:06 -05:00

00-overview.md

docs: rewrite ch15 observability + cross-refs for the live obs stack

2026-04-25 15:05:06 -05:00

01-infrastructure.md

Migrate prod deploy from Swarm to K3s; add full deployment book

2026-04-24 07:20:54 -05:00

02-orchestrator-choice.md

Migrate prod deploy from Swarm to K3s; add full deployment book

2026-04-24 07:20:54 -05:00

03-networking.md

Migrate prod deploy from Swarm to K3s; add full deployment book

2026-04-24 07:20:54 -05:00

04-firewall.md

docs/deployment: record security hardening pass + webapp + APNs

2026-04-24 15:50:59 -05:00

05-security.md

Migrate prod deploy from Swarm to K3s; add full deployment book

2026-04-24 07:20:54 -05:00

06-traefik-ingress.md

docs/deployment: record security hardening pass + webapp + APNs

2026-04-24 15:50:59 -05:00

07-services.md

docs/deployment: record security hardening pass + webapp + APNs

2026-04-24 15:50:59 -05:00

08-database.md

docs: capture latency optimizations + new caching invariants

2026-04-25 17:36:36 -05:00

09-storage.md

Migrate prod deploy from Swarm to K3s; add full deployment book

2026-04-24 07:20:54 -05:00

10-secrets-config.md

Fix Apple Sign In: update bundle IDs from old com.tt.honeyDue.* to com.myhoneydue.*

2026-04-24 23:58:44 -05:00

11-registry.md

Migrate prod deploy from Swarm to K3s; add full deployment book

2026-04-24 07:20:54 -05:00

12-data-flow.md

Migrate prod deploy from Swarm to K3s; add full deployment book

2026-04-24 07:20:54 -05:00

13-cloudflare.md

docs/deployment: record security hardening pass + webapp + APNs

2026-04-24 15:50:59 -05:00

14-deployment-process.md

docs: rewrite ch15 observability + cross-refs for the live obs stack

2026-04-25 15:05:06 -05:00

15-observability.md

docs: capture latency optimizations + new caching invariants

2026-04-25 17:36:36 -05:00

16-failure-modes.md

docs: capture latency optimizations + new caching invariants

2026-04-25 17:36:36 -05:00

17-runbook.md

docs: capture latency optimizations + new caching invariants

2026-04-25 17:36:36 -05:00

18-cost.md

docs: rewrite ch15 observability + cross-refs for the live obs stack

2026-04-25 15:05:06 -05:00

19-postmortem-swarm.md

Migrate prod deploy from Swarm to K3s; add full deployment book

2026-04-24 07:20:54 -05:00

20-roadmap.md

docs/deployment: record security hardening pass + webapp + APNs

2026-04-24 15:50:59 -05:00

README.md

docs: rewrite ch15 observability + cross-refs for the live obs stack

2026-04-25 15:05:06 -05:00

README.md

honeyDue Production Deployment — The Book

This is the complete reference for the honeyDue production deployment as it exists on 2026-04-24. It serves two audiences:

A new engineer learning the system for the first time. Start at Chapter 0 (Overview) and read in order. Concepts are built up; nothing is assumed beyond "you've deployed web apps before."
The operator (future-you) needing a specific fact fast. Every chapter opens with a one-paragraph summary and has an operator runbook at its end. The appendices are a cheat sheet.

The deployment is non-trivial. It's a 3-node HA Kubernetes cluster running a Go API, a Next.js admin panel, a background worker, Redis, and Traefik — all fronted by Cloudflare, integrated with Neon Postgres, Backblaze B2, and a self-hosted Gitea registry. This book explains why each of those pieces was chosen (often over two or three alternatives we tried first), what they do, and how to operate them.

Part I — The System

00 — Overview — what's running, at a glance
01 — Infrastructure — Hetzner nodes, specs, cost, region
02 — Orchestrator Choice — why k3s (and not Swarm, full k8s, or Nomad)

Part II — Networking

03 — Networking — flannel, CoreDNS, kube-proxy, the overlay story
04 — Firewall — every UFW rule on every node, rationale
13 — Cloudflare — DNS, SSL modes, round-robin origin pool

Part III — Security

05 — Security — RBAC, Pod Security, secrets, TLS chain
06 — Traefik Ingress — host-network DaemonSet, cert plan

Part IV — Workloads

07 — Services — api, admin, worker, redis per-service deep dive
08 — Database — Neon Postgres, advisory-lock migrations
09 — Storage — Backblaze B2, minio-go client details
10 — Secrets & Config — ConfigMap, Secret, env mapping
11 — Registry — Gitea container registry, multi-arch builds

Part V — Operation

12 — Data Flow — end-to-end request lifecycle
14 — Deployment Process — how to roll new code
15 — Observability — VictoriaMetrics + Jaeger + Grafana on obs.88oakapps.com, vmagent in-cluster, Prometheus histograms in the Go API
16 — Failure Modes — what happens when X dies
17 — Runbook — common ops tasks

Part VI — Context

18 — Cost — what this costs to run, per service
19 — Swarm Postmortem — the story of why we migrated from Docker Swarm
20 — Roadmap — known TODOs and scaling triggers

Appendices

Quick Facts

Field	Value
Orchestrator	K3s v1.34.6+k3s1 (3 nodes, HA control plane)
Ingress	Traefik v3 (DaemonSet, hostNetwork)
Nodes	3× Hetzner Cloud CX33 (4 vCPU, 8 GB RAM, 80 GB SSD) in `nbg1` (Nuremberg)
DNS & Edge	Cloudflare (Free plan), SSL=Flexible, round-robin 3 node A records
Database	Neon Postgres, `ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech`
Cache + Queue	Redis 7-alpine, in-cluster, 1 replica, PVC-backed, pinned to `nbg1-2`
Object Storage	Backblaze B2, `honeyDueProd` bucket, `us-east-005` region
Image Registry	Self-hosted Gitea v1.25.5 at `gitea.treytartt.com`
Transactional Email	Fastmail SMTP (`smtp.fastmail.com:587`)
Domains	`api.myhoneydue.com`, `admin.myhoneydue.com`, `myhoneydue.com`
Monthly Cost (current)	~$30–40 (3× Hetzner + Neon Launch + B2 + Cloudflare Free + Gitea free)
kubeconfig	`~/.kube/honeydue-k3s.yaml` on operator workstation
Repo	`honeyDueAPI-go/deploy-k3s/` for manifests, `deploy/` is the legacy Swarm config

How to Read This Book

"Why did we…?" answers are in the chapter covering that component. Every major design choice has an explicit rejection of 1–3 alternatives.
Historical bugs are in Chapter 19. The rest of the book describes the current (fixed) state; 19 is the forensic record of what was broken and how we figured it out.
Operator commands you'll run regularly are in Appendix B. Chapter 17 has longer procedures (cert rotation, DB migration, etc.).
Citations throughout use footnote-style links to the canonical source (k3s docs, moby issues, Cloudflare docs, etc.). Appendix D collects them.

Conventions

Kubernetes namespace for the app is honeydue.
SSH aliases are hetzner1, hetzner2, hetzner3 in your ~/.ssh/config.
Node hostnames in the cluster are ubuntu-8gb-nbg1-{1,2,3} (Hetzner-assigned).
The mapping is non-obvious because the Hetzner hostname suffix order does not match SSH alias order:

SSH alias	Public IP	Hostname in k3s
hetzner1	178.104.247.152	`ubuntu-8gb-nbg1-2`
hetzner2	178.105.32.198	`ubuntu-8gb-nbg1-1`
hetzner3	178.104.249.189	`ubuntu-8gb-nbg1-3`

When a chapter refers to "hetzner1" it means the box at 178.104.247.152 / nbg1-2.

README.md Unescape Escape

honeyDue Production Deployment — The Book

Table of Contents

Part I — The System

Part II — Networking

Part III — Security

Part IV — Workloads

Part V — Operation

Part VI — Context

Appendices

Quick Facts

How to Read This Book

Conventions

README.md