# 00 — Overview ## Summary honeyDue runs on a three-node Kubernetes cluster managed by K3s, fronted by Cloudflare, and backed by a managed Postgres (Neon), S3-compatible object storage (Backblaze B2), and a self-hosted container registry (Gitea). The application consists of a Go REST API, a Next.js admin panel, and a background worker process using Redis-backed queues. Traefik handles HTTP ingress and path-based routing. The whole stack fits in about 1 GB of RAM across the three nodes with plenty of headroom. This chapter is the map. Everything here is expanded in a later chapter. ## Architecture at a glance ```mermaid flowchart TB subgraph Internet Browser[End-user browser / mobile client] end subgraph CF[Cloudflare] CFEdge[Edge POP
TLS terminates here] end Browser -- HTTPS :443 --> CFEdge subgraph Hetzner[Hetzner Cloud — Nuremberg nbg1] direction LR subgraph H1[hetzner1
178.104.247.152] T1[Traefik
:80/:443 hostNet] A1[api pod] W1[worker pod] end subgraph H2[hetzner2
178.105.32.198] T2[Traefik
:80/:443 hostNet] A2[api pod] R1[redis pod
PVC] end subgraph H3[hetzner3
178.104.249.189] T3[Traefik
:80/:443 hostNet] A3[api pod] AD1[admin pod] end end CFEdge -- HTTP :80
DNS round-robin --> T1 CFEdge -- HTTP :80 --> T2 CFEdge -- HTTP :80 --> T3 T1 & T2 & T3 -.Ingress routes by
Host header.-> A1 T1 & T2 & T3 -.-> AD1 A1 & A2 & A3 -.-> R1 subgraph External[Managed services] Neon[(Neon Postgres
AWS us-east-1)] B2[(Backblaze B2
us-east-005)] FM[Fastmail SMTP] Gitea[Gitea Registry
gitea.treytartt.com] end A1 & A2 & A3 -- SSL --> Neon W1 -- SSL --> Neon A1 & A2 & A3 -- HTTPS --> B2 W1 -- SMTP :587 --> FM H1 & H2 & H3 -. image pull .-> Gitea ``` ### ASCII fallback ``` ┌─────────────────────┐ │ End user │ └──────────┬──────────┘ │ HTTPS :443 ▼ ┌─────────────────────┐ │ Cloudflare edge │ TLS terminates here │ (SSL = Flexible) │ └──────────┬──────────┘ HTTP :80 round-robin ┌─────────────┼─────────────┐ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ hetzner1 │ │ hetzner2 │ │ hetzner3 │ │ 178.104.247.152 │ │ 178.105.32.198 │ │ 178.104.249.189 │ │ Traefik :80/443 │ │ Traefik :80/443 │ │ Traefik :80/443 │ │ api worker │ │ api redis │ │ api admin │ └─────────┬───────┘ └─────────┬───────┘ └─────────┬───────┘ │ │ │ └──────── Kubernetes overlay ───────────┘ │ ┌─────────────────────────────┴──────────────────────────────┐ │ │ ▼ ▼ ▼ ▼ ┌─────────┐ ┌─────────────┐ ┌──────────┐ ┌───────────────┐ │ Neon │ │ Backblaze B2│ │ Fastmail │ │ Gitea Registry│ │Postgres │ │ uploads │ │ SMTP │ │ image pull │ └─────────┘ └─────────────┘ └──────────┘ └───────────────┘ ``` ## The stack, one layer at a time ### Layer 0 — Hardware Three Hetzner Cloud CX33 instances (4 vCPU, 8 GB RAM, 80 GB NVMe SSD) in Hetzner's Nuremberg (nbg1) datacenter. Each node is $7.99/mo (April 2026 pricing), totaling ~$24/mo. See [Chapter 1](./01-infrastructure.md). ### Layer 1 — Operating system Ubuntu 24.04.3 LTS. Each node has: - SSH on port 22, key-only auth, `deploy` user with NOPASSWD sudo - `ufw` firewall with strict default-deny-incoming; specific ports allowed per Chapter 4 - Sysctl override `net.ipv4.ip_unprivileged_port_start=0` so non-root containers can bind privileged ports (needed for Traefik to serve :80/:443) ### Layer 2 — Container runtime `containerd` v2.2.2 (bundled with K3s). Docker was previously installed from the Swarm era but is now disabled. containerd is Kubernetes' reference runtime and has a smaller footprint than Docker's full stack. ### Layer 3 — Orchestrator K3s v1.34.6 in HA mode. All 3 nodes are `control-plane,etcd` (Raft quorum of 3 — can tolerate one node failure). K3s is a minimal Kubernetes distribution from Rancher Labs (now Suse): single-binary, embedded etcd instead of a separate etcd cluster, sane defaults for small installations. See [Chapter 2](./02-orchestrator-choice.md) for why k3s over full Kubernetes or Docker Swarm. ### Layer 4 — Cluster networking - **Flannel VXLAN** for pod-to-pod overlay (default on K3s). VXLAN tunnels pod traffic over UDP port 8472 between nodes. - **CoreDNS** for service discovery (what pods call `api` or `redis` to reach each other). - **kube-proxy** in IPVS mode for ClusterIP → pod routing. [Chapter 3](./03-networking.md) walks through a single request to show every hop. ### Layer 5 — Ingress **Traefik v3** as a DaemonSet with `hostNetwork: true`. Each node has a Traefik pod that binds directly to the node's public :80 and :443. No `servicelb`, no Hetzner Load Balancer — Cloudflare round-robins the three node IPs in DNS and any node can serve any request. See [Chapter 6](./06-traefik-ingress.md). ### Layer 6 — Edge / CDN Cloudflare Free plan. Proxied A records for `api.myhoneydue.com`, `admin.myhoneydue.com`, and `myhoneydue.com` each point at all three node IPs. Edge handles TLS termination (SSL=Flexible), DDoS protection, caching for static assets, and traffic failover if a node becomes unreachable. See [Chapter 13](./13-cloudflare.md). ### Layer 7 — Application services | Service | Type | Replicas | Image | |---|---|---|---| | `api` | Go (Echo, GORM) | 3 | `gitea.treytartt.com/admin/honeydue-api:` | | `admin` | Next.js 16 | 1 | `gitea.treytartt.com/admin/honeydue-admin:` | | `worker` | Go (Asynq) | 1 | `gitea.treytartt.com/admin/honeydue-worker:` | | `redis` | redis:7-alpine | 1 | Docker Hub | See [Chapter 7](./07-services.md). ### Layer 8 — External dependencies - **Neon Postgres** (Launch plan) — `honeyDue` database - **Backblaze B2** — `honeyDueProd` bucket for user uploads - **Fastmail SMTP** — transactional email - **Gitea** (self-hosted at `gitea.treytartt.com`) — container registry - **Cloudflare** — DNS, TLS, CDN See [Chapter 8](./08-database.md), [9](./09-storage.md), and [11](./11-registry.md). ## What's deliberately absent - **TLS at origin.** Cloudflare terminates TLS at the edge and talks HTTP on port 80 to the nodes. This is "Flexible SSL" in Cloudflare terminology. It's the simplest setup; we have a TODO to upgrade to "Full (strict)" with Cloudflare Origin CA certs ([Chapter 13](./13-cloudflare.md), §Future). - **Hetzner Load Balancer.** We save the $8.49/mo by having Cloudflare round-robin across node IPs directly. If any node is unresponsive, Cloudflare's own origin health checks will route around it within 30s. - **Push notifications.** APNs (iOS) and FCM (Android) are *configured off* until we have Apple Developer / Google Play accounts. The env vars are set to sentinel values that let the Go app boot; `FEATURE_PUSH_ENABLED=false` gates all call sites. - **In-cluster Prometheus / Grafana.** Self-hosted Prometheus-compatible metrics + tracing + dashboards live **outside** the k3s cluster on `88oakappsUpdate` (the same Linode VPS that hosts PostHog), reached via `https://obs.88oakapps.com` (Cloudflare-fronted, bearer-gated). A `vmagent` sidecar in the honeydue namespace scrapes the api Pods and remote-writes out. This frees ~700 MB of cluster RAM and means observability survives a k3s control-plane incident. See [Chapter 15](./15-observability.md). - **Alerting.** No PagerDuty, Slack hooks, or pages-on-error wired up yet. Histograms are flowing into Grafana — alert rules on top of them is the next add. See [Chapter 15 — Future](./15-observability.md). - **Automated backups of Redis state.** Redis is configured with AOF (append-only file) persistence, but the PVC is only on one node. Redis holds only cache + Asynq queue state; losing it re-populates on first request / next cron tick. Not critical. - **Admin panel basic auth (Traefik middleware).** In-app admin login is enabled; the extra Traefik-layer basic auth the scaffold supports is not currently attached. ## The deployment pipeline in one paragraph Changes to application code are built on your workstation by `docker buildx build --platform linux/amd64 --push`, which cross-compiles from arm64 (Apple Silicon) to amd64 (Hetzner nodes) and pushes directly to `gitea.treytartt.com`. Manifests live in `deploy-k3s/manifests/`; they reference image tags by git short SHA. `kubectl apply -f` rolls the new image in with `maxUnavailable: 0, maxSurge: 1` — one new pod at a time, old one stays up until new is healthy. Service discovery by Kubernetes DNS means `api` and `admin` hostnames always resolve to live backing pods; traffic shifts the moment a new pod passes its readiness probe. [Chapter 14](./14-deployment-process.md) walks through a complete deploy. ## What we *used* to have (the short version) Up until 2026-04-24 this stack ran on **Docker Swarm** on the same three Hetzner boxes. It worked, but the Docker libnetwork service-discovery layer has a bug in the 29.x line ([moby/moby#52265][moby-52265]) that leaves stale DNS A-records behind when tasks migrate between nodes. We hit it: the admin panel returned 502s for ~50% of requests through Cloudflare because Caddy (our previous reverse proxy) was dialing a ghost IP that had since been recycled to the Dozzle log viewer. We spent four hours trying increasingly clever workarounds (dnsrr vs VIP, `dynamic a` DNS refresh, global mode, host-mode ports, host.docker.internal, hardcoded node IPs) before concluding that libnetwork state corruption survives every non-nuclear fix. The full autopsy is in [Chapter 19 — Swarm Postmortem](./19-postmortem-swarm.md). K3s uses CoreDNS and has no libnetwork history; the bug class doesn't exist there. [moby-52265]: https://github.com/moby/moby/issues/52265