Migrate prod deploy from Swarm to K3s; add full deployment book

Infrastructure: - Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers) - Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh - All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept temporarily for reference Bug fixes surfaced during migration: - Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25) - cache_service.go: remove sync.Once reassignment from inside Do() callback (was causing 'unlock of unlocked mutex' fatal after Redis Ping failure) - router.go: relax CSP from 'default-src none' to 'default-src self' + allowlist fonts.googleapis.com so the marketing landing page CSS actually loads in browsers - deploy/scripts/deploy_prod.sh: use docker buildx with --platform linux/amd64 so arm64 (Apple Silicon) dev machines produce images runnable on x86_64 Hetzner nodes; fix array expansion under set -u - deploy/swarm-stack.prod.yml: fix secret source references to use top-level aliases (the '\${X_SECRET}' form never actually resolved); dozzle ports: long-form host_ip is rejected by Swarm, switched to short-form (bound to 0.0.0.0 with UFW-based loopback restriction); worker replicas 2 -> 1 (Asynq scheduler singleton) - deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/' (Next.js serves at root; /admin/ returned 404 and killed pods); startupProbe failureThreshold 12 -> 24 - deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable 1 -> 0 (singleton) - deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold 12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot; real startup takes up to 240s) - .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/ and admin/src/app/api/*, hiding legitimate files) New files: - deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet + hostNetwork override for k3s-bundled Traefik - deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress without TLS (CF Flexible SSL) and without middleware - deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log Documentation: - docs/deployment/ — full deployment book, 26 files, ~42k words: - Part I Overview, infrastructure, orchestrator choice (Ch 0-2) - Part II Networking, firewall, Cloudflare (Ch 3-4, 13) - Part III Security, Traefik ingress (Ch 5-6) - Part IV Services, DB, storage, secrets, registry (Ch 7-11) - Part V Data flow, deploy process, observability, failures, runbook (Ch 12, 14-17) - Part VI Cost, Swarm postmortem, roadmap (Ch 18-20) - Appendices: glossary, kubectl cheat sheet, file locations, consolidated citations - README.md: Production Deployment section replaced with pointer to the book; Go version bumped to 1.25 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 07:20:21 -05:00
parent 4ec4bbbfe8
commit 6f303dbbaa
46 changed files with 9785 additions and 93 deletions
@@ -0,0 +1,240 @@
+# 00 — Overview
+
+## Summary
+
+honeyDue runs on a three-node Kubernetes cluster managed by K3s, fronted by
+Cloudflare, and backed by a managed Postgres (Neon), S3-compatible object
+storage (Backblaze B2), and a self-hosted container registry (Gitea). The
+application consists of a Go REST API, a Next.js admin panel, and a
+background worker process using Redis-backed queues. Traefik handles HTTP
+ingress and path-based routing. The whole stack fits in about 1 GB of RAM
+across the three nodes with plenty of headroom.
+
+This chapter is the map. Everything here is expanded in a later chapter.
+
+## Architecture at a glance
+
+```mermaid
+flowchart TB
+    subgraph Internet
+        Browser[End-user browser / mobile client]
+    end
+
+    subgraph CF[Cloudflare]
+        CFEdge[Edge POP<br/>TLS terminates here]
+    end
+
+    Browser -- HTTPS :443 --> CFEdge
+
+    subgraph Hetzner[Hetzner Cloud — Nuremberg nbg1]
+        direction LR
+        subgraph H1[hetzner1<br/>178.104.247.152]
+            T1[Traefik<br/>:80/:443 hostNet]
+            A1[api pod]
+            W1[worker pod]
+        end
+        subgraph H2[hetzner2<br/>178.105.32.198]
+            T2[Traefik<br/>:80/:443 hostNet]
+            A2[api pod]
+            R1[redis pod<br/>PVC]
+        end
+        subgraph H3[hetzner3<br/>178.104.249.189]
+            T3[Traefik<br/>:80/:443 hostNet]
+            A3[api pod]
+            AD1[admin pod]
+        end
+    end
+
+    CFEdge -- HTTP :80<br/>DNS round-robin --> T1
+    CFEdge -- HTTP :80 --> T2
+    CFEdge -- HTTP :80 --> T3
+
+    T1 & T2 & T3 -.Ingress routes by<br/>Host header.-> A1
+    T1 & T2 & T3 -.-> AD1
+    A1 & A2 & A3 -.-> R1
+
+    subgraph External[Managed services]
+        Neon[(Neon Postgres<br/>AWS us-east-1)]
+        B2[(Backblaze B2<br/>us-east-005)]
+        FM[Fastmail SMTP]
+        Gitea[Gitea Registry<br/>gitea.treytartt.com]
+    end
+
+    A1 & A2 & A3 -- SSL --> Neon
+    W1 -- SSL --> Neon
+    A1 & A2 & A3 -- HTTPS --> B2
+    W1 -- SMTP :587 --> FM
+    H1 & H2 & H3 -. image pull .-> Gitea
+```
+
+### ASCII fallback
+
+```
+                         ┌─────────────────────┐
+                         │     End user        │
+                         └──────────┬──────────┘
+                                    │ HTTPS :443
+                                    ▼
+                         ┌─────────────────────┐
+                         │  Cloudflare edge    │ TLS terminates here
+                         │  (SSL = Flexible)   │
+                         └──────────┬──────────┘
+                       HTTP :80 round-robin
+                  ┌─────────────┼─────────────┐
+                  ▼             ▼             ▼
+     ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
+     │   hetzner1      │ │   hetzner2      │ │   hetzner3      │
+     │ 178.104.247.152 │ │ 178.105.32.198  │ │ 178.104.249.189 │
+     │ Traefik :80/443 │ │ Traefik :80/443 │ │ Traefik :80/443 │
+     │  api   worker   │ │  api   redis    │ │  api   admin    │
+     └─────────┬───────┘ └─────────┬───────┘ └─────────┬───────┘
+               │                   │                   │
+               └──────── Kubernetes overlay ───────────┘
+                                   │
+     ┌─────────────────────────────┴──────────────────────────────┐
+     │                                                             │
+     ▼                      ▼                    ▼                 ▼
+┌─────────┐          ┌─────────────┐     ┌──────────┐    ┌───────────────┐
+│  Neon   │          │ Backblaze B2│     │ Fastmail │    │ Gitea Registry│
+│Postgres │          │   uploads   │     │   SMTP   │    │   image pull  │
+└─────────┘          └─────────────┘     └──────────┘    └───────────────┘
+```
+
+## The stack, one layer at a time
+
+### Layer 0 — Hardware
+
+Three Hetzner Cloud CX33 instances (4 vCPU, 8 GB RAM, 80 GB NVMe SSD) in
+Hetzner's Nuremberg (nbg1) datacenter. Each node is $7.99/mo (April 2026
+pricing), totaling ~$24/mo. See [Chapter 1](./01-infrastructure.md).
+
+### Layer 1 — Operating system
+
+Ubuntu 24.04.3 LTS. Each node has:
+- SSH on port 22, key-only auth, `deploy` user with NOPASSWD sudo
+- `ufw` firewall with strict default-deny-incoming; specific ports allowed
+  per Chapter 4
+- Sysctl override `net.ipv4.ip_unprivileged_port_start=0` so non-root
+  containers can bind privileged ports (needed for Traefik to serve :80/:443)
+
+### Layer 2 — Container runtime
+
+`containerd` v2.2.2 (bundled with K3s). Docker was previously installed from
+the Swarm era but is now disabled. containerd is Kubernetes' reference
+runtime and has a smaller footprint than Docker's full stack.
+
+### Layer 3 — Orchestrator
+
+K3s v1.34.6 in HA mode. All 3 nodes are `control-plane,etcd` (Raft quorum
+of 3 — can tolerate one node failure). K3s is a minimal Kubernetes
+distribution from Rancher Labs (now Suse): single-binary, embedded etcd
+instead of a separate etcd cluster, sane defaults for small installations.
+See [Chapter 2](./02-orchestrator-choice.md) for why k3s over full Kubernetes
+or Docker Swarm.
+
+### Layer 4 — Cluster networking
+
+- **Flannel VXLAN** for pod-to-pod overlay (default on K3s). VXLAN tunnels
+  pod traffic over UDP port 8472 between nodes.
+- **CoreDNS** for service discovery (what pods call `api` or `redis` to
+  reach each other).
+- **kube-proxy** in IPVS mode for ClusterIP → pod routing.
+
+[Chapter 3](./03-networking.md) walks through a single request to show
+every hop.
+
+### Layer 5 — Ingress
+
+**Traefik v3** as a DaemonSet with `hostNetwork: true`. Each node has a
+Traefik pod that binds directly to the node's public :80 and :443. No
+`servicelb`, no Hetzner Load Balancer — Cloudflare round-robins the three
+node IPs in DNS and any node can serve any request. See
+[Chapter 6](./06-traefik-ingress.md).
+
+### Layer 6 — Edge / CDN
+
+Cloudflare Free plan. Proxied A records for `api.myhoneydue.com`,
+`admin.myhoneydue.com`, and `myhoneydue.com` each point at all three node
+IPs. Edge handles TLS termination (SSL=Flexible), DDoS protection, caching
+for static assets, and traffic failover if a node becomes unreachable.
+See [Chapter 13](./13-cloudflare.md).
+
+### Layer 7 — Application services
+
+| Service | Type | Replicas | Image |
+|---|---|---|---|
+| `api` | Go (Echo, GORM) | 3 | `gitea.treytartt.com/admin/honeydue-api:<sha>` |
+| `admin` | Next.js 16 | 1 | `gitea.treytartt.com/admin/honeydue-admin:<sha>` |
+| `worker` | Go (Asynq) | 1 | `gitea.treytartt.com/admin/honeydue-worker:<sha>` |
+| `redis` | redis:7-alpine | 1 | Docker Hub |
+
+See [Chapter 7](./07-services.md).
+
+### Layer 8 — External dependencies
+
+- **Neon Postgres** (Launch plan) — `honeyDue` database
+- **Backblaze B2** — `honeyDueProd` bucket for user uploads
+- **Fastmail SMTP** — transactional email
+- **Gitea** (self-hosted at `gitea.treytartt.com`) — container registry
+- **Cloudflare** — DNS, TLS, CDN
+
+See [Chapter 8](./08-database.md), [9](./09-storage.md), and
+[11](./11-registry.md).
+
+## What's deliberately absent
+
+- **TLS at origin.** Cloudflare terminates TLS at the edge and talks HTTP
+  on port 80 to the nodes. This is "Flexible SSL" in Cloudflare terminology.
+  It's the simplest setup; we have a TODO to upgrade to "Full (strict)" with
+  Cloudflare Origin CA certs ([Chapter 13](./13-cloudflare.md), §Future).
+- **Hetzner Load Balancer.** We save the $8.49/mo by having Cloudflare
+  round-robin across node IPs directly. If any node is unresponsive,
+  Cloudflare's own origin health checks will route around it within 30s.
+- **Push notifications.** APNs (iOS) and FCM (Android) are *configured off*
+  until we have Apple Developer / Google Play accounts. The env vars are
+  set to sentinel values that let the Go app boot; `FEATURE_PUSH_ENABLED=false`
+  gates all call sites.
+- **External metrics/monitoring (Prometheus, Grafana, Betterstack).**
+  Right now we rely on `kubectl logs`, `kubectl top`, and Cloudflare's own
+  analytics. See [Chapter 15](./15-observability.md) for what's there and
+  what we'd add.
+- **Automated backups of Redis state.** Redis is configured with AOF
+  (append-only file) persistence, but the PVC is only on one node. Redis
+  holds only cache + Asynq queue state; losing it re-populates on first
+  request / next cron tick. Not critical.
+- **Admin panel basic auth (Traefik middleware).** In-app admin login is
+  enabled; the extra Traefik-layer basic auth the scaffold supports is not
+  currently attached.
+
+## The deployment pipeline in one paragraph
+
+Changes to application code are built on your workstation by
+`docker buildx build --platform linux/amd64 --push`, which cross-compiles
+from arm64 (Apple Silicon) to amd64 (Hetzner nodes) and pushes directly to
+`gitea.treytartt.com`. Manifests live in `deploy-k3s/manifests/`; they
+reference image tags by git short SHA. `kubectl apply -f` rolls the new
+image in with `maxUnavailable: 0, maxSurge: 1` — one new pod at a time,
+old one stays up until new is healthy. Service discovery by Kubernetes
+DNS means `api` and `admin` hostnames always resolve to live backing pods;
+traffic shifts the moment a new pod passes its readiness probe.
+[Chapter 14](./14-deployment-process.md) walks through a complete deploy.
+
+## What we *used* to have (the short version)
+
+Up until 2026-04-24 this stack ran on **Docker Swarm** on the same three
+Hetzner boxes. It worked, but the Docker libnetwork service-discovery
+layer has a bug in the 29.x line ([moby/moby#52265][moby-52265]) that
+leaves stale DNS A-records behind when tasks migrate between nodes. We
+hit it: the admin panel returned 502s for ~50% of requests through
+Cloudflare because Caddy (our previous reverse proxy) was dialing a ghost
+IP that had since been recycled to the Dozzle log viewer. We spent four
+hours trying increasingly clever workarounds (dnsrr vs VIP,
+`dynamic a` DNS refresh, global mode, host-mode ports, host.docker.internal,
+hardcoded node IPs) before concluding that libnetwork state corruption
+survives every non-nuclear fix.
+
+The full autopsy is in [Chapter 19 — Swarm Postmortem](./19-postmortem-swarm.md).
+K3s uses CoreDNS and has no libnetwork history; the bug class doesn't
+exist there.
+
+[moby-52265]: https://github.com/moby/moby/issues/52265