Migrate prod deploy from Swarm to K3s; add full deployment book

Infrastructure: - Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers) - Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh - All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept temporarily for reference Bug fixes surfaced during migration: - Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25) - cache_service.go: remove sync.Once reassignment from inside Do() callback (was causing 'unlock of unlocked mutex' fatal after Redis Ping failure) - router.go: relax CSP from 'default-src none' to 'default-src self' + allowlist fonts.googleapis.com so the marketing landing page CSS actually loads in browsers - deploy/scripts/deploy_prod.sh: use docker buildx with --platform linux/amd64 so arm64 (Apple Silicon) dev machines produce images runnable on x86_64 Hetzner nodes; fix array expansion under set -u - deploy/swarm-stack.prod.yml: fix secret source references to use top-level aliases (the '\${X_SECRET}' form never actually resolved); dozzle ports: long-form host_ip is rejected by Swarm, switched to short-form (bound to 0.0.0.0 with UFW-based loopback restriction); worker replicas 2 -> 1 (Asynq scheduler singleton) - deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/' (Next.js serves at root; /admin/ returned 404 and killed pods); startupProbe failureThreshold 12 -> 24 - deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable 1 -> 0 (singleton) - deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold 12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot; real startup takes up to 240s) - .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/ and admin/src/app/api/*, hiding legitimate files) New files: - deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet + hostNetwork override for k3s-bundled Traefik - deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress without TLS (CF Flexible SSL) and without middleware - deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log Documentation: - docs/deployment/ — full deployment book, 26 files, ~42k words: - Part I Overview, infrastructure, orchestrator choice (Ch 0-2) - Part II Networking, firewall, Cloudflare (Ch 3-4, 13) - Part III Security, Traefik ingress (Ch 5-6) - Part IV Services, DB, storage, secrets, registry (Ch 7-11) - Part V Data flow, deploy process, observability, failures, runbook (Ch 12, 14-17) - Part VI Cost, Swarm postmortem, roadmap (Ch 18-20) - Appendices: glossary, kubectl cheat sheet, file locations, consolidated citations - README.md: Production Deployment section replaced with pointer to the book; Go version bumped to 1.25 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 07:20:21 -05:00
parent 4ec4bbbfe8
commit 6f303dbbaa
46 changed files with 9785 additions and 93 deletions
@@ -0,0 +1,419 @@
+# 06 — Traefik Ingress
+
+## Summary
+
+Traefik is the reverse proxy that routes external HTTP requests to the
+right application pod based on the `Host:` header. We run Traefik v3 as a
+Kubernetes DaemonSet with `hostNetwork: true` — each of the three nodes
+has its own Traefik pod listening directly on the node's `:80`/`:443`.
+Cloudflare round-robins DNS across the three node IPs, so any node can
+serve any request. No external load balancer.
+
+## Why Traefik
+
+K3s bundles Traefik by default. The alternatives:
+
+| Option | Pros | Cons |
+|---|---|---|
+| **Traefik v3 (bundled)** | Zero install, excellent k8s integration, middleware system, active development | Helm-driven config is indirect |
+| NGINX Ingress | Most popular, battle-tested | Another thing to install, more config surface |
+| HAProxy Ingress | Extremely performant | More hands-on, older docs |
+| Caddy | Simple config, auto-HTTPS | `caddy-docker-proxy` / Ingress integration is less mature |
+| Envoy / Istio | Most featureful | Massive overkill at our scale |
+
+Traefik came "free" with K3s, does the job, and its
+[Swarm provider][traefik-swarm] is what we would have used if we'd
+fixed our Swarm architecture. Using it on k3s keeps the mental model
+consistent.
+
+## Deployment model
+
+```mermaid
+flowchart TB
+    subgraph CF[Cloudflare edge]
+        DNS[DNS A records:<br/>api.myhoneydue.com → 3 node IPs<br/>admin.myhoneydue.com → 3 node IPs]
+    end
+
+    subgraph N1[hetzner1]
+        T1[Traefik pod<br/>hostNetwork:true<br/>:80/:443]
+        kernel1[Linux kernel<br/>net.ipv4.ip_unprivileged_port_start=0]
+    end
+    subgraph N2[hetzner2]
+        T2[Traefik pod<br/>hostNetwork:true<br/>:80/:443]
+        kernel2[Linux kernel]
+    end
+    subgraph N3[hetzner3]
+        T3[Traefik pod<br/>hostNetwork:true<br/>:80/:443]
+        kernel3[Linux kernel]
+    end
+
+    subgraph Cluster[k3s cluster services]
+        APISvc[api Service :8000]
+        AdminSvc[admin Service :3000]
+    end
+
+    DNS -. HTTP :80 .-> T1 & T2 & T3
+    T1 & T2 & T3 -- reverse_proxy --> APISvc & AdminSvc
+```
+
+### ASCII fallback
+
+```
+                  Cloudflare DNS
+                  ┌───────────────────┐
+                  │  api  → 3 IPs     │
+                  │  admin→ 3 IPs     │
+                  └─────────┬─────────┘
+                            │ HTTP :80
+        ┌───────────────────┼───────────────────┐
+        ▼                   ▼                   ▼
+  ┌──────────┐        ┌──────────┐        ┌──────────┐
+  │ hetzner1 │        │ hetzner2 │        │ hetzner3 │
+  │ Traefik  │        │ Traefik  │        │ Traefik  │
+  │  :80/443 │        │  :80/443 │        │  :80/443 │
+  │(hostNet) │        │(hostNet) │        │(hostNet) │
+  └────┬─────┘        └────┬─────┘        └────┬─────┘
+       │                   │                   │
+       └── ClusterIP ──────┼── ClusterIP ──────┘
+                           ▼
+              ┌────────────────────────┐
+              │ api Service   :8000    │
+              │ admin Service :3000    │
+              └────────────────────────┘
+```
+
+## Why DaemonSet + hostNetwork
+
+**What we're trying to achieve**: Any public-facing node should answer
+:80/:443. Cloudflare round-robins DNS; whichever node it picks, that
+node must serve.
+
+**The default k3s Traefik deployment** is a single-replica Deployment
+exposed via a LoadBalancer Service. That requires either:
+- Hetzner Load Balancer (+ $8.49/mo, another thing to manage), **or**
+- K3s' built-in `servicelb` (klipper-lb) which binds node ports
+  dynamically to proxy to the Service
+
+Neither was quite what we wanted. With three replicas of the stock Traefik
+behind klipper-lb, each Traefik pod is reachable but there's an extra hop
+through klipper's proxy daemon.
+
+**DaemonSet + hostNetwork** is cleaner: each Traefik pod *is* the host's
+:80/:443. No proxy daemon, no LB Service, no VIP. Cloudflare DNS →
+node IP → kernel → Traefik, one hop.
+
+### Trade-offs of hostNetwork
+
+**Pro:**
+- One fewer layer of indirection; lower latency
+- No Service needed; no kube-proxy in the ingress path
+- Standard Cloudflare round-robin DNS is the failover mechanism
+
+**Con:**
+- Traefik is in the host netns; it sees the node's interfaces, not
+  the cluster overlay
+- Traefik still joins the cluster-DNS resolution (via `hostNetwork`'s
+  default DNS policy) so it can resolve Service names like `api`
+- Port conflicts possible if anything else wants :80/:443 on the node
+  (nothing else does in our setup)
+
+### Trade-offs of DaemonSet
+
+**Pro:**
+- One Traefik per node; matches our Cloudflare 3-IP round-robin
+  exactly
+- Any node down = Cloudflare's origin health checks route around it
+
+**Con:**
+- Updates require `maxUnavailable > 0` (host ports conflict during
+  surge) → brief moment where one node is down during rollout
+- 3× the memory usage vs. 1-replica Deployment (but Traefik is tiny
+  — ~128 MB total across all three)
+
+## Our Traefik configuration
+
+We reconfigure the bundled K3s Traefik via a `HelmChartConfig`. K3s
+uses the `helm-controller` to manage bundled addons; `HelmChartConfig`
+lets us override values without disabling-and-replacing the chart.
+
+Full config at
+`deploy-k3s/manifests/traefik-helmchartconfig.yaml`. Key settings:
+
+```yaml
+apiVersion: helm.cattle.io/v1
+kind: HelmChartConfig
+metadata:
+  name: traefik
+  namespace: kube-system
+spec:
+  valuesContent: |-
+    deployment:
+      kind: DaemonSet      # was Deployment
+    hostNetwork: true
+    service:
+      enabled: false        # no LoadBalancer Service
+    ports:
+      web:
+        port: 80
+        hostPort: 80
+      websecure:
+        port: 443
+        hostPort: 443
+    updateStrategy:
+      type: RollingUpdate
+      rollingUpdate:
+        maxUnavailable: 1
+        maxSurge: 0
+    securityContext:
+      capabilities:
+        drop: [ALL]
+        add: [NET_BIND_SERVICE]
+      readOnlyRootFilesystem: true
+      runAsGroup: 65532
+      runAsNonRoot: true
+      runAsUser: 65532
+    additionalArguments:
+      - "--entrypoints.web.forwardedHeaders.trustedIPs=<CF ranges>"
+```
+
+### Why each setting
+
+- **`kind: DaemonSet`** — one Traefik per node. Default is a Deployment
+  with 1 replica.
+- **`hostNetwork: true`** — Traefik runs in the host's network namespace
+  so it can bind real :80/:443 on the node.
+- **`service.enabled: false`** — no LoadBalancer Service is created.
+  With `hostNetwork`, we don't need one.
+- **`ports.*.hostPort`** — explicit host port binding. Matches the
+  container port (DaemonSet semantics with `hostPort: 80` ensure the
+  kubelet schedules at most one Traefik per node).
+- **`updateStrategy.maxUnavailable: 1, maxSurge: 0`** — we accept one
+  node being down during a Traefik update (host port can't be shared).
+  The Traefik Helm chart rejects this config combination with
+  `maxSurge > 0` — this was the second config iteration.
+- **Security context** — non-root (UID 65532), read-only root filesystem,
+  only `NET_BIND_SERVICE` capability. See Chapter 5.
+- **`forwardedHeaders.trustedIPs`** — Cloudflare's IP ranges. Traefik
+  trusts `X-Forwarded-Proto` et al. only from these ranges, so a
+  bypassing client can't spoof the proto header.
+
+### Forwarded-headers trustedIPs
+
+The full list of trusted CF ranges is in our `additionalArguments`. It's
+the union of CF's published IPv4 and IPv6 ranges. When Cloudflare passes
+a request to origin, it adds `X-Forwarded-For` and `X-Forwarded-Proto`
+headers; Traefik only honors these if the request came from one of these
+IPs. Every other client's headers are ignored.
+
+If CF publishes new IP ranges (rare but possible), the
+`trustedIPs` list needs updating. It's a raw string in our
+HelmChartConfig — we'd need to edit, apply, and bump the helm job.
+
+## Traefik v3 vs v2
+
+K3s ships Traefik v3 (currently `3.6.10`). The v2 → v3 migration
+changed a few things:
+- `swarmMode` removed (replaced by a `swarm` provider, but we don't
+  use Swarm anyway)
+- Encoded-character handling changed (v3 warns about RFC 3986 handling;
+  we ignore the warning)
+- Middleware CRD group is `traefik.io/v1alpha1` (was `containo.us`)
+
+Our deployment handles all of this automatically via the bundled
+chart.
+
+## Ingress resources
+
+We define two standard k8s `Ingress` resources in
+`deploy-k3s/manifests/ingress/ingress-simple.yaml`:
+
+```yaml
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: honeydue-api
+  namespace: honeydue
+spec:
+  ingressClassName: traefik
+  rules:
+    - host: api.myhoneydue.com
+      http:
+        paths:
+          - path: /
+            pathType: Prefix
+            backend:
+              service: {name: api, port: {number: 8000}}
+    - host: myhoneydue.com
+      http:
+        paths:
+          - path: /
+            pathType: Prefix
+            backend:
+              service: {name: api, port: {number: 8000}}
+---
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: honeydue-admin
+  namespace: honeydue
+spec:
+  ingressClassName: traefik
+  rules:
+    - host: admin.myhoneydue.com
+      http:
+        paths:
+          - path: /
+            pathType: Prefix
+            backend:
+              service: {name: admin, port: {number: 3000}}
+```
+
+Traefik watches for Ingress resources with `ingressClassName: traefik`
+and programs its router table accordingly. Changes are applied within
+seconds — no restart needed.
+
+### What pathType: Prefix means
+
+Every request starting with `/` matches (which is everything). Alternative
+is `Exact` (matches only the literal path). `Prefix` is the default for
+most Ingress controllers and matches how users think about URL routing.
+
+## How requests flow
+
+1. **Cloudflare DNS** resolves `api.myhoneydue.com` to one of three IPs
+   (round-robin). Say it picks `178.105.32.198` (hetzner2).
+2. **Cloudflare edge** establishes TCP to `178.105.32.198:80` (plain HTTP,
+   SSL=Flexible). Original HTTPS terminated at CF.
+3. **UFW on hetzner2** accepts the SYN (80/tcp open from anywhere).
+4. **Linux kernel** sees a listener on 0.0.0.0:80 (the Traefik pod).
+   Hands off the SYN.
+5. **Traefik accepts** the connection. Reads the HTTP request.
+6. **Traefik matches** the `Host:` header against its router table.
+   `Host: api.myhoneydue.com` → `honeydue-api` Ingress → `api` Service.
+7. **Traefik dials** `10.43.167.83:8000` (api Service ClusterIP). This
+   goes through the cluster DNS (CoreDNS) and kube-proxy (IPVS).
+8. **kube-proxy IPVS** rewrites the destination to a live api pod endpoint
+   — say `10.42.2.6:8000` (api pod on hetzner3).
+9. **Flannel VXLAN** encapsulates the packet and sends to hetzner3
+   (UDP :8472 between node IPs).
+10. **hetzner3's kernel** decapsulates, delivers to the api pod.
+11. **api pod** processes, returns response.
+12. **Response flows back** the reverse path.
+
+Cloudflare caches 200 responses at the edge (default TTL varies; for
+HTML/JSON usually 0 unless we set `Cache-Control` headers). So the
+second request for the same URL might not reach the origin at all.
+
+## Middleware (mostly unused)
+
+Traefik supports middleware — small functions run before/after the proxy.
+The `deploy-k3s/manifests/ingress/middleware.yaml` scaffold defines:
+
+- **`rate-limit`** — 100 req/min average, 200 burst
+- **`security-headers`** — HSTS, X-Frame-Options, CSP, etc.
+- **`cloudflare-only`** — IP allowlist restricting origin to CF ranges
+- **`admin-auth`** — HTTP basic auth for admin panel
+
+**None of these are currently attached to our Ingresses.** To enable,
+add the `traefik.ingress.kubernetes.io/router.middlewares` annotation to
+the Ingress:
+
+```yaml
+metadata:
+  annotations:
+    traefik.ingress.kubernetes.io/router.middlewares: honeydue-security-headers@kubernetescrd,honeydue-rate-limit@kubernetescrd
+```
+
+We left them off to minimize surface area for the first week of the new
+cluster. Enabling is TODO in Chapter 20.
+
+## Traefik dashboard
+
+Disabled. The Traefik dashboard (`/dashboard/` and `/api/`) exposes
+runtime state and is potentially information leaky. The bundled k3s
+Traefik disables it by default, and we haven't re-enabled it.
+
+If needed for debugging:
+
+```bash
+# Port-forward to a Traefik pod
+kubectl port-forward -n kube-system daemonset/traefik 9000:9000
+# (the chart exposes the dashboard on :9000 when enabled)
+# Then visit http://localhost:9000/dashboard/
+```
+
+This requires kubectl access and isn't exposed publicly.
+
+## Version pinning
+
+We take whatever Traefik version is bundled with K3s (currently 3.6.10).
+The bundled chart is pinned to a specific version in K3s' release notes;
+when we upgrade K3s the Traefik version can change. If that ever breaks
+something, we can pin a specific version via the HelmChartConfig's
+`version` field:
+
+```yaml
+spec:
+  version: 39.0.501+up39.0.5  # specific chart version
+```
+
+## Limitations we accept
+
+- **No sticky sessions.** Every request to `api.myhoneydue.com` can go
+  to a different pod. Our Go API is stateless — this is fine.
+- **No canary deployments** (yet). Traefik supports weighted routing
+  via its CRDs (`TraefikService`) but we don't use them. TODO if/when
+  we do gradual rollouts.
+- **No mTLS.** Traefik supports mutual TLS client auth for sensitive
+  endpoints. We don't use it.
+- **Single ingress class.** Everything goes through the same Traefik.
+  For multi-tenant setups we'd want separate ingress classes with
+  separate policies.
+
+## Troubleshooting
+
+| Symptom | Likely cause | Fix |
+|---|---|---|
+| 404 from Traefik | Ingress doesn't match `Host:` | Check Ingress host field, DNS |
+| 502 from Traefik | Backend Service has no endpoints | `kubectl get endpoints -n honeydue` |
+| 503 from Traefik | Circuit breaker / backend unhealthy | Check pod logs, readiness probe |
+| 504 from Traefik | Backend slow | Check pod CPU/memory, DB connections |
+| Connection refused at 80 | Traefik pod not running or kernel not listening | `kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik`; `ssh deploy@node 'ss -lntp | grep :80'` |
+| Mixed content error in browser | `X-Forwarded-Proto` not honored by app | Check `trustedIPs` includes CF; check app reads the header |
+
+## Operator cheat sheet
+
+```bash
+# Traefik pods per node
+kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik -o wide
+
+# Traefik logs (all pods)
+kubectl logs -n kube-system -l app.kubernetes.io/name=traefik --tail=50 --prefix
+
+# Ingress status
+kubectl get ingress -n honeydue
+
+# List all routers Traefik sees (requires dashboard or API)
+kubectl exec -n kube-system daemonset/traefik -- traefik healthcheck
+
+# Re-apply config
+kubectl apply -f deploy-k3s/manifests/traefik-helmchartconfig.yaml
+kubectl delete job -n kube-system helm-install-traefik  # triggers reinstall
+
+# Restart all Traefik pods
+kubectl rollout restart daemonset/traefik -n kube-system
+```
+
+## References
+
+- [Traefik v3 docs][traefik]
+- [Traefik Swarm provider][traefik-swarm]
+- [K3s Traefik customization][k3s-traefik]
+- [HelmChartConfig docs][k3s-helm]
+- [Cloudflare IP ranges][cf-ips]
+
+[traefik]: https://doc.traefik.io/traefik/v3.6/
+[traefik-swarm]: https://doc.traefik.io/traefik/providers/swarm/
+[k3s-traefik]: https://docs.k3s.io/networking/networking-services#traefik-ingress-controller
+[k3s-helm]: https://docs.k3s.io/helm#customizing-packaged-components-with-helmchartconfig
+[cf-ips]: https://www.cloudflare.com/ips/