Migrate prod deploy from Swarm to K3s; add full deployment book
Infrastructure:
- Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers)
- Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh
- All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept
temporarily for reference
Bug fixes surfaced during migration:
- Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25)
- cache_service.go: remove sync.Once reassignment from inside Do()
callback (was causing 'unlock of unlocked mutex' fatal after
Redis Ping failure)
- router.go: relax CSP from 'default-src none' to 'default-src self'
+ allowlist fonts.googleapis.com so the marketing landing page CSS
actually loads in browsers
- deploy/scripts/deploy_prod.sh: use docker buildx with
--platform linux/amd64 so arm64 (Apple Silicon) dev machines produce
images runnable on x86_64 Hetzner nodes; fix array expansion under
set -u
- deploy/swarm-stack.prod.yml: fix secret source references to use
top-level aliases (the '\${X_SECRET}' form never actually resolved);
dozzle ports: long-form host_ip is rejected by Swarm, switched to
short-form (bound to 0.0.0.0 with UFW-based loopback restriction);
worker replicas 2 -> 1 (Asynq scheduler singleton)
- deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/'
(Next.js serves at root; /admin/ returned 404 and killed pods);
startupProbe failureThreshold 12 -> 24
- deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable
1 -> 0 (singleton)
- deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold
12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot;
real startup takes up to 240s)
- .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/
and admin/src/app/api/*, hiding legitimate files)
New files:
- deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet +
hostNetwork override for k3s-bundled Traefik
- deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress
without TLS (CF Flexible SSL) and without middleware
- deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log
Documentation:
- docs/deployment/ — full deployment book, 26 files, ~42k words:
- Part I Overview, infrastructure, orchestrator choice (Ch 0-2)
- Part II Networking, firewall, Cloudflare (Ch 3-4, 13)
- Part III Security, Traefik ingress (Ch 5-6)
- Part IV Services, DB, storage, secrets, registry (Ch 7-11)
- Part V Data flow, deploy process, observability, failures, runbook
(Ch 12, 14-17)
- Part VI Cost, Swarm postmortem, roadmap (Ch 18-20)
- Appendices: glossary, kubectl cheat sheet, file locations,
consolidated citations
- README.md: Production Deployment section replaced with pointer to
the book; Go version bumped to 1.25
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,419 @@
|
||||
# 06 — Traefik Ingress
|
||||
|
||||
## Summary
|
||||
|
||||
Traefik is the reverse proxy that routes external HTTP requests to the
|
||||
right application pod based on the `Host:` header. We run Traefik v3 as a
|
||||
Kubernetes DaemonSet with `hostNetwork: true` — each of the three nodes
|
||||
has its own Traefik pod listening directly on the node's `:80`/`:443`.
|
||||
Cloudflare round-robins DNS across the three node IPs, so any node can
|
||||
serve any request. No external load balancer.
|
||||
|
||||
## Why Traefik
|
||||
|
||||
K3s bundles Traefik by default. The alternatives:
|
||||
|
||||
| Option | Pros | Cons |
|
||||
|---|---|---|
|
||||
| **Traefik v3 (bundled)** | Zero install, excellent k8s integration, middleware system, active development | Helm-driven config is indirect |
|
||||
| NGINX Ingress | Most popular, battle-tested | Another thing to install, more config surface |
|
||||
| HAProxy Ingress | Extremely performant | More hands-on, older docs |
|
||||
| Caddy | Simple config, auto-HTTPS | `caddy-docker-proxy` / Ingress integration is less mature |
|
||||
| Envoy / Istio | Most featureful | Massive overkill at our scale |
|
||||
|
||||
Traefik came "free" with K3s, does the job, and its
|
||||
[Swarm provider][traefik-swarm] is what we would have used if we'd
|
||||
fixed our Swarm architecture. Using it on k3s keeps the mental model
|
||||
consistent.
|
||||
|
||||
## Deployment model
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph CF[Cloudflare edge]
|
||||
DNS[DNS A records:<br/>api.myhoneydue.com → 3 node IPs<br/>admin.myhoneydue.com → 3 node IPs]
|
||||
end
|
||||
|
||||
subgraph N1[hetzner1]
|
||||
T1[Traefik pod<br/>hostNetwork:true<br/>:80/:443]
|
||||
kernel1[Linux kernel<br/>net.ipv4.ip_unprivileged_port_start=0]
|
||||
end
|
||||
subgraph N2[hetzner2]
|
||||
T2[Traefik pod<br/>hostNetwork:true<br/>:80/:443]
|
||||
kernel2[Linux kernel]
|
||||
end
|
||||
subgraph N3[hetzner3]
|
||||
T3[Traefik pod<br/>hostNetwork:true<br/>:80/:443]
|
||||
kernel3[Linux kernel]
|
||||
end
|
||||
|
||||
subgraph Cluster[k3s cluster services]
|
||||
APISvc[api Service :8000]
|
||||
AdminSvc[admin Service :3000]
|
||||
end
|
||||
|
||||
DNS -. HTTP :80 .-> T1 & T2 & T3
|
||||
T1 & T2 & T3 -- reverse_proxy --> APISvc & AdminSvc
|
||||
```
|
||||
|
||||
### ASCII fallback
|
||||
|
||||
```
|
||||
Cloudflare DNS
|
||||
┌───────────────────┐
|
||||
│ api → 3 IPs │
|
||||
│ admin→ 3 IPs │
|
||||
└─────────┬─────────┘
|
||||
│ HTTP :80
|
||||
┌───────────────────┼───────────────────┐
|
||||
▼ ▼ ▼
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐
|
||||
│ hetzner1 │ │ hetzner2 │ │ hetzner3 │
|
||||
│ Traefik │ │ Traefik │ │ Traefik │
|
||||
│ :80/443 │ │ :80/443 │ │ :80/443 │
|
||||
│(hostNet) │ │(hostNet) │ │(hostNet) │
|
||||
└────┬─────┘ └────┬─────┘ └────┬─────┘
|
||||
│ │ │
|
||||
└── ClusterIP ──────┼── ClusterIP ──────┘
|
||||
▼
|
||||
┌────────────────────────┐
|
||||
│ api Service :8000 │
|
||||
│ admin Service :3000 │
|
||||
└────────────────────────┘
|
||||
```
|
||||
|
||||
## Why DaemonSet + hostNetwork
|
||||
|
||||
**What we're trying to achieve**: Any public-facing node should answer
|
||||
:80/:443. Cloudflare round-robins DNS; whichever node it picks, that
|
||||
node must serve.
|
||||
|
||||
**The default k3s Traefik deployment** is a single-replica Deployment
|
||||
exposed via a LoadBalancer Service. That requires either:
|
||||
- Hetzner Load Balancer (+ $8.49/mo, another thing to manage), **or**
|
||||
- K3s' built-in `servicelb` (klipper-lb) which binds node ports
|
||||
dynamically to proxy to the Service
|
||||
|
||||
Neither was quite what we wanted. With three replicas of the stock Traefik
|
||||
behind klipper-lb, each Traefik pod is reachable but there's an extra hop
|
||||
through klipper's proxy daemon.
|
||||
|
||||
**DaemonSet + hostNetwork** is cleaner: each Traefik pod *is* the host's
|
||||
:80/:443. No proxy daemon, no LB Service, no VIP. Cloudflare DNS →
|
||||
node IP → kernel → Traefik, one hop.
|
||||
|
||||
### Trade-offs of hostNetwork
|
||||
|
||||
**Pro:**
|
||||
- One fewer layer of indirection; lower latency
|
||||
- No Service needed; no kube-proxy in the ingress path
|
||||
- Standard Cloudflare round-robin DNS is the failover mechanism
|
||||
|
||||
**Con:**
|
||||
- Traefik is in the host netns; it sees the node's interfaces, not
|
||||
the cluster overlay
|
||||
- Traefik still joins the cluster-DNS resolution (via `hostNetwork`'s
|
||||
default DNS policy) so it can resolve Service names like `api`
|
||||
- Port conflicts possible if anything else wants :80/:443 on the node
|
||||
(nothing else does in our setup)
|
||||
|
||||
### Trade-offs of DaemonSet
|
||||
|
||||
**Pro:**
|
||||
- One Traefik per node; matches our Cloudflare 3-IP round-robin
|
||||
exactly
|
||||
- Any node down = Cloudflare's origin health checks route around it
|
||||
|
||||
**Con:**
|
||||
- Updates require `maxUnavailable > 0` (host ports conflict during
|
||||
surge) → brief moment where one node is down during rollout
|
||||
- 3× the memory usage vs. 1-replica Deployment (but Traefik is tiny
|
||||
— ~128 MB total across all three)
|
||||
|
||||
## Our Traefik configuration
|
||||
|
||||
We reconfigure the bundled K3s Traefik via a `HelmChartConfig`. K3s
|
||||
uses the `helm-controller` to manage bundled addons; `HelmChartConfig`
|
||||
lets us override values without disabling-and-replacing the chart.
|
||||
|
||||
Full config at
|
||||
`deploy-k3s/manifests/traefik-helmchartconfig.yaml`. Key settings:
|
||||
|
||||
```yaml
|
||||
apiVersion: helm.cattle.io/v1
|
||||
kind: HelmChartConfig
|
||||
metadata:
|
||||
name: traefik
|
||||
namespace: kube-system
|
||||
spec:
|
||||
valuesContent: |-
|
||||
deployment:
|
||||
kind: DaemonSet # was Deployment
|
||||
hostNetwork: true
|
||||
service:
|
||||
enabled: false # no LoadBalancer Service
|
||||
ports:
|
||||
web:
|
||||
port: 80
|
||||
hostPort: 80
|
||||
websecure:
|
||||
port: 443
|
||||
hostPort: 443
|
||||
updateStrategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxUnavailable: 1
|
||||
maxSurge: 0
|
||||
securityContext:
|
||||
capabilities:
|
||||
drop: [ALL]
|
||||
add: [NET_BIND_SERVICE]
|
||||
readOnlyRootFilesystem: true
|
||||
runAsGroup: 65532
|
||||
runAsNonRoot: true
|
||||
runAsUser: 65532
|
||||
additionalArguments:
|
||||
- "--entrypoints.web.forwardedHeaders.trustedIPs=<CF ranges>"
|
||||
```
|
||||
|
||||
### Why each setting
|
||||
|
||||
- **`kind: DaemonSet`** — one Traefik per node. Default is a Deployment
|
||||
with 1 replica.
|
||||
- **`hostNetwork: true`** — Traefik runs in the host's network namespace
|
||||
so it can bind real :80/:443 on the node.
|
||||
- **`service.enabled: false`** — no LoadBalancer Service is created.
|
||||
With `hostNetwork`, we don't need one.
|
||||
- **`ports.*.hostPort`** — explicit host port binding. Matches the
|
||||
container port (DaemonSet semantics with `hostPort: 80` ensure the
|
||||
kubelet schedules at most one Traefik per node).
|
||||
- **`updateStrategy.maxUnavailable: 1, maxSurge: 0`** — we accept one
|
||||
node being down during a Traefik update (host port can't be shared).
|
||||
The Traefik Helm chart rejects this config combination with
|
||||
`maxSurge > 0` — this was the second config iteration.
|
||||
- **Security context** — non-root (UID 65532), read-only root filesystem,
|
||||
only `NET_BIND_SERVICE` capability. See Chapter 5.
|
||||
- **`forwardedHeaders.trustedIPs`** — Cloudflare's IP ranges. Traefik
|
||||
trusts `X-Forwarded-Proto` et al. only from these ranges, so a
|
||||
bypassing client can't spoof the proto header.
|
||||
|
||||
### Forwarded-headers trustedIPs
|
||||
|
||||
The full list of trusted CF ranges is in our `additionalArguments`. It's
|
||||
the union of CF's published IPv4 and IPv6 ranges. When Cloudflare passes
|
||||
a request to origin, it adds `X-Forwarded-For` and `X-Forwarded-Proto`
|
||||
headers; Traefik only honors these if the request came from one of these
|
||||
IPs. Every other client's headers are ignored.
|
||||
|
||||
If CF publishes new IP ranges (rare but possible), the
|
||||
`trustedIPs` list needs updating. It's a raw string in our
|
||||
HelmChartConfig — we'd need to edit, apply, and bump the helm job.
|
||||
|
||||
## Traefik v3 vs v2
|
||||
|
||||
K3s ships Traefik v3 (currently `3.6.10`). The v2 → v3 migration
|
||||
changed a few things:
|
||||
- `swarmMode` removed (replaced by a `swarm` provider, but we don't
|
||||
use Swarm anyway)
|
||||
- Encoded-character handling changed (v3 warns about RFC 3986 handling;
|
||||
we ignore the warning)
|
||||
- Middleware CRD group is `traefik.io/v1alpha1` (was `containo.us`)
|
||||
|
||||
Our deployment handles all of this automatically via the bundled
|
||||
chart.
|
||||
|
||||
## Ingress resources
|
||||
|
||||
We define two standard k8s `Ingress` resources in
|
||||
`deploy-k3s/manifests/ingress/ingress-simple.yaml`:
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: honeydue-api
|
||||
namespace: honeydue
|
||||
spec:
|
||||
ingressClassName: traefik
|
||||
rules:
|
||||
- host: api.myhoneydue.com
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service: {name: api, port: {number: 8000}}
|
||||
- host: myhoneydue.com
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service: {name: api, port: {number: 8000}}
|
||||
---
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: honeydue-admin
|
||||
namespace: honeydue
|
||||
spec:
|
||||
ingressClassName: traefik
|
||||
rules:
|
||||
- host: admin.myhoneydue.com
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service: {name: admin, port: {number: 3000}}
|
||||
```
|
||||
|
||||
Traefik watches for Ingress resources with `ingressClassName: traefik`
|
||||
and programs its router table accordingly. Changes are applied within
|
||||
seconds — no restart needed.
|
||||
|
||||
### What pathType: Prefix means
|
||||
|
||||
Every request starting with `/` matches (which is everything). Alternative
|
||||
is `Exact` (matches only the literal path). `Prefix` is the default for
|
||||
most Ingress controllers and matches how users think about URL routing.
|
||||
|
||||
## How requests flow
|
||||
|
||||
1. **Cloudflare DNS** resolves `api.myhoneydue.com` to one of three IPs
|
||||
(round-robin). Say it picks `178.105.32.198` (hetzner2).
|
||||
2. **Cloudflare edge** establishes TCP to `178.105.32.198:80` (plain HTTP,
|
||||
SSL=Flexible). Original HTTPS terminated at CF.
|
||||
3. **UFW on hetzner2** accepts the SYN (80/tcp open from anywhere).
|
||||
4. **Linux kernel** sees a listener on 0.0.0.0:80 (the Traefik pod).
|
||||
Hands off the SYN.
|
||||
5. **Traefik accepts** the connection. Reads the HTTP request.
|
||||
6. **Traefik matches** the `Host:` header against its router table.
|
||||
`Host: api.myhoneydue.com` → `honeydue-api` Ingress → `api` Service.
|
||||
7. **Traefik dials** `10.43.167.83:8000` (api Service ClusterIP). This
|
||||
goes through the cluster DNS (CoreDNS) and kube-proxy (IPVS).
|
||||
8. **kube-proxy IPVS** rewrites the destination to a live api pod endpoint
|
||||
— say `10.42.2.6:8000` (api pod on hetzner3).
|
||||
9. **Flannel VXLAN** encapsulates the packet and sends to hetzner3
|
||||
(UDP :8472 between node IPs).
|
||||
10. **hetzner3's kernel** decapsulates, delivers to the api pod.
|
||||
11. **api pod** processes, returns response.
|
||||
12. **Response flows back** the reverse path.
|
||||
|
||||
Cloudflare caches 200 responses at the edge (default TTL varies; for
|
||||
HTML/JSON usually 0 unless we set `Cache-Control` headers). So the
|
||||
second request for the same URL might not reach the origin at all.
|
||||
|
||||
## Middleware (mostly unused)
|
||||
|
||||
Traefik supports middleware — small functions run before/after the proxy.
|
||||
The `deploy-k3s/manifests/ingress/middleware.yaml` scaffold defines:
|
||||
|
||||
- **`rate-limit`** — 100 req/min average, 200 burst
|
||||
- **`security-headers`** — HSTS, X-Frame-Options, CSP, etc.
|
||||
- **`cloudflare-only`** — IP allowlist restricting origin to CF ranges
|
||||
- **`admin-auth`** — HTTP basic auth for admin panel
|
||||
|
||||
**None of these are currently attached to our Ingresses.** To enable,
|
||||
add the `traefik.ingress.kubernetes.io/router.middlewares` annotation to
|
||||
the Ingress:
|
||||
|
||||
```yaml
|
||||
metadata:
|
||||
annotations:
|
||||
traefik.ingress.kubernetes.io/router.middlewares: honeydue-security-headers@kubernetescrd,honeydue-rate-limit@kubernetescrd
|
||||
```
|
||||
|
||||
We left them off to minimize surface area for the first week of the new
|
||||
cluster. Enabling is TODO in Chapter 20.
|
||||
|
||||
## Traefik dashboard
|
||||
|
||||
Disabled. The Traefik dashboard (`/dashboard/` and `/api/`) exposes
|
||||
runtime state and is potentially information leaky. The bundled k3s
|
||||
Traefik disables it by default, and we haven't re-enabled it.
|
||||
|
||||
If needed for debugging:
|
||||
|
||||
```bash
|
||||
# Port-forward to a Traefik pod
|
||||
kubectl port-forward -n kube-system daemonset/traefik 9000:9000
|
||||
# (the chart exposes the dashboard on :9000 when enabled)
|
||||
# Then visit http://localhost:9000/dashboard/
|
||||
```
|
||||
|
||||
This requires kubectl access and isn't exposed publicly.
|
||||
|
||||
## Version pinning
|
||||
|
||||
We take whatever Traefik version is bundled with K3s (currently 3.6.10).
|
||||
The bundled chart is pinned to a specific version in K3s' release notes;
|
||||
when we upgrade K3s the Traefik version can change. If that ever breaks
|
||||
something, we can pin a specific version via the HelmChartConfig's
|
||||
`version` field:
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
version: 39.0.501+up39.0.5 # specific chart version
|
||||
```
|
||||
|
||||
## Limitations we accept
|
||||
|
||||
- **No sticky sessions.** Every request to `api.myhoneydue.com` can go
|
||||
to a different pod. Our Go API is stateless — this is fine.
|
||||
- **No canary deployments** (yet). Traefik supports weighted routing
|
||||
via its CRDs (`TraefikService`) but we don't use them. TODO if/when
|
||||
we do gradual rollouts.
|
||||
- **No mTLS.** Traefik supports mutual TLS client auth for sensitive
|
||||
endpoints. We don't use it.
|
||||
- **Single ingress class.** Everything goes through the same Traefik.
|
||||
For multi-tenant setups we'd want separate ingress classes with
|
||||
separate policies.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Symptom | Likely cause | Fix |
|
||||
|---|---|---|
|
||||
| 404 from Traefik | Ingress doesn't match `Host:` | Check Ingress host field, DNS |
|
||||
| 502 from Traefik | Backend Service has no endpoints | `kubectl get endpoints -n honeydue` |
|
||||
| 503 from Traefik | Circuit breaker / backend unhealthy | Check pod logs, readiness probe |
|
||||
| 504 from Traefik | Backend slow | Check pod CPU/memory, DB connections |
|
||||
| Connection refused at 80 | Traefik pod not running or kernel not listening | `kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik`; `ssh deploy@node 'ss -lntp | grep :80'` |
|
||||
| Mixed content error in browser | `X-Forwarded-Proto` not honored by app | Check `trustedIPs` includes CF; check app reads the header |
|
||||
|
||||
## Operator cheat sheet
|
||||
|
||||
```bash
|
||||
# Traefik pods per node
|
||||
kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik -o wide
|
||||
|
||||
# Traefik logs (all pods)
|
||||
kubectl logs -n kube-system -l app.kubernetes.io/name=traefik --tail=50 --prefix
|
||||
|
||||
# Ingress status
|
||||
kubectl get ingress -n honeydue
|
||||
|
||||
# List all routers Traefik sees (requires dashboard or API)
|
||||
kubectl exec -n kube-system daemonset/traefik -- traefik healthcheck
|
||||
|
||||
# Re-apply config
|
||||
kubectl apply -f deploy-k3s/manifests/traefik-helmchartconfig.yaml
|
||||
kubectl delete job -n kube-system helm-install-traefik # triggers reinstall
|
||||
|
||||
# Restart all Traefik pods
|
||||
kubectl rollout restart daemonset/traefik -n kube-system
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Traefik v3 docs][traefik]
|
||||
- [Traefik Swarm provider][traefik-swarm]
|
||||
- [K3s Traefik customization][k3s-traefik]
|
||||
- [HelmChartConfig docs][k3s-helm]
|
||||
- [Cloudflare IP ranges][cf-ips]
|
||||
|
||||
[traefik]: https://doc.traefik.io/traefik/v3.6/
|
||||
[traefik-swarm]: https://doc.traefik.io/traefik/providers/swarm/
|
||||
[k3s-traefik]: https://docs.k3s.io/networking/networking-services#traefik-ingress-controller
|
||||
[k3s-helm]: https://docs.k3s.io/helm#customizing-packaged-components-with-helmchartconfig
|
||||
[cf-ips]: https://www.cloudflare.com/ips/
|
||||
Reference in New Issue
Block a user