# 06 — Traefik Ingress ## Summary Traefik is the reverse proxy that routes external HTTP requests to the right application pod based on the `Host:` header. We run Traefik v3 as a Kubernetes DaemonSet with `hostNetwork: true` — each of the three nodes has its own Traefik pod listening directly on the node's `:80`/`:443`. Cloudflare round-robins DNS across the three node IPs, so any node can serve any request. No external load balancer. ## Why Traefik K3s bundles Traefik by default. The alternatives: | Option | Pros | Cons | |---|---|---| | **Traefik v3 (bundled)** | Zero install, excellent k8s integration, middleware system, active development | Helm-driven config is indirect | | NGINX Ingress | Most popular, battle-tested | Another thing to install, more config surface | | HAProxy Ingress | Extremely performant | More hands-on, older docs | | Caddy | Simple config, auto-HTTPS | `caddy-docker-proxy` / Ingress integration is less mature | | Envoy / Istio | Most featureful | Massive overkill at our scale | Traefik came "free" with K3s, does the job, and its [Swarm provider][traefik-swarm] is what we would have used if we'd fixed our Swarm architecture. Using it on k3s keeps the mental model consistent. ## Deployment model ```mermaid flowchart TB subgraph CF[Cloudflare edge] DNS[DNS A records:
api.myhoneydue.com → 3 node IPs
admin.myhoneydue.com → 3 node IPs] end subgraph N1[hetzner1] T1[Traefik pod
hostNetwork:true
:80/:443] kernel1[Linux kernel
net.ipv4.ip_unprivileged_port_start=0] end subgraph N2[hetzner2] T2[Traefik pod
hostNetwork:true
:80/:443] kernel2[Linux kernel] end subgraph N3[hetzner3] T3[Traefik pod
hostNetwork:true
:80/:443] kernel3[Linux kernel] end subgraph Cluster[k3s cluster services] APISvc[api Service :8000] AdminSvc[admin Service :3000] end DNS -. HTTP :80 .-> T1 & T2 & T3 T1 & T2 & T3 -- reverse_proxy --> APISvc & AdminSvc ``` ### ASCII fallback ``` Cloudflare DNS ┌───────────────────┐ │ api → 3 IPs │ │ admin→ 3 IPs │ └─────────┬─────────┘ │ HTTP :80 ┌───────────────────┼───────────────────┐ ▼ ▼ ▼ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ hetzner1 │ │ hetzner2 │ │ hetzner3 │ │ Traefik │ │ Traefik │ │ Traefik │ │ :80/443 │ │ :80/443 │ │ :80/443 │ │(hostNet) │ │(hostNet) │ │(hostNet) │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ └── ClusterIP ──────┼── ClusterIP ──────┘ ▼ ┌────────────────────────┐ │ api Service :8000 │ │ admin Service :3000 │ └────────────────────────┘ ``` ## Why DaemonSet + hostNetwork **What we're trying to achieve**: Any public-facing node should answer :80/:443. Cloudflare round-robins DNS; whichever node it picks, that node must serve. **The default k3s Traefik deployment** is a single-replica Deployment exposed via a LoadBalancer Service. That requires either: - Hetzner Load Balancer (+ $8.49/mo, another thing to manage), **or** - K3s' built-in `servicelb` (klipper-lb) which binds node ports dynamically to proxy to the Service Neither was quite what we wanted. With three replicas of the stock Traefik behind klipper-lb, each Traefik pod is reachable but there's an extra hop through klipper's proxy daemon. **DaemonSet + hostNetwork** is cleaner: each Traefik pod *is* the host's :80/:443. No proxy daemon, no LB Service, no VIP. Cloudflare DNS → node IP → kernel → Traefik, one hop. ### Trade-offs of hostNetwork **Pro:** - One fewer layer of indirection; lower latency - No Service needed; no kube-proxy in the ingress path - Standard Cloudflare round-robin DNS is the failover mechanism **Con:** - Traefik is in the host netns; it sees the node's interfaces, not the cluster overlay - Traefik still joins the cluster-DNS resolution (via `hostNetwork`'s default DNS policy) so it can resolve Service names like `api` - Port conflicts possible if anything else wants :80/:443 on the node (nothing else does in our setup) ### Trade-offs of DaemonSet **Pro:** - One Traefik per node; matches our Cloudflare 3-IP round-robin exactly - Any node down = Cloudflare's origin health checks route around it **Con:** - Updates require `maxUnavailable > 0` (host ports conflict during surge) → brief moment where one node is down during rollout - 3× the memory usage vs. 1-replica Deployment (but Traefik is tiny — ~128 MB total across all three) ## Our Traefik configuration We reconfigure the bundled K3s Traefik via a `HelmChartConfig`. K3s uses the `helm-controller` to manage bundled addons; `HelmChartConfig` lets us override values without disabling-and-replacing the chart. Full config at `deploy-k3s/manifests/traefik-helmchartconfig.yaml`. Key settings: ```yaml apiVersion: helm.cattle.io/v1 kind: HelmChartConfig metadata: name: traefik namespace: kube-system spec: valuesContent: |- deployment: kind: DaemonSet # was Deployment hostNetwork: true service: enabled: false # no LoadBalancer Service ports: web: port: 80 hostPort: 80 websecure: port: 443 hostPort: 443 updateStrategy: type: RollingUpdate rollingUpdate: maxUnavailable: 1 maxSurge: 0 securityContext: capabilities: drop: [ALL] add: [NET_BIND_SERVICE] readOnlyRootFilesystem: true runAsGroup: 65532 runAsNonRoot: true runAsUser: 65532 additionalArguments: - "--entrypoints.web.forwardedHeaders.trustedIPs=" ``` ### Why each setting - **`kind: DaemonSet`** — one Traefik per node. Default is a Deployment with 1 replica. - **`hostNetwork: true`** — Traefik runs in the host's network namespace so it can bind real :80/:443 on the node. - **`service.enabled: false`** — no LoadBalancer Service is created. With `hostNetwork`, we don't need one. - **`ports.*.hostPort`** — explicit host port binding. Matches the container port (DaemonSet semantics with `hostPort: 80` ensure the kubelet schedules at most one Traefik per node). - **`updateStrategy.maxUnavailable: 1, maxSurge: 0`** — we accept one node being down during a Traefik update (host port can't be shared). The Traefik Helm chart rejects this config combination with `maxSurge > 0` — this was the second config iteration. - **Security context** — non-root (UID 65532), read-only root filesystem, only `NET_BIND_SERVICE` capability. See Chapter 5. - **`forwardedHeaders.trustedIPs`** — Cloudflare's IP ranges. Traefik trusts `X-Forwarded-Proto` et al. only from these ranges, so a bypassing client can't spoof the proto header. ### Forwarded-headers trustedIPs The full list of trusted CF ranges is in our `additionalArguments`. It's the union of CF's published IPv4 and IPv6 ranges. When Cloudflare passes a request to origin, it adds `X-Forwarded-For` and `X-Forwarded-Proto` headers; Traefik only honors these if the request came from one of these IPs. Every other client's headers are ignored. If CF publishes new IP ranges (rare but possible), the `trustedIPs` list needs updating. It's a raw string in our HelmChartConfig — we'd need to edit, apply, and bump the helm job. ## Traefik v3 vs v2 K3s ships Traefik v3 (currently `3.6.10`). The v2 → v3 migration changed a few things: - `swarmMode` removed (replaced by a `swarm` provider, but we don't use Swarm anyway) - Encoded-character handling changed (v3 warns about RFC 3986 handling; we ignore the warning) - Middleware CRD group is `traefik.io/v1alpha1` (was `containo.us`) Our deployment handles all of this automatically via the bundled chart. ## Ingress resources We define two standard k8s `Ingress` resources in `deploy-k3s/manifests/ingress/ingress-simple.yaml`: ```yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: honeydue-api namespace: honeydue spec: ingressClassName: traefik rules: - host: api.myhoneydue.com http: paths: - path: / pathType: Prefix backend: service: {name: api, port: {number: 8000}} - host: myhoneydue.com http: paths: - path: / pathType: Prefix backend: service: {name: api, port: {number: 8000}} --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: honeydue-admin namespace: honeydue spec: ingressClassName: traefik rules: - host: admin.myhoneydue.com http: paths: - path: / pathType: Prefix backend: service: {name: admin, port: {number: 3000}} ``` Traefik watches for Ingress resources with `ingressClassName: traefik` and programs its router table accordingly. Changes are applied within seconds — no restart needed. ### What pathType: Prefix means Every request starting with `/` matches (which is everything). Alternative is `Exact` (matches only the literal path). `Prefix` is the default for most Ingress controllers and matches how users think about URL routing. ## How requests flow 1. **Cloudflare DNS** resolves `api.myhoneydue.com` to one of three IPs (round-robin). Say it picks `178.105.32.198` (hetzner2). 2. **Cloudflare edge** establishes TCP to `178.105.32.198:80` (plain HTTP, SSL=Flexible). Original HTTPS terminated at CF. 3. **UFW on hetzner2** accepts the SYN (80/tcp open from anywhere). 4. **Linux kernel** sees a listener on 0.0.0.0:80 (the Traefik pod). Hands off the SYN. 5. **Traefik accepts** the connection. Reads the HTTP request. 6. **Traefik matches** the `Host:` header against its router table. `Host: api.myhoneydue.com` → `honeydue-api` Ingress → `api` Service. 7. **Traefik dials** `10.43.167.83:8000` (api Service ClusterIP). This goes through the cluster DNS (CoreDNS) and kube-proxy (IPVS). 8. **kube-proxy IPVS** rewrites the destination to a live api pod endpoint — say `10.42.2.6:8000` (api pod on hetzner3). 9. **Flannel VXLAN** encapsulates the packet and sends to hetzner3 (UDP :8472 between node IPs). 10. **hetzner3's kernel** decapsulates, delivers to the api pod. 11. **api pod** processes, returns response. 12. **Response flows back** the reverse path. Cloudflare caches 200 responses at the edge (default TTL varies; for HTML/JSON usually 0 unless we set `Cache-Control` headers). So the second request for the same URL might not reach the origin at all. ## Middleware (mostly unused) Traefik supports middleware — small functions run before/after the proxy. The `deploy-k3s/manifests/ingress/middleware.yaml` scaffold defines: - **`rate-limit`** — 100 req/min average, 200 burst - **`security-headers`** — HSTS, X-Frame-Options, CSP, etc. - **`cloudflare-only`** — IP allowlist restricting origin to CF ranges - **`admin-auth`** — HTTP basic auth for admin panel **None of these are currently attached to our Ingresses.** To enable, add the `traefik.ingress.kubernetes.io/router.middlewares` annotation to the Ingress: ```yaml metadata: annotations: traefik.ingress.kubernetes.io/router.middlewares: honeydue-security-headers@kubernetescrd,honeydue-rate-limit@kubernetescrd ``` We left them off to minimize surface area for the first week of the new cluster. Enabling is TODO in Chapter 20. ## Traefik dashboard Disabled. The Traefik dashboard (`/dashboard/` and `/api/`) exposes runtime state and is potentially information leaky. The bundled k3s Traefik disables it by default, and we haven't re-enabled it. If needed for debugging: ```bash # Port-forward to a Traefik pod kubectl port-forward -n kube-system daemonset/traefik 9000:9000 # (the chart exposes the dashboard on :9000 when enabled) # Then visit http://localhost:9000/dashboard/ ``` This requires kubectl access and isn't exposed publicly. ## Version pinning We take whatever Traefik version is bundled with K3s (currently 3.6.10). The bundled chart is pinned to a specific version in K3s' release notes; when we upgrade K3s the Traefik version can change. If that ever breaks something, we can pin a specific version via the HelmChartConfig's `version` field: ```yaml spec: version: 39.0.501+up39.0.5 # specific chart version ``` ## Limitations we accept - **No sticky sessions.** Every request to `api.myhoneydue.com` can go to a different pod. Our Go API is stateless — this is fine. - **No canary deployments** (yet). Traefik supports weighted routing via its CRDs (`TraefikService`) but we don't use them. TODO if/when we do gradual rollouts. - **No mTLS.** Traefik supports mutual TLS client auth for sensitive endpoints. We don't use it. - **Single ingress class.** Everything goes through the same Traefik. For multi-tenant setups we'd want separate ingress classes with separate policies. ## Troubleshooting | Symptom | Likely cause | Fix | |---|---|---| | 404 from Traefik | Ingress doesn't match `Host:` | Check Ingress host field, DNS | | 502 from Traefik | Backend Service has no endpoints | `kubectl get endpoints -n honeydue` | | 503 from Traefik | Circuit breaker / backend unhealthy | Check pod logs, readiness probe | | 504 from Traefik | Backend slow | Check pod CPU/memory, DB connections | | Connection refused at 80 | Traefik pod not running or kernel not listening | `kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik`; `ssh deploy@node 'ss -lntp | grep :80'` | | Mixed content error in browser | `X-Forwarded-Proto` not honored by app | Check `trustedIPs` includes CF; check app reads the header | ## Operator cheat sheet ```bash # Traefik pods per node kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik -o wide # Traefik logs (all pods) kubectl logs -n kube-system -l app.kubernetes.io/name=traefik --tail=50 --prefix # Ingress status kubectl get ingress -n honeydue # List all routers Traefik sees (requires dashboard or API) kubectl exec -n kube-system daemonset/traefik -- traefik healthcheck # Re-apply config kubectl apply -f deploy-k3s/manifests/traefik-helmchartconfig.yaml kubectl delete job -n kube-system helm-install-traefik # triggers reinstall # Restart all Traefik pods kubectl rollout restart daemonset/traefik -n kube-system ``` ## References - [Traefik v3 docs][traefik] - [Traefik Swarm provider][traefik-swarm] - [K3s Traefik customization][k3s-traefik] - [HelmChartConfig docs][k3s-helm] - [Cloudflare IP ranges][cf-ips] [traefik]: https://doc.traefik.io/traefik/v3.6/ [traefik-swarm]: https://doc.traefik.io/traefik/providers/swarm/ [k3s-traefik]: https://docs.k3s.io/networking/networking-services#traefik-ingress-controller [k3s-helm]: https://docs.k3s.io/helm#customizing-packaged-components-with-helmchartconfig [cf-ips]: https://www.cloudflare.com/ips/