# 06 — Traefik Ingress

## Summary

Traefik is the reverse proxy that routes external HTTP requests to the
right application pod based on the `Host:` header. We run Traefik v3 as a
Kubernetes DaemonSet with `hostNetwork: true` — each of the three nodes
has its own Traefik pod listening directly on the node's `:80`/`:443`.
Cloudflare round-robins DNS across the three node IPs, so any node can
serve any request. No external load balancer.

## Why Traefik

K3s bundles Traefik by default. The alternatives:

| Option | Pros | Cons |
|---|---|---|
| **Traefik v3 (bundled)** | Zero install, excellent k8s integration, middleware system, active development | Helm-driven config is indirect |
| NGINX Ingress | Most popular, battle-tested | Another thing to install, more config surface |
| HAProxy Ingress | Extremely performant | More hands-on, older docs |
| Caddy | Simple config, auto-HTTPS | `caddy-docker-proxy` / Ingress integration is less mature |
| Envoy / Istio | Most featureful | Massive overkill at our scale |

Traefik came "free" with K3s, does the job, and its
[Swarm provider][traefik-swarm] is what we would have used if we'd
fixed our Swarm architecture. Using it on k3s keeps the mental model
consistent.

## Deployment model

```mermaid
flowchart TB
    subgraph CF[Cloudflare edge]
        DNS[DNS A records:<br/>api.myhoneydue.com → 3 node IPs<br/>admin.myhoneydue.com → 3 node IPs]
    end

    subgraph N1[hetzner1]
        T1[Traefik pod<br/>hostNetwork:true<br/>:80/:443]
        kernel1[Linux kernel<br/>net.ipv4.ip_unprivileged_port_start=0]
    end
    subgraph N2[hetzner2]
        T2[Traefik pod<br/>hostNetwork:true<br/>:80/:443]
        kernel2[Linux kernel]
    end
    subgraph N3[hetzner3]
        T3[Traefik pod<br/>hostNetwork:true<br/>:80/:443]
        kernel3[Linux kernel]
    end

    subgraph Cluster[k3s cluster services]
        APISvc[api Service :8000]
        AdminSvc[admin Service :3000]
    end

    DNS -. HTTP :80 .-> T1 & T2 & T3
    T1 & T2 & T3 -- reverse_proxy --> APISvc & AdminSvc
```

### ASCII fallback

```
                  Cloudflare DNS
                  ┌───────────────────┐
                  │  api  → 3 IPs     │
                  │  admin→ 3 IPs     │
                  └─────────┬─────────┘
                            │ HTTP :80
        ┌───────────────────┼───────────────────┐
        ▼                   ▼                   ▼
  ┌──────────┐        ┌──────────┐        ┌──────────┐
  │ hetzner1 │        │ hetzner2 │        │ hetzner3 │
  │ Traefik  │        │ Traefik  │        │ Traefik  │
  │  :80/443 │        │  :80/443 │        │  :80/443 │
  │(hostNet) │        │(hostNet) │        │(hostNet) │
  └────┬─────┘        └────┬─────┘        └────┬─────┘
       │                   │                   │
       └── ClusterIP ──────┼── ClusterIP ──────┘
                           ▼
              ┌────────────────────────┐
              │ api Service   :8000    │
              │ admin Service :3000    │
              └────────────────────────┘
```

## Why DaemonSet + hostNetwork

**What we're trying to achieve**: Any public-facing node should answer
:80/:443. Cloudflare round-robins DNS; whichever node it picks, that
node must serve.

**The default k3s Traefik deployment** is a single-replica Deployment
exposed via a LoadBalancer Service. That requires either:
- Hetzner Load Balancer (+ $8.49/mo, another thing to manage), **or**
- K3s' built-in `servicelb` (klipper-lb) which binds node ports
  dynamically to proxy to the Service

Neither was quite what we wanted. With three replicas of the stock Traefik
behind klipper-lb, each Traefik pod is reachable but there's an extra hop
through klipper's proxy daemon.

**DaemonSet + hostNetwork** is cleaner: each Traefik pod *is* the host's
:80/:443. No proxy daemon, no LB Service, no VIP. Cloudflare DNS →
node IP → kernel → Traefik, one hop.

### Trade-offs of hostNetwork

**Pro:**
- One fewer layer of indirection; lower latency
- No Service needed; no kube-proxy in the ingress path
- Standard Cloudflare round-robin DNS is the failover mechanism

**Con:**
- Traefik is in the host netns; it sees the node's interfaces, not
  the cluster overlay
- Traefik still joins the cluster-DNS resolution (via `hostNetwork`'s
  default DNS policy) so it can resolve Service names like `api`
- Port conflicts possible if anything else wants :80/:443 on the node
  (nothing else does in our setup)

### Trade-offs of DaemonSet

**Pro:**
- One Traefik per node; matches our Cloudflare 3-IP round-robin
  exactly
- Any node down = Cloudflare's origin health checks route around it

**Con:**
- Updates require `maxUnavailable > 0` (host ports conflict during
  surge) → brief moment where one node is down during rollout
- 3× the memory usage vs. 1-replica Deployment (but Traefik is tiny
  — ~128 MB total across all three)

## Our Traefik configuration

We reconfigure the bundled K3s Traefik via a `HelmChartConfig`. K3s
uses the `helm-controller` to manage bundled addons; `HelmChartConfig`
lets us override values without disabling-and-replacing the chart.

Full config at
`deploy-k3s/manifests/traefik-helmchartconfig.yaml`. Key settings:

```yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: traefik
  namespace: kube-system
spec:
  valuesContent: |-
    deployment:
      kind: DaemonSet      # was Deployment
    hostNetwork: true
    service:
      enabled: false        # no LoadBalancer Service
    ports:
      web:
        port: 80
        hostPort: 80
      websecure:
        port: 443
        hostPort: 443
    updateStrategy:
      type: RollingUpdate
      rollingUpdate:
        maxUnavailable: 1
        maxSurge: 0
    securityContext:
      capabilities:
        drop: [ALL]
        add: [NET_BIND_SERVICE]
      readOnlyRootFilesystem: true
      runAsGroup: 65532
      runAsNonRoot: true
      runAsUser: 65532
    additionalArguments:
      - "--entrypoints.web.forwardedHeaders.trustedIPs=<CF ranges>"
```

### Why each setting

- **`kind: DaemonSet`** — one Traefik per node. Default is a Deployment
  with 1 replica.
- **`hostNetwork: true`** — Traefik runs in the host's network namespace
  so it can bind real :80/:443 on the node.
- **`service.enabled: false`** — no LoadBalancer Service is created.
  With `hostNetwork`, we don't need one.
- **`ports.*.hostPort`** — explicit host port binding. Matches the
  container port (DaemonSet semantics with `hostPort: 80` ensure the
  kubelet schedules at most one Traefik per node).
- **`updateStrategy.maxUnavailable: 1, maxSurge: 0`** — we accept one
  node being down during a Traefik update (host port can't be shared).
  The Traefik Helm chart rejects this config combination with
  `maxSurge > 0` — this was the second config iteration.
- **Security context** — non-root (UID 65532), read-only root filesystem,
  only `NET_BIND_SERVICE` capability. See Chapter 5.
- **`forwardedHeaders.trustedIPs`** — Cloudflare's IP ranges. Traefik
  trusts `X-Forwarded-Proto` et al. only from these ranges, so a
  bypassing client can't spoof the proto header.

### Forwarded-headers trustedIPs

The full list of trusted CF ranges is in our `additionalArguments`. It's
the union of CF's published IPv4 and IPv6 ranges. When Cloudflare passes
a request to origin, it adds `X-Forwarded-For` and `X-Forwarded-Proto`
headers; Traefik only honors these if the request came from one of these
IPs. Every other client's headers are ignored.

If CF publishes new IP ranges (rare but possible), the
`trustedIPs` list needs updating. It's a raw string in our
HelmChartConfig — we'd need to edit, apply, and bump the helm job.

## Traefik v3 vs v2

K3s ships Traefik v3 (currently `3.6.10`). The v2 → v3 migration
changed a few things:
- `swarmMode` removed (replaced by a `swarm` provider, but we don't
  use Swarm anyway)
- Encoded-character handling changed (v3 warns about RFC 3986 handling;
  we ignore the warning)
- Middleware CRD group is `traefik.io/v1alpha1` (was `containo.us`)

Our deployment handles all of this automatically via the bundled
chart.

## Ingress resources

We define two standard k8s `Ingress` resources in
`deploy-k3s/manifests/ingress/ingress-simple.yaml`:

```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: honeydue-api
  namespace: honeydue
spec:
  ingressClassName: traefik
  rules:
    - host: api.myhoneydue.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service: {name: api, port: {number: 8000}}
    - host: myhoneydue.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service: {name: api, port: {number: 8000}}
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: honeydue-admin
  namespace: honeydue
spec:
  ingressClassName: traefik
  rules:
    - host: admin.myhoneydue.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service: {name: admin, port: {number: 3000}}
```

Traefik watches for Ingress resources with `ingressClassName: traefik`
and programs its router table accordingly. Changes are applied within
seconds — no restart needed.

### What pathType: Prefix means

Every request starting with `/` matches (which is everything). Alternative
is `Exact` (matches only the literal path). `Prefix` is the default for
most Ingress controllers and matches how users think about URL routing.

## How requests flow

1. **Cloudflare DNS** resolves `api.myhoneydue.com` to a CF edge IP
   (client never sees the three origin IPs — CF proxies).
2. **Cloudflare edge** terminates TLS from the browser, then opens a
   fresh TCP to one of the origin IPs on `:443` (SSL=Full (strict)).
   Say it picks `178.105.32.198` (hetzner2).
3. **UFW on hetzner2** accepts the SYN — the source IP is in one of
   the 15 CF IPv4 CIDRs allowed on `:443`. (Any non-CF source IP is
   dropped at the kernel.)
4. **Linux kernel** sees a listener on `0.0.0.0:443` (the Traefik pod,
   hostNetwork). Hands off the SYN.
5. **Traefik accepts** the connection, completes the TLS handshake
   using the `cloudflare-origin-cert` secret (CF Origin CA — CF
   verifies this chain on its side). Reads the plaintext HTTP request.
6. **Traefik matches** the `Host:` header against its router table.
   `Host: api.myhoneydue.com` → `honeydue-api` Ingress → `api` Service.
   Attached middlewares (`security-headers`, `rate-limit`) run here.
7. **Traefik dials** `10.43.167.83:8000` (api Service ClusterIP). This
   goes through the cluster DNS (CoreDNS) and kube-proxy (IPVS).
8. **kube-proxy IPVS** rewrites the destination to a live api pod endpoint
   — say `10.42.2.6:8000` (api pod on hetzner3).
9. **Flannel VXLAN** encapsulates the packet and sends to hetzner3
   (UDP :8472 between node IPs).
10. **hetzner3's kernel** decapsulates, delivers to the api pod.
11. **api pod** processes, returns response.
12. **Response flows back** the reverse path.

Cloudflare caches 200 responses at the edge (default TTL varies; for
HTML/JSON usually 0 unless we set `Cache-Control` headers). So the
second request for the same URL might not reach the origin at all.

## Middleware (mostly unused)

Traefik supports middleware — small functions run before/after the proxy.
The `deploy-k3s/manifests/ingress/middleware.yaml` scaffold defines:

- **`rate-limit`** — 100 req/min average, 200 burst
- **`security-headers`** — HSTS, X-Frame-Options, CSP, etc.
- **`cloudflare-only`** — IP allowlist restricting origin to CF ranges
- **`admin-auth`** — HTTP basic auth for admin panel

**None of these are currently attached to our Ingresses.** To enable,
add the `traefik.ingress.kubernetes.io/router.middlewares` annotation to
the Ingress:

```yaml
metadata:
  annotations:
    traefik.ingress.kubernetes.io/router.middlewares: honeydue-security-headers@kubernetescrd,honeydue-rate-limit@kubernetescrd
```

We left them off to minimize surface area for the first week of the new
cluster. Enabling is TODO in Chapter 20.

## Traefik dashboard

Disabled. The Traefik dashboard (`/dashboard/` and `/api/`) exposes
runtime state and is potentially information leaky. The bundled k3s
Traefik disables it by default, and we haven't re-enabled it.

If needed for debugging:

```bash
# Port-forward to a Traefik pod
kubectl port-forward -n kube-system daemonset/traefik 9000:9000
# (the chart exposes the dashboard on :9000 when enabled)
# Then visit http://localhost:9000/dashboard/
```

This requires kubectl access and isn't exposed publicly.

## Version pinning

We take whatever Traefik version is bundled with K3s (currently 3.6.10).
The bundled chart is pinned to a specific version in K3s' release notes;
when we upgrade K3s the Traefik version can change. If that ever breaks
something, we can pin a specific version via the HelmChartConfig's
`version` field:

```yaml
spec:
  version: 39.0.501+up39.0.5  # specific chart version
```

## Limitations we accept

- **No sticky sessions.** Every request to `api.myhoneydue.com` can go
  to a different pod. Our Go API is stateless — this is fine.
- **No canary deployments** (yet). Traefik supports weighted routing
  via its CRDs (`TraefikService`) but we don't use them. TODO if/when
  we do gradual rollouts.
- **No mTLS.** Traefik supports mutual TLS client auth for sensitive
  endpoints. We don't use it.
- **Single ingress class.** Everything goes through the same Traefik.
  For multi-tenant setups we'd want separate ingress classes with
  separate policies.

## Troubleshooting

| Symptom | Likely cause | Fix |
|---|---|---|
| 404 from Traefik | Ingress doesn't match `Host:` | Check Ingress host field, DNS |
| 502 from Traefik | Backend Service has no endpoints | `kubectl get endpoints -n honeydue` |
| 503 from Traefik | Circuit breaker / backend unhealthy | Check pod logs, readiness probe |
| 504 from Traefik | Backend slow | Check pod CPU/memory, DB connections |
| Connection refused at 80 | Traefik pod not running or kernel not listening | `kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik`; `ssh deploy@node 'ss -lntp | grep :80'` |
| Mixed content error in browser | `X-Forwarded-Proto` not honored by app | Check `trustedIPs` includes CF; check app reads the header |

## Operator cheat sheet

```bash
# Traefik pods per node
kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik -o wide

# Traefik logs (all pods)
kubectl logs -n kube-system -l app.kubernetes.io/name=traefik --tail=50 --prefix

# Ingress status
kubectl get ingress -n honeydue

# List all routers Traefik sees (requires dashboard or API)
kubectl exec -n kube-system daemonset/traefik -- traefik healthcheck

# Re-apply config
kubectl apply -f deploy-k3s/manifests/traefik-helmchartconfig.yaml
kubectl delete job -n kube-system helm-install-traefik  # triggers reinstall

# Restart all Traefik pods
kubectl rollout restart daemonset/traefik -n kube-system
```

## References

- [Traefik v3 docs][traefik]
- [Traefik Swarm provider][traefik-swarm]
- [K3s Traefik customization][k3s-traefik]
- [HelmChartConfig docs][k3s-helm]
- [Cloudflare IP ranges][cf-ips]

[traefik]: https://doc.traefik.io/traefik/v3.6/
[traefik-swarm]: https://doc.traefik.io/traefik/providers/swarm/
[k3s-traefik]: https://docs.k3s.io/networking/networking-services#traefik-ingress-controller
[k3s-helm]: https://docs.k3s.io/helm#customizing-packaged-components-with-helmchartconfig
[cf-ips]: https://www.cloudflare.com/ips/