Remediation of the 2026-05-12/13 audits (78 findings + cluster gaps), tracked in deploy-k3s/SECURITY.md, plus fixes from two independent post-remediation reviews. Auth & sessions: - SHA-256 hashed auth-token storage (C1); prior-token cache eviction on re-login (MEDIUM-1) - local Google JWKS verification, iss/aud/exp checks (C2/C3) - constant-time login + generic errors (L1/LIVE-L11/LIVE-L13) - per-account login lockout keyed on distinct source IPs (M5/MEDIUM-3) - verified-email gating, login rate limiting (LIVE-L19, H1-H3) IAP & webhooks: - Apple/Google cross-account replay protection (C5/C6/C10/C13, H5/H6) - migrations 000003-000006 (token hashing, IAP replay, audit_log + webhook_event_log table creation, append-only audit log) Authorization & races: - file-ownership owner-OR-member fix (C7), atomic share-code join (C9/H9), device-token reassignment (C8/LOW-3) Secrets & deploy: - secrets file-mounted at /etc/honeydue/secrets, not env (F8); Redis password out of the ConfigMap (HIGH-1); B2 keys reconciled - digest-pinned images, admin ingress hardening, CSP/HSTS, /metrics lockdown; kubeconfig 0600, etcd secrets-encryption, fail2ban + unattended-upgrades at provision; secret-rotation runbook Build, vet, and the full test suite (incl. -race) pass; the goose migration chain is verified against PostgreSQL 16. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
16 KiB
06 — Traefik Ingress
Updated 2026-05-15 (security remediation): the Traefik middleware set changed —
cloudflare-only+admin-authare now attached to the admin ingress, a strictauth-rate-limitmiddleware fronts the auth endpoints (via a dedicatedhoneydue-api-authIngress), andsecurity-headersgained COOP/CORP + a 2-year preload HSTS and dropped the deprecatedX-XSS-Protection.deploy-k3s/SECURITY.mdis the authoritative current-state record.
Summary
Traefik is the reverse proxy that routes external HTTP requests to the
right application pod based on the Host: header. We run Traefik v3 as a
Kubernetes DaemonSet with hostNetwork: true — each of the three nodes
has its own Traefik pod listening directly on the node's :80/:443.
Cloudflare round-robins DNS across the three node IPs, so any node can
serve any request. No external load balancer.
Why Traefik
K3s bundles Traefik by default. The alternatives:
| Option | Pros | Cons |
|---|---|---|
| Traefik v3 (bundled) | Zero install, excellent k8s integration, middleware system, active development | Helm-driven config is indirect |
| NGINX Ingress | Most popular, battle-tested | Another thing to install, more config surface |
| HAProxy Ingress | Extremely performant | More hands-on, older docs |
| Caddy | Simple config, auto-HTTPS | caddy-docker-proxy / Ingress integration is less mature |
| Envoy / Istio | Most featureful | Massive overkill at our scale |
Traefik came "free" with K3s, does the job, and its Swarm provider is what we would have used if we'd fixed our Swarm architecture. Using it on k3s keeps the mental model consistent.
Deployment model
flowchart TB
subgraph CF[Cloudflare edge]
DNS[DNS A records:<br/>api.myhoneydue.com → 3 node IPs<br/>admin.myhoneydue.com → 3 node IPs]
end
subgraph N1[hetzner1]
T1[Traefik pod<br/>hostNetwork:true<br/>:80/:443]
kernel1[Linux kernel<br/>net.ipv4.ip_unprivileged_port_start=0]
end
subgraph N2[hetzner2]
T2[Traefik pod<br/>hostNetwork:true<br/>:80/:443]
kernel2[Linux kernel]
end
subgraph N3[hetzner3]
T3[Traefik pod<br/>hostNetwork:true<br/>:80/:443]
kernel3[Linux kernel]
end
subgraph Cluster[k3s cluster services]
APISvc[api Service :8000]
AdminSvc[admin Service :3000]
end
DNS -. HTTP :80 .-> T1 & T2 & T3
T1 & T2 & T3 -- reverse_proxy --> APISvc & AdminSvc
ASCII fallback
Cloudflare DNS
┌───────────────────┐
│ api → 3 IPs │
│ admin→ 3 IPs │
└─────────┬─────────┘
│ HTTP :80
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ hetzner1 │ │ hetzner2 │ │ hetzner3 │
│ Traefik │ │ Traefik │ │ Traefik │
│ :80/443 │ │ :80/443 │ │ :80/443 │
│(hostNet) │ │(hostNet) │ │(hostNet) │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└── ClusterIP ──────┼── ClusterIP ──────┘
▼
┌────────────────────────┐
│ api Service :8000 │
│ admin Service :3000 │
└────────────────────────┘
Why DaemonSet + hostNetwork
What we're trying to achieve: Any public-facing node should answer :80/:443. Cloudflare round-robins DNS; whichever node it picks, that node must serve.
The default k3s Traefik deployment is a single-replica Deployment exposed via a LoadBalancer Service. That requires either:
- Hetzner Load Balancer (+ $8.49/mo, another thing to manage), or
- K3s' built-in
servicelb(klipper-lb) which binds node ports dynamically to proxy to the Service
Neither was quite what we wanted. With three replicas of the stock Traefik behind klipper-lb, each Traefik pod is reachable but there's an extra hop through klipper's proxy daemon.
DaemonSet + hostNetwork is cleaner: each Traefik pod is the host's :80/:443. No proxy daemon, no LB Service, no VIP. Cloudflare DNS → node IP → kernel → Traefik, one hop.
Trade-offs of hostNetwork
Pro:
- One fewer layer of indirection; lower latency
- No Service needed; no kube-proxy in the ingress path
- Standard Cloudflare round-robin DNS is the failover mechanism
Con:
- Traefik is in the host netns; it sees the node's interfaces, not the cluster overlay
- Traefik still joins the cluster-DNS resolution (via
hostNetwork's default DNS policy) so it can resolve Service names likeapi - Port conflicts possible if anything else wants :80/:443 on the node (nothing else does in our setup)
Trade-offs of DaemonSet
Pro:
- One Traefik per node; matches our Cloudflare 3-IP round-robin exactly
- Any node down = Cloudflare's origin health checks route around it
Con:
- Updates require
maxUnavailable > 0(host ports conflict during surge) → brief moment where one node is down during rollout - 3× the memory usage vs. 1-replica Deployment (but Traefik is tiny — ~128 MB total across all three)
Our Traefik configuration
We reconfigure the bundled K3s Traefik via a HelmChartConfig. K3s
uses the helm-controller to manage bundled addons; HelmChartConfig
lets us override values without disabling-and-replacing the chart.
Full config at
deploy-k3s/manifests/traefik-helmchartconfig.yaml. Key settings:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: traefik
namespace: kube-system
spec:
valuesContent: |-
deployment:
kind: DaemonSet # was Deployment
hostNetwork: true
service:
enabled: false # no LoadBalancer Service
ports:
web:
port: 80
hostPort: 80
websecure:
port: 443
hostPort: 443
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 0
securityContext:
capabilities:
drop: [ALL]
add: [NET_BIND_SERVICE]
readOnlyRootFilesystem: true
runAsGroup: 65532
runAsNonRoot: true
runAsUser: 65532
additionalArguments:
- "--entrypoints.web.forwardedHeaders.trustedIPs=<CF ranges>"
Why each setting
kind: DaemonSet— one Traefik per node. Default is a Deployment with 1 replica.hostNetwork: true— Traefik runs in the host's network namespace so it can bind real :80/:443 on the node.service.enabled: false— no LoadBalancer Service is created. WithhostNetwork, we don't need one.ports.*.hostPort— explicit host port binding. Matches the container port (DaemonSet semantics withhostPort: 80ensure the kubelet schedules at most one Traefik per node).updateStrategy.maxUnavailable: 1, maxSurge: 0— we accept one node being down during a Traefik update (host port can't be shared). The Traefik Helm chart rejects this config combination withmaxSurge > 0— this was the second config iteration.- Security context — non-root (UID 65532), read-only root filesystem,
only
NET_BIND_SERVICEcapability. See Chapter 5. forwardedHeaders.trustedIPs— Cloudflare's IP ranges. Traefik trustsX-Forwarded-Protoet al. only from these ranges, so a bypassing client can't spoof the proto header.
Forwarded-headers trustedIPs
The full list of trusted CF ranges is in our additionalArguments. It's
the union of CF's published IPv4 and IPv6 ranges. When Cloudflare passes
a request to origin, it adds X-Forwarded-For and X-Forwarded-Proto
headers; Traefik only honors these if the request came from one of these
IPs. Every other client's headers are ignored.
If CF publishes new IP ranges (rare but possible), the
trustedIPs list needs updating. It's a raw string in our
HelmChartConfig — we'd need to edit, apply, and bump the helm job.
Traefik v3 vs v2
K3s ships Traefik v3 (currently 3.6.10). The v2 → v3 migration
changed a few things:
swarmModeremoved (replaced by aswarmprovider, but we don't use Swarm anyway)- Encoded-character handling changed (v3 warns about RFC 3986 handling; we ignore the warning)
- Middleware CRD group is
traefik.io/v1alpha1(wascontaino.us)
Our deployment handles all of this automatically via the bundled chart.
Ingress resources
We define two standard k8s Ingress resources in
deploy-k3s/manifests/ingress/ingress-simple.yaml:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: honeydue-api
namespace: honeydue
spec:
ingressClassName: traefik
rules:
- host: api.myhoneydue.com
http:
paths:
- path: /
pathType: Prefix
backend:
service: {name: api, port: {number: 8000}}
- host: myhoneydue.com
http:
paths:
- path: /
pathType: Prefix
backend:
service: {name: api, port: {number: 8000}}
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: honeydue-admin
namespace: honeydue
spec:
ingressClassName: traefik
rules:
- host: admin.myhoneydue.com
http:
paths:
- path: /
pathType: Prefix
backend:
service: {name: admin, port: {number: 3000}}
Traefik watches for Ingress resources with ingressClassName: traefik
and programs its router table accordingly. Changes are applied within
seconds — no restart needed.
What pathType: Prefix means
Every request starting with / matches (which is everything). Alternative
is Exact (matches only the literal path). Prefix is the default for
most Ingress controllers and matches how users think about URL routing.
How requests flow
- Cloudflare DNS resolves
api.myhoneydue.comto a CF edge IP (client never sees the three origin IPs — CF proxies). - Cloudflare edge terminates TLS from the browser, then opens a
fresh TCP to one of the origin IPs on
:443(SSL=Full (strict)). Say it picks178.105.32.198(hetzner2). - UFW on hetzner2 accepts the SYN — the source IP is in one of
the 15 CF IPv4 CIDRs allowed on
:443. (Any non-CF source IP is dropped at the kernel.) - Linux kernel sees a listener on
0.0.0.0:443(the Traefik pod, hostNetwork). Hands off the SYN. - Traefik accepts the connection, completes the TLS handshake
using the
cloudflare-origin-certsecret (CF Origin CA — CF verifies this chain on its side). Reads the plaintext HTTP request. - Traefik matches the
Host:header against its router table.Host: api.myhoneydue.com→honeydue-apiIngress →apiService. Attached middlewares (security-headers,rate-limit) run here. - Traefik dials
10.43.167.83:8000(api Service ClusterIP). This goes through the cluster DNS (CoreDNS) and kube-proxy (IPVS). - kube-proxy IPVS rewrites the destination to a live api pod endpoint
— say
10.42.2.6:8000(api pod on hetzner3). - Flannel VXLAN encapsulates the packet and sends to hetzner3 (UDP :8472 between node IPs).
- hetzner3's kernel decapsulates, delivers to the api pod.
- api pod processes, returns response.
- Response flows back the reverse path.
Cloudflare caches 200 responses at the edge (default TTL varies; for
HTML/JSON usually 0 unless we set Cache-Control headers). So the
second request for the same URL might not reach the origin at all.
Middleware (mostly unused)
Traefik supports middleware — small functions run before/after the proxy.
The deploy-k3s/manifests/ingress/middleware.yaml scaffold defines:
rate-limit— 100 req/min average, 200 burstsecurity-headers— HSTS, X-Frame-Options, CSP, etc.cloudflare-only— IP allowlist restricting origin to CF rangesadmin-auth— HTTP basic auth for admin panel
None of these are currently attached to our Ingresses. To enable,
add the traefik.ingress.kubernetes.io/router.middlewares annotation to
the Ingress:
metadata:
annotations:
traefik.ingress.kubernetes.io/router.middlewares: honeydue-security-headers@kubernetescrd,honeydue-rate-limit@kubernetescrd
We left them off to minimize surface area for the first week of the new cluster. Enabling is TODO in Chapter 20.
Traefik dashboard
Disabled. The Traefik dashboard (/dashboard/ and /api/) exposes
runtime state and is potentially information leaky. The bundled k3s
Traefik disables it by default, and we haven't re-enabled it.
If needed for debugging:
# Port-forward to a Traefik pod
kubectl port-forward -n kube-system daemonset/traefik 9000:9000
# (the chart exposes the dashboard on :9000 when enabled)
# Then visit http://localhost:9000/dashboard/
This requires kubectl access and isn't exposed publicly.
Version pinning
We take whatever Traefik version is bundled with K3s (currently 3.6.10).
The bundled chart is pinned to a specific version in K3s' release notes;
when we upgrade K3s the Traefik version can change. If that ever breaks
something, we can pin a specific version via the HelmChartConfig's
version field:
spec:
version: 39.0.501+up39.0.5 # specific chart version
Limitations we accept
- No sticky sessions. Every request to
api.myhoneydue.comcan go to a different pod. Our Go API is stateless — this is fine. - No canary deployments (yet). Traefik supports weighted routing
via its CRDs (
TraefikService) but we don't use them. TODO if/when we do gradual rollouts. - No mTLS. Traefik supports mutual TLS client auth for sensitive endpoints. We don't use it.
- Single ingress class. Everything goes through the same Traefik. For multi-tenant setups we'd want separate ingress classes with separate policies.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| 404 from Traefik | Ingress doesn't match Host: |
Check Ingress host field, DNS |
| 502 from Traefik | Backend Service has no endpoints | kubectl get endpoints -n honeydue |
| 503 from Traefik | Circuit breaker / backend unhealthy | Check pod logs, readiness probe |
| 504 from Traefik | Backend slow | Check pod CPU/memory, DB connections |
| Connection refused at 80 | Traefik pod not running or kernel not listening | kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik; `ssh deploy@node 'ss -lntp |
| Mixed content error in browser | X-Forwarded-Proto not honored by app |
Check trustedIPs includes CF; check app reads the header |
Operator cheat sheet
# Traefik pods per node
kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik -o wide
# Traefik logs (all pods)
kubectl logs -n kube-system -l app.kubernetes.io/name=traefik --tail=50 --prefix
# Ingress status
kubectl get ingress -n honeydue
# List all routers Traefik sees (requires dashboard or API)
kubectl exec -n kube-system daemonset/traefik -- traefik healthcheck
# Re-apply config
kubectl apply -f deploy-k3s/manifests/traefik-helmchartconfig.yaml
kubectl delete job -n kube-system helm-install-traefik # triggers reinstall
# Restart all Traefik pods
kubectl rollout restart daemonset/traefik -n kube-system