# K3s Migration Notes — 2026-04-24

honeyDue is running on a 3-node K3s HA cluster on the existing Hetzner nodes
(hetzner1/2/3), replacing the previous Docker Swarm deployment.

## Why we migrated

Docker Swarm's libnetwork has a known stale-DNS bug on 29.x
([moby/moby#52265](https://github.com/moby/moby/issues/52265)) that leaves
ghost A-records when tasks migrate between nodes. Single-replica services
(like the admin panel) landed on a ghost IP ~50% of the time → connection
refused → 502. Full stack recreate cleared it, but the bug recurs on every
node-to-node task migration.

K3s uses CoreDNS + containerd with no libnetwork history → the bug class
doesn't exist there. See `docs/SWARM_POSTMORTEM.md` if it exists, or the
research summary in the earlier deploy session.

## Differences from the original `deploy-k3s/` scaffold

The original scaffold assumes a greenfield provision via `hetzner-k3s`,
GHCR for images, Cloudflare origin certs, and a Hetzner Load Balancer.
We reused existing nodes and kept Cloudflare Flexible SSL:

| Setting | Scaffold default | What we did |
|---|---|---|
| Provisioning | `hetzner-k3s` tool creates boxes | Manual k3s install on existing Hetzner boxes |
| Registry | GHCR (`ghcr-credentials`) | Gitea (`gitea-credentials`) via `kubectl create secret docker-registry` |
| Ingress TLS | `cloudflare-origin-cert` Secret | No TLS at origin (CF Flexible) |
| Load balancer | Hetzner LB → nodes | Cloudflare round-robin across 3 node IPs |
| Admin basic auth | `admin-auth` Traefik middleware | Not applied — in-app auth only |
| CF-only IP allowlist | `cloudflare-only` middleware | Not applied — UFW restricts some ports, 80/443 open to anyone who knows node IPs |
| Traefik | LoadBalancer via servicelb | DaemonSet w/ hostNetwork (servicelb disabled); see `traefik-config.yaml` below |
| Worker replicas | 2 | 1 (Asynq scheduler is singleton) |
| API start_period | 12×5s = 60s | 48×5s = 240s (covers migrate + lock queue on first boot) |
| Admin probe path | `/admin/` | `/` (Next.js serves at root) |

## Manifest fixes applied in-repo (already committed)

- `manifests/api/deployment.yaml` — `startupProbe.failureThreshold: 12 → 48`
- `manifests/admin/deployment.yaml` — probe path `/admin/ → /`, threshold `12 → 24`
- `manifests/worker/deployment.yaml` — `replicas: 2 → 1`
- `manifests/pod-disruption-budgets.yaml` — worker `minAvailable: 1 → 0`

## Traefik override (applied as HelmChartConfig)

K3s ships Traefik as a single-replica Deployment with a LoadBalancer service.
With servicelb disabled (to avoid binding a random port), we reconfigure it
to a DaemonSet binding directly on each node's public :80/:443 via
`hostNetwork: true`. The HelmChartConfig:

```yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: traefik
  namespace: kube-system
spec:
  valuesContent: |-
    deployment:
      kind: DaemonSet
    hostNetwork: true
    service:
      enabled: false
    ports:
      web:
        port: 80
        hostPort: 80
      websecure:
        port: 443
        hostPort: 443
    updateStrategy:
      type: RollingUpdate
      rollingUpdate:
        maxUnavailable: 1
        maxSurge: 0
    securityContext:
      capabilities:
        drop: [ALL]
        add: [NET_BIND_SERVICE]
      readOnlyRootFilesystem: true
      runAsGroup: 65532
      runAsNonRoot: true
      runAsUser: 65532
    additionalArguments:
      - "--entrypoints.web.forwardedHeaders.trustedIPs=173.245.48.0/20,103.21.244.0/22,103.22.200.0/22,103.31.4.0/22,141.101.64.0/18,108.162.192.0/18,190.93.240.0/20,188.114.96.0/20,197.234.240.0/22,198.41.128.0/17,162.158.0.0/15,104.16.0.0/13,104.24.0.0/14,172.64.0.0/13,131.0.72.0/22"
```

Apply with `kubectl apply -f traefik-config.yaml`, then bump the helm job
(`kubectl delete job -n kube-system helm-install-traefik`) to trigger reinstall.

## Required node-level sysctl

hostNetwork pods with capabilities don't get CAP_NET_BIND_SERVICE in the
host netns on modern containerd. Set on each node:

```bash
echo 'net.ipv4.ip_unprivileged_port_start=0' | sudo tee /etc/sysctl.d/99-unprivileged-ports.conf
sudo sysctl --system
```

## UFW rules added for k3s (per node)

All between the 3 node IPs (178.104.247.152, 178.105.32.198, 178.104.249.189):

- `6443/tcp` — kube API
- `2379/tcp`, `2380/tcp` — embedded etcd client + peer
- `10250/tcp` — kubelet
- `8472/udp` — flannel VXLAN overlay

Plus from your workstation IP to each node's `6443/tcp` for `kubectl`.

## Ingress

Minimal hostname-only routing (`/tmp/honeydue-ingress.yaml` at deploy time
— move it into `deploy-k3s/manifests/ingress/` in a follow-up):

```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: honeydue-api
  namespace: honeydue
spec:
  ingressClassName: traefik
  rules:
    - host: api.myhoneydue.com
      http:
        paths:
          - {path: /, pathType: Prefix, backend: {service: {name: api, port: {number: 8000}}}}
    - host: myhoneydue.com
      http:
        paths:
          - {path: /, pathType: Prefix, backend: {service: {name: api, port: {number: 8000}}}}
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: honeydue-admin
  namespace: honeydue
spec:
  ingressClassName: traefik
  rules:
    - host: admin.myhoneydue.com
      http:
        paths:
          - {path: /, pathType: Prefix, backend: {service: {name: admin, port: {number: 3000}}}}
```

## Operator access

Kubeconfig lives at `~/.kube/honeydue-k3s.yaml`.

```bash
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
kubectl get pods -n honeydue
```

## Remaining TODOs (not blocking)

- Apply `manifests/ingress/middleware.yaml` for security headers + rate limiting
  (CF-only allowlist + basic auth deliberately skipped until you want them)
- Apply `manifests/network-policies.yaml` for default-deny + explicit allows
- Apply `manifests/api/hpa.yaml` if you want autoscaling (metrics-server is
  already running, so just `kubectl apply` it)
- Upgrade to CF Full (strict) SSL: generate origin cert, create
  `cloudflare-origin-cert` Secret, add `tls:` block back to Ingress
- Set up a proper migration Job so `api` replicas don't each run `MigrateWithLock`
  on startup — lets you drop the 240s startupProbe grace
- Remove `deploy/` (the Swarm-era config) once you're confident in k3s