Migrate prod deploy from Swarm to K3s; add full deployment book

Infrastructure: - Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers) - Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh - All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept temporarily for reference Bug fixes surfaced during migration: - Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25) - cache_service.go: remove sync.Once reassignment from inside Do() callback (was causing 'unlock of unlocked mutex' fatal after Redis Ping failure) - router.go: relax CSP from 'default-src none' to 'default-src self' + allowlist fonts.googleapis.com so the marketing landing page CSS actually loads in browsers - deploy/scripts/deploy_prod.sh: use docker buildx with --platform linux/amd64 so arm64 (Apple Silicon) dev machines produce images runnable on x86_64 Hetzner nodes; fix array expansion under set -u - deploy/swarm-stack.prod.yml: fix secret source references to use top-level aliases (the '\${X_SECRET}' form never actually resolved); dozzle ports: long-form host_ip is rejected by Swarm, switched to short-form (bound to 0.0.0.0 with UFW-based loopback restriction); worker replicas 2 -> 1 (Asynq scheduler singleton) - deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/' (Next.js serves at root; /admin/ returned 404 and killed pods); startupProbe failureThreshold 12 -> 24 - deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable 1 -> 0 (singleton) - deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold 12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot; real startup takes up to 240s) - .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/ and admin/src/app/api/*, hiding legitimate files) New files: - deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet + hostNetwork override for k3s-bundled Traefik - deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress without TLS (CF Flexible SSL) and without middleware - deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log Documentation: - docs/deployment/ — full deployment book, 26 files, ~42k words: - Part I Overview, infrastructure, orchestrator choice (Ch 0-2) - Part II Networking, firewall, Cloudflare (Ch 3-4, 13) - Part III Security, Traefik ingress (Ch 5-6) - Part IV Services, DB, storage, secrets, registry (Ch 7-11) - Part V Data flow, deploy process, observability, failures, runbook (Ch 12, 14-17) - Part VI Cost, Swarm postmortem, roadmap (Ch 18-20) - Appendices: glossary, kubectl cheat sheet, file locations, consolidated citations - README.md: Production Deployment section replaced with pointer to the book; Go version bumped to 1.25 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 07:20:21 -05:00
parent 4ec4bbbfe8
commit 6f303dbbaa
46 changed files with 9785 additions and 93 deletions
@@ -0,0 +1,526 @@
+# 05 — Security
+
+## Summary
+
+Security on this deployment is layered: Cloudflare at the edge, UFW at
+the node, k3s RBAC + Pod Security at the orchestrator, TLS between
+long-haul components, and dedicated service accounts with dropped
+capabilities inside containers. This chapter documents each layer, the
+rationale, and what's currently missing (and why).
+
+## Threat model
+
+Who we're defending against, in rough order of likelihood:
+
+1. **Opportunistic scanners** — bots scanning random IPv4 ranges for
+   known vulnerabilities. Mitigated by the firewall.
+2. **Credential stuffing / brute-force** — especially against SSH and
+   admin login. Mitigated by key-only SSH, strong passwords, rate limits.
+3. **Compromised external service** — if Neon, Backblaze, or Cloudflare
+   were breached, attacker would have access to whatever we store there.
+   Mitigated by scoped credentials, least-privilege API keys.
+4. **Compromised container image** — if Gitea or our build pipeline
+   were compromised, malicious code could reach prod. Mitigated by
+   (a) Gitea is behind authentication, (b) image pull secrets scoped,
+   (c) containers run non-root with minimal capabilities.
+5. **Insider threat** — not really a threat for a solo operator.
+6. **State actor** — not in threat model. At our scale this is
+   effectively unaddressable without becoming a security company.
+
+Explicitly **not** in threat model:
+- DDoS at a scale that saturates Cloudflare. We pay $0 for CF; their
+  DDoS mitigation is included but not unlimited. If we got hit with a
+  large attack, we'd move to a paid plan.
+- Physical access to Hetzner datacenters. That's their problem.
+
+## Layer 1 — Cloudflare edge
+
+Cloudflare sits in front of every public request.
+
+### What Cloudflare does for us
+
+| Protection | How it works |
+|---|---|
+| TLS termination | CF presents a cert for `*.myhoneydue.com`; clients encrypt to CF |
+| DDoS mitigation | Automatic on all plans including Free |
+| Bot filtering | "Under Attack" mode + bot score based blocking |
+| IP concealment | Origin IPs not in DNS; attackers can't directly scan |
+| WAF rules | CF Free includes managed ruleset for common exploits |
+| Rate limiting | Free tier: 10k requests/10min; more on paid plans |
+
+### What Cloudflare does **not** do
+
+- **Authenticate users** — that's the app's job
+- **Authorize requests** — that's the app's job
+- **Protect origin if origin IP leaks** — once someone knows a node IP
+  they can bypass CF. Mitigation: keep origin firewall strict (Chapter 4).
+- **Encrypt between CF and origin** — we're on SSL=Flexible, so CF↔origin
+  is HTTP. This is in our TODO (Chapter 20, upgrade to Full-strict).
+
+### The proxy-IP problem
+
+Cloudflare publishes its IP ranges
+([cloudflare.com/ips](https://www.cloudflare.com/ips/)). Any client can
+verify a request came from a CF IP by checking the remote address. Our
+Traefik is configured to trust `X-Forwarded-Proto` (so the Go API sees
+`https` even though origin received HTTP) only from CF IP ranges:
+
+```yaml
+# deploy-k3s/manifests/traefik-helmchartconfig.yaml
+additionalArguments:
+  - "--entrypoints.web.forwardedHeaders.trustedIPs=173.245.48.0/20,..."
+```
+
+This means a malicious request that bypasses CF (by hitting the node IP
+directly) can't spoof headers — Traefik ignores `X-Forwarded-*` unless
+the source IP is in CF's ranges.
+
+**TODO** (Chapter 20): Enforce at UFW level — allow 80/tcp only from
+CF IP ranges. Today any IP can reach the origin on port 80.
+
+## Layer 2 — Node (OS, SSH, firewall)
+
+Each node runs Ubuntu 24.04.3 LTS with:
+
+### SSH hardening
+
+`/etc/ssh/sshd_config` on each node:
+
+```
+Port 22
+PermitRootLogin no
+PasswordAuthentication no
+PubkeyAuthentication yes
+AllowUsers deploy
+```
+
+Result:
+- Only the `deploy` user can log in
+- Only with a public key (no password)
+- Root cannot log in remotely
+
+The public key authorized for `deploy`:
+
+```
+ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBU9xTTBD78tYUqHijgyU9PDqtmS4NuM/6uy8XgDzva+ hetzner2@myhoneydue.com
+```
+
+(Note: the comment field says "hetzner2" but it's the key for all three
+nodes — the comment is the key's identifier, not a restriction.)
+
+Private key is at `~/.ssh/hetzner` on the operator workstation.
+
+### Sudo
+
+The `deploy` user has unrestricted sudo with no password
+(`/etc/sudoers.d/deploy`):
+
+```
+deploy ALL=(ALL) NOPASSWD: ALL
+```
+
+This is convenient but broad. A compromise of the `deploy` SSH key =
+root on the node. Mitigations:
+- Key is stored only on the operator workstation, not checked into git
+- Operator workstation has disk encryption (macOS FileVault)
+- Operator workstation has a passphrase for the key (ssh-agent cache)
+
+Future hardening: scope sudo to specific commands that deploy workflows
+need (e.g., `/usr/sbin/ufw`, `/usr/bin/systemctl`), but this requires
+enumerating every command we might run, which breaks ad-hoc debugging.
+
+### fail2ban
+
+**Not installed.** fail2ban would ban IPs that fail SSH auth repeatedly.
+Because we disable password auth entirely, the attack surface is tiny (an
+attacker with the private key wins; failed-public-key attempts are
+functionally DDoS, not credential-stuffing). Installing fail2ban is on
+the TODO list anyway because it buys us rate-limiting on SSH bot noise.
+
+### unattended-upgrades
+
+**Not installed.** Security patches require manual `apt upgrade`. This is
+a gap. Install and configure for security-only updates as soon as time
+permits.
+
+### UFW firewall
+
+See [Chapter 4](./04-firewall.md) for the complete ruleset. Summary:
+default-deny incoming, specific allows for SSH (22), HTTP (80), HTTPS
+(443), k3s API from operator IP (6443), and inter-node cluster ports.
+
+## Layer 3 — Kubernetes RBAC
+
+K3s inherits full Kubernetes RBAC. Every component that talks to the API
+server has a ServiceAccount with only the permissions it needs.
+
+### System accounts
+
+K3s creates these by default:
+- `kube-system:admin` — cluster admin, used by `kubectl`
+- `kube-system:coredns` — for CoreDNS
+- `kube-system:traefik` — for Traefik ingress controller
+- `kube-system:helm-install-traefik` — for the Helm chart installer
+
+We don't touch these.
+
+### Application service accounts
+
+Our `rbac.yaml` creates four ServiceAccounts in the `honeydue` namespace:
+
+```yaml
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: api
+  namespace: honeydue
+automountServiceAccountToken: false   # ← important
+```
+
+Same for `admin`, `worker`, `redis`.
+
+**`automountServiceAccountToken: false`** means pods don't get a k8s
+API token mounted in `/var/run/secrets/kubernetes.io/serviceaccount/`.
+Without it, a compromised pod cannot query the Kubernetes API even if
+the default service account has broad permissions.
+
+### What the app pods CAN'T do
+
+Our app service accounts have **no RoleBindings or ClusterRoleBindings**.
+They cannot:
+- List, get, create, update, delete any Kubernetes resource
+- Read other namespaces' secrets
+- Schedule workloads
+- View cluster state
+
+If the api container were fully compromised (RCE), the attacker would
+have:
+- Network access to other pods in the `honeydue` namespace (Chapter 16)
+- Read access to our ConfigMap + Secrets (mounted into the container)
+- No ability to pivot to other parts of the cluster via the k8s API
+
+## Layer 4 — Pod Security
+
+Every pod runs with restrictive security context:
+
+```yaml
+securityContext:
+  runAsNonRoot: true
+  runAsUser: 1000        # api; different per service
+  runAsGroup: 1000
+  fsGroup: 1000
+  seccompProfile:
+    type: RuntimeDefault
+
+containers:
+  - securityContext:
+      allowPrivilegeEscalation: false
+      readOnlyRootFilesystem: true
+      capabilities:
+        drop: ["ALL"]
+```
+
+### What each setting does
+
+| Setting | Effect |
+|---|---|
+| `runAsNonRoot: true` | Pod refuses to start if the image's default user is root |
+| `runAsUser: 1000` | Override to UID 1000 (app user) |
+| `allowPrivilegeEscalation: false` | Process cannot become root via setuid, ptrace, etc. |
+| `readOnlyRootFilesystem: true` | `/` is read-only; writes require explicit volumes |
+| `capabilities: drop: [ALL]` | No Linux capabilities (NET_ADMIN, SYS_TIME, etc.) |
+| `seccompProfile: RuntimeDefault` | Restrict syscalls to containerd's default seccomp allowlist |
+
+Read-only root means our app images must declare writable volumes for
+anything mutable:
+
+```yaml
+volumeMounts:
+  - name: tmp
+    mountPath: /tmp
+volumes:
+  - name: tmp
+    emptyDir:
+      sizeLimit: 64Mi
+```
+
+If the app needs to write somewhere else (e.g., Next.js cache), we mount
+an emptyDir there explicitly.
+
+### Traefik exception
+
+Traefik needs `CAP_NET_BIND_SERVICE` to bind ports 80/443 on the host
+network. Its security context adds just that one capability back:
+
+```yaml
+securityContext:
+  capabilities:
+    drop: [ALL]
+    add: [NET_BIND_SERVICE]
+  readOnlyRootFilesystem: true
+  runAsGroup: 65532
+  runAsNonRoot: true
+  runAsUser: 65532
+```
+
+The `net.ipv4.ip_unprivileged_port_start=0` sysctl on the nodes
+complements this — on older kernels NET_BIND_SERVICE alone isn't enough
+in the host netns.
+
+### Pod Security Admission (PSA)
+
+Kubernetes has a built-in admission controller for enforcing Pod Security
+Standards at the namespace level:
+
+```yaml
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: honeydue
+  labels:
+    pod-security.kubernetes.io/enforce: restricted
+    pod-security.kubernetes.io/enforce-version: latest
+```
+
+We **don't currently set this**. We get the equivalent effect from
+the explicit securityContext on each pod, but namespace-level enforcement
+would catch new workloads that forget to set it. **TODO** (Chapter 20).
+
+## Layer 5 — Network Policies
+
+The `deploy-k3s/manifests/network-policies.yaml` scaffold defines:
+
+- **default-deny-all** — deny all ingress and egress by default in the
+  `honeydue` namespace
+- **allow-dns** — allow egress UDP/TCP 53 to CoreDNS
+- **allow-ingress-to-api** — allow Traefik (`kube-system` namespace) to
+  reach api pods on port 8000
+- **allow-ingress-to-admin** — same, for admin:3000
+
+**These are not currently applied.** Without them, our pods can freely
+talk to anything — including, theoretically, malicious destinations if
+an attacker gets RCE inside a pod.
+
+**TODO** (Chapter 20): Apply network policies. The scaffold is there; we
+just need to `kubectl apply -f deploy-k3s/manifests/network-policies.yaml`
+and test that nothing breaks.
+
+### What network policies would prevent
+
+| Attack scenario | NetworkPolicy blocks |
+|---|---|
+| Pod A compromised, attacker SSHs sideways to pod B | Yes (explicit allow needed) |
+| Pod RCE → scan internal networks | Yes (default deny egress) |
+| Pod RCE → exfil to attacker's C2 | Yes (outbound to internet needs egress rule) |
+
+Without policies, all of these work.
+
+## TLS and encryption
+
+### CF ↔ user
+
+Always TLS 1.2+ (CF doesn't support older). CF presents an automatically-
+renewed Let's Encrypt or CF-managed cert for `*.myhoneydue.com`.
+
+### CF ↔ origin
+
+**Plaintext HTTP** (SSL = Flexible). An attacker with access to the
+Cloudflare-to-Hetzner path could read traffic. In practice nobody who
+isn't Cloudflare or Hetzner sits on that path.
+
+**TODO** (Chapter 20): Upgrade to SSL = Full (strict) with a Cloudflare
+Origin CA certificate. This encrypts CF ↔ origin and verifies that
+origin's cert is the CF-issued one (prevents MitM if DNS is compromised).
+
+### API ↔ Neon Postgres
+
+**TLS 1.3** via `DB_SSLMODE=require`. The Go app's postgres driver (pgx)
+negotiates TLS and verifies Neon's cert against the system CA bundle.
+Connection fails if TLS can't be established.
+
+### API ↔ Backblaze B2
+
+**HTTPS** (B2 doesn't support HTTP). `B2_USE_SSL=true` in our ConfigMap
+(though actually the app reads `STORAGE_USE_SSL` — see Chapter 9 for this
+vestigial variable's story).
+
+### Worker ↔ Fastmail SMTP
+
+**STARTTLS** on port 587. The Go `wneessen/go-mail` library uses
+`TLSOpportunistic` mode — which means it connects plain then upgrades via
+STARTTLS. Fastmail always supports STARTTLS, so in practice every
+connection is encrypted.
+
+### API/worker ↔ Redis
+
+**Plaintext** inside the cluster. Redis 7 supports TLS (redis-tls.conf,
+`redis-server --tls-port`), but we haven't enabled it because Redis is
+on the overlay network, not exposed externally, and only holds cache +
+queue state.
+
+### Pod-to-pod (Flannel overlay)
+
+**Plaintext VXLAN** over Hetzner's public network. See
+[Chapter 3 §Layer 3](./03-networking.md#layer-3--pod-overlay-flannel-vxlan).
+TODO to switch to WireGuard backend.
+
+## Secrets management
+
+### Kubernetes Secrets
+
+Our k8s Secrets are stored in etcd. etcd-at-rest encryption is **not
+currently enabled** — a compromise of the etcd data directory would
+expose Secret values. Given:
+- Nodes have disk encryption at the Hetzner hypervisor layer
+- Attacker needs root on the node to read etcd
+- Our operator access is already root-via-sudo
+
+This is an accepted risk. **TODO** (Chapter 20): enable encryption at rest
+for etcd. K3s supports it via `--secrets-encryption` flag on the server.
+
+### What Secrets we have
+
+```
+$ kubectl get secrets -n honeydue
+NAME                TYPE                             DATA   AGE
+gitea-credentials   kubernetes.io/dockerconfigjson   1      ...
+honeydue-apns-key   Opaque                           1      ...
+honeydue-secrets    Opaque                           9      ...
+```
+
+Contents:
+
+| Secret | Key | Source |
+|---|---|---|
+| `gitea-credentials` | `.dockerconfigjson` | PAT for Gitea registry (image pulls) |
+| `honeydue-apns-key` | `apns_auth_key.p8` | Placeholder p8 file (push off) |
+| `honeydue-secrets` | `POSTGRES_PASSWORD` | Neon DB password |
+| `honeydue-secrets` | `SECRET_KEY` | 64-char random, app signing key |
+| `honeydue-secrets` | `EMAIL_HOST_PASSWORD` | Fastmail app password |
+| `honeydue-secrets` | `FCM_SERVER_KEY` | "disabled-no-push-accounts-yet" placeholder |
+| `honeydue-secrets` | `REDIS_PASSWORD` | Empty (no auth on internal Redis) |
+| `honeydue-secrets` | `B2_KEY_ID` | B2 app key ID |
+| `honeydue-secrets` | `B2_APP_KEY` | B2 app key secret |
+| `honeydue-secrets` | `ADMIN_EMAIL` | `admin@myhoneydue.com` |
+| `honeydue-secrets` | `ADMIN_PASSWORD` | Generated 24-char initial admin password |
+
+### Source of truth
+
+The Secret values came from:
+- `deploy/secrets/*.txt` files on the operator workstation (gitignored)
+- `deploy/prod.env` (gitignored)
+- `deploy/registry.env` (gitignored)
+
+These Swarm-era files are still the canonical source. If you need to
+recreate Secrets in a new cluster:
+
+```bash
+cd honeyDueAPI-go
+kubectl create secret generic honeydue-secrets -n honeydue \
+  --from-literal=POSTGRES_PASSWORD="$(cat deploy/secrets/postgres_password.txt)" \
+  --from-literal=SECRET_KEY="$(cat deploy/secrets/secret_key.txt)" \
+  --from-literal=EMAIL_HOST_PASSWORD="$(cat deploy/secrets/email_host_password.txt)" \
+  ...
+```
+
+The full recreation script is in Chapter 17 (Runbook).
+
+### Secret rotation
+
+Not automated. To rotate (e.g., after a compromise):
+
+1. Generate new value: `openssl rand -base64 32`
+2. Update the secret:
+   ```bash
+   kubectl create secret generic honeydue-secrets -n honeydue \
+     --from-literal=SECRET_KEY='new-value' \
+     --dry-run=client -o yaml | kubectl apply -f -
+   ```
+3. Restart dependent pods:
+   ```bash
+   kubectl rollout restart -n honeydue deploy/api deploy/worker
+   ```
+4. Update `deploy/secrets/secret_key.txt` to match
+5. Revoke the old credential at the source (Neon, Fastmail, etc.)
+
+## Container image provenance
+
+Images come from `gitea.treytartt.com/admin/*`. We have **no image
+signing or verification** (cosign/sigstore) in place. A compromise of
+the Gitea registry = the ability to push malicious images that would be
+pulled into prod on the next rollout.
+
+Mitigations:
+- Gitea itself is behind login; PAT is scoped to read:packages +
+  write:packages only
+- Gitea runs on the operator's infrastructure (same operator account)
+- Image tags are SHA-pinned (`:237c6b8`) not `:latest` → attacker can't
+  replace an existing tag's image without us noticing the digest change
+
+**TODO** (Chapter 20): Add cosign signing at build time, verify at pull
+time.
+
+## Operator workstation security
+
+The operator workstation has:
+- macOS with FileVault (full disk encryption)
+- Login password required
+- Private keys in `~/.ssh/` (mode 0600)
+- Kubeconfig at `~/.kube/honeydue-k3s.yaml` (mode 0600) — contains a bearer
+  token to the cluster
+
+**Losing the laptop would require immediate credential rotation:**
+- New SSH key, redeploy public part on all 3 nodes
+- New kubeconfig: run `sudo cat /etc/rancher/k3s/k3s.yaml` on hetzner1,
+  copy to workstation, update `KUBECONFIG` env
+- Rotate operator-access PATs on Gitea, Neon, Cloudflare, Backblaze
+
+## Compliance notes
+
+This stack is **not currently certified** for:
+- HIPAA — we transit and store health-related data but haven't contractually
+  bound any BAA
+- SOC 2 — no auditing, no documented controls beyond this document
+- PCI-DSS — we don't handle card data; Apple/Google IAP handles payments
+- GDPR — we follow GDPR best practices (data minimization, user deletion)
+  but haven't had a formal assessment
+
+If honeyDue ever needs any of these, the infrastructure is compatible
+but the operational processes around it would need formal work.
+
+## Operator cheat sheet
+
+```bash
+# See all RBAC-related resources in a namespace
+kubectl get sa,role,rolebinding -n honeydue
+
+# Check what a ServiceAccount can do
+kubectl auth can-i --list --as=system:serviceaccount:honeydue:api -n honeydue
+
+# Verify pod is running with expected security context
+kubectl get pod <pod> -n honeydue -o jsonpath='{.spec.securityContext}'
+kubectl get pod <pod> -n honeydue -o jsonpath='{.spec.containers[0].securityContext}'
+
+# List all Secrets (without revealing content)
+kubectl get secret -n honeydue
+kubectl describe secret honeydue-secrets -n honeydue  # shows keys, not values
+
+# Decode a secret (CAREFUL: prints plaintext)
+kubectl get secret honeydue-secrets -n honeydue -o jsonpath='{.data.SECRET_KEY}' | base64 -d
+```
+
+## References
+
+- [Kubernetes Pod Security Standards][psa]
+- [Kubernetes RBAC][rbac]
+- [Kubernetes NetworkPolicy][netpol]
+- [Cloudflare IP ranges][cf-ips]
+- [K3s secrets encryption][k3s-secrets]
+- [SSH hardening guide][ssh-guide]
+
+[psa]: https://kubernetes.io/docs/concepts/security/pod-security-standards/
+[rbac]: https://kubernetes.io/docs/reference/access-authn-authz/rbac/
+[netpol]: https://kubernetes.io/docs/concepts/services-networking/network-policies/
+[cf-ips]: https://www.cloudflare.com/ips/
+[k3s-secrets]: https://docs.k3s.io/security/secrets-encryption
+[ssh-guide]: https://linux-audit.com/audit-and-harden-your-ssh-configuration/