docs/deployment: record security hardening pass + webapp + APNs
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled

Mark roadmap items done (network policies, Traefik middleware, CF Full
strict, CF IP UFW restriction, webapp deploy, APNs wired up, admin
URL-baking fix, admin probe bug). Update Chapter 4 (firewall rule
inventory now shows CF-only :443, no :80), Chapter 6 (request flow
walks through TLS on :443 and middleware hops), Chapter 13 (CF SSL
mode is Full strict, not Flexible; documents the origin cert
install), Chapter 7 (adds the web service section — proxy pattern,
3 replicas, PostHog build-args), and Appendix C (web manifests, CF
origin cert paths on disk, APNs .p8 path, updated network-policies
applied status).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-04-24 15:50:59 -05:00
parent ace03d2340
commit 7e77e3bbab
6 changed files with 198 additions and 124 deletions
+48 -53
View File
@@ -19,69 +19,55 @@ minute, with Slack/email alerts on failure.
**Effort**: ~30 min for Uptime Kuma deploy, ~10 min for Better Stack
signup.
### Cloudflare origin IP restriction
### ~~Cloudflare origin IP restriction~~ ✓ DONE (2026-04-24)
**Why**: UFW allows :80 from anywhere. If node IPs leak, direct-connect
attackers bypass CF's WAF/DDoS protection.
Both `:80` and `:443` `Anywhere` rules removed on all 3 nodes. Only
CF's 15 IPv4 + 7 IPv6 ranges allowed on `:443`. Direct-connect attempts
from non-CF IPs time out.
**How**: Replace the anywhere-80 UFW rule with 15 IPv4 + 7 IPv6 CF
ranges. See [Chapter 13 §CF IP ranges](./13-cloudflare.md#cloudflare-ip-ranges-used-in-traefik-trustedips).
**Still TODO**: monthly automated refresh of the CF IP list. Ranges
change rarely; manual re-run of `scripts/ufw-cf-refresh.sh` (not yet
written) on cadence is acceptable for now.
Automation: a small script that refreshes the CF IP list monthly and
re-applies UFW rules.
### ~~Enable network policies in k3s~~ ✓ DONE (2026-04-24)
**Effort**: 1 hour.
Applied with one scaffold correction: Traefik runs as a DaemonSet with
`hostNetwork: true`, so traffic from it arrives with the **node IP** as
source rather than a pod IP. The original scaffold used
`namespaceSelector: kube-system` which doesn't match hostNetwork
traffic. Fixed by using an `ipBlock` list of the three node IPs plus
the cluster pod CIDR `10.42.0.0/16`.
### Enable network policies in k3s
Also added policies for `web` (missing from the original scaffold).
**Why**: Currently pods can freely egress anywhere. A compromised pod
could exfiltrate data or attack lateral services.
### ~~Apply Traefik security middleware~~ ✓ DONE (2026-04-24)
**How**: `kubectl apply -f deploy-k3s/manifests/network-policies.yaml`.
The scaffold defines default-deny + explicit allows for:
- DNS egress for all pods
- Traefik → api (port 8000)
- Traefik → admin (port 3000)
- api/worker → Redis
- api/worker → external services (Postgres, B2, Fastmail)
`security-headers` + `rate-limit` attached to all three ingresses
(api, admin, web). `admin-auth` is defined but not attached (needs an
`admin-basic-auth` secret we haven't created). `cloudflare-only` IP
allowlist exists but is redundant with the UFW-level CF restriction —
keep for defense in depth if we ever expose another layer.
Then test that nothing breaks (might need to adjust allow rules).
**Effort**: 1-2 hours including testing.
### Apply Traefik security middleware
**Why**: Our current Ingress has no rate limiting or security headers
beyond what Traefik adds by default.
**How**: Apply `deploy-k3s/manifests/ingress/middleware.yaml`, annotate
Ingresses to use them:
```yaml
metadata:
annotations:
traefik.ingress.kubernetes.io/router.middlewares: honeydue-security-headers@kubernetescrd,honeydue-rate-limit@kubernetescrd
```
**Effort**: 15 min.
One scaffold correction: the `Content-Security-Policy` header in
`security-headers.customResponseHeaders` was stripped. The Go API sets
its own CSP in `internal/router/router.go`, and two CSP headers combine
via intersection (most restrictive wins), which would break the Google
Fonts on the marketing landing page. Next.js apps set their own via
middleware.
## Medium priority
### Upgrade to CF Full (strict) SSL
### ~~Upgrade to CF Full (strict) SSL~~ ✓ DONE (2026-04-24)
**Why**: Currently CF↔origin is plain HTTP. An attacker between CF and
Hetzner could read traffic. Full (strict) mode encrypts this leg with
a CF-issued origin cert.
Origin CA cert (`*.myhoneydue.com` + `myhoneydue.com`, 15-year
validity) stored as `cloudflare-origin-cert` TLS secret. All three
ingresses reference it via `tls:` blocks. CF mode flipped from
Flexible to Full (strict). Verified by:
**How**:
1. Generate Origin CA cert in CF dashboard → SSL/TLS → Origin Server
2. Create `cloudflare-origin-cert` Secret in k8s
3. Add `tls:` block to Ingresses
4. Switch CF SSL mode to Full (strict)
**Effort**: 30 min.
**Citations**: [Cloudflare Origin CA docs][cf-origin-ca]
- direct-connect to origin on `:443` serves the Origin cert (subject
`CN=CloudFlare Origin Certificate`)
- CF edge continues to serve its own Let's Encrypt cert to browsers
- both layers now TLS-encrypted
### Migration Job for schema changes
@@ -312,7 +298,16 @@ k3s server on each node with the new backend.
As items are done, mark them here. Think of this as a running changelog.
- [x] k3s migration from Swarm (2026-04-24)
- [x] Traefik DaemonSet + hostNetwork
- [x] Admin seed via ADMIN_EMAIL + ADMIN_PASSWORD
- [x] Documentation book (this doc set)
- [x] Traefik DaemonSet + hostNetwork (2026-04-24)
- [x] Admin seed via ADMIN_EMAIL + ADMIN_PASSWORD (2026-04-24)
- [x] Documentation book (this doc set) (2026-04-24)
- [x] Web client deployed at `app.myhoneydue.com` (2026-04-24) — Next.js 16 standalone, 3 replicas with PDB, proxy pattern to api, see Chapter 7.
- [x] Admin URL-baking fix (2026-04-24) — Dockerfile `ARG NEXT_PUBLIC_API_URL`, `.dockerignore` hardening for `admin/.env.*`.
- [x] Auto-seed initial data on first API boot (2026-04-24) — `20260414_seed_initial_data` migration populates lookups, admin user, task templates. See commit `4ec4bbb`.
- [x] APNs wired up (2026-04-24) — Key ID `5L5BVF5G48`, Team ID `X86BR9WTLD`, sandbox mode. Secret `honeydue-apns-key`, `FEATURE_PUSH_ENABLED=true`.
- [x] Traefik middleware: `security-headers` + `rate-limit` attached to all three ingresses (2026-04-24). CSP is stripped from the middleware because the Go API sets its own.
- [x] Admin liveness probe path fix (2026-04-24) — was hitting `/admin/` (404) and crashlooping every ~90s for 6 hours before the bug was caught. Fixed to `/`.
- [x] Network policies applied (2026-04-24) — default-deny + explicit allows. Traefik hostNetwork is matched via node IP `ipBlock`s, not namespaceSelector. See Chapter 5.
- [x] Cloudflare Full (strict) SSL (2026-04-24) — Origin CA cert installed as `cloudflare-origin-cert` secret, ingresses have `tls:` blocks, CF mode flipped from Flexible. Both user↔CF and CF↔origin now TLS.
- [x] UFW CF-IP allowlist on all 3 nodes (2026-04-24) — 15 IPv4 + 7 IPv6 CF ranges allow `:443`; `Anywhere` rules for `:80` and `:443` deleted. Direct-connect from non-CF IPs times out.
- [ ] All other items above