docs/deployment: record security hardening pass + webapp + APNs
Mark roadmap items done (network policies, Traefik middleware, CF Full strict, CF IP UFW restriction, webapp deploy, APNs wired up, admin URL-baking fix, admin probe bug). Update Chapter 4 (firewall rule inventory now shows CF-only :443, no :80), Chapter 6 (request flow walks through TLS on :443 and middleware hops), Chapter 13 (CF SSL mode is Full strict, not Flexible; documents the origin cert install), Chapter 7 (adds the web service section — proxy pattern, 3 replicas, PostHog build-args), and Appendix C (web manifests, CF origin cert paths on disk, APNs .p8 path, updated network-policies applied status). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -19,69 +19,55 @@ minute, with Slack/email alerts on failure.
|
||||
**Effort**: ~30 min for Uptime Kuma deploy, ~10 min for Better Stack
|
||||
signup.
|
||||
|
||||
### Cloudflare origin IP restriction
|
||||
### ~~Cloudflare origin IP restriction~~ ✓ DONE (2026-04-24)
|
||||
|
||||
**Why**: UFW allows :80 from anywhere. If node IPs leak, direct-connect
|
||||
attackers bypass CF's WAF/DDoS protection.
|
||||
Both `:80` and `:443` `Anywhere` rules removed on all 3 nodes. Only
|
||||
CF's 15 IPv4 + 7 IPv6 ranges allowed on `:443`. Direct-connect attempts
|
||||
from non-CF IPs time out.
|
||||
|
||||
**How**: Replace the anywhere-80 UFW rule with 15 IPv4 + 7 IPv6 CF
|
||||
ranges. See [Chapter 13 §CF IP ranges](./13-cloudflare.md#cloudflare-ip-ranges-used-in-traefik-trustedips).
|
||||
**Still TODO**: monthly automated refresh of the CF IP list. Ranges
|
||||
change rarely; manual re-run of `scripts/ufw-cf-refresh.sh` (not yet
|
||||
written) on cadence is acceptable for now.
|
||||
|
||||
Automation: a small script that refreshes the CF IP list monthly and
|
||||
re-applies UFW rules.
|
||||
### ~~Enable network policies in k3s~~ ✓ DONE (2026-04-24)
|
||||
|
||||
**Effort**: 1 hour.
|
||||
Applied with one scaffold correction: Traefik runs as a DaemonSet with
|
||||
`hostNetwork: true`, so traffic from it arrives with the **node IP** as
|
||||
source rather than a pod IP. The original scaffold used
|
||||
`namespaceSelector: kube-system` which doesn't match hostNetwork
|
||||
traffic. Fixed by using an `ipBlock` list of the three node IPs plus
|
||||
the cluster pod CIDR `10.42.0.0/16`.
|
||||
|
||||
### Enable network policies in k3s
|
||||
Also added policies for `web` (missing from the original scaffold).
|
||||
|
||||
**Why**: Currently pods can freely egress anywhere. A compromised pod
|
||||
could exfiltrate data or attack lateral services.
|
||||
### ~~Apply Traefik security middleware~~ ✓ DONE (2026-04-24)
|
||||
|
||||
**How**: `kubectl apply -f deploy-k3s/manifests/network-policies.yaml`.
|
||||
The scaffold defines default-deny + explicit allows for:
|
||||
- DNS egress for all pods
|
||||
- Traefik → api (port 8000)
|
||||
- Traefik → admin (port 3000)
|
||||
- api/worker → Redis
|
||||
- api/worker → external services (Postgres, B2, Fastmail)
|
||||
`security-headers` + `rate-limit` attached to all three ingresses
|
||||
(api, admin, web). `admin-auth` is defined but not attached (needs an
|
||||
`admin-basic-auth` secret we haven't created). `cloudflare-only` IP
|
||||
allowlist exists but is redundant with the UFW-level CF restriction —
|
||||
keep for defense in depth if we ever expose another layer.
|
||||
|
||||
Then test that nothing breaks (might need to adjust allow rules).
|
||||
|
||||
**Effort**: 1-2 hours including testing.
|
||||
|
||||
### Apply Traefik security middleware
|
||||
|
||||
**Why**: Our current Ingress has no rate limiting or security headers
|
||||
beyond what Traefik adds by default.
|
||||
|
||||
**How**: Apply `deploy-k3s/manifests/ingress/middleware.yaml`, annotate
|
||||
Ingresses to use them:
|
||||
|
||||
```yaml
|
||||
metadata:
|
||||
annotations:
|
||||
traefik.ingress.kubernetes.io/router.middlewares: honeydue-security-headers@kubernetescrd,honeydue-rate-limit@kubernetescrd
|
||||
```
|
||||
|
||||
**Effort**: 15 min.
|
||||
One scaffold correction: the `Content-Security-Policy` header in
|
||||
`security-headers.customResponseHeaders` was stripped. The Go API sets
|
||||
its own CSP in `internal/router/router.go`, and two CSP headers combine
|
||||
via intersection (most restrictive wins), which would break the Google
|
||||
Fonts on the marketing landing page. Next.js apps set their own via
|
||||
middleware.
|
||||
|
||||
## Medium priority
|
||||
|
||||
### Upgrade to CF Full (strict) SSL
|
||||
### ~~Upgrade to CF Full (strict) SSL~~ ✓ DONE (2026-04-24)
|
||||
|
||||
**Why**: Currently CF↔origin is plain HTTP. An attacker between CF and
|
||||
Hetzner could read traffic. Full (strict) mode encrypts this leg with
|
||||
a CF-issued origin cert.
|
||||
Origin CA cert (`*.myhoneydue.com` + `myhoneydue.com`, 15-year
|
||||
validity) stored as `cloudflare-origin-cert` TLS secret. All three
|
||||
ingresses reference it via `tls:` blocks. CF mode flipped from
|
||||
Flexible to Full (strict). Verified by:
|
||||
|
||||
**How**:
|
||||
1. Generate Origin CA cert in CF dashboard → SSL/TLS → Origin Server
|
||||
2. Create `cloudflare-origin-cert` Secret in k8s
|
||||
3. Add `tls:` block to Ingresses
|
||||
4. Switch CF SSL mode to Full (strict)
|
||||
|
||||
**Effort**: 30 min.
|
||||
|
||||
**Citations**: [Cloudflare Origin CA docs][cf-origin-ca]
|
||||
- direct-connect to origin on `:443` serves the Origin cert (subject
|
||||
`CN=CloudFlare Origin Certificate`)
|
||||
- CF edge continues to serve its own Let's Encrypt cert to browsers
|
||||
- both layers now TLS-encrypted
|
||||
|
||||
### Migration Job for schema changes
|
||||
|
||||
@@ -312,7 +298,16 @@ k3s server on each node with the new backend.
|
||||
As items are done, mark them here. Think of this as a running changelog.
|
||||
|
||||
- [x] k3s migration from Swarm (2026-04-24)
|
||||
- [x] Traefik DaemonSet + hostNetwork
|
||||
- [x] Admin seed via ADMIN_EMAIL + ADMIN_PASSWORD
|
||||
- [x] Documentation book (this doc set)
|
||||
- [x] Traefik DaemonSet + hostNetwork (2026-04-24)
|
||||
- [x] Admin seed via ADMIN_EMAIL + ADMIN_PASSWORD (2026-04-24)
|
||||
- [x] Documentation book (this doc set) (2026-04-24)
|
||||
- [x] Web client deployed at `app.myhoneydue.com` (2026-04-24) — Next.js 16 standalone, 3 replicas with PDB, proxy pattern to api, see Chapter 7.
|
||||
- [x] Admin URL-baking fix (2026-04-24) — Dockerfile `ARG NEXT_PUBLIC_API_URL`, `.dockerignore` hardening for `admin/.env.*`.
|
||||
- [x] Auto-seed initial data on first API boot (2026-04-24) — `20260414_seed_initial_data` migration populates lookups, admin user, task templates. See commit `4ec4bbb`.
|
||||
- [x] APNs wired up (2026-04-24) — Key ID `5L5BVF5G48`, Team ID `X86BR9WTLD`, sandbox mode. Secret `honeydue-apns-key`, `FEATURE_PUSH_ENABLED=true`.
|
||||
- [x] Traefik middleware: `security-headers` + `rate-limit` attached to all three ingresses (2026-04-24). CSP is stripped from the middleware because the Go API sets its own.
|
||||
- [x] Admin liveness probe path fix (2026-04-24) — was hitting `/admin/` (404) and crashlooping every ~90s for 6 hours before the bug was caught. Fixed to `/`.
|
||||
- [x] Network policies applied (2026-04-24) — default-deny + explicit allows. Traefik hostNetwork is matched via node IP `ipBlock`s, not namespaceSelector. See Chapter 5.
|
||||
- [x] Cloudflare Full (strict) SSL (2026-04-24) — Origin CA cert installed as `cloudflare-origin-cert` secret, ingresses have `tls:` blocks, CF mode flipped from Flexible. Both user↔CF and CF↔origin now TLS.
|
||||
- [x] UFW CF-IP allowlist on all 3 nodes (2026-04-24) — 15 IPv4 + 7 IPv6 CF ranges allow `:443`; `Anywhere` rules for `:80` and `:443` deleted. Direct-connect from non-CF IPs times out.
|
||||
- [ ] All other items above
|
||||
|
||||
Reference in New Issue
Block a user