fix(security): remediate 2026-05-12 audit findings (Stages 2–5)
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled
Backend CI / Build (push) Has been cancelled

Remediation of the 2026-05-12/13 audits (78 findings + cluster gaps),
tracked in deploy-k3s/SECURITY.md, plus fixes from two independent
post-remediation reviews.

Auth & sessions:
- SHA-256 hashed auth-token storage (C1); prior-token cache eviction on
  re-login (MEDIUM-1)
- local Google JWKS verification, iss/aud/exp checks (C2/C3)
- constant-time login + generic errors (L1/LIVE-L11/LIVE-L13)
- per-account login lockout keyed on distinct source IPs (M5/MEDIUM-3)
- verified-email gating, login rate limiting (LIVE-L19, H1-H3)

IAP & webhooks:
- Apple/Google cross-account replay protection (C5/C6/C10/C13, H5/H6)
- migrations 000003-000006 (token hashing, IAP replay, audit_log +
  webhook_event_log table creation, append-only audit log)

Authorization & races:
- file-ownership owner-OR-member fix (C7), atomic share-code join
  (C9/H9), device-token reassignment (C8/LOW-3)

Secrets & deploy:
- secrets file-mounted at /etc/honeydue/secrets, not env (F8); Redis
  password out of the ConfigMap (HIGH-1); B2 keys reconciled
- digest-pinned images, admin ingress hardening, CSP/HSTS, /metrics
  lockdown; kubeconfig 0600, etcd secrets-encryption, fail2ban +
  unattended-upgrades at provision; secret-rotation runbook

Build, vet, and the full test suite (incl. -race) pass; the goose
migration chain is verified against PostgreSQL 16.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-05-16 22:28:33 -05:00
parent 2004f9c5b2
commit c77ff07ce9
59 changed files with 2819 additions and 1245 deletions
+146
View File
@@ -0,0 +1,146 @@
# Runbook — Secret Rotation
Closes audit finding `K3S-F12` (secrets unrotated since cluster bootstrap,
no rotation cadence). See `deploy-k3s/SECURITY.md` Stage 2.
**Cadence:** rotate every secret at least **annually**. Rotate
**immediately** on suspected exposure, on an operator-device loss, or when
anyone who has seen a secret leaves the project.
**Record keeping:** after each rotation, annotate the secret so the age is
visible:
```bash
kubectl -n honeydue annotate secret <name> \
honeydue.dev/last-rotated="$(date -u +%Y-%m-%d)" --overwrite
```
---
## How rotation works
Every secret has a **source of truth** on the operator workstation. The
deploy scripts read those sources and (re)create the Kubernetes Secrets.
Rotation is always: **update the source → re-run `02-setup-secrets.sh`
restart the pods that consume it → revoke the old credential at its
provider.**
`02-setup-secrets.sh` uses `kubectl apply` (via `--dry-run=client -o yaml`),
so re-running it is idempotent and only changes what you changed.
| Kubernetes Secret | Source of truth | Consumed by |
|---|---|---|
| `honeydue-secrets``POSTGRES_PASSWORD` | `deploy-k3s/secrets/postgres_password.txt` | api, worker |
| `honeydue-secrets``SECRET_KEY` | `deploy-k3s/secrets/secret_key.txt` | api, worker |
| `honeydue-secrets``EMAIL_HOST_PASSWORD` | `deploy-k3s/secrets/email_host_password.txt` | api, worker |
| `honeydue-secrets``FCM_SERVER_KEY` | `deploy-k3s/secrets/fcm_server_key.txt` | api, worker |
| `honeydue-secrets``REDIS_PASSWORD` | `config.yaml` key `redis.password` | api, worker, redis |
| `honeydue-secrets``OBS_INGEST_TOKEN` | `deploy/prod.env` | api, worker |
| `honeydue-apns-key``apns_auth_key.p8` | `deploy-k3s/secrets/apns_auth_key.p8` | api, worker |
| `cloudflare-origin-cert` | `deploy-k3s/secrets/cloudflare-origin.{crt,key}` | Traefik ingress |
| `ghcr-credentials` | `config.yaml` block `registry.*` | image pulls (all pods) |
| `admin-basic-auth` | `config.yaml` keys `admin.basic_auth_user` / `..._password` | Traefik `admin-auth` middleware |
The `deploy-k3s/secrets/` directory and `config.yaml` are **gitignored**
never commit them.
---
## Standard rotation procedure
```bash
cd honeyDueAPI-go
export KUBECONFIG="$(pwd)/deploy-k3s/kubeconfig"
# 1. Update the source (file under deploy-k3s/secrets/ or a config.yaml key)
# 2. Recreate the Kubernetes Secrets from sources
./deploy-k3s/scripts/02-setup-secrets.sh
# 3. Restart the consumers (see per-secret notes below for which)
kubectl -n honeydue rollout restart deploy/api deploy/worker
# 4. Confirm health
kubectl -n honeydue rollout status deploy/api
kubectl -n honeydue rollout status deploy/worker
# 5. Revoke the OLD credential at its provider (see per-secret notes)
# 6. Annotate the rotated secret with today's date
```
---
## Per-secret notes
### `POSTGRES_PASSWORD`
1. Rotate the role password in the Neon dashboard.
2. Write the new value to `deploy-k3s/secrets/postgres_password.txt`.
3. `02-setup-secrets.sh`, then `rollout restart deploy/api deploy/worker`.
4. Watch logs for connection errors; the old password stops working the
moment Neon applies the change, so do steps 23 promptly.
### `SECRET_KEY` ⚠️ user-visible
This signs auth tokens. **Rotating it logs every user out** — all existing
tokens become invalid and every client must re-authenticate.
1. Generate: `openssl rand -hex 32`.
2. Write to `deploy-k3s/secrets/secret_key.txt` (must be ≥32 chars — the
script enforces this; the app refuses to start in production without it).
3. `02-setup-secrets.sh`, then `rollout restart deploy/api deploy/worker`.
- Only rotate on a schedule or on suspected compromise — not casually.
- A future improvement (overlap window via a key-id header) would let old
tokens validate during the transition; not implemented today.
### `EMAIL_HOST_PASSWORD`
1. Generate a new app password in Fastmail; keep the old one alive briefly.
2. Write to `deploy-k3s/secrets/email_host_password.txt`.
3. `02-setup-secrets.sh`, `rollout restart deploy/api deploy/worker`.
4. Delete the old Fastmail app password.
### `FCM_SERVER_KEY`
1. Rotate the key in the Firebase console.
2. Write to `deploy-k3s/secrets/fcm_server_key.txt`.
3. `02-setup-secrets.sh`, `rollout restart deploy/api deploy/worker`.
### `REDIS_PASSWORD`
Source is `config.yaml` key `redis.password` (hex only — it is embedded in
the `REDIS_URL`, so non-hex characters would break URL parsing).
1. Generate: `openssl rand -hex 32`.
2. Set `redis.password` in `config.yaml`.
3. `02-setup-secrets.sh`.
4. Restart **redis as well as** api/worker so the new `--requirepass` and
the new `REDIS_URL` land together:
`kubectl -n honeydue rollout restart deploy/redis deploy/api deploy/worker`.
Expect a few seconds where api/worker reconnect.
### `apns_auth_key.p8`
1. Revoke the key in the Apple Developer console, generate a new `.p8`.
2. Replace `deploy-k3s/secrets/apns_auth_key.p8`.
3. `02-setup-secrets.sh`, `rollout restart deploy/api deploy/worker`.
4. If the Key ID changed, update `push.apns_key_id` in `config.yaml` too.
### `cloudflare-origin-cert`
1. Generate a new Origin CA certificate in the Cloudflare dashboard.
2. Replace `deploy-k3s/secrets/cloudflare-origin.crt` and `.key`.
3. `02-setup-secrets.sh`. Traefik picks up the new TLS secret; no app
restart needed. Verify the served cert with `openssl s_client`.
### `ghcr-credentials` (Gitea registry)
1. Generate a new PAT in Gitea (scope: `read:packages`).
2. Update the `registry.token` value in `config.yaml`.
3. `02-setup-secrets.sh`. No restart needed unless a pull is pending.
4. Revoke the old PAT in Gitea.
### `admin-basic-auth`
Source is `config.yaml` keys `admin.basic_auth_user` / `basic_auth_password`.
1. Set a new password (e.g. `openssl rand -hex 24`).
2. `02-setup-secrets.sh` regenerates the bcrypt htpasswd secret.
3. No app restart needed — Traefik reloads the `admin-auth` middleware.
4. Distribute the new credential to whoever uses the admin panel.
---
## After any rotation
- Run `./deploy-k3s/scripts/04-verify.sh` and confirm no `✗` lines.
- Annotate the rotated secret (see "Record keeping" above).
- If the rotation was due to a compromise, also follow the relevant
playbook in `deploy-k3s/SECURITY.md` → Appendix (Incident response).