Files
Trey t c77ff07ce9
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
fix(security): remediate 2026-05-12 audit findings (Stages 2–5)
Remediation of the 2026-05-12/13 audits (78 findings + cluster gaps),
tracked in deploy-k3s/SECURITY.md, plus fixes from two independent
post-remediation reviews.

Auth & sessions:
- SHA-256 hashed auth-token storage (C1); prior-token cache eviction on
  re-login (MEDIUM-1)
- local Google JWKS verification, iss/aud/exp checks (C2/C3)
- constant-time login + generic errors (L1/LIVE-L11/LIVE-L13)
- per-account login lockout keyed on distinct source IPs (M5/MEDIUM-3)
- verified-email gating, login rate limiting (LIVE-L19, H1-H3)

IAP & webhooks:
- Apple/Google cross-account replay protection (C5/C6/C10/C13, H5/H6)
- migrations 000003-000006 (token hashing, IAP replay, audit_log +
  webhook_event_log table creation, append-only audit log)

Authorization & races:
- file-ownership owner-OR-member fix (C7), atomic share-code join
  (C9/H9), device-token reassignment (C8/LOW-3)

Secrets & deploy:
- secrets file-mounted at /etc/honeydue/secrets, not env (F8); Redis
  password out of the ConfigMap (HIGH-1); B2 keys reconciled
- digest-pinned images, admin ingress hardening, CSP/HSTS, /metrics
  lockdown; kubeconfig 0600, etcd secrets-encryption, fail2ban +
  unattended-upgrades at provision; secret-rotation runbook

Build, vet, and the full test suite (incl. -race) pass; the goose
migration chain is verified against PostgreSQL 16.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:28:33 -05:00

147 lines
6.2 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Runbook — Secret Rotation
Closes audit finding `K3S-F12` (secrets unrotated since cluster bootstrap,
no rotation cadence). See `deploy-k3s/SECURITY.md` Stage 2.
**Cadence:** rotate every secret at least **annually**. Rotate
**immediately** on suspected exposure, on an operator-device loss, or when
anyone who has seen a secret leaves the project.
**Record keeping:** after each rotation, annotate the secret so the age is
visible:
```bash
kubectl -n honeydue annotate secret <name> \
honeydue.dev/last-rotated="$(date -u +%Y-%m-%d)" --overwrite
```
---
## How rotation works
Every secret has a **source of truth** on the operator workstation. The
deploy scripts read those sources and (re)create the Kubernetes Secrets.
Rotation is always: **update the source → re-run `02-setup-secrets.sh`
restart the pods that consume it → revoke the old credential at its
provider.**
`02-setup-secrets.sh` uses `kubectl apply` (via `--dry-run=client -o yaml`),
so re-running it is idempotent and only changes what you changed.
| Kubernetes Secret | Source of truth | Consumed by |
|---|---|---|
| `honeydue-secrets``POSTGRES_PASSWORD` | `deploy-k3s/secrets/postgres_password.txt` | api, worker |
| `honeydue-secrets``SECRET_KEY` | `deploy-k3s/secrets/secret_key.txt` | api, worker |
| `honeydue-secrets``EMAIL_HOST_PASSWORD` | `deploy-k3s/secrets/email_host_password.txt` | api, worker |
| `honeydue-secrets``FCM_SERVER_KEY` | `deploy-k3s/secrets/fcm_server_key.txt` | api, worker |
| `honeydue-secrets``REDIS_PASSWORD` | `config.yaml` key `redis.password` | api, worker, redis |
| `honeydue-secrets``OBS_INGEST_TOKEN` | `deploy/prod.env` | api, worker |
| `honeydue-apns-key``apns_auth_key.p8` | `deploy-k3s/secrets/apns_auth_key.p8` | api, worker |
| `cloudflare-origin-cert` | `deploy-k3s/secrets/cloudflare-origin.{crt,key}` | Traefik ingress |
| `ghcr-credentials` | `config.yaml` block `registry.*` | image pulls (all pods) |
| `admin-basic-auth` | `config.yaml` keys `admin.basic_auth_user` / `..._password` | Traefik `admin-auth` middleware |
The `deploy-k3s/secrets/` directory and `config.yaml` are **gitignored**
never commit them.
---
## Standard rotation procedure
```bash
cd honeyDueAPI-go
export KUBECONFIG="$(pwd)/deploy-k3s/kubeconfig"
# 1. Update the source (file under deploy-k3s/secrets/ or a config.yaml key)
# 2. Recreate the Kubernetes Secrets from sources
./deploy-k3s/scripts/02-setup-secrets.sh
# 3. Restart the consumers (see per-secret notes below for which)
kubectl -n honeydue rollout restart deploy/api deploy/worker
# 4. Confirm health
kubectl -n honeydue rollout status deploy/api
kubectl -n honeydue rollout status deploy/worker
# 5. Revoke the OLD credential at its provider (see per-secret notes)
# 6. Annotate the rotated secret with today's date
```
---
## Per-secret notes
### `POSTGRES_PASSWORD`
1. Rotate the role password in the Neon dashboard.
2. Write the new value to `deploy-k3s/secrets/postgres_password.txt`.
3. `02-setup-secrets.sh`, then `rollout restart deploy/api deploy/worker`.
4. Watch logs for connection errors; the old password stops working the
moment Neon applies the change, so do steps 23 promptly.
### `SECRET_KEY` ⚠️ user-visible
This signs auth tokens. **Rotating it logs every user out** — all existing
tokens become invalid and every client must re-authenticate.
1. Generate: `openssl rand -hex 32`.
2. Write to `deploy-k3s/secrets/secret_key.txt` (must be ≥32 chars — the
script enforces this; the app refuses to start in production without it).
3. `02-setup-secrets.sh`, then `rollout restart deploy/api deploy/worker`.
- Only rotate on a schedule or on suspected compromise — not casually.
- A future improvement (overlap window via a key-id header) would let old
tokens validate during the transition; not implemented today.
### `EMAIL_HOST_PASSWORD`
1. Generate a new app password in Fastmail; keep the old one alive briefly.
2. Write to `deploy-k3s/secrets/email_host_password.txt`.
3. `02-setup-secrets.sh`, `rollout restart deploy/api deploy/worker`.
4. Delete the old Fastmail app password.
### `FCM_SERVER_KEY`
1. Rotate the key in the Firebase console.
2. Write to `deploy-k3s/secrets/fcm_server_key.txt`.
3. `02-setup-secrets.sh`, `rollout restart deploy/api deploy/worker`.
### `REDIS_PASSWORD`
Source is `config.yaml` key `redis.password` (hex only — it is embedded in
the `REDIS_URL`, so non-hex characters would break URL parsing).
1. Generate: `openssl rand -hex 32`.
2. Set `redis.password` in `config.yaml`.
3. `02-setup-secrets.sh`.
4. Restart **redis as well as** api/worker so the new `--requirepass` and
the new `REDIS_URL` land together:
`kubectl -n honeydue rollout restart deploy/redis deploy/api deploy/worker`.
Expect a few seconds where api/worker reconnect.
### `apns_auth_key.p8`
1. Revoke the key in the Apple Developer console, generate a new `.p8`.
2. Replace `deploy-k3s/secrets/apns_auth_key.p8`.
3. `02-setup-secrets.sh`, `rollout restart deploy/api deploy/worker`.
4. If the Key ID changed, update `push.apns_key_id` in `config.yaml` too.
### `cloudflare-origin-cert`
1. Generate a new Origin CA certificate in the Cloudflare dashboard.
2. Replace `deploy-k3s/secrets/cloudflare-origin.crt` and `.key`.
3. `02-setup-secrets.sh`. Traefik picks up the new TLS secret; no app
restart needed. Verify the served cert with `openssl s_client`.
### `ghcr-credentials` (Gitea registry)
1. Generate a new PAT in Gitea (scope: `read:packages`).
2. Update the `registry.token` value in `config.yaml`.
3. `02-setup-secrets.sh`. No restart needed unless a pull is pending.
4. Revoke the old PAT in Gitea.
### `admin-basic-auth`
Source is `config.yaml` keys `admin.basic_auth_user` / `basic_auth_password`.
1. Set a new password (e.g. `openssl rand -hex 24`).
2. `02-setup-secrets.sh` regenerates the bcrypt htpasswd secret.
3. No app restart needed — Traefik reloads the `admin-auth` middleware.
4. Distribute the new credential to whoever uses the admin panel.
---
## After any rotation
- Run `./deploy-k3s/scripts/04-verify.sh` and confirm no `✗` lines.
- Annotate the rotated secret (see "Record keeping" above).
- If the rotation was due to a compromise, also follow the relevant
playbook in `deploy-k3s/SECURITY.md` → Appendix (Incident response).