Files
Trey t c77ff07ce9
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
fix(security): remediate 2026-05-12 audit findings (Stages 2–5)
Remediation of the 2026-05-12/13 audits (78 findings + cluster gaps),
tracked in deploy-k3s/SECURITY.md, plus fixes from two independent
post-remediation reviews.

Auth & sessions:
- SHA-256 hashed auth-token storage (C1); prior-token cache eviction on
  re-login (MEDIUM-1)
- local Google JWKS verification, iss/aud/exp checks (C2/C3)
- constant-time login + generic errors (L1/LIVE-L11/LIVE-L13)
- per-account login lockout keyed on distinct source IPs (M5/MEDIUM-3)
- verified-email gating, login rate limiting (LIVE-L19, H1-H3)

IAP & webhooks:
- Apple/Google cross-account replay protection (C5/C6/C10/C13, H5/H6)
- migrations 000003-000006 (token hashing, IAP replay, audit_log +
  webhook_event_log table creation, append-only audit log)

Authorization & races:
- file-ownership owner-OR-member fix (C7), atomic share-code join
  (C9/H9), device-token reassignment (C8/LOW-3)

Secrets & deploy:
- secrets file-mounted at /etc/honeydue/secrets, not env (F8); Redis
  password out of the ConfigMap (HIGH-1); B2 keys reconciled
- digest-pinned images, admin ingress hardening, CSP/HSTS, /metrics
  lockdown; kubeconfig 0600, etcd secrets-encryption, fail2ban +
  unattended-upgrades at provision; secret-rotation runbook

Build, vet, and the full test suite (incl. -race) pass; the goose
migration chain is verified against PostgreSQL 16.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:28:33 -05:00

6.2 KiB
Raw Permalink Blame History

Runbook — Secret Rotation

Closes audit finding K3S-F12 (secrets unrotated since cluster bootstrap, no rotation cadence). See deploy-k3s/SECURITY.md Stage 2.

Cadence: rotate every secret at least annually. Rotate immediately on suspected exposure, on an operator-device loss, or when anyone who has seen a secret leaves the project.

Record keeping: after each rotation, annotate the secret so the age is visible:

kubectl -n honeydue annotate secret <name> \
  honeydue.dev/last-rotated="$(date -u +%Y-%m-%d)" --overwrite

How rotation works

Every secret has a source of truth on the operator workstation. The deploy scripts read those sources and (re)create the Kubernetes Secrets. Rotation is always: update the source → re-run 02-setup-secrets.sh → restart the pods that consume it → revoke the old credential at its provider.

02-setup-secrets.sh uses kubectl apply (via --dry-run=client -o yaml), so re-running it is idempotent and only changes what you changed.

Kubernetes Secret Source of truth Consumed by
honeydue-secretsPOSTGRES_PASSWORD deploy-k3s/secrets/postgres_password.txt api, worker
honeydue-secretsSECRET_KEY deploy-k3s/secrets/secret_key.txt api, worker
honeydue-secretsEMAIL_HOST_PASSWORD deploy-k3s/secrets/email_host_password.txt api, worker
honeydue-secretsFCM_SERVER_KEY deploy-k3s/secrets/fcm_server_key.txt api, worker
honeydue-secretsREDIS_PASSWORD config.yaml key redis.password api, worker, redis
honeydue-secretsOBS_INGEST_TOKEN deploy/prod.env api, worker
honeydue-apns-keyapns_auth_key.p8 deploy-k3s/secrets/apns_auth_key.p8 api, worker
cloudflare-origin-cert deploy-k3s/secrets/cloudflare-origin.{crt,key} Traefik ingress
ghcr-credentials config.yaml block registry.* image pulls (all pods)
admin-basic-auth config.yaml keys admin.basic_auth_user / ..._password Traefik admin-auth middleware

The deploy-k3s/secrets/ directory and config.yaml are gitignored — never commit them.


Standard rotation procedure

cd honeyDueAPI-go
export KUBECONFIG="$(pwd)/deploy-k3s/kubeconfig"

# 1. Update the source (file under deploy-k3s/secrets/ or a config.yaml key)
# 2. Recreate the Kubernetes Secrets from sources
./deploy-k3s/scripts/02-setup-secrets.sh

# 3. Restart the consumers (see per-secret notes below for which)
kubectl -n honeydue rollout restart deploy/api deploy/worker

# 4. Confirm health
kubectl -n honeydue rollout status deploy/api
kubectl -n honeydue rollout status deploy/worker

# 5. Revoke the OLD credential at its provider (see per-secret notes)
# 6. Annotate the rotated secret with today's date

Per-secret notes

POSTGRES_PASSWORD

  1. Rotate the role password in the Neon dashboard.
  2. Write the new value to deploy-k3s/secrets/postgres_password.txt.
  3. 02-setup-secrets.sh, then rollout restart deploy/api deploy/worker.
  4. Watch logs for connection errors; the old password stops working the moment Neon applies the change, so do steps 23 promptly.

SECRET_KEY ⚠️ user-visible

This signs auth tokens. Rotating it logs every user out — all existing tokens become invalid and every client must re-authenticate.

  1. Generate: openssl rand -hex 32.
  2. Write to deploy-k3s/secrets/secret_key.txt (must be ≥32 chars — the script enforces this; the app refuses to start in production without it).
  3. 02-setup-secrets.sh, then rollout restart deploy/api deploy/worker.
  • Only rotate on a schedule or on suspected compromise — not casually.
  • A future improvement (overlap window via a key-id header) would let old tokens validate during the transition; not implemented today.

EMAIL_HOST_PASSWORD

  1. Generate a new app password in Fastmail; keep the old one alive briefly.
  2. Write to deploy-k3s/secrets/email_host_password.txt.
  3. 02-setup-secrets.sh, rollout restart deploy/api deploy/worker.
  4. Delete the old Fastmail app password.

FCM_SERVER_KEY

  1. Rotate the key in the Firebase console.
  2. Write to deploy-k3s/secrets/fcm_server_key.txt.
  3. 02-setup-secrets.sh, rollout restart deploy/api deploy/worker.

REDIS_PASSWORD

Source is config.yaml key redis.password (hex only — it is embedded in the REDIS_URL, so non-hex characters would break URL parsing).

  1. Generate: openssl rand -hex 32.
  2. Set redis.password in config.yaml.
  3. 02-setup-secrets.sh.
  4. Restart redis as well as api/worker so the new --requirepass and the new REDIS_URL land together: kubectl -n honeydue rollout restart deploy/redis deploy/api deploy/worker. Expect a few seconds where api/worker reconnect.

apns_auth_key.p8

  1. Revoke the key in the Apple Developer console, generate a new .p8.
  2. Replace deploy-k3s/secrets/apns_auth_key.p8.
  3. 02-setup-secrets.sh, rollout restart deploy/api deploy/worker.
  4. If the Key ID changed, update push.apns_key_id in config.yaml too.

cloudflare-origin-cert

  1. Generate a new Origin CA certificate in the Cloudflare dashboard.
  2. Replace deploy-k3s/secrets/cloudflare-origin.crt and .key.
  3. 02-setup-secrets.sh. Traefik picks up the new TLS secret; no app restart needed. Verify the served cert with openssl s_client.

ghcr-credentials (Gitea registry)

  1. Generate a new PAT in Gitea (scope: read:packages).
  2. Update the registry.token value in config.yaml.
  3. 02-setup-secrets.sh. No restart needed unless a pull is pending.
  4. Revoke the old PAT in Gitea.

admin-basic-auth

Source is config.yaml keys admin.basic_auth_user / basic_auth_password.

  1. Set a new password (e.g. openssl rand -hex 24).
  2. 02-setup-secrets.sh regenerates the bcrypt htpasswd secret.
  3. No app restart needed — Traefik reloads the admin-auth middleware.
  4. Distribute the new credential to whoever uses the admin panel.

After any rotation

  • Run ./deploy-k3s/scripts/04-verify.sh and confirm no lines.
  • Annotate the rotated secret (see "Record keeping" above).
  • If the rotation was due to a compromise, also follow the relevant playbook in deploy-k3s/SECURITY.md → Appendix (Incident response).