e448ec66dc8659cb0b49b190c211118cefa0f217
6 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
c77ff07ce9 |
fix(security): remediate 2026-05-12 audit findings (Stages 2–5)
Remediation of the 2026-05-12/13 audits (78 findings + cluster gaps), tracked in deploy-k3s/SECURITY.md, plus fixes from two independent post-remediation reviews. Auth & sessions: - SHA-256 hashed auth-token storage (C1); prior-token cache eviction on re-login (MEDIUM-1) - local Google JWKS verification, iss/aud/exp checks (C2/C3) - constant-time login + generic errors (L1/LIVE-L11/LIVE-L13) - per-account login lockout keyed on distinct source IPs (M5/MEDIUM-3) - verified-email gating, login rate limiting (LIVE-L19, H1-H3) IAP & webhooks: - Apple/Google cross-account replay protection (C5/C6/C10/C13, H5/H6) - migrations 000003-000006 (token hashing, IAP replay, audit_log + webhook_event_log table creation, append-only audit log) Authorization & races: - file-ownership owner-OR-member fix (C7), atomic share-code join (C9/H9), device-token reassignment (C8/LOW-3) Secrets & deploy: - secrets file-mounted at /etc/honeydue/secrets, not env (F8); Redis password out of the ConfigMap (HIGH-1); B2 keys reconciled - digest-pinned images, admin ingress hardening, CSP/HSTS, /metrics lockdown; kubeconfig 0600, etcd secrets-encryption, fail2ban + unattended-upgrades at provision; secret-rotation runbook Build, vet, and the full test suite (incl. -race) pass; the goose migration chain is verified against PostgreSQL 16. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
289a23f7e6 |
deploy(ingress): drop obsolete scaffold ingress.yaml
The directory had two ingress manifests that both define
honeydue-api and honeydue-admin:
- ingress.yaml (Mar 28, scaffold from `deploy-k3s/`
greenfield template)
- ingress-simple.yaml (Apr 24, corrected for our actual cluster
shape per MIGRATION_NOTES.md)
`kubectl apply -f manifests/ingress/` applies both, and ingress.yaml
happens to apply last alphabetically (`-` < `.` so `ingress-simple`
sorts before `ingress.yaml`), clobbering the corrected manifest.
That left the live cluster with two regressions:
1. honeydue-admin had `admin-auth` Traefik middleware in its chain,
referencing the `admin-basic-auth` secret. Per MIGRATION_NOTES
basic auth is intentionally not applied on this cluster (admin
uses in-app auth), so the secret was never created. Traefik
logs `secret 'honeydue/admin-basic-auth' not found` on every
reconcile and refuses to materialize the admin router → 404.
2. honeydue-api lost the apex `myhoneydue.com` rule that
ingress-simple.yaml adds for the marketing landing page →
apex 404.
`kubectl apply -f ingress-simple.yaml` against the live cluster
restored both routes (admin/apex back to 200). Removing the stale
file from the repo prevents the next deploy from regressing.
Refs: deploy-k3s/MIGRATION_NOTES.md ("Admin basic auth | Not applied
— in-app auth only").
|
||
|
|
ace03d2340 |
Security hardening: TLS at origin, security headers, network policies, admin probe fix
Four related hardening changes made on the live cluster during this session. Each manifest captures the final working state so a fresh `kubectl apply` of the repo reproduces it. 1. Cloudflare Full (strict) TLS — ingresses now carry `tls:` blocks pointing at `cloudflare-origin-cert` secret (installed imperatively from the CF Origin CA PEM). CF SSL mode flipped from Flexible to Full (strict). CF↔origin is now HTTPS; origin serves a CF-issued cert that only CF can validate. 2. Traefik middleware attached to all three ingresses — `rate-limit` (100/min avg, 200 burst) and `security-headers` (frame-deny, nosniff, HSTS, referrer policy, permissions policy). `admin-auth` middleware was also defined in middleware.yaml but is not attached (needs an unset basic-auth secret) and was deleted at runtime. 3. `security-headers` middleware: stripped the Content-Security-Policy entry. The Go API sets its own CSP in internal/router/router.go that permits Google Fonts for the landing page. Two CSP headers combine via intersection (most restrictive wins), which would break the landing page. Next.js apps set their own CSP via middleware. Header kept documentation comments explain this. 4. NetworkPolicies — default-deny + explicit allows, applied. Added missing policies for `web`. Corrected the Traefik ingress rule: the scaffold used `namespaceSelector: kube-system`, but our Traefik runs as a DaemonSet with `hostNetwork: true`, so traffic arrives with the NODE IP as source. Fixed to an `ipBlock` list of the three node IPs plus the cluster pod CIDR (10.42.0.0/16). 5. admin livenessProbe path fix: was hitting /admin/ (404) which caused a 6-hour crashloop cycle (87 restarts) before the bug was caught. Fixed to / — matches the startupProbe and readinessProbe paths that were corrected earlier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
15359401fa |
Deploy honeyDueAPI-Web to k3s at app.myhoneydue.com
The Next.js 16 webapp in sibling repo honeyDueAPI-Web now runs alongside api/worker/admin on the cluster. Uses a server-side proxy pattern: browser hits app.myhoneydue.com, Next.js route handlers forward to the Go API with an httpOnly cookie, so no CORS entry or Allowed-Hosts change is needed on the API side. Availability mirrors api (3 replicas, PDB minAvailable:2, topologySpreadConstraints across nodes). Changes: - deploy-k3s/manifests/web/deployment.yaml: 3 replicas, readOnly root FS, drops all caps, mounts emptyDir for /app/.next/cache and /tmp, reads API_URL from honeydue-config. - deploy-k3s/manifests/web/service.yaml: ClusterIP :3000. - deploy-k3s/manifests/rbac.yaml: ServiceAccount web with automountServiceAccountToken: false. - deploy-k3s/manifests/pod-disruption-budgets.yaml: web-pdb minAvailable: 2. - deploy-k3s/manifests/ingress/ingress-simple.yaml: route app.myhoneydue.com → web:3000. - deploy-k3s/scripts/_config.sh: emit API_URL into the ConfigMap. - deploy-k3s/scripts/03-deploy.sh: build + push + apply the web image alongside api/worker/admin. Reads NEXT_PUBLIC_POSTHOG_KEY and NEXT_PUBLIC_POSTHOG_HOST from the operator shell env (not committed). Also adds the --build-arg NEXT_PUBLIC_API_URL wiring for the admin image that was previously only done manually. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
6f303dbbaa |
Migrate prod deploy from Swarm to K3s; add full deployment book
Infrastructure:
- Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers)
- Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh
- All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept
temporarily for reference
Bug fixes surfaced during migration:
- Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25)
- cache_service.go: remove sync.Once reassignment from inside Do()
callback (was causing 'unlock of unlocked mutex' fatal after
Redis Ping failure)
- router.go: relax CSP from 'default-src none' to 'default-src self'
+ allowlist fonts.googleapis.com so the marketing landing page CSS
actually loads in browsers
- deploy/scripts/deploy_prod.sh: use docker buildx with
--platform linux/amd64 so arm64 (Apple Silicon) dev machines produce
images runnable on x86_64 Hetzner nodes; fix array expansion under
set -u
- deploy/swarm-stack.prod.yml: fix secret source references to use
top-level aliases (the '\${X_SECRET}' form never actually resolved);
dozzle ports: long-form host_ip is rejected by Swarm, switched to
short-form (bound to 0.0.0.0 with UFW-based loopback restriction);
worker replicas 2 -> 1 (Asynq scheduler singleton)
- deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/'
(Next.js serves at root; /admin/ returned 404 and killed pods);
startupProbe failureThreshold 12 -> 24
- deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable
1 -> 0 (singleton)
- deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold
12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot;
real startup takes up to 240s)
- .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/
and admin/src/app/api/*, hiding legitimate files)
New files:
- deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet +
hostNetwork override for k3s-bundled Traefik
- deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress
without TLS (CF Flexible SSL) and without middleware
- deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log
Documentation:
- docs/deployment/ — full deployment book, 26 files, ~42k words:
- Part I Overview, infrastructure, orchestrator choice (Ch 0-2)
- Part II Networking, firewall, Cloudflare (Ch 3-4, 13)
- Part III Security, Traefik ingress (Ch 5-6)
- Part IV Services, DB, storage, secrets, registry (Ch 7-11)
- Part V Data flow, deploy process, observability, failures, runbook
(Ch 12, 14-17)
- Part VI Cost, Swarm postmortem, roadmap (Ch 18-20)
- Appendices: glossary, kubectl cheat sheet, file locations,
consolidated citations
- README.md: Production Deployment section replaced with pointer to
the book; Go version bumped to 1.25
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
34553f3bec |
Add K3s dev deployment setup for single-node VPS
Mirrors the prod deploy-k3s/ setup but runs all services in-cluster on a single node: PostgreSQL (replaces Neon), MinIO S3-compatible storage (replaces B2), Redis, API, worker, and admin. Includes fully automated setup scripts (00-init through 04-verify), server hardening (SSH, fail2ban, ufw), Let's Encrypt TLS via Traefik, network policies, RBAC, and security contexts matching prod. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |