Migrate prod deploy from Swarm to K3s; add full deployment book
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled

Infrastructure:
- Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers)
- Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh
- All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept
  temporarily for reference

Bug fixes surfaced during migration:
- Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25)
- cache_service.go: remove sync.Once reassignment from inside Do()
  callback (was causing 'unlock of unlocked mutex' fatal after
  Redis Ping failure)
- router.go: relax CSP from 'default-src none' to 'default-src self'
  + allowlist fonts.googleapis.com so the marketing landing page CSS
  actually loads in browsers
- deploy/scripts/deploy_prod.sh: use docker buildx with
  --platform linux/amd64 so arm64 (Apple Silicon) dev machines produce
  images runnable on x86_64 Hetzner nodes; fix array expansion under
  set -u
- deploy/swarm-stack.prod.yml: fix secret source references to use
  top-level aliases (the '\${X_SECRET}' form never actually resolved);
  dozzle ports: long-form host_ip is rejected by Swarm, switched to
  short-form (bound to 0.0.0.0 with UFW-based loopback restriction);
  worker replicas 2 -> 1 (Asynq scheduler singleton)
- deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/'
  (Next.js serves at root; /admin/ returned 404 and killed pods);
  startupProbe failureThreshold 12 -> 24
- deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable
  1 -> 0 (singleton)
- deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold
  12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot;
  real startup takes up to 240s)
- .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/
  and admin/src/app/api/*, hiding legitimate files)

New files:
- deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet +
  hostNetwork override for k3s-bundled Traefik
- deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress
  without TLS (CF Flexible SSL) and without middleware
- deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log

Documentation:
- docs/deployment/ — full deployment book, 26 files, ~42k words:
  - Part I Overview, infrastructure, orchestrator choice (Ch 0-2)
  - Part II Networking, firewall, Cloudflare (Ch 3-4, 13)
  - Part III Security, Traefik ingress (Ch 5-6)
  - Part IV Services, DB, storage, secrets, registry (Ch 7-11)
  - Part V Data flow, deploy process, observability, failures, runbook
    (Ch 12, 14-17)
  - Part VI Cost, Swarm postmortem, roadmap (Ch 18-20)
  - Appendices: glossary, kubectl cheat sheet, file locations,
    consolidated citations
- README.md: Production Deployment section replaced with pointer to
  the book; Go version bumped to 1.25

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-04-24 07:20:21 -05:00
parent 4ec4bbbfe8
commit 6f303dbbaa
46 changed files with 9785 additions and 93 deletions
+243
View File
@@ -0,0 +1,243 @@
# 18 — Cost
## Summary
Current monthly infrastructure cost is ~$30-40. External SaaS (Fastmail,
Apple Developer, Google Play) adds ~$8-17/mo depending on push-enable
status. This chapter itemizes every line, projects costs at scale
(10k, 100k, 1M users), and shows what dials to turn when we need to
save or spend.
## Current monthly cost
### Compute (Hetzner)
| Item | Unit cost | Count | Monthly |
|---|---:|---|---:|
| CX33 (4 vCPU, 8 GB RAM, 80 GB SSD) | $7.99 | 3 | **$23.97** |
| Traffic | $0 (20 TB/mo included per node, well below) | — | $0 |
| Hetzner Cloud Firewall | $0 | — | $0 |
| IPv4 public address | $0 (included) | 3 | $0 |
| **Subtotal** | | | **$23.97** |
### Database (Neon)
Neon Launch plan: $0.106/CU-hour + $0.35/GB-month storage, $5 minimum.
At current usage (low traffic, small schema):
- ~10 CU-hours/month × $0.106 ≈ $1
- ~1 GB storage × $0.35 ≈ $0.35
- Hits the $5 minimum
| Item | Monthly |
|---|---:|
| Neon Launch ($5 min + usage) | **~$5** |
### Object storage (Backblaze B2)
At current usage (~50 GB stored):
| Item | Monthly |
|---|---:|
| Storage ($0.006/GB × 50 GB) | $0.30 |
| Egress (effectively $0 — mostly served through CF) | $0 |
| **Subtotal** | **~$0.30** |
### Edge (Cloudflare)
| Item | Monthly |
|---|---:|
| Cloudflare Free plan (DNS, TLS, CDN, basic DDoS) | **$0** |
### Registry (Gitea)
Self-hosted on the operator's existing Gitea VPS. Not charged to
honeyDue.
| Item | Monthly |
|---|---:|
| Gitea container registry | **$0** |
### Total infrastructure
| Category | Monthly |
|---|---:|
| Compute | $23.97 |
| Database | ~$5 |
| Storage | ~$0.30 |
| Edge | $0 |
| Registry | $0 |
| **Total** | **~$30** |
## External SaaS
Things not part of the deploy but required for the product:
| Item | Cost | Notes |
|---|---:|---|
| Fastmail (SMTP for transactional email) | Part of operator's existing plan | — |
| Apple Developer Program | $99/year = $8.25/mo | Required for iOS app + APNs |
| Google Play Developer | $25 one-time + $0/mo ongoing | — |
| Hetzner Cloud Firewall | $0 | Free; we use UFW instead |
At push-enabled state, total monthly run rate is **~$38-42**.
## Hidden / untracked costs
- **Operator time**: The biggest cost for a bootstrapped project.
Treating ops time at $100/hr, a 4-hour incident = $400.
- **Electricity for operator workstation during builds**: trivial.
- **Domain registration (myhoneydue.com)**: ~$12/year = $1/mo.
## Cost drivers
### 1. Compute (scales with traffic)
If api gets >70% CPU utilization, HPA will scale from 3 to 6 replicas.
Memory at 3 replicas × 512Mi limit = 1.5 GB; nodes have 8 GB each.
Plenty of room before needing more nodes.
Tipping points:
- >6 api replicas needed sustainedly = bigger CX43 (8 vCPU, 16 GB,
~$16/mo each) or more CX33s
- Heavy worker throughput = need Asynq PeriodicTaskManager (code
change, not infra)
### 2. Database (scales with query volume + data)
Neon Launch: pay per CU-hour of compute. If idle time ≫ active time,
we stay near $5 min. If the app is busy, CU-hours grow.
Tipping points:
- Consistently >$30/mo at Launch → evaluate Neon Scale plan
- DB storage >50 GB → $15+/mo just for storage
- Active query load → consider read replicas (paid feature)
### 3. Storage (scales with user uploads)
B2 at $0.006/GB is cheap. 1 TB = $6/mo.
Tipping points:
- >5 TB stored = consider R2 (free egress) if egress becomes a factor
- Very high egress = evaluate moving B2 behind CF Workers
### 4. Edge
Cloudflare Free is generous. We move to Pro ($20/mo) if:
- We need custom WAF rules beyond 5
- We need Image Resizing for user uploads
- We need custom Page Rules beyond 3
## Projections
### 10,000 daily active users
Assume 50 API requests per user per day = 500k req/day = ~6 req/s avg.
Peaks maybe 3-5× = ~25 req/s.
Bottleneck: probably Neon free-tier CU-hours. At 25 req/s with DB calls,
we'd burn through CU-hours fast. Neon bill: $15-30/mo.
Compute: 3 CX33s still handle this comfortably.
| Category | Projected monthly |
|---|---:|
| Compute | $24 |
| Neon | ~$20 |
| Storage | ~$2 |
| Cloudflare | $0 |
| **Total** | **~$46** |
### 100,000 daily active users
500k req/s peaks = multi-node api scaling. HPA kicks in.
| Category | Projected monthly |
|---|---:|
| Compute (3x CX33) | $24 |
| Plus Hetzner LB | $8.49 |
| Neon Scale (pay-as-you-go, higher baseline) | $40-60 |
| B2 (200 GB stored, some egress) | $2 |
| Cloudflare Pro | $20 |
| **Total** | **~$95-115** |
At this scale, operator time becomes the bigger cost. Adding paid
monitoring (Betterstack ~$15/mo) and uptime (Betterstack Uptime $5/mo)
becomes reasonable.
### 1,000,000 daily active users
Bigger question. We'd be re-evaluating:
- More Hetzner nodes or bigger instances
- Neon at scale vs. self-hosted Postgres
- Maybe Cloudflare Workers to offload traffic
Ballpark: $300-500/mo. At this scale, the company has revenue to
justify an ops hire, and this chapter's assumptions break down.
## Dials to save money
### Immediate (reduce $)
| Lever | Savings | Trade-off |
|---|---|---|
| Switch 3 CX33 → 3 Netcup VPS1000G11 | ~$4/mo | Less polished provider, slightly worse UX |
| Disable Neon Launch, use Supabase free tier | ~$5/mo | Supabase free tier limits |
| 2 nodes instead of 3 | ~$8/mo | Lose HA, two-node Raft is worse than one |
| 1 CX23 (2 vCPU, 4 GB) for admin + worker; 2 CX33 for api | ~$5/mo | Complexity; node roles |
None of these are compelling. Current cost is in the "don't optimize"
zone.
### Dials to spend when it becomes worth it
| Spend | Return |
|---|---|
| Upgrade Neon to Scale ($20+) | More CU-hours, connection count room |
| Add Hetzner LB ($8.49) | Real active health checks, sub-second failover |
| Add monitoring (Betterstack $15) | Proactive detection of issues |
| Add uptime monitoring ($5) | Alerts when site is down |
| CF Pro ($20) | Better WAF, Image Resizing |
| CF Load Balancing ($5) | Multi-region failover, active checks on origins |
Cumulatively **~$70/mo** takes us to a fully-monitored, fully-alerted,
multi-region-failing-over setup. At 100k users, worth it.
## Historical spend
**April 2026 MTD**: ~$35 (Hetzner + Neon prorated).
**April 2026 (projected)**: $30-40.
**March 2026**: Pre-launch; no user traffic yet. Just node rentals.
~$25.
## Hetzner April 2026 price adjustment
CX33 went from ~$6.59 → $7.99/mo on 2026-04-01. Our monthly compute
cost rose by $4.20 overnight. This is on our budget radar but isn't a
forcing function to switch providers.
If Hetzner keeps raising prices (which they've historically resisted;
the 2026 adjustment was their first in several years), reconsider.
## Budget alerts
- **B2**: hard-capped via B2 console at $20/mo. If we breach, something
is wrong and B2 rejects further writes.
- **Neon**: soft limits via Neon alerts. Set threshold at $20 to get
email when approaching.
- **Hetzner**: no variable cost at our scale, no alerts needed.
- **Cloudflare**: Free plan has hard quotas; no surprise bills possible.
## References
- [Hetzner Cloud pricing][hetzner-cloud]
- [Neon pricing][neon-pricing]
- [Backblaze B2 pricing][b2-pricing]
- [Cloudflare Free plan][cf-free]
[hetzner-cloud]: https://www.hetzner.com/cloud/
[neon-pricing]: https://neon.com/pricing
[b2-pricing]: https://www.backblaze.com/cloud-storage/pricing
[cf-free]: https://www.cloudflare.com/plans/free/