6f303dbbaa
Infrastructure:
- Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers)
- Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh
- All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept
temporarily for reference
Bug fixes surfaced during migration:
- Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25)
- cache_service.go: remove sync.Once reassignment from inside Do()
callback (was causing 'unlock of unlocked mutex' fatal after
Redis Ping failure)
- router.go: relax CSP from 'default-src none' to 'default-src self'
+ allowlist fonts.googleapis.com so the marketing landing page CSS
actually loads in browsers
- deploy/scripts/deploy_prod.sh: use docker buildx with
--platform linux/amd64 so arm64 (Apple Silicon) dev machines produce
images runnable on x86_64 Hetzner nodes; fix array expansion under
set -u
- deploy/swarm-stack.prod.yml: fix secret source references to use
top-level aliases (the '\${X_SECRET}' form never actually resolved);
dozzle ports: long-form host_ip is rejected by Swarm, switched to
short-form (bound to 0.0.0.0 with UFW-based loopback restriction);
worker replicas 2 -> 1 (Asynq scheduler singleton)
- deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/'
(Next.js serves at root; /admin/ returned 404 and killed pods);
startupProbe failureThreshold 12 -> 24
- deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable
1 -> 0 (singleton)
- deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold
12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot;
real startup takes up to 240s)
- .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/
and admin/src/app/api/*, hiding legitimate files)
New files:
- deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet +
hostNetwork override for k3s-bundled Traefik
- deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress
without TLS (CF Flexible SSL) and without middleware
- deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log
Documentation:
- docs/deployment/ — full deployment book, 26 files, ~42k words:
- Part I Overview, infrastructure, orchestrator choice (Ch 0-2)
- Part II Networking, firewall, Cloudflare (Ch 3-4, 13)
- Part III Security, Traefik ingress (Ch 5-6)
- Part IV Services, DB, storage, secrets, registry (Ch 7-11)
- Part V Data flow, deploy process, observability, failures, runbook
(Ch 12, 14-17)
- Part VI Cost, Swarm postmortem, roadmap (Ch 18-20)
- Appendices: glossary, kubectl cheat sheet, file locations,
consolidated citations
- README.md: Production Deployment section replaced with pointer to
the book; Go version bumped to 1.25
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
244 lines
7.2 KiB
Markdown
244 lines
7.2 KiB
Markdown
# 18 — Cost
|
||
|
||
## Summary
|
||
|
||
Current monthly infrastructure cost is ~$30-40. External SaaS (Fastmail,
|
||
Apple Developer, Google Play) adds ~$8-17/mo depending on push-enable
|
||
status. This chapter itemizes every line, projects costs at scale
|
||
(10k, 100k, 1M users), and shows what dials to turn when we need to
|
||
save or spend.
|
||
|
||
## Current monthly cost
|
||
|
||
### Compute (Hetzner)
|
||
|
||
| Item | Unit cost | Count | Monthly |
|
||
|---|---:|---|---:|
|
||
| CX33 (4 vCPU, 8 GB RAM, 80 GB SSD) | $7.99 | 3 | **$23.97** |
|
||
| Traffic | $0 (20 TB/mo included per node, well below) | — | $0 |
|
||
| Hetzner Cloud Firewall | $0 | — | $0 |
|
||
| IPv4 public address | $0 (included) | 3 | $0 |
|
||
| **Subtotal** | | | **$23.97** |
|
||
|
||
### Database (Neon)
|
||
|
||
Neon Launch plan: $0.106/CU-hour + $0.35/GB-month storage, $5 minimum.
|
||
|
||
At current usage (low traffic, small schema):
|
||
- ~10 CU-hours/month × $0.106 ≈ $1
|
||
- ~1 GB storage × $0.35 ≈ $0.35
|
||
- Hits the $5 minimum
|
||
|
||
| Item | Monthly |
|
||
|---|---:|
|
||
| Neon Launch ($5 min + usage) | **~$5** |
|
||
|
||
### Object storage (Backblaze B2)
|
||
|
||
At current usage (~50 GB stored):
|
||
|
||
| Item | Monthly |
|
||
|---|---:|
|
||
| Storage ($0.006/GB × 50 GB) | $0.30 |
|
||
| Egress (effectively $0 — mostly served through CF) | $0 |
|
||
| **Subtotal** | **~$0.30** |
|
||
|
||
### Edge (Cloudflare)
|
||
|
||
| Item | Monthly |
|
||
|---|---:|
|
||
| Cloudflare Free plan (DNS, TLS, CDN, basic DDoS) | **$0** |
|
||
|
||
### Registry (Gitea)
|
||
|
||
Self-hosted on the operator's existing Gitea VPS. Not charged to
|
||
honeyDue.
|
||
|
||
| Item | Monthly |
|
||
|---|---:|
|
||
| Gitea container registry | **$0** |
|
||
|
||
### Total infrastructure
|
||
|
||
| Category | Monthly |
|
||
|---|---:|
|
||
| Compute | $23.97 |
|
||
| Database | ~$5 |
|
||
| Storage | ~$0.30 |
|
||
| Edge | $0 |
|
||
| Registry | $0 |
|
||
| **Total** | **~$30** |
|
||
|
||
## External SaaS
|
||
|
||
Things not part of the deploy but required for the product:
|
||
|
||
| Item | Cost | Notes |
|
||
|---|---:|---|
|
||
| Fastmail (SMTP for transactional email) | Part of operator's existing plan | — |
|
||
| Apple Developer Program | $99/year = $8.25/mo | Required for iOS app + APNs |
|
||
| Google Play Developer | $25 one-time + $0/mo ongoing | — |
|
||
| Hetzner Cloud Firewall | $0 | Free; we use UFW instead |
|
||
|
||
At push-enabled state, total monthly run rate is **~$38-42**.
|
||
|
||
## Hidden / untracked costs
|
||
|
||
- **Operator time**: The biggest cost for a bootstrapped project.
|
||
Treating ops time at $100/hr, a 4-hour incident = $400.
|
||
- **Electricity for operator workstation during builds**: trivial.
|
||
- **Domain registration (myhoneydue.com)**: ~$12/year = $1/mo.
|
||
|
||
## Cost drivers
|
||
|
||
### 1. Compute (scales with traffic)
|
||
|
||
If api gets >70% CPU utilization, HPA will scale from 3 to 6 replicas.
|
||
Memory at 3 replicas × 512Mi limit = 1.5 GB; nodes have 8 GB each.
|
||
Plenty of room before needing more nodes.
|
||
|
||
Tipping points:
|
||
- >6 api replicas needed sustainedly = bigger CX43 (8 vCPU, 16 GB,
|
||
~$16/mo each) or more CX33s
|
||
- Heavy worker throughput = need Asynq PeriodicTaskManager (code
|
||
change, not infra)
|
||
|
||
### 2. Database (scales with query volume + data)
|
||
|
||
Neon Launch: pay per CU-hour of compute. If idle time ≫ active time,
|
||
we stay near $5 min. If the app is busy, CU-hours grow.
|
||
|
||
Tipping points:
|
||
- Consistently >$30/mo at Launch → evaluate Neon Scale plan
|
||
- DB storage >50 GB → $15+/mo just for storage
|
||
- Active query load → consider read replicas (paid feature)
|
||
|
||
### 3. Storage (scales with user uploads)
|
||
|
||
B2 at $0.006/GB is cheap. 1 TB = $6/mo.
|
||
|
||
Tipping points:
|
||
- >5 TB stored = consider R2 (free egress) if egress becomes a factor
|
||
- Very high egress = evaluate moving B2 behind CF Workers
|
||
|
||
### 4. Edge
|
||
|
||
Cloudflare Free is generous. We move to Pro ($20/mo) if:
|
||
- We need custom WAF rules beyond 5
|
||
- We need Image Resizing for user uploads
|
||
- We need custom Page Rules beyond 3
|
||
|
||
## Projections
|
||
|
||
### 10,000 daily active users
|
||
|
||
Assume 50 API requests per user per day = 500k req/day = ~6 req/s avg.
|
||
Peaks maybe 3-5× = ~25 req/s.
|
||
|
||
Bottleneck: probably Neon free-tier CU-hours. At 25 req/s with DB calls,
|
||
we'd burn through CU-hours fast. Neon bill: $15-30/mo.
|
||
|
||
Compute: 3 CX33s still handle this comfortably.
|
||
|
||
| Category | Projected monthly |
|
||
|---|---:|
|
||
| Compute | $24 |
|
||
| Neon | ~$20 |
|
||
| Storage | ~$2 |
|
||
| Cloudflare | $0 |
|
||
| **Total** | **~$46** |
|
||
|
||
### 100,000 daily active users
|
||
|
||
500k req/s peaks = multi-node api scaling. HPA kicks in.
|
||
|
||
| Category | Projected monthly |
|
||
|---|---:|
|
||
| Compute (3x CX33) | $24 |
|
||
| Plus Hetzner LB | $8.49 |
|
||
| Neon Scale (pay-as-you-go, higher baseline) | $40-60 |
|
||
| B2 (200 GB stored, some egress) | $2 |
|
||
| Cloudflare Pro | $20 |
|
||
| **Total** | **~$95-115** |
|
||
|
||
At this scale, operator time becomes the bigger cost. Adding paid
|
||
monitoring (Betterstack ~$15/mo) and uptime (Betterstack Uptime $5/mo)
|
||
becomes reasonable.
|
||
|
||
### 1,000,000 daily active users
|
||
|
||
Bigger question. We'd be re-evaluating:
|
||
- More Hetzner nodes or bigger instances
|
||
- Neon at scale vs. self-hosted Postgres
|
||
- Maybe Cloudflare Workers to offload traffic
|
||
|
||
Ballpark: $300-500/mo. At this scale, the company has revenue to
|
||
justify an ops hire, and this chapter's assumptions break down.
|
||
|
||
## Dials to save money
|
||
|
||
### Immediate (reduce $)
|
||
|
||
| Lever | Savings | Trade-off |
|
||
|---|---|---|
|
||
| Switch 3 CX33 → 3 Netcup VPS1000G11 | ~$4/mo | Less polished provider, slightly worse UX |
|
||
| Disable Neon Launch, use Supabase free tier | ~$5/mo | Supabase free tier limits |
|
||
| 2 nodes instead of 3 | ~$8/mo | Lose HA, two-node Raft is worse than one |
|
||
| 1 CX23 (2 vCPU, 4 GB) for admin + worker; 2 CX33 for api | ~$5/mo | Complexity; node roles |
|
||
|
||
None of these are compelling. Current cost is in the "don't optimize"
|
||
zone.
|
||
|
||
### Dials to spend when it becomes worth it
|
||
|
||
| Spend | Return |
|
||
|---|---|
|
||
| Upgrade Neon to Scale ($20+) | More CU-hours, connection count room |
|
||
| Add Hetzner LB ($8.49) | Real active health checks, sub-second failover |
|
||
| Add monitoring (Betterstack $15) | Proactive detection of issues |
|
||
| Add uptime monitoring ($5) | Alerts when site is down |
|
||
| CF Pro ($20) | Better WAF, Image Resizing |
|
||
| CF Load Balancing ($5) | Multi-region failover, active checks on origins |
|
||
|
||
Cumulatively **~$70/mo** takes us to a fully-monitored, fully-alerted,
|
||
multi-region-failing-over setup. At 100k users, worth it.
|
||
|
||
## Historical spend
|
||
|
||
**April 2026 MTD**: ~$35 (Hetzner + Neon prorated).
|
||
|
||
**April 2026 (projected)**: $30-40.
|
||
|
||
**March 2026**: Pre-launch; no user traffic yet. Just node rentals.
|
||
~$25.
|
||
|
||
## Hetzner April 2026 price adjustment
|
||
|
||
CX33 went from ~$6.59 → $7.99/mo on 2026-04-01. Our monthly compute
|
||
cost rose by $4.20 overnight. This is on our budget radar but isn't a
|
||
forcing function to switch providers.
|
||
|
||
If Hetzner keeps raising prices (which they've historically resisted;
|
||
the 2026 adjustment was their first in several years), reconsider.
|
||
|
||
## Budget alerts
|
||
|
||
- **B2**: hard-capped via B2 console at $20/mo. If we breach, something
|
||
is wrong and B2 rejects further writes.
|
||
- **Neon**: soft limits via Neon alerts. Set threshold at $20 to get
|
||
email when approaching.
|
||
- **Hetzner**: no variable cost at our scale, no alerts needed.
|
||
- **Cloudflare**: Free plan has hard quotas; no surprise bills possible.
|
||
|
||
## References
|
||
|
||
- [Hetzner Cloud pricing][hetzner-cloud]
|
||
- [Neon pricing][neon-pricing]
|
||
- [Backblaze B2 pricing][b2-pricing]
|
||
- [Cloudflare Free plan][cf-free]
|
||
|
||
[hetzner-cloud]: https://www.hetzner.com/cloud/
|
||
[neon-pricing]: https://neon.com/pricing
|
||
[b2-pricing]: https://www.backblaze.com/cloud-storage/pricing
|
||
[cf-free]: https://www.cloudflare.com/plans/free/
|