Files
Trey t 77cfcc0b27
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled
docs: rewrite ch15 observability + cross-refs for the live obs stack
ch15 is now an account of what's actually running, not a roadmap for
what we'd add: VictoriaMetrics + Jaeger + Grafana on 88oakappsUpdate
fronted by Cloudflare and bearer-gated nginx, vmagent in-cluster, the
internal/prom histogram set, the rollout's NetworkPolicy footprint,
the obs.88oakapps.com endpoint shape, the ~$0/700MB resource budget,
and a token-rotation runbook. The "what we still don't have" section
keeps log aggregation, alerting, and full distributed tracing as the
honest gap list.

Other touched docs:
- 00-overview: \"deliberately absent\" no longer claims we have no
  metrics — calls out the cross-cluster shape instead.
- 14-deployment-process: TL;DR now points at deploy-k3s/scripts/03-deploy.sh
  (full build + push + apply + obs vmagent), with the manual
  kubectl-set-image flow kept as the single-service path. Notes the
  IfNotPresent gotcha that bit us during the rollout.
- 16-failure-modes: adds vmagent-can't-reach-obs and Grafana-no-data.
- 18-cost: $0 line item for the obs stack on 88oakappsUpdate, with the
  CX32 migration trigger.
- 17/18 README + appendix b: link the new ch15, add the obs cheat
  sheet block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 15:05:06 -05:00

259 lines
7.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 18 — Cost
## Summary
Current monthly infrastructure cost is ~$30-40. External SaaS (Fastmail,
Apple Developer, Google Play) adds ~$8-17/mo depending on push-enable
status. This chapter itemizes every line, projects costs at scale
(10k, 100k, 1M users), and shows what dials to turn when we need to
save or spend.
## Current monthly cost
### Compute (Hetzner)
| Item | Unit cost | Count | Monthly |
|---|---:|---|---:|
| CX33 (4 vCPU, 8 GB RAM, 80 GB SSD) | $7.99 | 3 | **$23.97** |
| Traffic | $0 (20 TB/mo included per node, well below) | — | $0 |
| Hetzner Cloud Firewall | $0 | — | $0 |
| IPv4 public address | $0 (included) | 3 | $0 |
| **Subtotal** | | | **$23.97** |
### Database (Neon)
Neon Launch plan: $0.106/CU-hour + $0.35/GB-month storage, $5 minimum.
At current usage (low traffic, small schema):
- ~10 CU-hours/month × $0.106 ≈ $1
- ~1 GB storage × $0.35 ≈ $0.35
- Hits the $5 minimum
| Item | Monthly |
|---|---:|
| Neon Launch ($5 min + usage) | **~$5** |
### Object storage (Backblaze B2)
At current usage (~50 GB stored):
| Item | Monthly |
|---|---:|
| Storage ($0.006/GB × 50 GB) | $0.30 |
| Egress (effectively $0 — mostly served through CF) | $0 |
| **Subtotal** | **~$0.30** |
### Edge (Cloudflare)
| Item | Monthly |
|---|---:|
| Cloudflare Free plan (DNS, TLS, CDN, basic DDoS) | **$0** |
### Registry (Gitea)
Self-hosted on the operator's existing Gitea VPS. Not charged to
honeyDue.
| Item | Monthly |
|---|---:|
| Gitea container registry | **$0** |
### Observability (88oakappsUpdate)
VictoriaMetrics + Jaeger + Grafana co-tenant on the existing Linode
VPS that hosts PostHog. ~700 MB RAM, 21 GB disk — fits inside the
existing instance. Not charged to honeyDue.
| Item | Monthly |
|---|---:|
| Self-hosted obs stack on `88oakappsUpdate` | **$0** |
Migration trigger: when the obs stack starts pressuring PostHog or
needs hard isolation, move to a dedicated Hetzner CX32 (~$8/mo).
See [Chapter 15 — When to move off](./15-observability.md).
### Total infrastructure
| Category | Monthly |
|---|---:|
| Compute | $23.97 |
| Database | ~$5 |
| Storage | ~$0.30 |
| Edge | $0 |
| Registry | $0 |
| Observability | $0 |
| **Total** | **~$30** |
## External SaaS
Things not part of the deploy but required for the product:
| Item | Cost | Notes |
|---|---:|---|
| Fastmail (SMTP for transactional email) | Part of operator's existing plan | — |
| Apple Developer Program | $99/year = $8.25/mo | Required for iOS app + APNs |
| Google Play Developer | $25 one-time + $0/mo ongoing | — |
| Hetzner Cloud Firewall | $0 | Free; we use UFW instead |
At push-enabled state, total monthly run rate is **~$38-42**.
## Hidden / untracked costs
- **Operator time**: The biggest cost for a bootstrapped project.
Treating ops time at $100/hr, a 4-hour incident = $400.
- **Electricity for operator workstation during builds**: trivial.
- **Domain registration (myhoneydue.com)**: ~$12/year = $1/mo.
## Cost drivers
### 1. Compute (scales with traffic)
If api gets >70% CPU utilization, HPA will scale from 3 to 6 replicas.
Memory at 3 replicas × 512Mi limit = 1.5 GB; nodes have 8 GB each.
Plenty of room before needing more nodes.
Tipping points:
- >6 api replicas needed sustainedly = bigger CX43 (8 vCPU, 16 GB,
~$16/mo each) or more CX33s
- Heavy worker throughput = need Asynq PeriodicTaskManager (code
change, not infra)
### 2. Database (scales with query volume + data)
Neon Launch: pay per CU-hour of compute. If idle time ≫ active time,
we stay near $5 min. If the app is busy, CU-hours grow.
Tipping points:
- Consistently >$30/mo at Launch → evaluate Neon Scale plan
- DB storage >50 GB → $15+/mo just for storage
- Active query load → consider read replicas (paid feature)
### 3. Storage (scales with user uploads)
B2 at $0.006/GB is cheap. 1 TB = $6/mo.
Tipping points:
- >5 TB stored = consider R2 (free egress) if egress becomes a factor
- Very high egress = evaluate moving B2 behind CF Workers
### 4. Edge
Cloudflare Free is generous. We move to Pro ($20/mo) if:
- We need custom WAF rules beyond 5
- We need Image Resizing for user uploads
- We need custom Page Rules beyond 3
## Projections
### 10,000 daily active users
Assume 50 API requests per user per day = 500k req/day = ~6 req/s avg.
Peaks maybe 3-5× = ~25 req/s.
Bottleneck: probably Neon free-tier CU-hours. At 25 req/s with DB calls,
we'd burn through CU-hours fast. Neon bill: $15-30/mo.
Compute: 3 CX33s still handle this comfortably.
| Category | Projected monthly |
|---|---:|
| Compute | $24 |
| Neon | ~$20 |
| Storage | ~$2 |
| Cloudflare | $0 |
| **Total** | **~$46** |
### 100,000 daily active users
500k req/s peaks = multi-node api scaling. HPA kicks in.
| Category | Projected monthly |
|---|---:|
| Compute (3x CX33) | $24 |
| Plus Hetzner LB | $8.49 |
| Neon Scale (pay-as-you-go, higher baseline) | $40-60 |
| B2 (200 GB stored, some egress) | $2 |
| Cloudflare Pro | $20 |
| **Total** | **~$95-115** |
At this scale, operator time becomes the bigger cost. Adding paid
monitoring (Betterstack ~$15/mo) and uptime (Betterstack Uptime $5/mo)
becomes reasonable.
### 1,000,000 daily active users
Bigger question. We'd be re-evaluating:
- More Hetzner nodes or bigger instances
- Neon at scale vs. self-hosted Postgres
- Maybe Cloudflare Workers to offload traffic
Ballpark: $300-500/mo. At this scale, the company has revenue to
justify an ops hire, and this chapter's assumptions break down.
## Dials to save money
### Immediate (reduce $)
| Lever | Savings | Trade-off |
|---|---|---|
| Switch 3 CX33 → 3 Netcup VPS1000G11 | ~$4/mo | Less polished provider, slightly worse UX |
| Disable Neon Launch, use Supabase free tier | ~$5/mo | Supabase free tier limits |
| 2 nodes instead of 3 | ~$8/mo | Lose HA, two-node Raft is worse than one |
| 1 CX23 (2 vCPU, 4 GB) for admin + worker; 2 CX33 for api | ~$5/mo | Complexity; node roles |
None of these are compelling. Current cost is in the "don't optimize"
zone.
### Dials to spend when it becomes worth it
| Spend | Return |
|---|---|
| Upgrade Neon to Scale ($20+) | More CU-hours, connection count room |
| Add Hetzner LB ($8.49) | Real active health checks, sub-second failover |
| Add monitoring (Betterstack $15) | Proactive detection of issues |
| Add uptime monitoring ($5) | Alerts when site is down |
| CF Pro ($20) | Better WAF, Image Resizing |
| CF Load Balancing ($5) | Multi-region failover, active checks on origins |
Cumulatively **~$70/mo** takes us to a fully-monitored, fully-alerted,
multi-region-failing-over setup. At 100k users, worth it.
## Historical spend
**April 2026 MTD**: ~$35 (Hetzner + Neon prorated).
**April 2026 (projected)**: $30-40.
**March 2026**: Pre-launch; no user traffic yet. Just node rentals.
~$25.
## Hetzner April 2026 price adjustment
CX33 went from ~$6.59 → $7.99/mo on 2026-04-01. Our monthly compute
cost rose by $4.20 overnight. This is on our budget radar but isn't a
forcing function to switch providers.
If Hetzner keeps raising prices (which they've historically resisted;
the 2026 adjustment was their first in several years), reconsider.
## Budget alerts
- **B2**: hard-capped via B2 console at $20/mo. If we breach, something
is wrong and B2 rejects further writes.
- **Neon**: soft limits via Neon alerts. Set threshold at $20 to get
email when approaching.
- **Hetzner**: no variable cost at our scale, no alerts needed.
- **Cloudflare**: Free plan has hard quotas; no surprise bills possible.
## References
- [Hetzner Cloud pricing][hetzner-cloud]
- [Neon pricing][neon-pricing]
- [Backblaze B2 pricing][b2-pricing]
- [Cloudflare Free plan][cf-free]
[hetzner-cloud]: https://www.hetzner.com/cloud/
[neon-pricing]: https://neon.com/pricing
[b2-pricing]: https://www.backblaze.com/cloud-storage/pricing
[cf-free]: https://www.cloudflare.com/plans/free/