Files
honeyDueAPI/docs/server_2026_2_24.md
Trey T bec880886b Coverage priorities 1-5: test pure functions, extract interfaces, mock-based handler tests
- Priority 1: Test NewSendEmailTask + NewSendPushTask (5 tests)
- Priority 2: Test customHTTPErrorHandler — all 15+ branches (21 tests)
- Priority 3: Extract Enqueuer interface + payload builders in worker pkg (5 tests)
- Priority 4: Extract ClassifyFile/ComputeRelPath in migrate-encrypt (6 tests)
- Priority 5: Define Handler interfaces, refactor to accept them, mock-based tests (14 tests)
- Fix .gitignore: /worker instead of worker to stop ignoring internal/worker/

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-01 20:30:09 -05:00

303 lines
11 KiB
Markdown

# Casera Infrastructure Plan — February 2026
## Architecture Overview
```
┌─────────────┐
│ Cloudflare │
│ (CDN/DNS) │
└──────┬──────┘
│ HTTPS
┌──────┴──────┐
│ Hetzner LB │
│ ($5.99) │
└──────┬──────┘
┌────────────────┼────────────────┐
│ │ │
┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐
│ CX33 #1 │ │ CX33 #2 │ │ CX33 #3 │
│ (manager) │ │ (manager) │ │ (manager) │
│ │ │ │ │ │
│ api (x2) │ │ api (x2) │ │ api (x1) │
│ admin │ │ worker │ │ worker │
│ redis │ │ dozzle │ │ │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
│ Docker Swarm Overlay (IPsec) │
└────────────────┼────────────────┘
┌────────────┼────────────────┐
│ │
┌──────┴──────┐ ┌───────┴──────┐
│ Neon │ │ Backblaze │
│ (Postgres) │ │ B2 │
│ Launch │ │ (media) │
└─────────────┘ └──────────────┘
```
## Swarm Nodes — Hetzner CX33
All 3 nodes are manager+worker (Raft consensus requires 3 managers for fault tolerance — 1 node can go down and the cluster stays operational).
| Spec | Value |
|------|-------|
| Plan | CX33 (Shared Regular Performance) |
| vCPU | 4 |
| RAM | 8 GB |
| Disk | 80 GB SSD |
| Traffic | 20 TB/mo included |
| Price | $6.59/mo per node |
| Region | Pick closest to users (US: Ashburn or Hillsboro, EU: Nuremberg/Falkenstein/Helsinki) |
**Why CX33 over CX23:** 8 GB RAM gives headroom for Redis, multiple API replicas, and the admin panel without pressure. The $2.50/mo difference per node isn't worth optimizing away.
### Container Distribution
| Container | Replicas | Notes |
|-----------|----------|-------|
| api | 3-6 | Spread across all nodes by Swarm |
| worker | 2-3 | Asynq workers pull jobs from Redis concurrently |
| admin | 1 | Next.js admin panel |
| redis | 1 | Pinned to one node with its volume |
| dozzle | 1 | Pinned to a manager node (needs Docker socket) |
### Scaling Path
- Need more capacity? Add another CX33 with `docker swarm join`. Swarm rebalances automatically.
- Need more API throughput? Bump replicas in the compose file. No infra change.
- Only infrastructure addition needed at scale: the Hetzner Load Balancer ($5.99/mo).
## Load Balancer — Hetzner LB
| Spec | Value |
|------|-------|
| Price | $5.99/mo |
| Purpose | Distribute traffic across Swarm nodes, TLS termination |
| When to add | When you need redundant ingress (not required day 1 if using Cloudflare to proxy to a single node) |
## Database — Neon Postgres (Launch Plan)
| Spec | Value |
|------|-------|
| Plan | Launch (usage-based, no monthly minimum) |
| Compute | $0.106/CU-hr, up to 16 CU (64 GB RAM) |
| Storage | $0.35/GB-month |
| Connections | Up to 10,000 via built-in PgBouncer |
| Typical cost | ~$5-15/mo for light load, ~$20-40/mo at 100k users |
| Free tier | Available for dev/staging (100 CU-hrs/mo, 0.5 GB) |
### Connection Pooling
Neon includes built-in PgBouncer on all plans. Enable by adding `-pooler` to the hostname:
```
# Direct connection
ep-cool-darkness-123456.us-east-2.aws.neon.tech
# Pooled connection (use this in production)
ep-cool-darkness-123456-pooler.us-east-2.aws.neon.tech
```
Runs in transaction mode — compatible with GORM out of the box.
### Configuration
```env
DB_HOST=ep-xxxxx-pooler.us-east-2.aws.neon.tech
DB_PORT=5432
DB_SSLMODE=require
POSTGRES_USER=<from neon dashboard>
POSTGRES_PASSWORD=<from neon dashboard>
POSTGRES_DB=casera
```
## Object Storage — Backblaze B2
| Spec | Value |
|------|-------|
| Storage | $6/TB/mo ($0.006/GB) |
| Egress | $0.01/GB (first 3x stored amount is free) |
| Free tier | 10 GB storage always free |
| API calls | Class A free, Class B/C free first 2,500/day |
| Spending cap | Built-in data caps with alerts at 75% and 100% |
### Bucket Setup
| Bucket | Visibility | Key Permissions | Contents |
|--------|------------|-----------------|----------|
| `casera-uploads` | Private | Read/Write (API containers) | User-uploaded photos, documents |
| `casera-certs` | Private | Read-only (API + worker) | APNs push certificates |
Serve files through the API using signed URLs — never expose buckets publicly.
### Why B2 Over Others
- **Spending cap**: only S3-compatible provider with built-in hard caps and alerts. No surprise bills.
- **Cheapest storage**: $6/TB vs Cloudflare R2 at $15/TB vs Tigris at $20/TB.
- **Free egress partner CDNs**: Cloudflare, Fastly, bunny.net — zero egress when behind Cloudflare.
## CDN — Cloudflare (Free Tier)
| Spec | Value |
|------|-------|
| Price | $0 |
| Purpose | DNS, CDN caching, DDoS protection, TLS termination |
| Setup | Point DNS to Cloudflare, proxy traffic to Hetzner LB (or directly to a Swarm node) |
Add this on day 1. No reason not to.
## Logging — Dozzle
| Spec | Value |
|------|-------|
| Price | $0 (open source) |
| Port | 9999 (internal only — do not expose publicly) |
| Features | Real-time log viewer, webhook support for alerts |
Runs as a container in the Swarm. Needs Docker socket access, so it's pinned to a manager node.
For 100k+ users, consider adding Prometheus + Grafana (self-hosted, free) or Betterstack (~$10/mo) for metrics and alerting beyond log viewing.
## Security
### Swarm Node Firewall (Hetzner Cloud Firewall — free)
| Port | Protocol | Source | Purpose |
|------|----------|--------|---------|
| Custom (e.g. 2222) | TCP | Your IP only | SSH |
| 80, 443 | TCP | Anywhere | Public traffic |
| 2377 | TCP | Swarm nodes only | Cluster management |
| 7946 | TCP/UDP | Swarm nodes only | Node discovery |
| 4789 | UDP | Swarm nodes only | Overlay network (VXLAN) |
| Everything else | — | — | Blocked |
Set up once in Hetzner dashboard, apply to all 3 nodes.
### SSH Hardening
```
# /etc/ssh/sshd_config
Port 2222 # Non-default port
PermitRootLogin no # No root SSH
PasswordAuthentication no # Key-only auth
PubkeyAuthentication yes
AllowUsers deploy # Only your deploy user
```
### Swarm ↔ Neon (Postgres)
| Layer | Method |
|-------|--------|
| Encryption | TLS enforced by Neon (`DB_SSLMODE=require`) |
| Authentication | Strong password stored as Docker secret |
| Access control | IP allowlist in Neon dashboard — restrict to 3 Swarm node IPs |
### Swarm ↔ B2 (Object Storage)
| Layer | Method |
|-------|--------|
| Encryption | HTTPS always (enforced by B2 API) |
| Authentication | Scoped application keys (not master key) |
| Access control | Per-bucket key permissions (read-only where possible) |
### Swarm Internal
| Layer | Method |
|-------|--------|
| Overlay encryption | `driver_opts: encrypted: "true"` on overlay network (IPsec between nodes) |
| Secrets | Use `docker secret create` for DB password, SECRET_KEY, B2 keys, APNs keys. Mounted at `/run/secrets/`, encrypted in Swarm raft log. |
| Container isolation | Non-root users in all containers (already configured in Dockerfile) |
### Docker Secrets Migration
Current setup uses environment variables for secrets. Migrate to Docker secrets for production:
```bash
# Create secrets
echo "your-db-password" | docker secret create postgres_password -
echo "your-secret-key" | docker secret create secret_key -
echo "your-b2-app-key" | docker secret create b2_app_key -
# Reference in compose file
services:
api:
secrets:
- postgres_password
- secret_key
secrets:
postgres_password:
external: true
secret_key:
external: true
```
Application code reads from `/run/secrets/<name>` instead of env vars.
## Redis (In-Cluster)
Redis stays inside the Swarm — no need to externalize.
| Purpose | Details |
|---------|---------|
| Asynq job queue | Background jobs: push notifications, digests, reminders, onboarding emails |
| Static data cache | Cached lookup tables with ETag support |
| Resource usage | ~20-50 MB RAM, negligible CPU |
At 100k users, Redis handles job queuing for nightly digests (100k enqueue + dequeue operations) without issue. A single Redis instance handles millions of operations per second.
Asynq coordinates multiple worker replicas automatically — each job is dequeued atomically by exactly one worker, no double-processing.
## Performance Estimates
| Metric | Value |
|--------|-------|
| Single CX33 API throughput | ~1,000-2,000 req/s (blended, with Neon latency) |
| 3-node cluster throughput | ~3,000-6,000 req/s |
| Avg requests per user per day | ~50 |
| Estimated user capacity (3 nodes) | ~200k-500k registered users |
| Bottleneck at scale | Neon compute tier, not Go or Swarm |
These are napkin estimates. Load test before launch.
## Monthly Cost Summary
### Starting Out
| Component | Provider | Cost |
|-----------|----------|------|
| 3x Swarm nodes | Hetzner CX33 | $19.77/mo |
| Postgres | Neon Launch | ~$5-15/mo |
| Object storage | Backblaze B2 | <$1/mo |
| CDN | Cloudflare Free | $0 |
| Logging | Dozzle (self-hosted) | $0 |
| **Total** | | **~$25-35/mo** |
### At Scale (100k users)
| Component | Provider | Cost |
|-----------|----------|------|
| 3x Swarm nodes | Hetzner CX33 | $19.77/mo |
| Load balancer | Hetzner LB | $5.99/mo |
| Postgres | Neon Launch | ~$20-40/mo |
| Object storage | Backblaze B2 | ~$1-3/mo |
| CDN | Cloudflare Free | $0 |
| Monitoring | Betterstack or self-hosted | ~$0-10/mo |
| **Total** | | **~$47-79/mo** |
## TODO
- [ ] Set up 3x Hetzner CX33 instances
- [ ] Initialize Docker Swarm (`docker swarm init` on first node, `docker swarm join` on others)
- [ ] Configure Hetzner Cloud Firewall
- [ ] Harden SSH on all nodes
- [ ] Create Neon project (Launch plan), configure IP allowlist
- [ ] Create Backblaze B2 buckets with scoped application keys
- [ ] Set up Cloudflare DNS proxying
- [ ] Update prod compose file: remove `db` service, add overlay encryption, add Docker secrets
- [ ] Add B2 SDK integration for file uploads (code change)
- [ ] Update config to read from `/run/secrets/` for Docker secrets
- [ ] Set B2 spending cap and alerts
- [ ] Load test the deployed stack
- [ ] Add Hetzner LB when needed