- Priority 1: Test NewSendEmailTask + NewSendPushTask (5 tests) - Priority 2: Test customHTTPErrorHandler — all 15+ branches (21 tests) - Priority 3: Extract Enqueuer interface + payload builders in worker pkg (5 tests) - Priority 4: Extract ClassifyFile/ComputeRelPath in migrate-encrypt (6 tests) - Priority 5: Define Handler interfaces, refactor to accept them, mock-based tests (14 tests) - Fix .gitignore: /worker instead of worker to stop ignoring internal/worker/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
303 lines
11 KiB
Markdown
303 lines
11 KiB
Markdown
# Casera Infrastructure Plan — February 2026
|
|
|
|
## Architecture Overview
|
|
|
|
```
|
|
┌─────────────┐
|
|
│ Cloudflare │
|
|
│ (CDN/DNS) │
|
|
└──────┬──────┘
|
|
│ HTTPS
|
|
┌──────┴──────┐
|
|
│ Hetzner LB │
|
|
│ ($5.99) │
|
|
└──────┬──────┘
|
|
│
|
|
┌────────────────┼────────────────┐
|
|
│ │ │
|
|
┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐
|
|
│ CX33 #1 │ │ CX33 #2 │ │ CX33 #3 │
|
|
│ (manager) │ │ (manager) │ │ (manager) │
|
|
│ │ │ │ │ │
|
|
│ api (x2) │ │ api (x2) │ │ api (x1) │
|
|
│ admin │ │ worker │ │ worker │
|
|
│ redis │ │ dozzle │ │ │
|
|
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
|
|
│ │ │
|
|
│ Docker Swarm Overlay (IPsec) │
|
|
└────────────────┼────────────────┘
|
|
│
|
|
┌────────────┼────────────────┐
|
|
│ │
|
|
┌──────┴──────┐ ┌───────┴──────┐
|
|
│ Neon │ │ Backblaze │
|
|
│ (Postgres) │ │ B2 │
|
|
│ Launch │ │ (media) │
|
|
└─────────────┘ └──────────────┘
|
|
```
|
|
|
|
## Swarm Nodes — Hetzner CX33
|
|
|
|
All 3 nodes are manager+worker (Raft consensus requires 3 managers for fault tolerance — 1 node can go down and the cluster stays operational).
|
|
|
|
| Spec | Value |
|
|
|------|-------|
|
|
| Plan | CX33 (Shared Regular Performance) |
|
|
| vCPU | 4 |
|
|
| RAM | 8 GB |
|
|
| Disk | 80 GB SSD |
|
|
| Traffic | 20 TB/mo included |
|
|
| Price | $6.59/mo per node |
|
|
| Region | Pick closest to users (US: Ashburn or Hillsboro, EU: Nuremberg/Falkenstein/Helsinki) |
|
|
|
|
**Why CX33 over CX23:** 8 GB RAM gives headroom for Redis, multiple API replicas, and the admin panel without pressure. The $2.50/mo difference per node isn't worth optimizing away.
|
|
|
|
### Container Distribution
|
|
|
|
| Container | Replicas | Notes |
|
|
|-----------|----------|-------|
|
|
| api | 3-6 | Spread across all nodes by Swarm |
|
|
| worker | 2-3 | Asynq workers pull jobs from Redis concurrently |
|
|
| admin | 1 | Next.js admin panel |
|
|
| redis | 1 | Pinned to one node with its volume |
|
|
| dozzle | 1 | Pinned to a manager node (needs Docker socket) |
|
|
|
|
### Scaling Path
|
|
|
|
- Need more capacity? Add another CX33 with `docker swarm join`. Swarm rebalances automatically.
|
|
- Need more API throughput? Bump replicas in the compose file. No infra change.
|
|
- Only infrastructure addition needed at scale: the Hetzner Load Balancer ($5.99/mo).
|
|
|
|
## Load Balancer — Hetzner LB
|
|
|
|
| Spec | Value |
|
|
|------|-------|
|
|
| Price | $5.99/mo |
|
|
| Purpose | Distribute traffic across Swarm nodes, TLS termination |
|
|
| When to add | When you need redundant ingress (not required day 1 if using Cloudflare to proxy to a single node) |
|
|
|
|
## Database — Neon Postgres (Launch Plan)
|
|
|
|
| Spec | Value |
|
|
|------|-------|
|
|
| Plan | Launch (usage-based, no monthly minimum) |
|
|
| Compute | $0.106/CU-hr, up to 16 CU (64 GB RAM) |
|
|
| Storage | $0.35/GB-month |
|
|
| Connections | Up to 10,000 via built-in PgBouncer |
|
|
| Typical cost | ~$5-15/mo for light load, ~$20-40/mo at 100k users |
|
|
| Free tier | Available for dev/staging (100 CU-hrs/mo, 0.5 GB) |
|
|
|
|
### Connection Pooling
|
|
|
|
Neon includes built-in PgBouncer on all plans. Enable by adding `-pooler` to the hostname:
|
|
|
|
```
|
|
# Direct connection
|
|
ep-cool-darkness-123456.us-east-2.aws.neon.tech
|
|
|
|
# Pooled connection (use this in production)
|
|
ep-cool-darkness-123456-pooler.us-east-2.aws.neon.tech
|
|
```
|
|
|
|
Runs in transaction mode — compatible with GORM out of the box.
|
|
|
|
### Configuration
|
|
|
|
```env
|
|
DB_HOST=ep-xxxxx-pooler.us-east-2.aws.neon.tech
|
|
DB_PORT=5432
|
|
DB_SSLMODE=require
|
|
POSTGRES_USER=<from neon dashboard>
|
|
POSTGRES_PASSWORD=<from neon dashboard>
|
|
POSTGRES_DB=casera
|
|
```
|
|
|
|
## Object Storage — Backblaze B2
|
|
|
|
| Spec | Value |
|
|
|------|-------|
|
|
| Storage | $6/TB/mo ($0.006/GB) |
|
|
| Egress | $0.01/GB (first 3x stored amount is free) |
|
|
| Free tier | 10 GB storage always free |
|
|
| API calls | Class A free, Class B/C free first 2,500/day |
|
|
| Spending cap | Built-in data caps with alerts at 75% and 100% |
|
|
|
|
### Bucket Setup
|
|
|
|
| Bucket | Visibility | Key Permissions | Contents |
|
|
|--------|------------|-----------------|----------|
|
|
| `casera-uploads` | Private | Read/Write (API containers) | User-uploaded photos, documents |
|
|
| `casera-certs` | Private | Read-only (API + worker) | APNs push certificates |
|
|
|
|
Serve files through the API using signed URLs — never expose buckets publicly.
|
|
|
|
### Why B2 Over Others
|
|
|
|
- **Spending cap**: only S3-compatible provider with built-in hard caps and alerts. No surprise bills.
|
|
- **Cheapest storage**: $6/TB vs Cloudflare R2 at $15/TB vs Tigris at $20/TB.
|
|
- **Free egress partner CDNs**: Cloudflare, Fastly, bunny.net — zero egress when behind Cloudflare.
|
|
|
|
## CDN — Cloudflare (Free Tier)
|
|
|
|
| Spec | Value |
|
|
|------|-------|
|
|
| Price | $0 |
|
|
| Purpose | DNS, CDN caching, DDoS protection, TLS termination |
|
|
| Setup | Point DNS to Cloudflare, proxy traffic to Hetzner LB (or directly to a Swarm node) |
|
|
|
|
Add this on day 1. No reason not to.
|
|
|
|
## Logging — Dozzle
|
|
|
|
| Spec | Value |
|
|
|------|-------|
|
|
| Price | $0 (open source) |
|
|
| Port | 9999 (internal only — do not expose publicly) |
|
|
| Features | Real-time log viewer, webhook support for alerts |
|
|
|
|
Runs as a container in the Swarm. Needs Docker socket access, so it's pinned to a manager node.
|
|
|
|
For 100k+ users, consider adding Prometheus + Grafana (self-hosted, free) or Betterstack (~$10/mo) for metrics and alerting beyond log viewing.
|
|
|
|
## Security
|
|
|
|
### Swarm Node Firewall (Hetzner Cloud Firewall — free)
|
|
|
|
| Port | Protocol | Source | Purpose |
|
|
|------|----------|--------|---------|
|
|
| Custom (e.g. 2222) | TCP | Your IP only | SSH |
|
|
| 80, 443 | TCP | Anywhere | Public traffic |
|
|
| 2377 | TCP | Swarm nodes only | Cluster management |
|
|
| 7946 | TCP/UDP | Swarm nodes only | Node discovery |
|
|
| 4789 | UDP | Swarm nodes only | Overlay network (VXLAN) |
|
|
| Everything else | — | — | Blocked |
|
|
|
|
Set up once in Hetzner dashboard, apply to all 3 nodes.
|
|
|
|
### SSH Hardening
|
|
|
|
```
|
|
# /etc/ssh/sshd_config
|
|
Port 2222 # Non-default port
|
|
PermitRootLogin no # No root SSH
|
|
PasswordAuthentication no # Key-only auth
|
|
PubkeyAuthentication yes
|
|
AllowUsers deploy # Only your deploy user
|
|
```
|
|
|
|
### Swarm ↔ Neon (Postgres)
|
|
|
|
| Layer | Method |
|
|
|-------|--------|
|
|
| Encryption | TLS enforced by Neon (`DB_SSLMODE=require`) |
|
|
| Authentication | Strong password stored as Docker secret |
|
|
| Access control | IP allowlist in Neon dashboard — restrict to 3 Swarm node IPs |
|
|
|
|
### Swarm ↔ B2 (Object Storage)
|
|
|
|
| Layer | Method |
|
|
|-------|--------|
|
|
| Encryption | HTTPS always (enforced by B2 API) |
|
|
| Authentication | Scoped application keys (not master key) |
|
|
| Access control | Per-bucket key permissions (read-only where possible) |
|
|
|
|
### Swarm Internal
|
|
|
|
| Layer | Method |
|
|
|-------|--------|
|
|
| Overlay encryption | `driver_opts: encrypted: "true"` on overlay network (IPsec between nodes) |
|
|
| Secrets | Use `docker secret create` for DB password, SECRET_KEY, B2 keys, APNs keys. Mounted at `/run/secrets/`, encrypted in Swarm raft log. |
|
|
| Container isolation | Non-root users in all containers (already configured in Dockerfile) |
|
|
|
|
### Docker Secrets Migration
|
|
|
|
Current setup uses environment variables for secrets. Migrate to Docker secrets for production:
|
|
|
|
```bash
|
|
# Create secrets
|
|
echo "your-db-password" | docker secret create postgres_password -
|
|
echo "your-secret-key" | docker secret create secret_key -
|
|
echo "your-b2-app-key" | docker secret create b2_app_key -
|
|
|
|
# Reference in compose file
|
|
services:
|
|
api:
|
|
secrets:
|
|
- postgres_password
|
|
- secret_key
|
|
secrets:
|
|
postgres_password:
|
|
external: true
|
|
secret_key:
|
|
external: true
|
|
```
|
|
|
|
Application code reads from `/run/secrets/<name>` instead of env vars.
|
|
|
|
## Redis (In-Cluster)
|
|
|
|
Redis stays inside the Swarm — no need to externalize.
|
|
|
|
| Purpose | Details |
|
|
|---------|---------|
|
|
| Asynq job queue | Background jobs: push notifications, digests, reminders, onboarding emails |
|
|
| Static data cache | Cached lookup tables with ETag support |
|
|
| Resource usage | ~20-50 MB RAM, negligible CPU |
|
|
|
|
At 100k users, Redis handles job queuing for nightly digests (100k enqueue + dequeue operations) without issue. A single Redis instance handles millions of operations per second.
|
|
|
|
Asynq coordinates multiple worker replicas automatically — each job is dequeued atomically by exactly one worker, no double-processing.
|
|
|
|
## Performance Estimates
|
|
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| Single CX33 API throughput | ~1,000-2,000 req/s (blended, with Neon latency) |
|
|
| 3-node cluster throughput | ~3,000-6,000 req/s |
|
|
| Avg requests per user per day | ~50 |
|
|
| Estimated user capacity (3 nodes) | ~200k-500k registered users |
|
|
| Bottleneck at scale | Neon compute tier, not Go or Swarm |
|
|
|
|
These are napkin estimates. Load test before launch.
|
|
|
|
## Monthly Cost Summary
|
|
|
|
### Starting Out
|
|
|
|
| Component | Provider | Cost |
|
|
|-----------|----------|------|
|
|
| 3x Swarm nodes | Hetzner CX33 | $19.77/mo |
|
|
| Postgres | Neon Launch | ~$5-15/mo |
|
|
| Object storage | Backblaze B2 | <$1/mo |
|
|
| CDN | Cloudflare Free | $0 |
|
|
| Logging | Dozzle (self-hosted) | $0 |
|
|
| **Total** | | **~$25-35/mo** |
|
|
|
|
### At Scale (100k users)
|
|
|
|
| Component | Provider | Cost |
|
|
|-----------|----------|------|
|
|
| 3x Swarm nodes | Hetzner CX33 | $19.77/mo |
|
|
| Load balancer | Hetzner LB | $5.99/mo |
|
|
| Postgres | Neon Launch | ~$20-40/mo |
|
|
| Object storage | Backblaze B2 | ~$1-3/mo |
|
|
| CDN | Cloudflare Free | $0 |
|
|
| Monitoring | Betterstack or self-hosted | ~$0-10/mo |
|
|
| **Total** | | **~$47-79/mo** |
|
|
|
|
## TODO
|
|
|
|
- [ ] Set up 3x Hetzner CX33 instances
|
|
- [ ] Initialize Docker Swarm (`docker swarm init` on first node, `docker swarm join` on others)
|
|
- [ ] Configure Hetzner Cloud Firewall
|
|
- [ ] Harden SSH on all nodes
|
|
- [ ] Create Neon project (Launch plan), configure IP allowlist
|
|
- [ ] Create Backblaze B2 buckets with scoped application keys
|
|
- [ ] Set up Cloudflare DNS proxying
|
|
- [ ] Update prod compose file: remove `db` service, add overlay encryption, add Docker secrets
|
|
- [ ] Add B2 SDK integration for file uploads (code change)
|
|
- [ ] Update config to read from `/run/secrets/` for Docker secrets
|
|
- [ ] Set B2 spending cap and alerts
|
|
- [ ] Load test the deployed stack
|
|
- [ ] Add Hetzner LB when needed
|