# Casera Infrastructure Plan — February 2026 ## Architecture Overview ``` ┌─────────────┐ │ Cloudflare │ │ (CDN/DNS) │ └──────┬──────┘ │ HTTPS ┌──────┴──────┐ │ Hetzner LB │ │ ($5.99) │ └──────┬──────┘ │ ┌────────────────┼────────────────┐ │ │ │ ┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐ │ CX33 #1 │ │ CX33 #2 │ │ CX33 #3 │ │ (manager) │ │ (manager) │ │ (manager) │ │ │ │ │ │ │ │ api (x2) │ │ api (x2) │ │ api (x1) │ │ admin │ │ worker │ │ worker │ │ redis │ │ dozzle │ │ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ │ Docker Swarm Overlay (IPsec) │ └────────────────┼────────────────┘ │ ┌────────────┼────────────────┐ │ │ ┌──────┴──────┐ ┌───────┴──────┐ │ Neon │ │ Backblaze │ │ (Postgres) │ │ B2 │ │ Launch │ │ (media) │ └─────────────┘ └──────────────┘ ``` ## Swarm Nodes — Hetzner CX33 All 3 nodes are manager+worker (Raft consensus requires 3 managers for fault tolerance — 1 node can go down and the cluster stays operational). | Spec | Value | |------|-------| | Plan | CX33 (Shared Regular Performance) | | vCPU | 4 | | RAM | 8 GB | | Disk | 80 GB SSD | | Traffic | 20 TB/mo included | | Price | $6.59/mo per node | | Region | Pick closest to users (US: Ashburn or Hillsboro, EU: Nuremberg/Falkenstein/Helsinki) | **Why CX33 over CX23:** 8 GB RAM gives headroom for Redis, multiple API replicas, and the admin panel without pressure. The $2.50/mo difference per node isn't worth optimizing away. ### Container Distribution | Container | Replicas | Notes | |-----------|----------|-------| | api | 3-6 | Spread across all nodes by Swarm | | worker | 2-3 | Asynq workers pull jobs from Redis concurrently | | admin | 1 | Next.js admin panel | | redis | 1 | Pinned to one node with its volume | | dozzle | 1 | Pinned to a manager node (needs Docker socket) | ### Scaling Path - Need more capacity? Add another CX33 with `docker swarm join`. Swarm rebalances automatically. - Need more API throughput? Bump replicas in the compose file. No infra change. - Only infrastructure addition needed at scale: the Hetzner Load Balancer ($5.99/mo). ## Load Balancer — Hetzner LB | Spec | Value | |------|-------| | Price | $5.99/mo | | Purpose | Distribute traffic across Swarm nodes, TLS termination | | When to add | When you need redundant ingress (not required day 1 if using Cloudflare to proxy to a single node) | ## Database — Neon Postgres (Launch Plan) | Spec | Value | |------|-------| | Plan | Launch (usage-based, no monthly minimum) | | Compute | $0.106/CU-hr, up to 16 CU (64 GB RAM) | | Storage | $0.35/GB-month | | Connections | Up to 10,000 via built-in PgBouncer | | Typical cost | ~$5-15/mo for light load, ~$20-40/mo at 100k users | | Free tier | Available for dev/staging (100 CU-hrs/mo, 0.5 GB) | ### Connection Pooling Neon includes built-in PgBouncer on all plans. Enable by adding `-pooler` to the hostname: ``` # Direct connection ep-cool-darkness-123456.us-east-2.aws.neon.tech # Pooled connection (use this in production) ep-cool-darkness-123456-pooler.us-east-2.aws.neon.tech ``` Runs in transaction mode — compatible with GORM out of the box. ### Configuration ```env DB_HOST=ep-xxxxx-pooler.us-east-2.aws.neon.tech DB_PORT=5432 DB_SSLMODE=require POSTGRES_USER= POSTGRES_PASSWORD= POSTGRES_DB=casera ``` ## Object Storage — Backblaze B2 | Spec | Value | |------|-------| | Storage | $6/TB/mo ($0.006/GB) | | Egress | $0.01/GB (first 3x stored amount is free) | | Free tier | 10 GB storage always free | | API calls | Class A free, Class B/C free first 2,500/day | | Spending cap | Built-in data caps with alerts at 75% and 100% | ### Bucket Setup | Bucket | Visibility | Key Permissions | Contents | |--------|------------|-----------------|----------| | `casera-uploads` | Private | Read/Write (API containers) | User-uploaded photos, documents | | `casera-certs` | Private | Read-only (API + worker) | APNs push certificates | Serve files through the API using signed URLs — never expose buckets publicly. ### Why B2 Over Others - **Spending cap**: only S3-compatible provider with built-in hard caps and alerts. No surprise bills. - **Cheapest storage**: $6/TB vs Cloudflare R2 at $15/TB vs Tigris at $20/TB. - **Free egress partner CDNs**: Cloudflare, Fastly, bunny.net — zero egress when behind Cloudflare. ## CDN — Cloudflare (Free Tier) | Spec | Value | |------|-------| | Price | $0 | | Purpose | DNS, CDN caching, DDoS protection, TLS termination | | Setup | Point DNS to Cloudflare, proxy traffic to Hetzner LB (or directly to a Swarm node) | Add this on day 1. No reason not to. ## Logging — Dozzle | Spec | Value | |------|-------| | Price | $0 (open source) | | Port | 9999 (internal only — do not expose publicly) | | Features | Real-time log viewer, webhook support for alerts | Runs as a container in the Swarm. Needs Docker socket access, so it's pinned to a manager node. For 100k+ users, consider adding Prometheus + Grafana (self-hosted, free) or Betterstack (~$10/mo) for metrics and alerting beyond log viewing. ## Security ### Swarm Node Firewall (Hetzner Cloud Firewall — free) | Port | Protocol | Source | Purpose | |------|----------|--------|---------| | Custom (e.g. 2222) | TCP | Your IP only | SSH | | 80, 443 | TCP | Anywhere | Public traffic | | 2377 | TCP | Swarm nodes only | Cluster management | | 7946 | TCP/UDP | Swarm nodes only | Node discovery | | 4789 | UDP | Swarm nodes only | Overlay network (VXLAN) | | Everything else | — | — | Blocked | Set up once in Hetzner dashboard, apply to all 3 nodes. ### SSH Hardening ``` # /etc/ssh/sshd_config Port 2222 # Non-default port PermitRootLogin no # No root SSH PasswordAuthentication no # Key-only auth PubkeyAuthentication yes AllowUsers deploy # Only your deploy user ``` ### Swarm ↔ Neon (Postgres) | Layer | Method | |-------|--------| | Encryption | TLS enforced by Neon (`DB_SSLMODE=require`) | | Authentication | Strong password stored as Docker secret | | Access control | IP allowlist in Neon dashboard — restrict to 3 Swarm node IPs | ### Swarm ↔ B2 (Object Storage) | Layer | Method | |-------|--------| | Encryption | HTTPS always (enforced by B2 API) | | Authentication | Scoped application keys (not master key) | | Access control | Per-bucket key permissions (read-only where possible) | ### Swarm Internal | Layer | Method | |-------|--------| | Overlay encryption | `driver_opts: encrypted: "true"` on overlay network (IPsec between nodes) | | Secrets | Use `docker secret create` for DB password, SECRET_KEY, B2 keys, APNs keys. Mounted at `/run/secrets/`, encrypted in Swarm raft log. | | Container isolation | Non-root users in all containers (already configured in Dockerfile) | ### Docker Secrets Migration Current setup uses environment variables for secrets. Migrate to Docker secrets for production: ```bash # Create secrets echo "your-db-password" | docker secret create postgres_password - echo "your-secret-key" | docker secret create secret_key - echo "your-b2-app-key" | docker secret create b2_app_key - # Reference in compose file services: api: secrets: - postgres_password - secret_key secrets: postgres_password: external: true secret_key: external: true ``` Application code reads from `/run/secrets/` instead of env vars. ## Redis (In-Cluster) Redis stays inside the Swarm — no need to externalize. | Purpose | Details | |---------|---------| | Asynq job queue | Background jobs: push notifications, digests, reminders, onboarding emails | | Static data cache | Cached lookup tables with ETag support | | Resource usage | ~20-50 MB RAM, negligible CPU | At 100k users, Redis handles job queuing for nightly digests (100k enqueue + dequeue operations) without issue. A single Redis instance handles millions of operations per second. Asynq coordinates multiple worker replicas automatically — each job is dequeued atomically by exactly one worker, no double-processing. ## Performance Estimates | Metric | Value | |--------|-------| | Single CX33 API throughput | ~1,000-2,000 req/s (blended, with Neon latency) | | 3-node cluster throughput | ~3,000-6,000 req/s | | Avg requests per user per day | ~50 | | Estimated user capacity (3 nodes) | ~200k-500k registered users | | Bottleneck at scale | Neon compute tier, not Go or Swarm | These are napkin estimates. Load test before launch. ## Monthly Cost Summary ### Starting Out | Component | Provider | Cost | |-----------|----------|------| | 3x Swarm nodes | Hetzner CX33 | $19.77/mo | | Postgres | Neon Launch | ~$5-15/mo | | Object storage | Backblaze B2 | <$1/mo | | CDN | Cloudflare Free | $0 | | Logging | Dozzle (self-hosted) | $0 | | **Total** | | **~$25-35/mo** | ### At Scale (100k users) | Component | Provider | Cost | |-----------|----------|------| | 3x Swarm nodes | Hetzner CX33 | $19.77/mo | | Load balancer | Hetzner LB | $5.99/mo | | Postgres | Neon Launch | ~$20-40/mo | | Object storage | Backblaze B2 | ~$1-3/mo | | CDN | Cloudflare Free | $0 | | Monitoring | Betterstack or self-hosted | ~$0-10/mo | | **Total** | | **~$47-79/mo** | ## TODO - [ ] Set up 3x Hetzner CX33 instances - [ ] Initialize Docker Swarm (`docker swarm init` on first node, `docker swarm join` on others) - [ ] Configure Hetzner Cloud Firewall - [ ] Harden SSH on all nodes - [ ] Create Neon project (Launch plan), configure IP allowlist - [ ] Create Backblaze B2 buckets with scoped application keys - [ ] Set up Cloudflare DNS proxying - [ ] Update prod compose file: remove `db` service, add overlay encryption, add Docker secrets - [ ] Add B2 SDK integration for file uploads (code change) - [ ] Update config to read from `/run/secrets/` for Docker secrets - [ ] Set B2 spending cap and alerts - [ ] Load test the deployed stack - [ ] Add Hetzner LB when needed