# 01 — Infrastructure ## Summary Three Hetzner Cloud CX33 virtual machines in the Nuremberg (nbg1) datacenter form the compute foundation. Each is a 4 vCPU / 8 GB RAM / 80 GB NVMe SSD instance on Hetzner's shared-CPU "Cloud" line. Total compute cost is $23.97/mo. This chapter explains each node spec in detail, why we picked Hetzner and this tier specifically, and the rejected alternatives. ## Node specifications All three nodes are identical. Specs per node: | Spec | Value | |---|---| | Provider | Hetzner Cloud (`www.hetzner.com/cloud`) | | Instance type | CX33 (shared-CPU line) | | vCPU | 4 | | RAM | 8 GB | | Disk | 80 GB NVMe SSD | | Network | 20 TB/mo outbound included | | IPv4 address | Public dedicated | | IPv6 address | /64 subnet | | Region | `nbg1` (Nuremberg, Germany) | | OS | Ubuntu 24.04.3 LTS (HWE kernel 6.8.0-90-generic) | | Price | **$7.99/mo** (April 2026) ⁽¹⁾ | ⁽¹⁾ Hetzner applied a price adjustment on 2026-04-01 — CX33 went from ~$6.59 to $7.99. See [Hetzner price adjustment announcement][hetzner-prices]. ### The three nodes | SSH alias | Public IPv4 | IPv6 | k3s hostname | |---|---|---|---| | `hetzner1` | 178.104.247.152 | `2a01:4f8:1c18:79c7::1` | `ubuntu-8gb-nbg1-2` | | `hetzner2` | 178.105.32.198 | `2a01:4f8:1c18:5ecf::1` | `ubuntu-8gb-nbg1-1` | | `hetzner3` | 178.104.249.189 | `2a01:4f8:1c18:241a::1` | `ubuntu-8gb-nbg1-3` | **Naming quirk.** The SSH-alias numbers and the Hetzner-assigned hostname numbers do not match (`hetzner1` is `nbg1-2`, `hetzner2` is `nbg1-1`). This is because the Hetzner hostnames are assigned in server-creation order; the SSH aliases were set up later in the order we wanted to refer to them. We chose not to rename the hosts — renaming `hostname` on a Kubernetes node after it joins the cluster causes problems (node certificates, etcd identity, etc. tie to the hostname). Living with the quirk is easier than rebuilding. See the mapping table in [the README](./README.md). ## Why Hetzner ### Decision matrix Compared at the time of purchase (~2026-04-23): | Provider | Instance | vCPU / RAM / SSD | Price/mo | Traffic/mo | |---|---|---|---:|---| | **Hetzner** | **CX33** | **4 / 8 GB / 80 GB** | **$7.99** | **20 TB** | | DigitalOcean | General-purpose | 2 / 8 GB / 25 GB | $63 | 4 TB | | DigitalOcean | Basic | 4 / 8 GB / 160 GB | $48 | 5 TB | | Vultr | High Perf | 4 / 8 GB / 180 GB | $48 | 5 TB | | Linode (Akamai) | Shared | 4 / 8 GB / 160 GB | $48 | 5 TB | | OVHcloud | VPS 2026 4vC | 4 / 8 GB / 75 GB | ~$13 | unlimited | | Contabo | Cloud VPS 2 | 4 / 8 GB / 200 GB | $8 | 32 TB | | Netcup | VPS 1000 G11 | 4 / 8 GB / 256 GB | ~$6 | unlimited | | Oracle Always Free | ARM Ampere | up to 4 / 24 GB / 200 GB | $0 | 10 TB | *availability lottery* | **Why Hetzner won:** 1. **Price/performance at this tier is best-in-class among mainstream hosts.** Similar specs at DigitalOcean/Vultr/Linode cost 6× as much. You're paying the "American managed cloud" premium there for UX polish we don't need. 2. **Dedicated IPv4 + /64 IPv6 + 20 TB traffic included.** No overage anxiety at this scale; 20 TB is multiple months of anticipated traffic for a bootstrapped app. 3. **European datacenter, GDPR-native.** honeyDue serves users in multiple regions; if EU users dominate, Nuremberg is fast. US users pay about +100 ms over a US-East host, which is well within Cloudflare-cached tolerances for most app traffic. 4. **Mature API + `hcloud` CLI** for automation if we ever need it. 5. **Hetzner Cloud Firewall is free** and rule-for-rule equivalent to AWS Security Groups / DO Cloud Firewall. We use UFW on the nodes instead (Chapter 4) because our rule set evolved ad-hoc and moving it to the provider's firewall is a small cleanup project. **Why not the cheaper options:** - **Netcup** is ~$1/mo cheaper per node with more disk, but its API is barebones, the account/billing UX is more fiddly, and their network routing in the US (where the operator is based) has more hops than Hetzner's. - **Contabo** is the cheapest, but the company has a reputation for oversubscribed nodes. For a production service, unpredictable CPU steal and disk I/O variance is not worth saving $0/node. Contabo is fine for non-critical workloads; it's a poor fit for prod. - **Oracle Cloud Always Free** is genuinely free (4 ARM cores + 24 GB RAM) but: - Requires ARM64 builds (we build on ARM but would need to not need cross-compile — see Chapter 11 for why amd64 matters) - Capacity for free accounts is a lottery; instance creation fails "out of capacity" more often than it succeeds - Oracle has reclaimed idle free-tier instances in the past ### Why not the premium options DigitalOcean, Vultr, and Linode are excellent products with better UX than Hetzner. They were rejected because at honeyDue's current scale the 3–6× price multiplier doesn't buy anything we'd use: - We don't need managed databases, object storage, or load balancers from the same provider — those are Neon, Backblaze, and Cloudflare - We don't need their monitoring dashboards — Cloudflare Analytics + `kubectl top` + future Prometheus cover it - The UI polish matters mostly for day-1 setup; ongoing operations are `kubectl` and `ssh` When honeyDue has enough revenue that an engineer's time is worth more than $40/mo, we'd consider moving for the better tooling. Not yet. ## Why Nuremberg (`nbg1`) Hetzner has datacenters in Nuremberg (nbg1), Falkenstein (fsn1), Helsinki (hel1), Ashburn (ash), and Hillsboro (hil). Nuremberg was picked because: - The operator's primary user base is expected to be mixed US/EU - Within the EU, Nuremberg is the most central from a peering perspective (well-connected to DE-CIX, Europe's largest internet exchange) - Falkenstein is Hetzner's main datacenter and tends to have longer provisioning queues during capacity crunches; Nuremberg is smaller and more available For a US-only userbase, Ashburn (ash) or Hillsboro (hil) would be better picks — US users would see ~20 ms instead of ~120 ms. Cloudflare's edge caches most assets, so the origin location matters mostly for first-request / uncached / POST traffic. ## Why three nodes **Raft quorum and fault tolerance.** K3s in HA mode uses Raft consensus (via embedded etcd) for cluster state. Raft requires a majority of nodes to agree on every write. Quorum formulas: | Total managers | Quorum | Max failures tolerated | |---|---|---| | 1 | 1 | 0 | | 2 | 2 | 0 | | 3 | 2 | 1 | | 4 | 3 | 1 | | 5 | 3 | 2 | Three is the smallest odd number that tolerates a failure, and three is where price/resilience is sweetest. Five nodes doesn't help until you need to tolerate *two* simultaneous failures — a scale concern that doesn't apply at our traffic volume. Two nodes is worse than one: you still have single-failure intolerance (one down = no quorum), but you've doubled your cost and failure surface. Avoid even-node clusters for consensus systems. ## Node hardening Each node was bootstrapped with: 1. **Docker installed** from `download.docker.com` using the stable repo (this was the original Swarm setup; still installed but disabled — k3s bundles its own containerd). 2. **`deploy` user created** with: - Home directory - Bash as login shell - Member of `docker` group (historical, when Swarm was the orchestrator) - Member of `sudo` group with `NOPASSWD: ALL` in `/etc/sudoers.d/deploy` 3. **SSH key installed** at `/home/deploy/.ssh/authorized_keys` - The key is the public half of `~/.ssh/hetzner` on the operator workstation (`ssh-ed25519`, 256 bits) 4. **`/opt/honeydue/deploy`** directory created, owned by `deploy` (originally for Swarm deploy bundle drop zone; unused now) 5. **Sysctl** `net.ipv4.ip_unprivileged_port_start=0` persisted to `/etc/sysctl.d/99-unprivileged-ports.conf`. Required so Traefik (running as UID 65532) can bind `:80` and `:443` in the host network namespace. The full bootstrap script is at `/tmp/honeydue_bootstrap.sh` on the operator workstation (used during the initial Swarm setup — see [Chapter 19](./19-postmortem-swarm.md) for context). ## Cost breakdown ``` 3 × Hetzner CX33 $23.97/mo Hetzner network traffic $0 (20 TB/mo included per node, nowhere near it) Neon Postgres (Launch) $5-15/mo (usage-based, ~$5 min) Backblaze B2 <$1/mo (tiny upload volume currently) Cloudflare Free $0 Gitea (self-hosted) $0 (the operator's existing Gitea) ───────────────────────────────── Total infra ~$30-40/mo ``` See [Chapter 18 — Cost](./18-cost.md) for a full breakdown including external SaaS (Fastmail, Apple Developer, etc.) and at-scale projections. ## Provisioning workflow Nodes were provisioned manually through Hetzner Cloud Console. This is fine for a three-node cluster; for larger clusters we'd switch to the [`hetzner-k3s`][hetzner-k3s] Ruby tool that the `deploy-k3s/` scaffold expects. The manual steps were: 1. Create project in Hetzner Cloud Console. 2. Upload SSH key (`hetzner.pub`). 3. Create 3× CX33 servers in `nbg1` with Ubuntu 24.04. 4. SSH in as `root`, run bootstrap to create `deploy` user and install Docker / later k3s. 5. Apply Hetzner Cloud Firewall rules at the network edge *optional* (we use UFW per Chapter 4 instead). A future greenfield deployment would run `deploy-k3s/scripts/01-provision-cluster.sh`, which does all of this in one shot via the `hetzner-k3s` CLI. ## Upgrade / replacement plan **Node failure.** If a node becomes unreachable, the other two retain Raft quorum and the cluster continues accepting writes. Pods from the failed node get rescheduled to the survivors (so long as the survivors have spare capacity — see Chapter 16). To replace the dead node: 1. Delete it from the cluster: `kubectl delete node ` 2. Create a replacement CX33 in Hetzner console 3. Install k3s on it with `--server=https://:6443` 4. Verify `kubectl get nodes` shows it as Ready **Scaling up.** To add a fourth node, same procedure without deleting anything. Consider whether you want it as a server (adds to Raft quorum; must also add up to an odd total) or an agent (worker-only). K3s agents join with `INSTALL_K3S_EXEC=agent` instead of `server`. **Upgrading K3s.** K3s has a minor release every ~3 months. Upgrade by running the install script with the new version on each node, one at a time, verifying cluster health between each. See [Chapter 17](./17-runbook.md) for the detailed procedure. **Upgrading the OS.** Ubuntu 24.04 LTS is supported until 2029. `unattended-upgrades` is *not* currently installed, so OS patches require manual `apt upgrade`. Install `unattended-upgrades` when time permits — security patches are important and automation reduces the risk of falling behind. ## Physical location & regulatory - **Sovereignty**: Hetzner is headquartered in Gunzenhausen, Germany. All data at rest in `nbg1` is subject to German law and the GDPR. - **User data**: Most user data actually lives in **Neon Postgres (AWS us-east-1, Virginia)** and **Backblaze B2 (us-east-005, South Carolina)** — both US-hosted. EU users' data therefore *exits* the EU in the API path. If strict EU data residency is ever a requirement, Neon has a EU region (Frankfurt) and Backblaze has EU endpoints; switching is a configuration change, not an architectural one. - **Encryption at rest**: Hetzner encrypts node-local disks at the hypervisor layer. Neon encrypts at the AWS EBS layer. B2 encrypts objects server-side. None of our application code or config holds secrets at rest that aren't already in Kubernetes Secrets (which are stored in etcd; etcd on disk is unencrypted by default in k3s but see Chapter 5 for hardening). ## Operator cheat sheet ```bash # SSH to any node ssh -i ~/.ssh/hetzner deploy@hetzner1 # Check node health kubectl get nodes -o wide # Per-node resource usage kubectl top nodes # See what's on each node kubectl get pods -A -o wide | sort -k 8 # Hetzner console (in browser) # https://console.hetzner.cloud/ ``` ## References - [Hetzner Cloud product page][hetzner-cloud] - [Hetzner price adjustment April 2026][hetzner-prices] - [hetzner-k3s tool][hetzner-k3s] - [K3s architecture docs][k3s-arch] [hetzner-cloud]: https://www.hetzner.com/cloud/ [hetzner-prices]: https://docs.hetzner.com/general/infrastructure-and-availability/price-adjustment/ [hetzner-k3s]: https://github.com/vitobotta/hetzner-k3s [k3s-arch]: https://docs.k3s.io/architecture