Files
honeyDueAPI/docs/deployment
Trey t 12b2f9d43b
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled
Adopt pressly/goose for schema migrations
Replaces the previous hand-rolled MigrateWithLock + GORM AutoMigrate path,
which had two compounding problems:
- AutoMigrate ran on every pod startup (~5 min over the transatlantic
  link) even when no schema changes had landed
- pg_advisory_lock is session-scoped, which silently fails through
  Neon's pgbouncer transaction-mode pooler — turns out this is a
  known and documented limitation that bites golang-migrate too

Goose was chosen over golang-migrate (the other heavyweight) because:
- Goose wraps each migration file in a transaction by default, so a
  failure rolls back cleanly instead of leaving a "dirty" version
  state requiring manual force-reset (golang-migrate's known
  weakness, per its own issue tracker — see #1001 + Atlas's writeup)
- Goose's locking is opt-in. We don't opt in: migrations run as a
  single Kubernetes Job, which IS the singleton process. No advisory
  lock needed at all.

Layout:
- migrations/000001_init.sql — schema-only pg_dump of the live Neon
  DB at adoption, stripped of psql-only directives that block goose's
  bookkeeping insert. Pre-goose hand-numbered migrations 002-022 had
  their effects folded into this baseline; deleted from the live tree
  but preserved in git history at 58e6997.
- Dockerfile installs `goose v3.22.1` at build time and copies the
  binary into the api image. The migrate Job reuses the api image with
  command=goose, so no separate image to build/push/version.
- deploy-k3s/manifests/migrate/job.yaml: a one-shot Job that strips
  the -pooler segment from DB_HOST (advisory lock won't survive
  pgbouncer transaction-mode), runs `goose up`, exits.
- deploy-k3s/scripts/03-deploy.sh: deletes any prior Job, applies the
  fresh one, `kubectl wait --for=condition=complete --timeout=10m`,
  then proceeds with api/worker rollout. Job failure aborts the deploy
  before any new app pod sees a stale schema.
- internal/database/database.go::RequireSchemaApplied checks
  goose_db_version on startup. api/worker refuse to boot if the
  table is missing or its latest row has is_applied=false — the
  fail-fast for "operator forgot to run migrate."
- Makefile: migrate-up / migrate-down / migrate-status / migrate-new
  for local workflow.

Production DB was bootstrapped manually:
  $ goose -dir migrations postgres "$DSN" version  # creates table
  $ psql ... -c "INSERT INTO goose_db_version (version_id, is_applied, tstamp) VALUES (1, true, NOW());"

Smoke test against fresh Postgres locally: 50 user tables created in
284ms via `goose up`, version_id=1 + is_applied=t recorded.

Verified the local goose CLI talks to prod successfully:
  $ goose ... status
  Applied At                  Migration
  =======================================
  Mon Apr 27 03:43:55 2026 -- 000001_init.sql

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 22:46:36 -05:00
..

honeyDue Production Deployment — The Book

This is the complete reference for the honeyDue production deployment as it exists on 2026-04-24. It serves two audiences:

  1. A new engineer learning the system for the first time. Start at Chapter 0 (Overview) and read in order. Concepts are built up; nothing is assumed beyond "you've deployed web apps before."
  2. The operator (future-you) needing a specific fact fast. Every chapter opens with a one-paragraph summary and has an operator runbook at its end. The appendices are a cheat sheet.

The deployment is non-trivial. It's a 3-node HA Kubernetes cluster running a Go API, a Next.js admin panel, a background worker, Redis, and Traefik — all fronted by Cloudflare, integrated with Neon Postgres, Backblaze B2, and a self-hosted Gitea registry. This book explains why each of those pieces was chosen (often over two or three alternatives we tried first), what they do, and how to operate them.

Table of Contents

Part I — The System

Part II — Networking

Part III — Security

Part IV — Workloads

Part V — Operation

Part VI — Context

Appendices

Quick Facts

Field Value
Orchestrator K3s v1.34.6+k3s1 (3 nodes, HA control plane)
Ingress Traefik v3 (DaemonSet, hostNetwork)
Nodes 3× Hetzner Cloud CX33 (4 vCPU, 8 GB RAM, 80 GB SSD) in nbg1 (Nuremberg)
DNS & Edge Cloudflare (Free plan), SSL=Flexible, round-robin 3 node A records
Database Neon Postgres, ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech
Cache + Queue Redis 7-alpine, in-cluster, 1 replica, PVC-backed, pinned to nbg1-2
Object Storage Backblaze B2, honeyDueProd bucket, us-east-005 region
Image Registry Self-hosted Gitea v1.25.5 at gitea.treytartt.com
Transactional Email Fastmail SMTP (smtp.fastmail.com:587)
Domains api.myhoneydue.com, admin.myhoneydue.com, myhoneydue.com
Monthly Cost (current) ~$3040 (3× Hetzner + Neon Launch + B2 + Cloudflare Free + Gitea free)
kubeconfig ~/.kube/honeydue-k3s.yaml on operator workstation
Repo honeyDueAPI-go/deploy-k3s/ for manifests, deploy/ is the legacy Swarm config

How to Read This Book

  • "Why did we…?" answers are in the chapter covering that component. Every major design choice has an explicit rejection of 13 alternatives.
  • Historical bugs are in Chapter 19. The rest of the book describes the current (fixed) state; 19 is the forensic record of what was broken and how we figured it out.
  • Operator commands you'll run regularly are in Appendix B. Chapter 17 has longer procedures (cert rotation, DB migration, etc.).
  • Citations throughout use footnote-style links to the canonical source (k3s docs, moby issues, Cloudflare docs, etc.). Appendix D collects them.

Conventions

  • Kubernetes namespace for the app is honeydue.
  • SSH aliases are hetzner1, hetzner2, hetzner3 in your ~/.ssh/config.
  • Node hostnames in the cluster are ubuntu-8gb-nbg1-{1,2,3} (Hetzner-assigned).
  • The mapping is non-obvious because the Hetzner hostname suffix order does not match SSH alias order:
SSH alias Public IP Hostname in k3s
hetzner1 178.104.247.152 ubuntu-8gb-nbg1-2
hetzner2 178.105.32.198 ubuntu-8gb-nbg1-1
hetzner3 178.104.249.189 ubuntu-8gb-nbg1-3

When a chapter refers to "hetzner1" it means the box at 178.104.247.152 / nbg1-2.