Migrate prod deploy from Swarm to K3s; add full deployment book
Infrastructure:
- Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers)
- Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh
- All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept
temporarily for reference
Bug fixes surfaced during migration:
- Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25)
- cache_service.go: remove sync.Once reassignment from inside Do()
callback (was causing 'unlock of unlocked mutex' fatal after
Redis Ping failure)
- router.go: relax CSP from 'default-src none' to 'default-src self'
+ allowlist fonts.googleapis.com so the marketing landing page CSS
actually loads in browsers
- deploy/scripts/deploy_prod.sh: use docker buildx with
--platform linux/amd64 so arm64 (Apple Silicon) dev machines produce
images runnable on x86_64 Hetzner nodes; fix array expansion under
set -u
- deploy/swarm-stack.prod.yml: fix secret source references to use
top-level aliases (the '\${X_SECRET}' form never actually resolved);
dozzle ports: long-form host_ip is rejected by Swarm, switched to
short-form (bound to 0.0.0.0 with UFW-based loopback restriction);
worker replicas 2 -> 1 (Asynq scheduler singleton)
- deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/'
(Next.js serves at root; /admin/ returned 404 and killed pods);
startupProbe failureThreshold 12 -> 24
- deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable
1 -> 0 (singleton)
- deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold
12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot;
real startup takes up to 240s)
- .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/
and admin/src/app/api/*, hiding legitimate files)
New files:
- deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet +
hostNetwork override for k3s-bundled Traefik
- deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress
without TLS (CF Flexible SSL) and without middleware
- deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log
Documentation:
- docs/deployment/ — full deployment book, 26 files, ~42k words:
- Part I Overview, infrastructure, orchestrator choice (Ch 0-2)
- Part II Networking, firewall, Cloudflare (Ch 3-4, 13)
- Part III Security, Traefik ingress (Ch 5-6)
- Part IV Services, DB, storage, secrets, registry (Ch 7-11)
- Part V Data flow, deploy process, observability, failures, runbook
(Ch 12, 14-17)
- Part VI Cost, Swarm postmortem, roadmap (Ch 18-20)
- Appendices: glossary, kubectl cheat sheet, file locations,
consolidated citations
- README.md: Production Deployment section replaced with pointer to
the book; Go version bumped to 1.25
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -4,7 +4,7 @@ Go REST API for the honeyDue property management platform. Powers iOS and Androi
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- **Language**: Go 1.24
|
||||
- **Language**: Go 1.25
|
||||
- **HTTP Framework**: [Echo v4](https://github.com/labstack/echo)
|
||||
- **ORM**: [GORM](https://gorm.io/) with PostgreSQL
|
||||
- **Background Jobs**: [Asynq](https://github.com/hibiken/asynq) (Redis-backed)
|
||||
@@ -16,7 +16,7 @@ Go REST API for the honeyDue property management platform. Powers iOS and Androi
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **Go 1.24+** — [install](https://go.dev/dl/)
|
||||
- **Go 1.25+** — [install](https://go.dev/dl/)
|
||||
- **PostgreSQL 16+** — via Docker (recommended) or [native install](https://www.postgresql.org/download/)
|
||||
- **Redis 7+** — via Docker (recommended) or [native install](https://redis.io/docs/getting-started/)
|
||||
- **Docker & Docker Compose** — [install](https://docs.docker.com/get-docker/) (recommended for local development)
|
||||
@@ -259,34 +259,43 @@ All protected endpoints require an `Authorization: Token <token>` header.
|
||||
|
||||
## Production Deployment
|
||||
|
||||
### Dokku
|
||||
Production runs on a **3-node K3s HA cluster** on Hetzner Cloud, fronted
|
||||
by Cloudflare, with Neon Postgres, Backblaze B2, and a self-hosted Gitea
|
||||
container registry. See the full deployment book for every detail:
|
||||
|
||||
```bash
|
||||
# Push to Dokku
|
||||
git push dokku main
|
||||
**→ [docs/deployment/](./docs/deployment/README.md) — The Deployment Book**
|
||||
|
||||
# Seed lookup data
|
||||
cat seeds/001_lookups.sql | dokku postgres:connect honeydue-db
|
||||
26 chapters and ~42,000 words covering:
|
||||
|
||||
# Check logs
|
||||
dokku logs honeydue-api -t
|
||||
```
|
||||
- **Part I — The System**: overview, Hetzner infrastructure, why K3s
|
||||
(and not Swarm, full Kubernetes, or Nomad)
|
||||
- **Part II — Networking**: Flannel VXLAN, CoreDNS, kube-proxy, every
|
||||
UFW rule on every node, Cloudflare DNS setup
|
||||
- **Part III — Security**: RBAC, Pod Security, secrets, TLS chain
|
||||
- **Part IV — Workloads**: api, admin, worker, redis per-service deep
|
||||
dives; Neon Postgres config; Backblaze B2 storage; Gitea registry
|
||||
- **Part V — Operation**: end-to-end data flow, deploy process,
|
||||
observability, failure modes, operator runbook
|
||||
- **Part VI — Context**: cost breakdown, postmortem of the bugs from
|
||||
the Swarm→K3s migration, roadmap
|
||||
|
||||
### Docker Swarm
|
||||
Quick links:
|
||||
|
||||
```bash
|
||||
# Build and push production images
|
||||
make docker-build-prod
|
||||
docker push ${REGISTRY}/honeydue-api:${TAG}
|
||||
docker push ${REGISTRY}/honeydue-worker:${TAG}
|
||||
docker push ${REGISTRY}/honeydue-admin:${TAG}
|
||||
- **Runbook** — [docs/deployment/17-runbook.md](./docs/deployment/17-runbook.md) — 22 common ops procedures
|
||||
- **kubectl cheat sheet** — [docs/deployment/appendices/b-commands.md](./docs/deployment/appendices/b-commands.md)
|
||||
- **Deploy process** — [docs/deployment/14-deployment-process.md](./docs/deployment/14-deployment-process.md) — build → push → rollout
|
||||
- **Failure modes** — [docs/deployment/16-failure-modes.md](./docs/deployment/16-failure-modes.md) — what happens when X dies
|
||||
- **Swarm postmortem** — [docs/deployment/19-postmortem-swarm.md](./docs/deployment/19-postmortem-swarm.md) — why we migrated
|
||||
|
||||
# Deploy the stack (all env vars must be set in .env or environment)
|
||||
docker stack deploy -c docker-compose.yml honeydue
|
||||
```
|
||||
Operational state lives under:
|
||||
|
||||
- `deploy-k3s/manifests/` — Kubernetes manifests (apply with `kubectl`)
|
||||
- `deploy-k3s/MIGRATION_NOTES.md` — notes from the Swarm → K3s migration
|
||||
- `deploy/` — legacy Swarm config (retained temporarily; to be removed)
|
||||
|
||||
## Related Projects
|
||||
|
||||
- **Deployment Book**: [`docs/deployment/`](./docs/deployment/README.md) — full production operations reference
|
||||
- **Mobile App (KMM)**: `../HoneyDueKMM` — Kotlin Multiplatform iOS/Android client
|
||||
- **Task Logic Docs**: `docs/TASK_LOGIC_ARCHITECTURE.md` — required reading before task-related work
|
||||
- **Push Notification Docs**: `docs/PUSH_NOTIFICATIONS.md`
|
||||
|
||||
Reference in New Issue
Block a user