Migrate prod deploy from Swarm to K3s; add full deployment book
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled

Infrastructure:
- Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers)
- Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh
- All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept
  temporarily for reference

Bug fixes surfaced during migration:
- Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25)
- cache_service.go: remove sync.Once reassignment from inside Do()
  callback (was causing 'unlock of unlocked mutex' fatal after
  Redis Ping failure)
- router.go: relax CSP from 'default-src none' to 'default-src self'
  + allowlist fonts.googleapis.com so the marketing landing page CSS
  actually loads in browsers
- deploy/scripts/deploy_prod.sh: use docker buildx with
  --platform linux/amd64 so arm64 (Apple Silicon) dev machines produce
  images runnable on x86_64 Hetzner nodes; fix array expansion under
  set -u
- deploy/swarm-stack.prod.yml: fix secret source references to use
  top-level aliases (the '\${X_SECRET}' form never actually resolved);
  dozzle ports: long-form host_ip is rejected by Swarm, switched to
  short-form (bound to 0.0.0.0 with UFW-based loopback restriction);
  worker replicas 2 -> 1 (Asynq scheduler singleton)
- deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/'
  (Next.js serves at root; /admin/ returned 404 and killed pods);
  startupProbe failureThreshold 12 -> 24
- deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable
  1 -> 0 (singleton)
- deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold
  12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot;
  real startup takes up to 240s)
- .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/
  and admin/src/app/api/*, hiding legitimate files)

New files:
- deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet +
  hostNetwork override for k3s-bundled Traefik
- deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress
  without TLS (CF Flexible SSL) and without middleware
- deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log

Documentation:
- docs/deployment/ — full deployment book, 26 files, ~42k words:
  - Part I Overview, infrastructure, orchestrator choice (Ch 0-2)
  - Part II Networking, firewall, Cloudflare (Ch 3-4, 13)
  - Part III Security, Traefik ingress (Ch 5-6)
  - Part IV Services, DB, storage, secrets, registry (Ch 7-11)
  - Part V Data flow, deploy process, observability, failures, runbook
    (Ch 12, 14-17)
  - Part VI Cost, Swarm postmortem, roadmap (Ch 18-20)
  - Appendices: glossary, kubectl cheat sheet, file locations,
    consolidated citations
- README.md: Production Deployment section replaced with pointer to
  the book; Go version bumped to 1.25

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-04-24 07:20:21 -05:00
parent 4ec4bbbfe8
commit 6f303dbbaa
46 changed files with 9785 additions and 93 deletions
+1 -1
View File
@@ -5,7 +5,7 @@
# Binaries
bin/
api
/api
/worker
/admin
!admin/
+30 -21
View File
@@ -4,7 +4,7 @@ Go REST API for the honeyDue property management platform. Powers iOS and Androi
## Tech Stack
- **Language**: Go 1.24
- **Language**: Go 1.25
- **HTTP Framework**: [Echo v4](https://github.com/labstack/echo)
- **ORM**: [GORM](https://gorm.io/) with PostgreSQL
- **Background Jobs**: [Asynq](https://github.com/hibiken/asynq) (Redis-backed)
@@ -16,7 +16,7 @@ Go REST API for the honeyDue property management platform. Powers iOS and Androi
## Prerequisites
- **Go 1.24+** — [install](https://go.dev/dl/)
- **Go 1.25+** — [install](https://go.dev/dl/)
- **PostgreSQL 16+** — via Docker (recommended) or [native install](https://www.postgresql.org/download/)
- **Redis 7+** — via Docker (recommended) or [native install](https://redis.io/docs/getting-started/)
- **Docker & Docker Compose** — [install](https://docs.docker.com/get-docker/) (recommended for local development)
@@ -259,34 +259,43 @@ All protected endpoints require an `Authorization: Token <token>` header.
## Production Deployment
### Dokku
Production runs on a **3-node K3s HA cluster** on Hetzner Cloud, fronted
by Cloudflare, with Neon Postgres, Backblaze B2, and a self-hosted Gitea
container registry. See the full deployment book for every detail:
```bash
# Push to Dokku
git push dokku main
**→ [docs/deployment/](./docs/deployment/README.md) — The Deployment Book**
# Seed lookup data
cat seeds/001_lookups.sql | dokku postgres:connect honeydue-db
26 chapters and ~42,000 words covering:
# Check logs
dokku logs honeydue-api -t
```
- **Part I — The System**: overview, Hetzner infrastructure, why K3s
(and not Swarm, full Kubernetes, or Nomad)
- **Part II — Networking**: Flannel VXLAN, CoreDNS, kube-proxy, every
UFW rule on every node, Cloudflare DNS setup
- **Part III — Security**: RBAC, Pod Security, secrets, TLS chain
- **Part IV — Workloads**: api, admin, worker, redis per-service deep
dives; Neon Postgres config; Backblaze B2 storage; Gitea registry
- **Part V — Operation**: end-to-end data flow, deploy process,
observability, failure modes, operator runbook
- **Part VI — Context**: cost breakdown, postmortem of the bugs from
the Swarm→K3s migration, roadmap
### Docker Swarm
Quick links:
```bash
# Build and push production images
make docker-build-prod
docker push ${REGISTRY}/honeydue-api:${TAG}
docker push ${REGISTRY}/honeydue-worker:${TAG}
docker push ${REGISTRY}/honeydue-admin:${TAG}
- **Runbook** — [docs/deployment/17-runbook.md](./docs/deployment/17-runbook.md) — 22 common ops procedures
- **kubectl cheat sheet** — [docs/deployment/appendices/b-commands.md](./docs/deployment/appendices/b-commands.md)
- **Deploy process** — [docs/deployment/14-deployment-process.md](./docs/deployment/14-deployment-process.md) — build → push → rollout
- **Failure modes** — [docs/deployment/16-failure-modes.md](./docs/deployment/16-failure-modes.md) — what happens when X dies
- **Swarm postmortem** — [docs/deployment/19-postmortem-swarm.md](./docs/deployment/19-postmortem-swarm.md) — why we migrated
# Deploy the stack (all env vars must be set in .env or environment)
docker stack deploy -c docker-compose.yml honeydue
```
Operational state lives under:
- `deploy-k3s/manifests/` — Kubernetes manifests (apply with `kubectl`)
- `deploy-k3s/MIGRATION_NOTES.md` — notes from the Swarm → K3s migration
- `deploy/` — legacy Swarm config (retained temporarily; to be removed)
## Related Projects
- **Deployment Book**: [`docs/deployment/`](./docs/deployment/README.md) — full production operations reference
- **Mobile App (KMM)**: `../HoneyDueKMM` — Kotlin Multiplatform iOS/Android client
- **Task Logic Docs**: `docs/TASK_LOGIC_ARCHITECTURE.md` — required reading before task-related work
- **Push Notification Docs**: `docs/PUSH_NOTIFICATIONS.md`
+12
View File
@@ -0,0 +1,12 @@
import { NextResponse } from 'next/server'
export const dynamic = 'force-dynamic'
export const revalidate = 0
export async function GET() {
return NextResponse.json({ status: 'ok' }, { status: 200 })
}
export async function HEAD() {
return new NextResponse(null, { status: 200 })
}
@@ -0,0 +1,118 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
namespace: honeydue
labels:
app.kubernetes.io/name: api
app.kubernetes.io/part-of: honeydue
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
selector:
matchLabels:
app.kubernetes.io/name: api
template:
metadata:
labels:
app.kubernetes.io/name: api
app.kubernetes.io/part-of: honeydue
spec:
serviceAccountName: api
imagePullSecrets:
- name: ghcr-credentials
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: api
image: IMAGE_PLACEHOLDER # Replaced by 03-deploy.sh
ports:
- containerPort: 8000
protocol: TCP
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
envFrom:
- configMapRef:
name: honeydue-config
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: POSTGRES_PASSWORD
- name: SECRET_KEY
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: SECRET_KEY
- name: EMAIL_HOST_PASSWORD
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: EMAIL_HOST_PASSWORD
- name: FCM_SERVER_KEY
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: FCM_SERVER_KEY
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: REDIS_PASSWORD
optional: true
volumeMounts:
- name: apns-key
mountPath: /secrets/apns
readOnly: true
- name: tmp
mountPath: /tmp
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: "1"
memory: 512Mi
startupProbe:
httpGet:
path: /api/health/
port: 8000
failureThreshold: 12
periodSeconds: 5
readinessProbe:
httpGet:
path: /api/health/
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
livenessProbe:
httpGet:
path: /api/health/
port: 8000
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 10
volumes:
- name: apns-key
secret:
secretName: honeydue-apns-key
items:
- key: apns_auth_key.p8
path: apns_auth_key.p8
- name: tmp
emptyDir:
sizeLimit: 64Mi
+16
View File
@@ -0,0 +1,16 @@
apiVersion: v1
kind: Service
metadata:
name: api
namespace: honeydue
labels:
app.kubernetes.io/name: api
app.kubernetes.io/part-of: honeydue
spec:
type: ClusterIP
selector:
app.kubernetes.io/name: api
ports:
- port: 8000
targetPort: 8000
protocol: TCP
@@ -0,0 +1,101 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: worker
namespace: honeydue
labels:
app.kubernetes.io/name: worker
app.kubernetes.io/part-of: honeydue
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
selector:
matchLabels:
app.kubernetes.io/name: worker
template:
metadata:
labels:
app.kubernetes.io/name: worker
app.kubernetes.io/part-of: honeydue
spec:
serviceAccountName: worker
imagePullSecrets:
- name: ghcr-credentials
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: worker
image: IMAGE_PLACEHOLDER # Replaced by 03-deploy.sh
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
envFrom:
- configMapRef:
name: honeydue-config
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: POSTGRES_PASSWORD
- name: SECRET_KEY
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: SECRET_KEY
- name: EMAIL_HOST_PASSWORD
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: EMAIL_HOST_PASSWORD
- name: FCM_SERVER_KEY
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: FCM_SERVER_KEY
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: REDIS_PASSWORD
optional: true
volumeMounts:
- name: apns-key
mountPath: /secrets/apns
readOnly: true
- name: tmp
mountPath: /tmp
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
exec:
command: ["pgrep", "-f", "/app/worker"]
initialDelaySeconds: 15
periodSeconds: 30
timeoutSeconds: 5
volumes:
- name: apns-key
secret:
secretName: honeydue-apns-key
items:
- key: apns_auth_key.p8
path: apns_auth_key.p8
- name: tmp
emptyDir:
sizeLimit: 64Mi
+170
View File
@@ -0,0 +1,170 @@
# K3s Migration Notes — 2026-04-24
honeyDue is running on a 3-node K3s HA cluster on the existing Hetzner nodes
(hetzner1/2/3), replacing the previous Docker Swarm deployment.
## Why we migrated
Docker Swarm's libnetwork has a known stale-DNS bug on 29.x
([moby/moby#52265](https://github.com/moby/moby/issues/52265)) that leaves
ghost A-records when tasks migrate between nodes. Single-replica services
(like the admin panel) landed on a ghost IP ~50% of the time → connection
refused → 502. Full stack recreate cleared it, but the bug recurs on every
node-to-node task migration.
K3s uses CoreDNS + containerd with no libnetwork history → the bug class
doesn't exist there. See `docs/SWARM_POSTMORTEM.md` if it exists, or the
research summary in the earlier deploy session.
## Differences from the original `deploy-k3s/` scaffold
The original scaffold assumes a greenfield provision via `hetzner-k3s`,
GHCR for images, Cloudflare origin certs, and a Hetzner Load Balancer.
We reused existing nodes and kept Cloudflare Flexible SSL:
| Setting | Scaffold default | What we did |
|---|---|---|
| Provisioning | `hetzner-k3s` tool creates boxes | Manual k3s install on existing Hetzner boxes |
| Registry | GHCR (`ghcr-credentials`) | Gitea (`gitea-credentials`) via `kubectl create secret docker-registry` |
| Ingress TLS | `cloudflare-origin-cert` Secret | No TLS at origin (CF Flexible) |
| Load balancer | Hetzner LB → nodes | Cloudflare round-robin across 3 node IPs |
| Admin basic auth | `admin-auth` Traefik middleware | Not applied — in-app auth only |
| CF-only IP allowlist | `cloudflare-only` middleware | Not applied — UFW restricts some ports, 80/443 open to anyone who knows node IPs |
| Traefik | LoadBalancer via servicelb | DaemonSet w/ hostNetwork (servicelb disabled); see `traefik-config.yaml` below |
| Worker replicas | 2 | 1 (Asynq scheduler is singleton) |
| API start_period | 12×5s = 60s | 48×5s = 240s (covers migrate + lock queue on first boot) |
| Admin probe path | `/admin/` | `/` (Next.js serves at root) |
## Manifest fixes applied in-repo (already committed)
- `manifests/api/deployment.yaml``startupProbe.failureThreshold: 12 → 48`
- `manifests/admin/deployment.yaml` — probe path `/admin/ → /`, threshold `12 → 24`
- `manifests/worker/deployment.yaml``replicas: 2 → 1`
- `manifests/pod-disruption-budgets.yaml` — worker `minAvailable: 1 → 0`
## Traefik override (applied as HelmChartConfig)
K3s ships Traefik as a single-replica Deployment with a LoadBalancer service.
With servicelb disabled (to avoid binding a random port), we reconfigure it
to a DaemonSet binding directly on each node's public :80/:443 via
`hostNetwork: true`. The HelmChartConfig:
```yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: traefik
namespace: kube-system
spec:
valuesContent: |-
deployment:
kind: DaemonSet
hostNetwork: true
service:
enabled: false
ports:
web:
port: 80
hostPort: 80
websecure:
port: 443
hostPort: 443
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 0
securityContext:
capabilities:
drop: [ALL]
add: [NET_BIND_SERVICE]
readOnlyRootFilesystem: true
runAsGroup: 65532
runAsNonRoot: true
runAsUser: 65532
additionalArguments:
- "--entrypoints.web.forwardedHeaders.trustedIPs=173.245.48.0/20,103.21.244.0/22,103.22.200.0/22,103.31.4.0/22,141.101.64.0/18,108.162.192.0/18,190.93.240.0/20,188.114.96.0/20,197.234.240.0/22,198.41.128.0/17,162.158.0.0/15,104.16.0.0/13,104.24.0.0/14,172.64.0.0/13,131.0.72.0/22"
```
Apply with `kubectl apply -f traefik-config.yaml`, then bump the helm job
(`kubectl delete job -n kube-system helm-install-traefik`) to trigger reinstall.
## Required node-level sysctl
hostNetwork pods with capabilities don't get CAP_NET_BIND_SERVICE in the
host netns on modern containerd. Set on each node:
```bash
echo 'net.ipv4.ip_unprivileged_port_start=0' | sudo tee /etc/sysctl.d/99-unprivileged-ports.conf
sudo sysctl --system
```
## UFW rules added for k3s (per node)
All between the 3 node IPs (178.104.247.152, 178.105.32.198, 178.104.249.189):
- `6443/tcp` — kube API
- `2379/tcp`, `2380/tcp` — embedded etcd client + peer
- `10250/tcp` — kubelet
- `8472/udp` — flannel VXLAN overlay
Plus from your workstation IP to each node's `6443/tcp` for `kubectl`.
## Ingress
Minimal hostname-only routing (`/tmp/honeydue-ingress.yaml` at deploy time
— move it into `deploy-k3s/manifests/ingress/` in a follow-up):
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: honeydue-api
namespace: honeydue
spec:
ingressClassName: traefik
rules:
- host: api.myhoneydue.com
http:
paths:
- {path: /, pathType: Prefix, backend: {service: {name: api, port: {number: 8000}}}}
- host: myhoneydue.com
http:
paths:
- {path: /, pathType: Prefix, backend: {service: {name: api, port: {number: 8000}}}}
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: honeydue-admin
namespace: honeydue
spec:
ingressClassName: traefik
rules:
- host: admin.myhoneydue.com
http:
paths:
- {path: /, pathType: Prefix, backend: {service: {name: admin, port: {number: 3000}}}}
```
## Operator access
Kubeconfig lives at `~/.kube/honeydue-k3s.yaml`.
```bash
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
kubectl get pods -n honeydue
```
## Remaining TODOs (not blocking)
- Apply `manifests/ingress/middleware.yaml` for security headers + rate limiting
(CF-only allowlist + basic auth deliberately skipped until you want them)
- Apply `manifests/network-policies.yaml` for default-deny + explicit allows
- Apply `manifests/api/hpa.yaml` if you want autoscaling (metrics-server is
already running, so just `kubectl apply` it)
- Upgrade to CF Full (strict) SSL: generate origin cert, create
`cloudflare-origin-cert` Secret, add `tls:` block back to Ingress
- Set up a proper migration Job so `api` replicas don't each run `MigrateWithLock`
on startup — lets you drop the 240s startupProbe grace
- Remove `deploy/` (the Swarm-era config) once you're confident in k3s
+5 -3
View File
@@ -65,15 +65,17 @@ spec:
limits:
cpu: 500m
memory: 256Mi
# Admin Next.js app serves at `/`, not `/admin/`. `/admin/` returns
# 404 and kills the pod via the probe.
startupProbe:
httpGet:
path: /admin/
path: /
port: 3000
failureThreshold: 12
failureThreshold: 24
periodSeconds: 5
readinessProbe:
httpGet:
path: /admin/
path: /
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
+123
View File
@@ -0,0 +1,123 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
namespace: honeydue
labels:
app.kubernetes.io/name: api
app.kubernetes.io/part-of: honeydue
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
selector:
matchLabels:
app.kubernetes.io/name: api
template:
metadata:
labels:
app.kubernetes.io/name: api
app.kubernetes.io/part-of: honeydue
spec:
serviceAccountName: api
imagePullSecrets:
- name: ghcr-credentials
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: api
image: IMAGE_PLACEHOLDER # Replaced by 03-deploy.sh
ports:
- containerPort: 8000
protocol: TCP
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
envFrom:
- configMapRef:
name: honeydue-config
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: POSTGRES_PASSWORD
- name: SECRET_KEY
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: SECRET_KEY
- name: EMAIL_HOST_PASSWORD
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: EMAIL_HOST_PASSWORD
- name: FCM_SERVER_KEY
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: FCM_SERVER_KEY
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: REDIS_PASSWORD
optional: true
volumeMounts:
- name: apns-key
mountPath: /secrets/apns
readOnly: true
- name: tmp
mountPath: /tmp
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: "1"
memory: 512Mi
startupProbe:
httpGet:
path: /api/health/
port: 8000
# MigrateWithLock in cmd/api/main.go runs pg_advisory_lock on
# every startup. On a cold boot with 3 replicas, the first does
# AutoMigrate (~90s) and the others wait on the lock, so real
# startup runs 90240s. 48 × 5s = 240s grace absorbs it without
# healthcheck killing a still-starting replica.
failureThreshold: 48
periodSeconds: 5
readinessProbe:
httpGet:
path: /api/health/
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
livenessProbe:
httpGet:
path: /api/health/
port: 8000
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 10
volumes:
- name: apns-key
secret:
secretName: honeydue-apns-key
items:
- key: apns_auth_key.p8
path: apns_auth_key.p8
- name: tmp
emptyDir:
sizeLimit: 64Mi
+41
View File
@@ -0,0 +1,41 @@
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api
namespace: honeydue
labels:
app.kubernetes.io/name: api
app.kubernetes.io/part-of: honeydue
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 3
maxReplicas: 6
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 1
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 120
+16
View File
@@ -0,0 +1,16 @@
apiVersion: v1
kind: Service
metadata:
name: api
namespace: honeydue
labels:
app.kubernetes.io/name: api
app.kubernetes.io/part-of: honeydue
spec:
type: ClusterIP
selector:
app.kubernetes.io/name: api
ports:
- port: 8000
targetPort: 8000
protocol: TCP
@@ -0,0 +1,61 @@
# Simple hostname-based Ingress — no TLS (Cloudflare Flexible handles edge
# TLS, CF→origin is plain HTTP on 80). Upgrade to Full (strict) by
# adding back a `tls:` block with a Cloudflare Origin CA cert stored in
# secret/cloudflare-origin-cert.
#
# Middleware chain (security headers, rate limit, CF-only allowlist, admin
# basic auth) is defined in `middleware.yaml` but NOT attached here —
# annotate this ingress to turn any of them on.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: honeydue-api
namespace: honeydue
labels:
app.kubernetes.io/part-of: honeydue
spec:
ingressClassName: traefik
rules:
- host: api.myhoneydue.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api
port:
number: 8000
# Root domain serves the marketing landing page from the Go API's
# STATIC_DIR. ALLOWED_HOSTS in honeydue-config includes myhoneydue.com.
- host: myhoneydue.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api
port:
number: 8000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: honeydue-admin
namespace: honeydue
labels:
app.kubernetes.io/part-of: honeydue
spec:
ingressClassName: traefik
rules:
- host: admin.myhoneydue.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: admin
port:
number: 3000
@@ -1,6 +1,6 @@
# Pod Disruption Budgets — prevent node maintenance from killing all replicas
# API: at least 2 of 3 replicas must stay up during voluntary disruptions
# Worker: at least 1 of 2 replicas must stay up
# Worker: singleton (Asynq scheduler) — must allow drain, minAvailable: 0
apiVersion: policy/v1
kind: PodDisruptionBudget
@@ -26,7 +26,7 @@ metadata:
app.kubernetes.io/name: worker
app.kubernetes.io/part-of: honeydue
spec:
minAvailable: 1
minAvailable: 0
selector:
matchLabels:
app.kubernetes.io/name: worker
@@ -0,0 +1,53 @@
# Traefik reconfiguration for this deployment.
#
# K3s defaults: Traefik as single-replica Deployment, LoadBalancer service.
# We disabled servicelb (--disable=servicelb on k3s install), so LoadBalancer
# doesn't get an external IP. This config makes Traefik a DaemonSet binding
# directly on each node's public :80/:443 via hostNetwork, matching our
# Cloudflare DNS round-robin across 3 node IPs.
#
# Apply: kubectl apply -f traefik-helmchartconfig.yaml
# Then bump Helm reconcile: kubectl delete job -n kube-system helm-install-traefik
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: traefik
namespace: kube-system
spec:
valuesContent: |-
deployment:
kind: DaemonSet
hostNetwork: true
service:
enabled: false
ports:
web:
port: 80
hostPort: 80
websecure:
port: 443
hostPort: 443
# hostNetwork with port 80/443 requires RollingUpdate maxUnavailable > 0
# (each node's port is held by one pod; can't surge).
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 0
securityContext:
capabilities:
drop: [ALL]
add: [NET_BIND_SERVICE]
readOnlyRootFilesystem: true
runAsGroup: 65532
runAsNonRoot: true
runAsUser: 65532
# NOTE: The host-level sysctl `net.ipv4.ip_unprivileged_port_start=0`
# must be set on each node. Without it, hostNetwork pods can't actually
# use NET_BIND_SERVICE to bind :80/:443. Persisted at
# /etc/sysctl.d/99-unprivileged-ports.conf on each node.
additionalArguments:
# Trust Cloudflare's forwarded proto header so the Go app sees the
# original https scheme even though CF→origin is plain HTTP.
# IP ranges from https://www.cloudflare.com/ips-v4/ (as of 2026-04).
- "--entrypoints.web.forwardedHeaders.trustedIPs=173.245.48.0/20,103.21.244.0/22,103.22.200.0/22,103.31.4.0/22,141.101.64.0/18,108.162.192.0/18,190.93.240.0/20,188.114.96.0/20,197.234.240.0/22,198.41.128.0/17,162.158.0.0/15,104.16.0.0/13,104.24.0.0/14,172.64.0.0/13,131.0.72.0/22"
+105
View File
@@ -0,0 +1,105 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: worker
namespace: honeydue
labels:
app.kubernetes.io/name: worker
app.kubernetes.io/part-of: honeydue
spec:
# Asynq's Scheduler is a singleton — running >1 replica fires every cron
# task once per replica (duplicate daily digests, onboarding emails, etc.).
# Keep at 1 until asynq.PeriodicTaskManager with Redis leader election is
# wired in cmd/worker/main.go.
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
selector:
matchLabels:
app.kubernetes.io/name: worker
template:
metadata:
labels:
app.kubernetes.io/name: worker
app.kubernetes.io/part-of: honeydue
spec:
serviceAccountName: worker
imagePullSecrets:
- name: ghcr-credentials
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: worker
image: IMAGE_PLACEHOLDER # Replaced by 03-deploy.sh
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
envFrom:
- configMapRef:
name: honeydue-config
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: POSTGRES_PASSWORD
- name: SECRET_KEY
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: SECRET_KEY
- name: EMAIL_HOST_PASSWORD
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: EMAIL_HOST_PASSWORD
- name: FCM_SERVER_KEY
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: FCM_SERVER_KEY
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: REDIS_PASSWORD
optional: true
volumeMounts:
- name: apns-key
mountPath: /secrets/apns
readOnly: true
- name: tmp
mountPath: /tmp
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
exec:
command: ["pgrep", "-f", "/app/worker"]
initialDelaySeconds: 15
periodSeconds: 30
timeoutSeconds: 5
volumes:
- name: apns-key
secret:
secretName: honeydue-apns-key
items:
- key: apns_auth_key.p8
path: apns_auth_key.p8
- name: tmp
emptyDir:
sizeLimit: 64Mi
+52
View File
@@ -0,0 +1,52 @@
# honeyDue edge proxy — terminates HTTP from Cloudflare, routes by Host header.
#
# Cloudflare is in front, SSL mode "Flexible" — CF terminates TLS at the edge
# and talks to this origin over plain HTTP on port 80. No LE certs needed here
# for now. Later, to go "Full (strict)", remove `auto_https off`, add `tls` blocks
# that use the ACME HTTP-01 challenge, and open 443 on the node.
{
admin off
auto_https off
}
# api.myhoneydue.com → Go REST API
# `dynamic a` re-resolves the Swarm service DNS every 30s instead of caching
# the IP forever at config parse. This is critical on Swarm with endpoint_mode:
# dnsrr — when a task restarts, its overlay IP changes, and static DNS caching
# leaves Caddy dialing dead IPs.
api.myhoneydue.com:80 {
reverse_proxy {
dynamic a {
name api
port 8000
refresh 30s
}
header_up X-Forwarded-Proto {http.request.header.X-Forwarded-Proto}
}
}
# admin.myhoneydue.com → Next.js admin panel via overlay DNS (VIP endpoint)
#
# This relies on Swarm's embedded resolver, which has a known libnetwork
# stale-record bug (moby/moby#52265, affects 29.x). We work around it by
# (a) using default VIP endpoint_mode — a stable service IP — and
# (b) running a clean overlay from scratch (see Phase 1 stack recreate).
#
# If ghosts come back, the long-term fix is Traefik w/ Swarm provider that
# reads task IPs from Docker API, bypassing libnetwork DNS entirely. See
# deploy/MIGRATION_NOTES.md for the Traefik migration plan.
admin.myhoneydue.com:80 {
reverse_proxy admin:3000 {
lb_try_duration 3s
lb_try_interval 250ms
header_up X-Forwarded-Proto {http.request.header.X-Forwarded-Proto}
}
}
# Catch-all for root/unknown hostnames hitting our IPs directly.
# Cloudflare SSL=Flexible will still hit us on :80 for myhoneydue.com; return
# a placeholder until you wire a real marketing site.
:80 {
respond "honeyDue" 200
}
+94 -32
View File
@@ -248,13 +248,15 @@ Images that would be built and pushed:
${ADMIN_IMAGE}
Replicas:
caddy: 3 (one per node)
api: ${API_REPLICAS:-3}
worker: ${WORKER_REPLICAS:-2}
worker: ${WORKER_REPLICAS:-1}
admin: ${ADMIN_REPLICAS:-1}
Published ports:
api: ${API_PORT:-8000} (ingress)
admin: ${ADMIN_PORT:-3000} (ingress)
caddy: 80, 443 (ingress — public)
api: internal only (proxied by caddy)
admin: internal only (proxied by caddy)
dozzle: ${DOZZLE_PORT:-9999} (manager loopback only — SSH tunnel required)
Versioned secrets that would be created on this deploy:
@@ -264,6 +266,9 @@ Versioned secrets that would be created on this deploy:
${DEPLOY_STACK_NAME}_fcm_server_key_<deploy_id>
${DEPLOY_STACK_NAME}_apns_auth_key_<deploy_id>
Versioned configs that would be created on this deploy:
${DEPLOY_STACK_NAME}_caddyfile_<deploy_id>
No changes made. Re-run without DRY_RUN=1 to deploy.
=================================================
@@ -289,27 +294,54 @@ if [[ "${SKIP_BUILD}" != "1" ]]; then
log "Logging in to ${REGISTRY}"
printf '%s' "${REGISTRY_TOKEN}" | docker login "${REGISTRY}" -u "${REGISTRY_USERNAME}" --password-stdin >/dev/null
log "Building API image ${API_IMAGE}"
docker build --target api -t "${API_IMAGE}" "${REPO_DIR}"
log "Building Worker image ${WORKER_IMAGE}"
docker build --target worker -t "${WORKER_IMAGE}" "${REPO_DIR}"
log "Building Admin image ${ADMIN_IMAGE}"
docker build --target admin -t "${ADMIN_IMAGE}" "${REPO_DIR}"
# Target platform for Swarm nodes. Hetzner CX is x86_64; override via
# BUILD_PLATFORM=linux/arm64 if you move to ARM (Ampere) hosts.
BUILD_PLATFORM="${BUILD_PLATFORM:-linux/amd64}"
log "Build platform: ${BUILD_PLATFORM} (dev host may differ; buildx cross-compiles)"
log "Pushing deploy images"
docker push "${API_IMAGE}"
docker push "${WORKER_IMAGE}"
docker push "${ADMIN_IMAGE}"
if [[ "${PUSH_LATEST_TAG}" == "true" ]]; then
log "Updating :latest tags"
docker tag "${API_IMAGE}" "${REGISTRY_PREFIX}/honeydue-api:latest"
docker tag "${WORKER_IMAGE}" "${REGISTRY_PREFIX}/honeydue-worker:latest"
docker tag "${ADMIN_IMAGE}" "${REGISTRY_PREFIX}/honeydue-admin:latest"
docker push "${REGISTRY_PREFIX}/honeydue-api:latest"
docker push "${REGISTRY_PREFIX}/honeydue-worker:latest"
docker push "${REGISTRY_PREFIX}/honeydue-admin:latest"
# Ensure a buildx builder exists and is usable
if ! docker buildx inspect honeydue-builder >/dev/null 2>&1; then
log "Creating buildx builder 'honeydue-builder'"
docker buildx create --name honeydue-builder --use >/dev/null
else
docker buildx use honeydue-builder >/dev/null
fi
docker buildx inspect --bootstrap >/dev/null
build_and_push() {
local target="$1"
local image="$2"
shift 2
local tag_args=(-t "${image}")
while (( $# > 0 )); do
tag_args+=(-t "$1")
shift
done
log "Building + pushing ${target} image for ${BUILD_PLATFORM}: ${image}"
docker buildx build \
--platform "${BUILD_PLATFORM}" \
--target "${target}" \
"${tag_args[@]}" \
--push \
"${REPO_DIR}"
}
api_extra=()
worker_extra=()
admin_extra=()
if [[ "${PUSH_LATEST_TAG}" == "true" ]]; then
api_extra=("${REGISTRY_PREFIX}/honeydue-api:latest")
worker_extra=("${REGISTRY_PREFIX}/honeydue-worker:latest")
admin_extra=("${REGISTRY_PREFIX}/honeydue-admin:latest")
fi
# ${arr[@]+"${arr[@]}"} safely expands to nothing when the array is empty
# under `set -u` — avoids "unbound variable" on bash arrays.
build_and_push api "${API_IMAGE}" ${api_extra[@]+"${api_extra[@]}"}
build_and_push worker "${WORKER_IMAGE}" ${worker_extra[@]+"${worker_extra[@]}"}
build_and_push admin "${ADMIN_IMAGE}" ${admin_extra[@]+"${admin_extra[@]}"}
else
warn "SKIP_BUILD=1 set. Using prebuilt images for tag: ${DEPLOY_TAG}"
fi
@@ -322,6 +354,12 @@ SECRET_KEY_SECRET="${DEPLOY_STACK_NAME}_secret_key_${DEPLOY_ID}"
EMAIL_HOST_PASSWORD_SECRET="${DEPLOY_STACK_NAME}_email_host_password_${DEPLOY_ID}"
FCM_SERVER_KEY_SECRET="${DEPLOY_STACK_NAME}_fcm_server_key_${DEPLOY_ID}"
APNS_AUTH_KEY_SECRET="${DEPLOY_STACK_NAME}_apns_auth_key_${DEPLOY_ID}"
CADDYFILE_CONFIG="${DEPLOY_STACK_NAME}_caddyfile_${DEPLOY_ID}"
CADDYFILE_SRC="${DEPLOY_DIR}/Caddyfile"
if [[ ! -f "${CADDYFILE_SRC}" ]]; then
die "Missing required file: ${CADDYFILE_SRC}"
fi
TMP_DIR="$(mktemp -d)"
cleanup() {
@@ -332,6 +370,7 @@ trap cleanup EXIT
cp "${STACK_TEMPLATE}" "${TMP_DIR}/swarm-stack.prod.yml"
cp "${PROD_ENV}" "${TMP_DIR}/prod.env"
cp "${REGISTRY_ENV}" "${TMP_DIR}/registry.env"
cp "${CADDYFILE_SRC}" "${TMP_DIR}/Caddyfile"
mkdir -p "${TMP_DIR}/secrets"
cp "${SECRET_POSTGRES}" "${TMP_DIR}/secrets/postgres_password.txt"
cp "${SECRET_APP_KEY}" "${TMP_DIR}/secrets/secret_key.txt"
@@ -356,6 +395,7 @@ SECRET_KEY_SECRET=${SECRET_KEY_SECRET}
EMAIL_HOST_PASSWORD_SECRET=${EMAIL_HOST_PASSWORD_SECRET}
FCM_SERVER_KEY_SECRET=${FCM_SERVER_KEY_SECRET}
APNS_AUTH_KEY_SECRET=${APNS_AUTH_KEY_SECRET}
CADDYFILE_CONFIG=${CADDYFILE_CONFIG}
EOF
log "Uploading deploy bundle to ${SSH_TARGET}:${DEPLOY_REMOTE_DIR}"
@@ -364,6 +404,7 @@ scp "${SCP_OPTS[@]}" "${TMP_DIR}/swarm-stack.prod.yml" "${SSH_TARGET}:${DEPLOY_R
scp "${SCP_OPTS[@]}" "${TMP_DIR}/prod.env" "${SSH_TARGET}:${DEPLOY_REMOTE_DIR}/prod.env"
scp "${SCP_OPTS[@]}" "${TMP_DIR}/registry.env" "${SSH_TARGET}:${DEPLOY_REMOTE_DIR}/registry.env"
scp "${SCP_OPTS[@]}" "${TMP_DIR}/runtime.env" "${SSH_TARGET}:${DEPLOY_REMOTE_DIR}/runtime.env"
scp "${SCP_OPTS[@]}" "${TMP_DIR}/Caddyfile" "${SSH_TARGET}:${DEPLOY_REMOTE_DIR}/Caddyfile"
scp "${SCP_OPTS[@]}" "${TMP_DIR}/secrets/postgres_password.txt" "${SSH_TARGET}:${DEPLOY_REMOTE_DIR}/secrets/postgres_password.txt"
scp "${SCP_OPTS[@]}" "${TMP_DIR}/secrets/secret_key.txt" "${SSH_TARGET}:${DEPLOY_REMOTE_DIR}/secrets/secret_key.txt"
scp "${SCP_OPTS[@]}" "${TMP_DIR}/secrets/email_host_password.txt" "${SSH_TARGET}:${DEPLOY_REMOTE_DIR}/secrets/email_host_password.txt"
@@ -397,6 +438,17 @@ create_secret() {
fi
}
create_config() {
local name="$1"
local src="$2"
if docker config inspect "${name}" >/dev/null 2>&1; then
echo "[remote] config exists: ${name}"
else
docker config create "${name}" "${src}" >/dev/null
echo "[remote] created config: ${name}"
fi
}
printf '%s' "${REGISTRY_TOKEN}" | docker login "${REGISTRY}" -u "${REGISTRY_USERNAME}" --password-stdin >/dev/null
rm -f "${REMOTE_DIR}/registry.env"
@@ -406,6 +458,8 @@ create_secret "${EMAIL_HOST_PASSWORD_SECRET}" "${REMOTE_DIR}/secrets/email_host_
create_secret "${FCM_SERVER_KEY_SECRET}" "${REMOTE_DIR}/secrets/fcm_server_key.txt"
create_secret "${APNS_AUTH_KEY_SECRET}" "${REMOTE_DIR}/secrets/apns_auth_key.p8"
create_config "${CADDYFILE_CONFIG}" "${REMOTE_DIR}/Caddyfile"
rm -f "${REMOTE_DIR}/secrets/postgres_password.txt"
rm -f "${REMOTE_DIR}/secrets/secret_key.txt"
rm -f "${REMOTE_DIR}/secrets/email_host_password.txt"
@@ -455,18 +509,22 @@ while true; do
sleep 10
done
log "Pruning old secret versions (keeping last ${SECRET_KEEP_VERSIONS})"
ssh "${SSH_OPTS[@]}" "${SSH_TARGET}" "bash -s -- '${DEPLOY_STACK_NAME}' '${SECRET_KEEP_VERSIONS}'" <<'EOF' || warn "Secret pruning reported errors (non-fatal)"
log "Pruning old secret + config versions (keeping last ${SECRET_KEEP_VERSIONS})"
ssh "${SSH_OPTS[@]}" "${SSH_TARGET}" "bash -s -- '${DEPLOY_STACK_NAME}' '${SECRET_KEEP_VERSIONS}'" <<'EOF' || warn "Pruning reported errors (non-fatal)"
set -euo pipefail
STACK_NAME="$1"
KEEP="$2"
prune_prefix() {
local prefix="$1"
# List matching secrets with creation time, sorted newest-first.
local kind="$1" # "secret" or "config"
local prefix="$2"
local ls_cmd rm_cmd
ls_cmd="docker ${kind} ls --format '{{.CreatedAt}}|{{.Name}}'"
rm_cmd="docker ${kind} rm"
local all
all="$(docker secret ls --format '{{.CreatedAt}}|{{.Name}}' 2>/dev/null \
all="$(eval "${ls_cmd}" 2>/dev/null \
| grep "|${prefix}_" \
| sort -r \
|| true)"
@@ -477,7 +535,7 @@ prune_prefix() {
local total
total="$(printf '%s\n' "${all}" | wc -l | tr -d ' ')"
if (( total <= KEEP )); then
echo "[cleanup] ${prefix}: ${total} version(s) — nothing to prune"
echo "[cleanup] ${kind}/${prefix}: ${total} version(s) — nothing to prune"
return 0
fi
@@ -486,16 +544,20 @@ prune_prefix() {
while IFS= read -r name; do
[[ -z "${name}" ]] && continue
if docker secret rm "${name}" >/dev/null 2>&1; then
echo "[cleanup] removed: ${name}"
if ${rm_cmd} "${name}" >/dev/null 2>&1; then
echo "[cleanup] removed ${kind}: ${name}"
else
echo "[cleanup] in-use (kept): ${name}"
echo "[cleanup] in-use ${kind} (kept): ${name}"
fi
done <<< "${to_remove}"
}
for base in postgres_password secret_key email_host_password fcm_server_key apns_auth_key; do
prune_prefix "${STACK_NAME}_${base}"
prune_prefix secret "${STACK_NAME}_${base}"
done
for base in caddyfile; do
prune_prefix config "${STACK_NAME}_${base}"
done
EOF
+91 -26
View File
@@ -1,6 +1,59 @@
version: "3.8"
services:
# Edge reverse proxy — the only service publishing :80/:443 publicly.
# Routes by Host header to internal `api` and `admin` services over the
# overlay network. Runs one replica per node via ingress mesh, so any node
# can terminate incoming traffic.
caddy:
image: caddy:2-alpine
ports:
- target: 80
published: 80
protocol: tcp
mode: ingress
- target: 443
published: 443
protocol: tcp
mode: ingress
configs:
- source: caddyfile
target: /etc/caddy/Caddyfile
mode: 0444
volumes:
- caddy_data:/data
- caddy_config:/config
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://127.0.0.1/"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
deploy:
replicas: 3
restart_policy:
condition: any
delay: 5s
update_config:
parallelism: 1
delay: 10s
order: start-first
rollback_config:
parallelism: 1
delay: 5s
order: stop-first
placement:
max_replicas_per_node: 1
resources:
limits:
cpus: "0.25"
memory: 128M
reservations:
cpus: "0.05"
memory: 32M
networks:
- honeydue-network
redis:
image: redis:7-alpine
command: redis-server --appendonly yes --appendfsync everysec --maxmemory 200mb --maxmemory-policy allkeys-lru
@@ -30,11 +83,8 @@ services:
api:
image: ${API_IMAGE}
ports:
- target: 8000
published: ${API_PORT}
protocol: tcp
mode: ingress
# No `ports:` block — Caddy edge service proxies to api:8000 over the
# overlay network. Port 8000 is never publicly exposed.
environment:
PORT: "8000"
DEBUG: "${DEBUG}"
@@ -104,6 +154,10 @@ services:
APPLE_IAP_SANDBOX: "${APPLE_IAP_SANDBOX}"
GOOGLE_IAP_SERVICE_ACCOUNT_PATH: "${GOOGLE_IAP_SERVICE_ACCOUNT_PATH}"
GOOGLE_IAP_PACKAGE_NAME: "${GOOGLE_IAP_PACKAGE_NAME}"
# Seeded on first migration (idempotent — skipped if admin_users row exists)
ADMIN_EMAIL: "${ADMIN_EMAIL}"
ADMIN_PASSWORD: "${ADMIN_PASSWORD}"
stop_grace_period: 60s
command:
- /bin/sh
@@ -116,15 +170,15 @@ services:
export FCM_SERVER_KEY="$$(cat /run/secrets/fcm_server_key)"
exec /app/api
secrets:
- source: ${POSTGRES_PASSWORD_SECRET}
- source: postgres_password
target: postgres_password
- source: ${SECRET_KEY_SECRET}
- source: secret_key
target: secret_key
- source: ${EMAIL_HOST_PASSWORD_SECRET}
- source: email_host_password
target: email_host_password
- source: ${FCM_SERVER_KEY_SECRET}
- source: fcm_server_key
target: fcm_server_key
- source: ${APNS_AUTH_KEY_SECRET}
- source: apns_auth_key
target: apns_auth_key
volumes:
- uploads:/app/uploads
@@ -132,10 +186,18 @@ services:
test: ["CMD", "curl", "-f", "http://127.0.0.1:8000/api/health/"]
interval: 30s
timeout: 10s
start_period: 15s
# Single-replica AutoMigrate on a fresh DB takes ~90s; subsequent
# replicas are ~2s (idempotent). 180s gives honest headroom for the
# first replica to finish, without masking cascade failures.
start_period: 180s
retries: 3
deploy:
replicas: ${API_REPLICAS}
# DNS round-robin instead of VIP. VIP's kernel IPVS state can go stale
# during replica churn (rolling updates, task restarts), causing
# intermittent i/o timeouts from clients on the overlay network (Caddy).
# dnsrr resolves to live task IPs directly and bypasses IPVS.
endpoint_mode: dnsrr
restart_policy:
condition: any
delay: 5s
@@ -159,11 +221,8 @@ services:
admin:
image: ${ADMIN_IMAGE}
ports:
- target: 3000
published: ${ADMIN_PORT}
protocol: tcp
mode: ingress
# No `ports:` block — reached via Caddy on admin.myhoneydue.com using
# Swarm's embedded DNS and default VIP endpoint_mode.
environment:
PORT: "3000"
HOSTNAME: "0.0.0.0"
@@ -248,15 +307,15 @@ services:
export FCM_SERVER_KEY="$$(cat /run/secrets/fcm_server_key)"
exec /app/worker
secrets:
- source: ${POSTGRES_PASSWORD_SECRET}
- source: postgres_password
target: postgres_password
- source: ${SECRET_KEY_SECRET}
- source: secret_key
target: secret_key
- source: ${EMAIL_HOST_PASSWORD_SECRET}
- source: email_host_password
target: email_host_password
- source: ${FCM_SERVER_KEY_SECRET}
- source: fcm_server_key
target: fcm_server_key
- source: ${APNS_AUTH_KEY_SECRET}
- source: apns_auth_key
target: apns_auth_key
healthcheck:
test: ["CMD", "curl", "-f", "http://127.0.0.1:6060/health"]
@@ -293,12 +352,11 @@ services:
# ssh -L ${DOZZLE_PORT}:127.0.0.1:${DOZZLE_PORT} <manager>
# Then browse http://localhost:${DOZZLE_PORT}
image: amir20/dozzle:latest
# Bind to loopback only on the manager. Swarm's long-form port spec
# rejects `host_ip`, so we use the short form — 127.0.0.1:<port>:8080.
# Access via SSH tunnel: ssh -L ${DOZZLE_PORT}:127.0.0.1:${DOZZLE_PORT} <manager>
ports:
- target: 8080
published: ${DOZZLE_PORT}
protocol: tcp
mode: host
host_ip: 127.0.0.1
- "127.0.0.1:${DOZZLE_PORT}:8080"
environment:
DOZZLE_NO_ANALYTICS: "true"
volumes:
@@ -324,6 +382,8 @@ services:
volumes:
redis_data:
uploads:
caddy_data:
caddy_config:
networks:
honeydue-network:
@@ -331,6 +391,11 @@ networks:
driver_opts:
encrypted: "true"
configs:
caddyfile:
external: true
name: ${CADDYFILE_CONFIG}
secrets:
postgres_password:
external: true
+240
View File
@@ -0,0 +1,240 @@
# 00 — Overview
## Summary
honeyDue runs on a three-node Kubernetes cluster managed by K3s, fronted by
Cloudflare, and backed by a managed Postgres (Neon), S3-compatible object
storage (Backblaze B2), and a self-hosted container registry (Gitea). The
application consists of a Go REST API, a Next.js admin panel, and a
background worker process using Redis-backed queues. Traefik handles HTTP
ingress and path-based routing. The whole stack fits in about 1 GB of RAM
across the three nodes with plenty of headroom.
This chapter is the map. Everything here is expanded in a later chapter.
## Architecture at a glance
```mermaid
flowchart TB
subgraph Internet
Browser[End-user browser / mobile client]
end
subgraph CF[Cloudflare]
CFEdge[Edge POP<br/>TLS terminates here]
end
Browser -- HTTPS :443 --> CFEdge
subgraph Hetzner[Hetzner Cloud — Nuremberg nbg1]
direction LR
subgraph H1[hetzner1<br/>178.104.247.152]
T1[Traefik<br/>:80/:443 hostNet]
A1[api pod]
W1[worker pod]
end
subgraph H2[hetzner2<br/>178.105.32.198]
T2[Traefik<br/>:80/:443 hostNet]
A2[api pod]
R1[redis pod<br/>PVC]
end
subgraph H3[hetzner3<br/>178.104.249.189]
T3[Traefik<br/>:80/:443 hostNet]
A3[api pod]
AD1[admin pod]
end
end
CFEdge -- HTTP :80<br/>DNS round-robin --> T1
CFEdge -- HTTP :80 --> T2
CFEdge -- HTTP :80 --> T3
T1 & T2 & T3 -.Ingress routes by<br/>Host header.-> A1
T1 & T2 & T3 -.-> AD1
A1 & A2 & A3 -.-> R1
subgraph External[Managed services]
Neon[(Neon Postgres<br/>AWS us-east-1)]
B2[(Backblaze B2<br/>us-east-005)]
FM[Fastmail SMTP]
Gitea[Gitea Registry<br/>gitea.treytartt.com]
end
A1 & A2 & A3 -- SSL --> Neon
W1 -- SSL --> Neon
A1 & A2 & A3 -- HTTPS --> B2
W1 -- SMTP :587 --> FM
H1 & H2 & H3 -. image pull .-> Gitea
```
### ASCII fallback
```
┌─────────────────────┐
│ End user │
└──────────┬──────────┘
│ HTTPS :443
┌─────────────────────┐
│ Cloudflare edge │ TLS terminates here
│ (SSL = Flexible) │
└──────────┬──────────┘
HTTP :80 round-robin
┌─────────────┼─────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ hetzner1 │ │ hetzner2 │ │ hetzner3 │
│ 178.104.247.152 │ │ 178.105.32.198 │ │ 178.104.249.189 │
│ Traefik :80/443 │ │ Traefik :80/443 │ │ Traefik :80/443 │
│ api worker │ │ api redis │ │ api admin │
└─────────┬───────┘ └─────────┬───────┘ └─────────┬───────┘
│ │ │
└──────── Kubernetes overlay ───────────┘
┌─────────────────────────────┴──────────────────────────────┐
│ │
▼ ▼ ▼ ▼
┌─────────┐ ┌─────────────┐ ┌──────────┐ ┌───────────────┐
│ Neon │ │ Backblaze B2│ │ Fastmail │ │ Gitea Registry│
│Postgres │ │ uploads │ │ SMTP │ │ image pull │
└─────────┘ └─────────────┘ └──────────┘ └───────────────┘
```
## The stack, one layer at a time
### Layer 0 — Hardware
Three Hetzner Cloud CX33 instances (4 vCPU, 8 GB RAM, 80 GB NVMe SSD) in
Hetzner's Nuremberg (nbg1) datacenter. Each node is $7.99/mo (April 2026
pricing), totaling ~$24/mo. See [Chapter 1](./01-infrastructure.md).
### Layer 1 — Operating system
Ubuntu 24.04.3 LTS. Each node has:
- SSH on port 22, key-only auth, `deploy` user with NOPASSWD sudo
- `ufw` firewall with strict default-deny-incoming; specific ports allowed
per Chapter 4
- Sysctl override `net.ipv4.ip_unprivileged_port_start=0` so non-root
containers can bind privileged ports (needed for Traefik to serve :80/:443)
### Layer 2 — Container runtime
`containerd` v2.2.2 (bundled with K3s). Docker was previously installed from
the Swarm era but is now disabled. containerd is Kubernetes' reference
runtime and has a smaller footprint than Docker's full stack.
### Layer 3 — Orchestrator
K3s v1.34.6 in HA mode. All 3 nodes are `control-plane,etcd` (Raft quorum
of 3 — can tolerate one node failure). K3s is a minimal Kubernetes
distribution from Rancher Labs (now Suse): single-binary, embedded etcd
instead of a separate etcd cluster, sane defaults for small installations.
See [Chapter 2](./02-orchestrator-choice.md) for why k3s over full Kubernetes
or Docker Swarm.
### Layer 4 — Cluster networking
- **Flannel VXLAN** for pod-to-pod overlay (default on K3s). VXLAN tunnels
pod traffic over UDP port 8472 between nodes.
- **CoreDNS** for service discovery (what pods call `api` or `redis` to
reach each other).
- **kube-proxy** in IPVS mode for ClusterIP → pod routing.
[Chapter 3](./03-networking.md) walks through a single request to show
every hop.
### Layer 5 — Ingress
**Traefik v3** as a DaemonSet with `hostNetwork: true`. Each node has a
Traefik pod that binds directly to the node's public :80 and :443. No
`servicelb`, no Hetzner Load Balancer — Cloudflare round-robins the three
node IPs in DNS and any node can serve any request. See
[Chapter 6](./06-traefik-ingress.md).
### Layer 6 — Edge / CDN
Cloudflare Free plan. Proxied A records for `api.myhoneydue.com`,
`admin.myhoneydue.com`, and `myhoneydue.com` each point at all three node
IPs. Edge handles TLS termination (SSL=Flexible), DDoS protection, caching
for static assets, and traffic failover if a node becomes unreachable.
See [Chapter 13](./13-cloudflare.md).
### Layer 7 — Application services
| Service | Type | Replicas | Image |
|---|---|---|---|
| `api` | Go (Echo, GORM) | 3 | `gitea.treytartt.com/admin/honeydue-api:<sha>` |
| `admin` | Next.js 16 | 1 | `gitea.treytartt.com/admin/honeydue-admin:<sha>` |
| `worker` | Go (Asynq) | 1 | `gitea.treytartt.com/admin/honeydue-worker:<sha>` |
| `redis` | redis:7-alpine | 1 | Docker Hub |
See [Chapter 7](./07-services.md).
### Layer 8 — External dependencies
- **Neon Postgres** (Launch plan) — `honeyDue` database
- **Backblaze B2** — `honeyDueProd` bucket for user uploads
- **Fastmail SMTP** — transactional email
- **Gitea** (self-hosted at `gitea.treytartt.com`) — container registry
- **Cloudflare** — DNS, TLS, CDN
See [Chapter 8](./08-database.md), [9](./09-storage.md), and
[11](./11-registry.md).
## What's deliberately absent
- **TLS at origin.** Cloudflare terminates TLS at the edge and talks HTTP
on port 80 to the nodes. This is "Flexible SSL" in Cloudflare terminology.
It's the simplest setup; we have a TODO to upgrade to "Full (strict)" with
Cloudflare Origin CA certs ([Chapter 13](./13-cloudflare.md), §Future).
- **Hetzner Load Balancer.** We save the $8.49/mo by having Cloudflare
round-robin across node IPs directly. If any node is unresponsive,
Cloudflare's own origin health checks will route around it within 30s.
- **Push notifications.** APNs (iOS) and FCM (Android) are *configured off*
until we have Apple Developer / Google Play accounts. The env vars are
set to sentinel values that let the Go app boot; `FEATURE_PUSH_ENABLED=false`
gates all call sites.
- **External metrics/monitoring (Prometheus, Grafana, Betterstack).**
Right now we rely on `kubectl logs`, `kubectl top`, and Cloudflare's own
analytics. See [Chapter 15](./15-observability.md) for what's there and
what we'd add.
- **Automated backups of Redis state.** Redis is configured with AOF
(append-only file) persistence, but the PVC is only on one node. Redis
holds only cache + Asynq queue state; losing it re-populates on first
request / next cron tick. Not critical.
- **Admin panel basic auth (Traefik middleware).** In-app admin login is
enabled; the extra Traefik-layer basic auth the scaffold supports is not
currently attached.
## The deployment pipeline in one paragraph
Changes to application code are built on your workstation by
`docker buildx build --platform linux/amd64 --push`, which cross-compiles
from arm64 (Apple Silicon) to amd64 (Hetzner nodes) and pushes directly to
`gitea.treytartt.com`. Manifests live in `deploy-k3s/manifests/`; they
reference image tags by git short SHA. `kubectl apply -f` rolls the new
image in with `maxUnavailable: 0, maxSurge: 1` — one new pod at a time,
old one stays up until new is healthy. Service discovery by Kubernetes
DNS means `api` and `admin` hostnames always resolve to live backing pods;
traffic shifts the moment a new pod passes its readiness probe.
[Chapter 14](./14-deployment-process.md) walks through a complete deploy.
## What we *used* to have (the short version)
Up until 2026-04-24 this stack ran on **Docker Swarm** on the same three
Hetzner boxes. It worked, but the Docker libnetwork service-discovery
layer has a bug in the 29.x line ([moby/moby#52265][moby-52265]) that
leaves stale DNS A-records behind when tasks migrate between nodes. We
hit it: the admin panel returned 502s for ~50% of requests through
Cloudflare because Caddy (our previous reverse proxy) was dialing a ghost
IP that had since been recycled to the Dozzle log viewer. We spent four
hours trying increasingly clever workarounds (dnsrr vs VIP,
`dynamic a` DNS refresh, global mode, host-mode ports, host.docker.internal,
hardcoded node IPs) before concluding that libnetwork state corruption
survives every non-nuclear fix.
The full autopsy is in [Chapter 19 — Swarm Postmortem](./19-postmortem-swarm.md).
K3s uses CoreDNS and has no libnetwork history; the bug class doesn't
exist there.
[moby-52265]: https://github.com/moby/moby/issues/52265
+294
View File
@@ -0,0 +1,294 @@
# 01 — Infrastructure
## Summary
Three Hetzner Cloud CX33 virtual machines in the Nuremberg (nbg1) datacenter
form the compute foundation. Each is a 4 vCPU / 8 GB RAM / 80 GB NVMe SSD
instance on Hetzner's shared-CPU "Cloud" line. Total compute cost is
$23.97/mo. This chapter explains each node spec in detail, why we picked
Hetzner and this tier specifically, and the rejected alternatives.
## Node specifications
All three nodes are identical. Specs per node:
| Spec | Value |
|---|---|
| Provider | Hetzner Cloud (`www.hetzner.com/cloud`) |
| Instance type | CX33 (shared-CPU line) |
| vCPU | 4 |
| RAM | 8 GB |
| Disk | 80 GB NVMe SSD |
| Network | 20 TB/mo outbound included |
| IPv4 address | Public dedicated |
| IPv6 address | /64 subnet |
| Region | `nbg1` (Nuremberg, Germany) |
| OS | Ubuntu 24.04.3 LTS (HWE kernel 6.8.0-90-generic) |
| Price | **$7.99/mo** (April 2026) ⁽¹⁾ |
⁽¹⁾ Hetzner applied a price adjustment on 2026-04-01 — CX33 went from
~$6.59 to $7.99. See [Hetzner price adjustment announcement][hetzner-prices].
### The three nodes
| SSH alias | Public IPv4 | IPv6 | k3s hostname |
|---|---|---|---|
| `hetzner1` | 178.104.247.152 | `2a01:4f8:1c18:79c7::1` | `ubuntu-8gb-nbg1-2` |
| `hetzner2` | 178.105.32.198 | `2a01:4f8:1c18:5ecf::1` | `ubuntu-8gb-nbg1-1` |
| `hetzner3` | 178.104.249.189 | `2a01:4f8:1c18:241a::1` | `ubuntu-8gb-nbg1-3` |
**Naming quirk.** The SSH-alias numbers and the Hetzner-assigned hostname
numbers do not match (`hetzner1` is `nbg1-2`, `hetzner2` is `nbg1-1`). This
is because the Hetzner hostnames are assigned in server-creation order; the
SSH aliases were set up later in the order we wanted to refer to them. We
chose not to rename the hosts — renaming `hostname` on a Kubernetes node
after it joins the cluster causes problems (node certificates, etcd
identity, etc. tie to the hostname). Living with the quirk is easier than
rebuilding. See the mapping table in [the README](./README.md).
## Why Hetzner
### Decision matrix
Compared at the time of purchase (~2026-04-23):
| Provider | Instance | vCPU / RAM / SSD | Price/mo | Traffic/mo |
|---|---|---|---:|---|
| **Hetzner** | **CX33** | **4 / 8 GB / 80 GB** | **$7.99** | **20 TB** |
| DigitalOcean | General-purpose | 2 / 8 GB / 25 GB | $63 | 4 TB |
| DigitalOcean | Basic | 4 / 8 GB / 160 GB | $48 | 5 TB |
| Vultr | High Perf | 4 / 8 GB / 180 GB | $48 | 5 TB |
| Linode (Akamai) | Shared | 4 / 8 GB / 160 GB | $48 | 5 TB |
| OVHcloud | VPS 2026 4vC | 4 / 8 GB / 75 GB | ~$13 | unlimited |
| Contabo | Cloud VPS 2 | 4 / 8 GB / 200 GB | $8 | 32 TB |
| Netcup | VPS 1000 G11 | 4 / 8 GB / 256 GB | ~$6 | unlimited |
| Oracle Always Free | ARM Ampere | up to 4 / 24 GB / 200 GB | $0 | 10 TB | *availability lottery* |
**Why Hetzner won:**
1. **Price/performance at this tier is best-in-class among mainstream hosts.**
Similar specs at DigitalOcean/Vultr/Linode cost 6× as much. You're paying
the "American managed cloud" premium there for UX polish we don't need.
2. **Dedicated IPv4 + /64 IPv6 + 20 TB traffic included.** No overage anxiety
at this scale; 20 TB is multiple months of anticipated traffic for a
bootstrapped app.
3. **European datacenter, GDPR-native.** honeyDue serves users in
multiple regions; if EU users dominate, Nuremberg is fast. US users pay
about +100 ms over a US-East host, which is well within Cloudflare-cached
tolerances for most app traffic.
4. **Mature API + `hcloud` CLI** for automation if we ever need it.
5. **Hetzner Cloud Firewall is free** and rule-for-rule equivalent to AWS
Security Groups / DO Cloud Firewall. We use UFW on the nodes instead
(Chapter 4) because our rule set evolved ad-hoc and moving it to the
provider's firewall is a small cleanup project.
**Why not the cheaper options:**
- **Netcup** is ~$1/mo cheaper per node with more disk, but its API is
barebones, the account/billing UX is more fiddly, and their network
routing in the US (where the operator is based) has more hops than
Hetzner's.
- **Contabo** is the cheapest, but the company has a reputation for
oversubscribed nodes. For a production service, unpredictable CPU steal
and disk I/O variance is not worth saving $0/node. Contabo is fine for
non-critical workloads; it's a poor fit for prod.
- **Oracle Cloud Always Free** is genuinely free (4 ARM cores + 24 GB RAM)
but:
- Requires ARM64 builds (we build on ARM but would need to not need
cross-compile — see Chapter 11 for why amd64 matters)
- Capacity for free accounts is a lottery; instance creation fails
"out of capacity" more often than it succeeds
- Oracle has reclaimed idle free-tier instances in the past
### Why not the premium options
DigitalOcean, Vultr, and Linode are excellent products with better UX than
Hetzner. They were rejected because at honeyDue's current scale the 36×
price multiplier doesn't buy anything we'd use:
- We don't need managed databases, object storage, or load balancers from
the same provider — those are Neon, Backblaze, and Cloudflare
- We don't need their monitoring dashboards — Cloudflare Analytics +
`kubectl top` + future Prometheus cover it
- The UI polish matters mostly for day-1 setup; ongoing operations are
`kubectl` and `ssh`
When honeyDue has enough revenue that an engineer's time is worth more than
$40/mo, we'd consider moving for the better tooling. Not yet.
## Why Nuremberg (`nbg1`)
Hetzner has datacenters in Nuremberg (nbg1), Falkenstein (fsn1), Helsinki
(hel1), Ashburn (ash), and Hillsboro (hil). Nuremberg was picked because:
- The operator's primary user base is expected to be mixed US/EU
- Within the EU, Nuremberg is the most central from a peering perspective
(well-connected to DE-CIX, Europe's largest internet exchange)
- Falkenstein is Hetzner's main datacenter and tends to have longer
provisioning queues during capacity crunches; Nuremberg is smaller and
more available
For a US-only userbase, Ashburn (ash) or Hillsboro (hil) would be better
picks — US users would see ~20 ms instead of ~120 ms.
Cloudflare's edge caches most assets, so the origin location matters mostly
for first-request / uncached / POST traffic.
## Why three nodes
**Raft quorum and fault tolerance.** K3s in HA mode uses Raft consensus
(via embedded etcd) for cluster state. Raft requires a majority of nodes
to agree on every write. Quorum formulas:
| Total managers | Quorum | Max failures tolerated |
|---|---|---|
| 1 | 1 | 0 |
| 2 | 2 | 0 |
| 3 | 2 | 1 |
| 4 | 3 | 1 |
| 5 | 3 | 2 |
Three is the smallest odd number that tolerates a failure, and three is
where price/resilience is sweetest. Five nodes doesn't help until you need
to tolerate *two* simultaneous failures — a scale concern that doesn't
apply at our traffic volume.
Two nodes is worse than one: you still have single-failure intolerance
(one down = no quorum), but you've doubled your cost and failure surface.
Avoid even-node clusters for consensus systems.
## Node hardening
Each node was bootstrapped with:
1. **Docker installed** from `download.docker.com` using the stable repo
(this was the original Swarm setup; still installed but disabled — k3s
bundles its own containerd).
2. **`deploy` user created** with:
- Home directory
- Bash as login shell
- Member of `docker` group (historical, when Swarm was the orchestrator)
- Member of `sudo` group with `NOPASSWD: ALL` in `/etc/sudoers.d/deploy`
3. **SSH key installed** at `/home/deploy/.ssh/authorized_keys`
- The key is the public half of `~/.ssh/hetzner` on the operator
workstation (`ssh-ed25519`, 256 bits)
4. **`/opt/honeydue/deploy`** directory created, owned by `deploy`
(originally for Swarm deploy bundle drop zone; unused now)
5. **Sysctl** `net.ipv4.ip_unprivileged_port_start=0` persisted to
`/etc/sysctl.d/99-unprivileged-ports.conf`. Required so Traefik (running
as UID 65532) can bind `:80` and `:443` in the host network namespace.
The full bootstrap script is at `/tmp/honeydue_bootstrap.sh` on the
operator workstation (used during the initial Swarm setup — see
[Chapter 19](./19-postmortem-swarm.md) for context).
## Cost breakdown
```
3 × Hetzner CX33 $23.97/mo
Hetzner network traffic $0 (20 TB/mo included per node, nowhere near it)
Neon Postgres (Launch) $5-15/mo (usage-based, ~$5 min)
Backblaze B2 <$1/mo (tiny upload volume currently)
Cloudflare Free $0
Gitea (self-hosted) $0 (the operator's existing Gitea)
─────────────────────────────────
Total infra ~$30-40/mo
```
See [Chapter 18 — Cost](./18-cost.md) for a full breakdown including
external SaaS (Fastmail, Apple Developer, etc.) and at-scale projections.
## Provisioning workflow
Nodes were provisioned manually through Hetzner Cloud Console. This is
fine for a three-node cluster; for larger clusters we'd switch to the
[`hetzner-k3s`][hetzner-k3s] Ruby tool that the `deploy-k3s/` scaffold
expects. The manual steps were:
1. Create project in Hetzner Cloud Console.
2. Upload SSH key (`hetzner.pub`).
3. Create 3× CX33 servers in `nbg1` with Ubuntu 24.04.
4. SSH in as `root`, run bootstrap to create `deploy` user and install
Docker / later k3s.
5. Apply Hetzner Cloud Firewall rules at the network edge *optional* (we
use UFW per Chapter 4 instead).
A future greenfield deployment would run `deploy-k3s/scripts/01-provision-cluster.sh`,
which does all of this in one shot via the `hetzner-k3s` CLI.
## Upgrade / replacement plan
**Node failure.** If a node becomes unreachable, the other two retain
Raft quorum and the cluster continues accepting writes. Pods from the
failed node get rescheduled to the survivors (so long as the survivors
have spare capacity — see Chapter 16). To replace the dead node:
1. Delete it from the cluster: `kubectl delete node <name>`
2. Create a replacement CX33 in Hetzner console
3. Install k3s on it with `--server=https://<manager>:6443`
4. Verify `kubectl get nodes` shows it as Ready
**Scaling up.** To add a fourth node, same procedure without deleting
anything. Consider whether you want it as a server (adds to Raft quorum;
must also add up to an odd total) or an agent (worker-only). K3s agents
join with `INSTALL_K3S_EXEC=agent` instead of `server`.
**Upgrading K3s.** K3s has a minor release every ~3 months. Upgrade by
running the install script with the new version on each node, one at a
time, verifying cluster health between each. See
[Chapter 17](./17-runbook.md) for the detailed procedure.
**Upgrading the OS.** Ubuntu 24.04 LTS is supported until 2029.
`unattended-upgrades` is *not* currently installed, so OS patches require
manual `apt upgrade`. Install `unattended-upgrades` when time permits —
security patches are important and automation reduces the risk of
falling behind.
## Physical location & regulatory
- **Sovereignty**: Hetzner is headquartered in Gunzenhausen, Germany.
All data at rest in `nbg1` is subject to German law and the GDPR.
- **User data**: Most user data actually lives in
**Neon Postgres (AWS us-east-1, Virginia)** and **Backblaze B2
(us-east-005, South Carolina)** — both US-hosted. EU users' data
therefore *exits* the EU in the API path. If strict EU data residency
is ever a requirement, Neon has a EU region (Frankfurt) and Backblaze
has EU endpoints; switching is a configuration change, not an
architectural one.
- **Encryption at rest**: Hetzner encrypts node-local disks at the
hypervisor layer. Neon encrypts at the AWS EBS layer. B2 encrypts
objects server-side. None of our application code or config holds
secrets at rest that aren't already in Kubernetes Secrets (which
are stored in etcd; etcd on disk is unencrypted by default in k3s
but see Chapter 5 for hardening).
## Operator cheat sheet
```bash
# SSH to any node
ssh -i ~/.ssh/hetzner deploy@hetzner1
# Check node health
kubectl get nodes -o wide
# Per-node resource usage
kubectl top nodes
# See what's on each node
kubectl get pods -A -o wide | sort -k 8
# Hetzner console (in browser)
# https://console.hetzner.cloud/
```
## References
- [Hetzner Cloud product page][hetzner-cloud]
- [Hetzner price adjustment April 2026][hetzner-prices]
- [hetzner-k3s tool][hetzner-k3s]
- [K3s architecture docs][k3s-arch]
[hetzner-cloud]: https://www.hetzner.com/cloud/
[hetzner-prices]: https://docs.hetzner.com/general/infrastructure-and-availability/price-adjustment/
[hetzner-k3s]: https://github.com/vitobotta/hetzner-k3s
[k3s-arch]: https://docs.k3s.io/architecture
+323
View File
@@ -0,0 +1,323 @@
# 02 — Orchestrator Choice
## Summary
We run K3s — a lightweight Kubernetes distribution from SUSE/Rancher Labs.
This wasn't our first choice. We originally deployed on Docker Swarm and
spent a long afternoon hitting a libnetwork bug before migrating. This
chapter walks through the comparison of the three realistic orchestrators
(Docker Swarm, full Kubernetes, and K3s) and a fourth (Nomad) we
considered and rejected. The story of the Swarm→k3s migration is in
[Chapter 19](./19-postmortem-swarm.md); this chapter is about the decision
framework.
## The decision
**K3s v1.34.6+k3s1**, HA mode, three control-plane nodes with embedded etcd.
## Candidates considered
| | Docker Swarm | K3s | Full Kubernetes (kubeadm) | Hashicorp Nomad |
|---|---|---|---|---|
| Learning curve | Easiest | Medium | Hardest | Easy |
| Install on 3 nodes | `docker swarm init/join` | `curl \| sh` per node | Many steps | `nomad server/agent` |
| Memory footprint (control plane) | ~200 MB per node | ~500 MB per node | ~1 GB per node | ~200 MB per node |
| Service discovery | libnetwork (buggy) | CoreDNS | CoreDNS | Consul |
| HA quorum | Raft (3+ managers) | Raft via embedded etcd (3+ servers) | etcd cluster (3+ nodes) | Raft (3+ servers) |
| Secrets management | Swarm secrets | k8s Secrets | k8s Secrets | Vault or file-backed |
| Rolling updates | Swarm update_config | Deployments | Deployments | job update stanza |
| Ingress | None (third-party) | Traefik bundled | None (install yourself) | None (install yourself) |
| Active development | Maintenance mode | Active | Active | Active |
| Industry momentum | Declining | Growing | Dominant | Niche |
## Why K3s
### Against Docker Swarm
Swarm was our first pick because it's the simplest "production-like"
option. `docker swarm init` gives you a working cluster in seconds. It's
built into the Docker daemon you already have.
What killed it:
1. **libnetwork state bugs.** Swarm's service discovery relies on
libnetwork's gossip-backed service registry. When a service's task
migrates between nodes, the old endpoint record isn't always removed
cleanly — especially on encrypted overlays or during transient network
partitions. The result: stale DNS A-records that persist indefinitely,
survive service removal, survive containerd restarts, survive pretty
much everything except recreating the overlay network. Multiple open
issues track this: [moby/moby#52265][moby-52265],
[moby/moby#51491][moby-51491], [Dokploy#3480][dokploy-3480].
2. **It's in maintenance mode.** Mirantis [committed to supporting
Swarm through 2030][mirantis-swarm] as part of Mirantis Kubernetes
Engine 3, but nothing is being actively developed. The libnetwork code
has no champion; bug fixes land slowly and often incompletely (the
29.0.0 partial fix for #50236, the 29.3.0 regression, the pending
follow-up in #52289 — months apart).
3. **Industry signal.** Every 2026 write-up of "should I pick Swarm"
reaches the same conclusion: run what works; don't bet new workload on
it. [Better Stack][bstack-swarm] and [VirtualizationHowTo][vht-swarm]
are representative.
The [Chapter 19 postmortem](./19-postmortem-swarm.md) details the specific
bug we hit, the workarounds we tried, and why each failed.
### Against full Kubernetes (kubeadm)
Full Kubernetes is the de-facto standard. It has the biggest ecosystem, the
most documentation, the most mindshare. Against it:
1. **Operational overhead.** A kubeadm-built cluster has ~6 control-plane
processes (kube-apiserver, etcd, kube-scheduler, kube-controller-manager,
kube-proxy, kubelet) each of which needs monitoring, upgrading, and
understanding. K3s bundles them into a single binary with sensible
defaults.
2. **Memory.** A kubeadm control plane wants ~1 GB RAM baseline per master
node. On an 8 GB node that's 12% gone before any workload runs. K3s is
~500 MB per master.
3. **Etcd.** Full Kubernetes expects a separate 3+ node etcd cluster for
HA, typically on the same masters but as an independent process. K3s
embeds etcd in the server binary; still Raft, still HA, but one less
thing to install/upgrade/monitor.
4. **Cluster creation UX.** `kubeadm init` + certificate distribution + CNI
install + storage class setup is a multi-step dance. K3s `curl -sfL
https://get.k3s.io | sh -s - server --cluster-init` plus two joins is a
10-minute cluster.
**What we'd lose by not using full Kubernetes:** nothing that matters at
our scale. K3s is 100% Kubernetes API-compatible. Every `kubectl` command,
every Helm chart, every manifest works identically. If we ever need to
migrate to full Kubernetes, `kubectl get all -A -o yaml` gives us the
entire state and we re-apply it on the new cluster.
### Against Hashicorp Nomad
Nomad is very good at what it does — simpler than Kubernetes, more robust
than Swarm, has real load balancing (via Consul Connect), and the
`nomad agent` binary is ~80 MB vs k3s' ~200 MB.
Against it:
1. **Ecosystem is smaller.** Far fewer community Helm charts, operators,
tutorials. Every new component needs bespoke integration.
2. **Service discovery requires Consul.** Two products to operate, not one.
3. **Ingress requires a separate tool** (Traefik, HAProxy, Fabio). K3s
bundles Traefik by default.
4. **Secrets management** requires Vault or relies on Nomad's template
stanza. Not bad, but more moving parts.
5. **The operator hasn't used Nomad in production before.** Learning curve
on a new platform during a prod migration is a bad trade.
Nomad would be a defensible choice. K3s won primarily on ecosystem
maturity and the operator's familiarity with Kubernetes primitives.
## What K3s actually is
K3s is a CNCF Sandbox project (now graduated to Rancher/SUSE-backed)
originally designed for edge and IoT. Its design goals:
- Single ~200 MB static binary
- Works on ARM64 and AMD64
- Bundles everything needed for a working cluster: containerd, Flannel,
CoreDNS, Traefik, metrics-server, local-path storage provisioner, and
(optionally) servicelb (klipper-lb) load balancer
- Replaces the kubeadm setup dance with `curl | sh`
- Replaces etcd-in-its-own-cluster with embedded etcd (or SQLite for
single-node)
- Replaces Docker with containerd (though you can opt back into Docker)
It is **not** a fork of Kubernetes. K3s is Kubernetes, packaged differently.
The Kubernetes Go code it wraps is unmodified (aside from build-time
stripping of cloud provider integrations you don't need). `kubectl`,
the API, CRDs, operators — all identical.
## HA architecture we chose
```mermaid
flowchart TB
subgraph Cluster[k3s HA cluster]
subgraph N1[hetzner1]
K1[k3s server]
E1[etcd]
KUB1[kubelet]
TR1[Traefik pod<br/>hostNet :80/:443]
P1[app pods]
end
subgraph N2[hetzner2]
K2[k3s server]
E2[etcd]
KUB2[kubelet]
TR2[Traefik pod<br/>hostNet :80/:443]
P2[app pods]
end
subgraph N3[hetzner3]
K3[k3s server]
E3[etcd]
KUB3[kubelet]
TR3[Traefik pod<br/>hostNet :80/:443]
P3[app pods]
end
end
E1 <--Raft--> E2 <--Raft--> E3
E1 <--Raft--> E3
K1 & K2 & K3 --- API[kube-apiserver<br/>port 6443]
```
### ASCII fallback
```
hetzner1 hetzner2 hetzner3
┌──────────┐ ┌──────────┐ ┌──────────┐
│ k3s srv │ │ k3s srv │ │ k3s srv │
│ ├ etcd ─┼──────┼ ├ etcd ──┼──────┼─ etcd │ │
│ │ :6443│ │ │ :6443│ │ :6443│ │
│ ├ kubelet │ ├ kubelet │ kubelet│
│ └ pods │ │ └ pods │ │ pods │ │
└──────────┘ └──────────┘ └──────────┘
│ ▲ │ ▲ │ ▲
│ └─── Raft ────┤ └─── Raft ────┘ │
└────────── Raft ─┴─────────────────────┘
```
All three nodes are **server** nodes (in k3s terminology) — they all run
`kube-apiserver`, `kube-scheduler`, `kube-controller-manager`, and
participate in etcd Raft consensus. A fourth "agent" node could be added
as worker-only; we don't need that capacity yet.
**Quorum**: 2 out of 3 nodes must agree on writes. The cluster stays
operational if any one node dies. Two dying nodes = cluster loses quorum
(Raft halts) until a majority returns.
## What we disabled
We ran k3s install with `--disable=servicelb`. `servicelb` (a.k.a.
`klipper-lb`) is a trick where k3s spawns a daemonset that listens on a
node's host ports and proxies to `LoadBalancer`-typed services. Fine for
dev; we don't need it because we handle ingress with Traefik in
DaemonSet+hostNetwork mode (Chapter 6).
We did **not** disable:
- **traefik** — we reconfigured it via HelmChartConfig rather than
disable-and-replace. See Chapter 6.
- **local-path-provisioner** — provides the default `StorageClass` we use
for Redis PVC (Chapter 7).
- **metrics-server** — required for `kubectl top` and HorizontalPodAutoscaler.
- **coredns** — the cluster DNS. Essential for service discovery.
## Version choices
### K3s v1.34.6+k3s1
This was the latest stable K3s release as of 2026-04-24. K3s follows
upstream Kubernetes' release cadence — `1.34` matches Kubernetes 1.34.x.
The `+k3s1` suffix is the K3s build number within that upstream version.
**Upgrade policy**: K3s supports one minor version per quarter. We'd
upgrade in place to 1.35 when it's been out ~30 days and has no open
critical bugs in the release notes. See Chapter 17 for the procedure.
### containerd v2.2.2
Bundled with K3s. containerd 2.x brought full support for the
`cri-dockerd` replacement API and performance improvements over 1.x.
We don't pin containerd separately — we take whatever K3s ships.
### Flannel (VXLAN backend)
Bundled with K3s as the default CNI. Flannel's VXLAN backend is
straightforward, performant enough, and has worked reliably in every K3s
install we've seen. Alternatives (Calico, Cilium) are more featureful but
add operational complexity.
See [Chapter 3](./03-networking.md) for a deep dive on the networking
layer.
## What we did NOT choose from K3s' ecosystem
- **servicelb / klipper-lb** — off. Reason above.
- **embedded SQLite** — on single-node k3s, SQLite replaces etcd. We're
multi-node, so this doesn't apply.
- **`--flannel-backend=wireguard-native`** — WireGuard-encrypted overlay.
We didn't enable it because (a) VXLAN already works, (b) our node-to-node
traffic stays within Hetzner's internal network anyway, and (c) we haven't
proven we need it. Encryption is a TODO (Chapter 20).
## Raft and split-brain behavior
If the 3 nodes become network-partitioned such that one node sees the
other two and vice versa (a "2-1 split"):
- **Majority partition (2 nodes)** — retains quorum, cluster keeps
accepting writes. Pods on those 2 nodes keep running. Pods on the
isolated node eventually get marked `NotReady` after
`node-monitor-grace-period` (default 40s), and after
`pod-eviction-timeout` (default 5 min) their pods are marked for
eviction and rescheduled onto the surviving nodes.
- **Minority partition (1 node)** — loses quorum. API server on that
node refuses writes; existing pods keep running (kubelet doesn't need
the API server for already-scheduled pods), but nothing new can deploy,
scale, or reschedule.
When the partition heals, Raft reconciles automatically. The minority
node catches up on etcd state via snapshot+replay.
**Worst case** (all 3 isolated from each other): no quorum, no node is
authoritative. Pods keep running from existing state; nothing can be
updated. This requires all three nodes losing network to each other
simultaneously, which implies Hetzner's entire internal switching is
broken — at that point, the whole region is likely down anyway.
## Our decision in one sentence
K3s gave us the Kubernetes API (enormous ecosystem, known primitives, our
existing scaffold in `deploy-k3s/manifests/`) without the operational
overhead of kubeadm; and unlike Swarm, its service-discovery layer is
rock-solid.
## Operator cheat sheet
```bash
# On any k3s server node, root commands use k3s-wrapped kubectl:
sudo k3s kubectl get nodes
# From workstation, use the copied kubeconfig:
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
kubectl get nodes
# Check k3s service:
ssh deploy@hetzner1 "sudo systemctl status k3s"
# Watch cluster events live:
kubectl get events -A --watch
# See what's on each node:
kubectl get pods -A -o wide | sort -k 8
```
## References
- [K3s architecture][k3s-arch]
- [K3s requirements][k3s-reqs]
- [Mirantis Swarm support announcement][mirantis-swarm]
- [moby/moby#52265 — libnetwork stale records][moby-52265]
- [moby/moby#51491 — DNS broken after swarm init][moby-51491]
- [Dokploy #3480 — Traefik stale VIP on Swarm][dokploy-3480]
- [Better Stack: Hetzner Cloud Review 2026][bstack-swarm]
- [VirtualizationHowTo: Is Docker Swarm Still Safe in 2026?][vht-swarm]
[k3s-arch]: https://docs.k3s.io/architecture
[k3s-reqs]: https://docs.k3s.io/installation/requirements
[mirantis-swarm]: https://www.mirantis.com/blog/mirantis-guarantees-long-term-support-for-swarm/
[moby-52265]: https://github.com/moby/moby/issues/52265
[moby-51491]: https://github.com/moby/moby/issues/51491
[dokploy-3480]: https://github.com/Dokploy/dokploy/issues/3480
[bstack-swarm]: https://betterstack.com/community/guides/web-servers/hetzner-cloud-review/
[vht-swarm]: https://www.virtualizationhowto.com/2026/03/is-docker-swarm-still-safe-in-2026/
+465
View File
@@ -0,0 +1,465 @@
# 03 — Networking
## Summary
The network stack has five layers: the physical/internet layer (Hetzner's
public network), the node layer (Ubuntu with UFW), the Kubernetes overlay
(Flannel VXLAN), the service layer (kube-proxy IPVS + CoreDNS), and the
ingress layer (Traefik). This chapter walks through each, explains how
they compose, and traces a single HTTP request from browser to Go API
response showing every hop.
## The five layers
```mermaid
flowchart TB
subgraph L5[Layer 5 — Ingress]
Traefik
end
subgraph L4[Layer 4 — Service discovery]
KubeProxy[kube-proxy IPVS]
CoreDNS
end
subgraph L3[Layer 3 — Pod overlay]
Flannel[Flannel VXLAN<br/>UDP 8472]
end
subgraph L2[Layer 2 — Node network]
UFW
Kernel[Linux kernel<br/>netfilter/iptables]
end
subgraph L1[Layer 1 — Physical]
Hetzner[Hetzner network<br/>public v4 + v6]
end
L5 --> L4 --> L3 --> L2 --> L1
```
### ASCII fallback
```
┌──────────────────────────────────────┐
│ L5 Traefik (host network, :80/:443)│
├──────────────────────────────────────┤
│ L4 kube-proxy (IPVS) + CoreDNS │
├──────────────────────────────────────┤
│ L3 Flannel VXLAN overlay │
│ 10.42.0.0/16 pod CIDR │
├──────────────────────────────────────┤
│ L2 Ubuntu + UFW + kernel iptables │
├──────────────────────────────────────┤
│ L1 Hetzner public IPv4/IPv6 │
└──────────────────────────────────────┘
```
## Layer 1 — Physical network
Each Hetzner CX33 has:
- A **public IPv4** address on the internet
- A **public IPv6** /64 subnet (one address used, the rest unused)
- **20 TB/mo** outbound traffic included; inbound is free
- **~1 Gbps** network bandwidth per node
All inter-node traffic goes over the **public network**. Hetzner Cloud
offers a private-network feature (vswitch), but we didn't attach one —
adding it now would require reconfiguring Flannel's advertise-addr. A
future improvement: attach a private vSwitch to all three nodes,
reconfigure Flannel to use it, shrink our public-interface attack surface.
## Layer 2 — Node network
Each node runs Ubuntu 24.04.3 LTS with:
- **Default routing** via the Hetzner-provided gateway
- **UFW** as the iptables frontend (Chapter 4 lists every rule)
- **IP forwarding** enabled (`net.ipv4.ip_forward=1`) — required for
Kubernetes pod routing
- **Bridge netfilter** enabled (`net.bridge.bridge-nf-call-iptables=1`)
— required so iptables can see bridged traffic
K3s configures the latter two automatically at install time via
`/etc/sysctl.d/90-kubelet.conf` (or similar; exact file varies by distro).
Two additional sysctls we set manually:
```
# /etc/sysctl.d/99-unprivileged-ports.conf
net.ipv4.ip_unprivileged_port_start=0
```
**Why**: Traefik runs as UID 65532 (non-root) in host network mode to bind
:80 and :443. Without this sysctl, even with `CAP_NET_BIND_SERVICE`, it
can't bind privileged ports in the host namespace. Ubuntu 24.04's default
is 1024 (so ports 11023 are "privileged"). Setting it to 0 lets any
user bind any port.
**Security implication**: Minimal. The ports Traefik binds are still
controlled by the container runtime — other pods on the node can't
accidentally grab 80/443 because kubelet won't schedule conflicting host
ports. And the UFW rules still gate what's reachable externally.
## Layer 3 — Pod overlay (Flannel VXLAN)
### What Flannel is
Flannel is a CNI (Container Network Interface) plugin. Its job: give every
pod in the cluster a routable IP address, and make those IPs reachable
from any other pod regardless of which node they're on.
### The pod CIDR
K3s assigns **10.42.0.0/16** as the cluster-wide pod CIDR by default. Each
node gets a /24 slice:
| Node | Pod CIDR |
|---|---|
| ubuntu-8gb-nbg1-1 | 10.42.1.0/24 |
| ubuntu-8gb-nbg1-2 | 10.42.0.0/24 |
| ubuntu-8gb-nbg1-3 | 10.42.2.0/24 |
Each pod gets an IP from its node's slice. So a pod on hetzner2
(`nbg1-1`) might be `10.42.1.6`; a pod on hetzner3 (`nbg1-3`) might be
`10.42.2.10`.
### How VXLAN works
VXLAN ("Virtual Extensible LAN") tunnels Layer-2 frames over UDP. Flannel
wraps every inter-node packet like so:
```
Original pod → pod packet:
┌──────────────────────────────────────────────────┐
│ Ethernet │ IP src=10.42.0.5 → dst=10.42.2.10 │ … │
└──────────────────────────────────────────────────┘
Flannel VXLAN-encapsulates it:
┌──────────────────────────────────────────────────────────────────┐
│ Eth │ IP src=178.104.247.152 → dst=178.104.249.189 │ UDP 8472 │ │
│ VXLAN header │ <original Ethernet+IP+payload> │ │
└──────────────────────────────────────────────────────────────────┘
```
The outer IP/UDP carries the packet between nodes over Hetzner's public
network. On arrival, the destination node unwraps the VXLAN header and
delivers the inner packet to the target pod.
**UDP port 8472** is VXLAN's IANA-assigned port. It must be open
node-to-node in UFW (see Chapter 4).
**MTU note**: VXLAN encapsulation adds 50 bytes of overhead (8 VXLAN +
8 UDP + 20 IP + 14 Ethernet). Hetzner's network uses standard 1500-byte
MTU, so Flannel's overlay MTU is 1450. Mismatches cause silent packet
drops. K3s sets this correctly by default.
### Flannel config
`/var/lib/rancher/k3s/agent/etc/flannel/net-conf.json` on each node:
```json
{
"Network": "10.42.0.0/16",
"EnableIPv6": false,
"EnableIPv4": true,
"IPv6Network": "::/0",
"Backend": { "Type": "vxlan" }
}
```
We did not enable IPv6 in the cluster — an unnecessary complexity for our
scale, and CoreDNS + kube-proxy + node controllers all work fine in v4-only
mode.
### No encryption (yet)
Flannel VXLAN traffic over Hetzner's public network is **not encrypted**.
This means pod-to-pod traffic between nodes is visible to any attacker
with packet capture on the path — in practice, nobody between our three
nodes at Hetzner Nuremberg, but it's still plaintext on the wire.
**Mitigation today**: All sensitive inter-pod traffic already uses TLS:
- api ↔ Neon Postgres: TLS 1.3 (`DB_SSLMODE=require`)
- api/worker ↔ Backblaze B2: HTTPS
- api ↔ Fastmail: STARTTLS
- api ↔ Redis: plaintext but Redis only holds cache + Asynq queue state,
no user credentials
**TODO** (Chapter 20): Switch Flannel to `wireguard-native` backend. K3s
supports this with a flag at install time; enabling on an existing
cluster requires a config edit and rolling kubelet restart.
## Layer 4 — Service discovery
Pods don't talk to each other by IP — IPs are ephemeral, assigned on pod
creation. They use **service names** resolved by DNS.
### CoreDNS
K3s runs **CoreDNS** as the cluster DNS server. A pod in the `honeydue`
namespace resolves `redis` to the Redis Service's ClusterIP:
```
redis → 10.43.7.10 (Service ClusterIP)
redis.honeydue → 10.43.7.10
redis.honeydue.svc.cluster.local → 10.43.7.10
```
When an app resolves `redis:6379`:
1. The pod's `/etc/resolv.conf` points to `10.43.0.10` (the CoreDNS
Service).
2. CoreDNS receives the query, checks its known Services, returns
`10.43.7.10`.
3. The pod sends TCP to `10.43.7.10:6379`.
4. kube-proxy (Layer 4, below) intercepts and routes to the actual pod IP.
### The service CIDR
K3s assigns **10.43.0.0/16** as the service CIDR. ClusterIPs live here.
Currently:
| Service | ClusterIP |
|---|---|
| `api.honeydue` | 10.43.167.83 |
| `admin.honeydue` | 10.43.136.168 |
| `redis.honeydue` | 10.43.7.10 |
| `kubernetes.default` | 10.43.0.1 |
| `kube-dns.kube-system` | 10.43.0.10 |
ClusterIPs are **stable** for the life of the Service — they don't change
when pods come and go.
### kube-proxy (IPVS mode)
`kube-proxy` is the dataplane component that makes Services work. It runs
as a DaemonSet (one per node), watches the k3s API for Service and
Endpoint changes, and programs the kernel to route traffic.
K3s defaults to **IPVS mode** on modern kernels. IPVS is a Linux kernel
feature for in-kernel L4 load balancing — essentially connection-tracking
NAT with round-robin or other scheduling.
When a pod dials `10.43.7.10:6379`:
1. The first packet hits the node's kernel
2. IPVS sees the destination is a ClusterIP
3. IPVS picks an endpoint from the Service's endpoint set (e.g.,
`10.42.0.10:6379` on hetzner2)
4. IPVS rewrites the destination and forwards
5. Flannel tunnels it to the destination node (if remote) or delivers
locally (if the endpoint is on the same node)
This happens per-TCP-connection, not per-packet, thanks to conntrack.
### Why IPVS over iptables
K3s' default kube-proxy mode is IPVS. The alternative (iptables mode) is
older and slower — for every Service, iptables mode adds a chain of rules
that grow linearly with Service count. IPVS uses a hash table and scales
to thousands of Services without performance degradation. At our scale
either works, but IPVS is the better default.
### Headless Services
Some of our Services are *not* using a ClusterIP — they're "headless"
(`clusterIP: None`). Our setup doesn't currently use them but it's worth
knowing the distinction: headless Services return all endpoint IPs
directly via DNS, no kube-proxy involvement. Useful for stateful sets
where clients need to talk to a specific replica.
## Layer 5 — Ingress (Traefik)
External traffic arrives on the node's public :80 or :443. Traefik
handles the first mile of routing. See [Chapter 6](./06-traefik-ingress.md)
for Traefik-specific details; this section just shows how it fits in the
networking stack.
Traefik runs as a **DaemonSet** with `hostNetwork: true`. That means:
- One Traefik pod per node
- Each pod is in the **host's network namespace**, not a pod netns
- Each pod can bind directly to `0.0.0.0:80` and `0.0.0.0:443` on the node
When Cloudflare sends a request to `178.104.247.152:80`:
1. Packet arrives at hetzner1's NIC
2. UFW accepts (80/tcp is open from anywhere)
3. Linux kernel routes to localhost:80 because something's listening
4. Traefik (running in host namespace) accepts the connection
5. Traefik reads the `Host:` header
6. Traefik matches an Ingress rule (api.myhoneydue.com → api Service)
7. Traefik dials `10.43.167.83:8000` (Service ClusterIP)
8. Kube-proxy IPVS rewrites to a live api pod endpoint
9. Flannel VXLAN tunnels if the endpoint is on a remote node
10. The api pod receives the request, processes, responds
11. Response flows back the reverse path
Full trace in the [end-to-end section](#end-to-end-request-trace) below.
## IPs we care about
| What | CIDR / IP | Used for |
|---|---|---|
| Pod CIDR | 10.42.0.0/16 | All pod IPs cluster-wide |
| Service CIDR | 10.43.0.0/16 | All ClusterIPs |
| Flannel VXLAN | UDP 8472 | Pod-to-pod traffic (inter-node) |
| CoreDNS Service | 10.43.0.10:53 | Cluster DNS |
| Kubernetes Service | 10.43.0.1:443 | Internal kube-apiserver |
| Node IPs | See README | External + flannel source/dst |
| Traefik | host network | Listens on node's :80, :443 |
## End-to-end request trace
A user in Texas hits `https://api.myhoneydue.com/api/tasks/`. Here's every
hop:
```mermaid
sequenceDiagram
autonumber
participant U as User (Austin, TX)
participant CF as Cloudflare edge (DFW POP)
participant H as hetzner2 (picked by CF)<br/>178.105.32.198
participant TR as Traefik pod<br/>(hostNetwork)
participant API as api pod on hetzner3<br/>10.42.2.6:8000
participant DB as Neon Postgres<br/>(AWS us-east-1)
U->>CF: HTTPS :443 GET /api/tasks/
Note over CF: TLS handshake terminates here
CF->>H: HTTP :80 (with original Host header)
H->>TR: Accepted by kernel, delivered to Traefik
Note over TR: Matches Ingress rule<br/>host: api.myhoneydue.com
TR->>TR: Resolve api.honeydue → 10.43.167.83
TR->>H: dial 10.43.167.83:8000
H->>H: kube-proxy IPVS rewrites<br/>dst → 10.42.2.6:8000
H->>API: Flannel VXLAN encapsulate<br/>UDP 8472 → hetzner3
Note over API: Pod receives packet
API->>DB: SELECT … FROM tasks WHERE user_id = …<br/>TLS :5432
DB-->>API: Result rows
API-->>TR: HTTP 200 JSON
TR-->>CF: HTTP 200
CF-->>U: HTTPS 200
```
### Timing budget for a cache-miss read
| Hop | Typical latency |
|---|---|
| User → CF edge (DFW) | 515 ms |
| CF edge → hetzner2 (origin HTTP :80) | 90120 ms (cross-Atlantic) |
| UFW + kernel accept | <1 ms |
| Traefik accept + route | 12 ms |
| kube-proxy + Flannel (same node) | <1 ms |
| kube-proxy + Flannel (remote node, VXLAN) | 13 ms |
| Go API request handling | 15 ms |
| Neon Postgres query (TLS + SQL) | 2060 ms (AWS us-east-1) |
| Return path (reverse) | similar |
**Total typical**: ~200300 ms for a user in North America, dominated by
the cross-Atlantic CF→origin hop. Cached responses at Cloudflare skip the
origin hop entirely.
## Inter-node routing concretely
Here's what `ip route` shows on hetzner2 (not run live, reconstructed from
typical k3s+flannel+vxlan setup):
```
default via 172.31.1.1 dev eth0 # Hetzner gateway
10.42.0.0/24 via 10.42.0.0 dev flannel.1 # to hetzner1 pods (via VXLAN iface)
10.42.1.0/24 dev cni0 # local pods on hetzner2
10.42.2.0/24 via 10.42.2.0 dev flannel.1 # to hetzner3 pods (via VXLAN iface)
10.43.0.0/16 via 10.42.1.1 dev cni0 # services via kube-proxy
```
The `flannel.1` interface is the VXLAN tunnel endpoint. Traffic written
to it gets encapsulated in UDP 8472 and sent to the peer node's public IP.
Flannel learns about peer nodes via the Kubernetes API (it watches Node
resources). When hetzner3 joins, Flannel on hetzner1 and hetzner2 both
learn its public IP and pod CIDR, update their routes and ARP tables,
and traffic just works.
## Network performance
### Within a node (pod to pod, same host)
Packets go through `cni0` bridge, never leave the node. Sub-millisecond
latency, bounded by kernel + veth performance. Easily >10 Gbps.
### Between nodes (pod to pod, different host)
Packets go through Flannel VXLAN. Added overhead: encap/decap in the
kernel (~510 μs), plus the actual network hop between hetzner nodes
(~0.5 ms within the same Hetzner datacenter). Throughput is bounded by
Hetzner's NIC (≈1 Gbps sustained per node).
In practice this is fine for everything we do. The slowest link in our
application is Neon (AWS us-east-1), which is ~100 ms round-trip.
## DNS resolution path
A pod resolves `redis`:
1. App does `getaddrinfo("redis")`.
2. glibc reads `/etc/resolv.conf`, finds nameserver `10.43.0.10`.
3. sends UDP 53 to `10.43.0.10`.
4. Destination is CoreDNS Service ClusterIP.
5. kube-proxy IPVS load-balances across CoreDNS pods (there's usually 1).
6. The packet arrives at the CoreDNS pod.
7. CoreDNS checks its Kubernetes plugin cache for `redis.<ns>.svc.cluster.local`.
8. Returns `10.43.7.10` (redis Service ClusterIP) with a low TTL.
CoreDNS is stateless — if it restarts, pods re-query on their next lookup.
**DNS caching in pods**: The Go API uses `net.Resolver` which does not
cache by default. Each new connection triggers a fresh DNS lookup. This
is correct behavior for Kubernetes (where Service IPs are stable but
Endpoints change), but it means a CoreDNS outage breaks new connections
immediately.
Next.js (admin) also uses Node's default resolver, similar behavior.
## What breaks if X fails
| Failure | Symptom |
|---|---|
| Flannel daemon on one node crashes | Pods on that node can't reach other nodes' pods; kube-proxy Services sometimes work (kernel conntrack) |
| CoreDNS pod crashes (only 1) | New connection DNS lookups fail; existing connections continue |
| kube-proxy daemon on one node crashes | Pods on that node can't resolve Service ClusterIPs; direct pod IPs still work |
| UFW misconfigured (port 8472 UDP blocked) | Pods on that node can't reach remote pods over overlay |
| Node's NIC fails | Node unreachable; Raft loses it; its pods get rescheduled elsewhere |
| Hetzner datacenter outage | Entire cluster offline |
## Operator cheat sheet
```bash
# See all IPs in the cluster
kubectl get pods -A -o wide # pod IPs + nodes
kubectl get svc -A # Service ClusterIPs
# Test pod-to-pod DNS from inside a pod
kubectl exec -n honeydue deploy/api -- nslookup redis
kubectl exec -n honeydue deploy/api -- getent hosts redis
# Test pod-to-pod TCP connectivity
kubectl exec -n honeydue deploy/api -- nc -zv redis 6379
kubectl exec -n honeydue deploy/api -- wget -q -O- http://admin:3000/
# See the node's iptables/IPVS rules (run on a node)
ssh deploy@hetzner1 "sudo ipvsadm -Ln"
ssh deploy@hetzner1 "sudo iptables -L -n -t nat | head -50"
# See the cluster's flannel state
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{" "}{.status.addresses[?(@.type=="InternalIP")].address}{" "}{.spec.podCIDR}{"\n"}{end}'
```
## References
- [Kubernetes networking concepts][k8s-net]
- [Flannel VXLAN backend][flannel-vxlan]
- [CoreDNS k8s plugin][coredns-k8s]
- [IPVS mode for kube-proxy][ipvs]
- [VXLAN RFC 7348][vxlan-rfc]
[k8s-net]: https://kubernetes.io/docs/concepts/services-networking/
[flannel-vxlan]: https://github.com/flannel-io/flannel/blob/master/Documentation/backends.md#vxlan
[coredns-k8s]: https://coredns.io/plugins/kubernetes/
[ipvs]: https://kubernetes.io/blog/2018/07/09/ipvs-based-in-cluster-load-balancing-deep-dive/
[vxlan-rfc]: https://datatracker.ietf.org/doc/html/rfc7348
+357
View File
@@ -0,0 +1,357 @@
# 04 — Firewall
## Summary
Every node runs UFW (Uncomplicated Firewall, a frontend for iptables) with
a default-deny-incoming policy. Specific ports are allowed from specific
sources only. This chapter lists every rule on every node, why each rule
exists, and what breaks without it. It also traces what happens to an
inbound packet as it goes through iptables, UFW, and the kernel.
## Policy
All three nodes have the same UFW config. The policy:
| Direction | Default |
|---|---|
| **Incoming** | **deny** |
| Outgoing | allow |
| Routed | disabled (we don't NAT) |
Default deny is a white-list model: unless a rule explicitly allows a
packet, it's dropped. This is more secure than default-allow but requires
that every legitimate port be enumerated in a rule.
## Current ruleset per node
Run `sudo ufw status verbose` on any node to see the live ruleset. The
canonical ruleset below, grouped by purpose.
### Public-facing (anywhere)
| Port | Protocol | From | Purpose | Comment |
|---|---|---|---|---|
| 22 | TCP | Anywhere | SSH | |
| 80 | TCP | Anywhere | HTTP (Cloudflare → Traefik) | |
| 443 | TCP | Anywhere | HTTPS (future, currently unused at origin) | |
**Why 443 is open but unused**: We're on Cloudflare SSL=Flexible, so
Cloudflare talks to origin over plain HTTP:80. Port 443 on origin is
only hit by misconfigured clients (who bypass CF DNS and hit node IPs
directly). Traefik's config accepts it but we don't require it. Keeping
it open smooths a future switch to Full (strict) SSL mode.
**Future hardening**: Restrict 80 and 443 to Cloudflare's published IP
ranges (15 IPv4 CIDRs, 7 IPv6 CIDRs). See [Chapter 13](./13-cloudflare.md)
for the ranges and the UFW rule format. Today they're open to anyone.
### SSH (operator access)
| Port | Protocol | From | Purpose |
|---|---|---|---|
| 22 | TCP | Anywhere | SSH login (key-only) |
SSH is open to the internet but hardened: key-only auth, no root login,
`AllowUsers deploy` configured (the stock distribution still allows root;
we hardened in bootstrap). See [Chapter 5](./05-security.md) for the full
SSH config.
**TODO** (Chapter 20): Move SSH off :22 to :2222 or similar, tighten to
the operator's current IP. Current state is acceptable given key-only +
fail2ban defaults.
### Kubernetes API (kubectl from operator)
| Port | Protocol | From | Purpose |
|---|---|---|---|
| 6443 | TCP | 47.185.183.191 (operator IP) | kubectl to kube-apiserver |
When the operator's public IP changes (moves, new ISP), this rule needs
updating on all 3 nodes. Ugly but necessary. A better long-term fix is
**Cloudflare Access** or **Tailscale** to avoid pinning operator IPs.
### Inter-node cluster traffic
These rules allow the three nodes to talk to each other for cluster state.
Each node has an allow rule for each of the **three node IPs** (including
its own — the "allow from self" rule exists so local flows are explicit).
| Port | Protocol | From | Purpose |
|---|---|---|---|
| 6443 | TCP | other nodes | kube-apiserver (other servers' talk to each other) |
| 2379 | TCP | other nodes | etcd client (Raft state reads) |
| 2380 | TCP | other nodes | etcd peer (Raft state writes between server nodes) |
| 10250 | TCP | other nodes | kubelet (metrics, exec, logs from API server) |
| 8472 | UDP | other nodes | Flannel VXLAN overlay |
### Application-specific (legacy, mostly superfluous on k3s)
These rules were added during the Swarm era and still exist on the nodes.
None of them hurt anything; most are unused on k3s.
| Port | Protocol | From | Purpose (original) | Status on k3s |
|---|---|---|---|---|
| 2377 | TCP | node IPs | Swarm cluster management | unused (Swarm gone) |
| 7946 | TCP + UDP | node IPs | Swarm gossip | unused |
| 4789 | UDP | node IPs | Swarm VXLAN | unused (k3s uses 8472) |
| (ESP, proto 50) | — | node IPs | IPSec encrypted overlay | unused |
| 500 | UDP | node IPs | IKE key exchange | unused |
| 3000 | TCP | node IPs | admin Next.js, when we tried node-IP hardcoding | unused |
These can be removed in a cleanup pass. They don't affect security because
no process listens on those ports anymore.
## Why each required rule exists
### Port 22 — SSH (public)
Obviously needed for operator access. Without it we'd have no way to
reach the nodes. Hetzner console's "rescue" mode is an emergency fallback.
### Port 80 — HTTP (public)
Cloudflare talks HTTP to origin on port 80 (SSL=Flexible mode). Without
this rule, Cloudflare gets connection-refused and returns 521 to users.
### Port 443 — HTTPS (public)
Currently unused in SSL=Flexible mode. Open to smooth the future
Full-strict migration. No process listens on 443 yet; the kernel would
reject connections. Rule is harmless.
### Port 6443 — kube-apiserver (operator + inter-node)
**From operator IP**: so `kubectl` works. Without this, `kubectl get pods`
times out.
**From other nodes**: server nodes check each other's apiservers for
Raft elections and cross-node controller operations. Without this,
nodes can still run pods but can't participate in cluster state changes.
### Ports 2379/2380 — embedded etcd (inter-node)
K3s runs etcd as an embedded library inside the server binary. The etcd
client port (2379) and peer port (2380) carry Raft protocol messages
between the three servers. **Without these rules, Raft cannot replicate
state and the cluster loses quorum.**
This bit us during the k3s install — initially the joins failed because
2379/2380 were blocked.
### Port 10250 — kubelet (inter-node)
The kubelet on each node exposes a read-only API for the kube-apiserver
to call — `kubectl logs`, `kubectl exec`, kubelet metrics scraping.
Without this rule, operator commands like `kubectl logs -n honeydue
deploy/api` fail with "Error from server: unable to upgrade connection".
### Port 8472 UDP — Flannel VXLAN (inter-node)
Pod-to-pod traffic between nodes flows through VXLAN tunnels on UDP 8472.
**Without this rule, cross-node pod communication silently fails** — which
looks like "admin can't reach api" or "worker can't reach Redis" depending
on where pods land.
This rule is load-bearing. It is the single most important inter-node
rule.
## Inbound packet's journey through UFW/iptables
When a packet arrives at hetzner1's network interface on port 80:
```mermaid
sequenceDiagram
participant NIC as hetzner1 NIC
participant PRE as iptables<br/>raw + mangle + nat PREROUTING
participant FIL as iptables filter INPUT<br/>(UFW lives here)
participant SOCK as Traefik pod socket<br/>(host network)
NIC->>PRE: Packet: SYN :80 from CF
PRE->>PRE: conntrack state: NEW
PRE->>FIL: handoff to INPUT chain
FIL->>FIL: UFW rules evaluated
Note over FIL: Rule: allow 80/tcp from anywhere<br/>→ ACCEPT
FIL->>SOCK: delivered to listening socket
SOCK->>SOCK: Traefik accepts connection
```
UFW is really a set of wrapper chains on top of iptables. `sudo iptables
-L INPUT -n --line-numbers` on any node shows the actual rules; UFW just
makes editing them easier.
## Rule syntax we used
UFW commands we ran during setup (for reference):
```bash
# Reset to default
sudo ufw --force reset
# Default deny incoming
sudo ufw default deny incoming
sudo ufw default allow outgoing
# SSH + web (public)
sudo ufw allow 22/tcp comment 'SSH'
sudo ufw allow 80/tcp comment 'HTTP'
sudo ufw allow 443/tcp comment 'HTTPS'
# Kubernetes inter-node (repeat for each peer IP)
for ip in 178.104.247.152 178.105.32.198 178.104.249.189; do
sudo ufw allow from "$ip" to any port 6443 proto tcp comment "k3s-api $ip"
sudo ufw allow from "$ip" to any port 2379 proto tcp comment "k3s-etcd-client $ip"
sudo ufw allow from "$ip" to any port 2380 proto tcp comment "k3s-etcd-peer $ip"
sudo ufw allow from "$ip" to any port 10250 proto tcp comment "k3s-kubelet $ip"
sudo ufw allow from "$ip" to any port 8472 proto udp comment "k3s-flannel-vxlan $ip"
done
# Kubectl from operator
sudo ufw allow from 47.185.183.191 to any port 6443 proto tcp comment 'kubectl from dev'
# Enable
sudo ufw --force enable
```
Rules persist across reboots via `/etc/ufw/user.rules`.
## What if we used Hetzner Cloud Firewall instead?
Hetzner Cloud has a provider-level firewall feature — rule-for-rule
equivalent but configured in the Hetzner console (or via API), not on the
nodes. Tradeoffs:
| | Hetzner Cloud Firewall | UFW (current) |
|---|---|---|
| Cost | Free | Free |
| Config location | Hetzner console / API | Per-node `/etc/ufw/` |
| Applies to | All traffic to NIC | All traffic to kernel |
| Failure mode | Provider-side issue = rules gone | Node-side issue = rules gone |
| Inter-node traffic | Same rules for all nodes | Same rules on each node |
| Visible to attacker | Yes (provider fingerprints) | Yes (iptables probe) |
| Rule ordering | UI-based | `iptables -L` |
Either works. A future improvement: move the stable rules to Hetzner
Cloud Firewall (one source of truth) and leave only the dynamic rules
(operator IP, ad-hoc debug) on the nodes.
## Why we don't use iptables directly
UFW is a frontend. `iptables` works, but the rules are harder to read and
edit. `sudo ufw allow from X to any port Y proto Z comment 'Z-rule'` is
clearer than writing the equivalent `-A INPUT ...` rule directly.
Also, UFW's `comment` field lets us explain each rule, which becomes
critical when the ruleset grows past ~10 rules.
## Testing the firewall
From the operator workstation (47.185.183.191):
```bash
# Should work (22/tcp open)
ssh deploy@hetzner1 exit
# Should work (80/tcp open)
curl -I -H "Host: api.myhoneydue.com" http://hetzner1/api/health/
# Should work (443/tcp open; TLS handshake will fail because nothing listens)
curl -kI https://178.104.247.152/
# Should work (6443 allowed from operator IP)
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
kubectl get nodes
# Should time out (default-deny from arbitrary ports)
curl http://178.104.247.152:3000/ # not open to operator
curl http://178.104.247.152:6379/ # Redis not exposed publicly
```
From another peer node (hetzner2 trying to reach hetzner1):
```bash
# Should work (k3s API allowed from peer node IPs)
curl -k https://178.104.247.152:6443/healthz
# Should work (etcd client from peer)
nc -zv 178.104.247.152 2379
```
## The hidden dependency: kubelet/containerd also need ports
Beyond the UFW rules, the kubelet also listens on:
- **10255/tcp** — kubelet read-only port (no auth, deprecated; disabled by default in k3s)
- **10256/tcp** — kube-proxy health
- **10257/tcp** — kube-controller-manager health
- **10259/tcp** — kube-scheduler health
These are bound to `localhost` only, so they don't need UFW rules. But
they're important to know about when debugging — if one of these health
endpoints isn't responding, the relevant component is broken.
## Legacy rules to clean up
The following rules are on the nodes from the Swarm era and can be
removed in a future cleanup pass:
```bash
# On each node, list Swarm-era rules
sudo ufw status numbered | grep -E "2377|7946|4789|500|3000|esp"
# Remove by number (highest-to-lowest to avoid renumbering)
# Example:
sudo ufw --force delete 15
sudo ufw --force delete 14
# ... etc.
```
We left them in because they don't affect security (no process listens on
those ports), and removing them requires careful testing that nothing in
k3s secretly relies on 4789/udp or similar.
## Operator cheat sheet
```bash
# Show the ruleset, with comments, numbered
sudo ufw status numbered verbose
# Add a new rule
sudo ufw allow from <ip> to any port <port> proto <tcp|udp> comment '<desc>'
# Remove a rule by number
sudo ufw status numbered
sudo ufw --force delete <N>
# Temporarily disable all rules (emergency)
sudo ufw disable
# Re-enable
sudo ufw enable
# Reload after editing /etc/ufw/ files directly
sudo ufw reload
```
## What to do if the firewall locks you out
Worst case: you apply a rule that blocks your own SSH, UFW enables it
immediately, and you can't log back in. Recovery:
1. Hetzner Cloud Console → Server → Rescue mode
2. Boot into rescue, mount the disk
3. Edit `/etc/ufw/user.rules` to remove the bad rule
4. Reboot back into normal mode
This has never happened to us but it's the escape hatch. The Console is
always a TLS login away.
## References
- [UFW man page][ufw-man]
- [K3s networking requirements][k3s-reqs]
- [Kubernetes ports and protocols][k8s-ports]
- [Cloudflare IP ranges][cf-ips]
[ufw-man]: https://manpages.ubuntu.com/manpages/noble/en/man8/ufw.8.html
[k3s-reqs]: https://docs.k3s.io/installation/requirements#networking
[k8s-ports]: https://kubernetes.io/docs/reference/networking/ports-and-protocols/
[cf-ips]: https://www.cloudflare.com/ips/
+526
View File
@@ -0,0 +1,526 @@
# 05 — Security
## Summary
Security on this deployment is layered: Cloudflare at the edge, UFW at
the node, k3s RBAC + Pod Security at the orchestrator, TLS between
long-haul components, and dedicated service accounts with dropped
capabilities inside containers. This chapter documents each layer, the
rationale, and what's currently missing (and why).
## Threat model
Who we're defending against, in rough order of likelihood:
1. **Opportunistic scanners** — bots scanning random IPv4 ranges for
known vulnerabilities. Mitigated by the firewall.
2. **Credential stuffing / brute-force** — especially against SSH and
admin login. Mitigated by key-only SSH, strong passwords, rate limits.
3. **Compromised external service** — if Neon, Backblaze, or Cloudflare
were breached, attacker would have access to whatever we store there.
Mitigated by scoped credentials, least-privilege API keys.
4. **Compromised container image** — if Gitea or our build pipeline
were compromised, malicious code could reach prod. Mitigated by
(a) Gitea is behind authentication, (b) image pull secrets scoped,
(c) containers run non-root with minimal capabilities.
5. **Insider threat** — not really a threat for a solo operator.
6. **State actor** — not in threat model. At our scale this is
effectively unaddressable without becoming a security company.
Explicitly **not** in threat model:
- DDoS at a scale that saturates Cloudflare. We pay $0 for CF; their
DDoS mitigation is included but not unlimited. If we got hit with a
large attack, we'd move to a paid plan.
- Physical access to Hetzner datacenters. That's their problem.
## Layer 1 — Cloudflare edge
Cloudflare sits in front of every public request.
### What Cloudflare does for us
| Protection | How it works |
|---|---|
| TLS termination | CF presents a cert for `*.myhoneydue.com`; clients encrypt to CF |
| DDoS mitigation | Automatic on all plans including Free |
| Bot filtering | "Under Attack" mode + bot score based blocking |
| IP concealment | Origin IPs not in DNS; attackers can't directly scan |
| WAF rules | CF Free includes managed ruleset for common exploits |
| Rate limiting | Free tier: 10k requests/10min; more on paid plans |
### What Cloudflare does **not** do
- **Authenticate users** — that's the app's job
- **Authorize requests** — that's the app's job
- **Protect origin if origin IP leaks** — once someone knows a node IP
they can bypass CF. Mitigation: keep origin firewall strict (Chapter 4).
- **Encrypt between CF and origin** — we're on SSL=Flexible, so CF↔origin
is HTTP. This is in our TODO (Chapter 20, upgrade to Full-strict).
### The proxy-IP problem
Cloudflare publishes its IP ranges
([cloudflare.com/ips](https://www.cloudflare.com/ips/)). Any client can
verify a request came from a CF IP by checking the remote address. Our
Traefik is configured to trust `X-Forwarded-Proto` (so the Go API sees
`https` even though origin received HTTP) only from CF IP ranges:
```yaml
# deploy-k3s/manifests/traefik-helmchartconfig.yaml
additionalArguments:
- "--entrypoints.web.forwardedHeaders.trustedIPs=173.245.48.0/20,..."
```
This means a malicious request that bypasses CF (by hitting the node IP
directly) can't spoof headers — Traefik ignores `X-Forwarded-*` unless
the source IP is in CF's ranges.
**TODO** (Chapter 20): Enforce at UFW level — allow 80/tcp only from
CF IP ranges. Today any IP can reach the origin on port 80.
## Layer 2 — Node (OS, SSH, firewall)
Each node runs Ubuntu 24.04.3 LTS with:
### SSH hardening
`/etc/ssh/sshd_config` on each node:
```
Port 22
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
AllowUsers deploy
```
Result:
- Only the `deploy` user can log in
- Only with a public key (no password)
- Root cannot log in remotely
The public key authorized for `deploy`:
```
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBU9xTTBD78tYUqHijgyU9PDqtmS4NuM/6uy8XgDzva+ hetzner2@myhoneydue.com
```
(Note: the comment field says "hetzner2" but it's the key for all three
nodes — the comment is the key's identifier, not a restriction.)
Private key is at `~/.ssh/hetzner` on the operator workstation.
### Sudo
The `deploy` user has unrestricted sudo with no password
(`/etc/sudoers.d/deploy`):
```
deploy ALL=(ALL) NOPASSWD: ALL
```
This is convenient but broad. A compromise of the `deploy` SSH key =
root on the node. Mitigations:
- Key is stored only on the operator workstation, not checked into git
- Operator workstation has disk encryption (macOS FileVault)
- Operator workstation has a passphrase for the key (ssh-agent cache)
Future hardening: scope sudo to specific commands that deploy workflows
need (e.g., `/usr/sbin/ufw`, `/usr/bin/systemctl`), but this requires
enumerating every command we might run, which breaks ad-hoc debugging.
### fail2ban
**Not installed.** fail2ban would ban IPs that fail SSH auth repeatedly.
Because we disable password auth entirely, the attack surface is tiny (an
attacker with the private key wins; failed-public-key attempts are
functionally DDoS, not credential-stuffing). Installing fail2ban is on
the TODO list anyway because it buys us rate-limiting on SSH bot noise.
### unattended-upgrades
**Not installed.** Security patches require manual `apt upgrade`. This is
a gap. Install and configure for security-only updates as soon as time
permits.
### UFW firewall
See [Chapter 4](./04-firewall.md) for the complete ruleset. Summary:
default-deny incoming, specific allows for SSH (22), HTTP (80), HTTPS
(443), k3s API from operator IP (6443), and inter-node cluster ports.
## Layer 3 — Kubernetes RBAC
K3s inherits full Kubernetes RBAC. Every component that talks to the API
server has a ServiceAccount with only the permissions it needs.
### System accounts
K3s creates these by default:
- `kube-system:admin` — cluster admin, used by `kubectl`
- `kube-system:coredns` — for CoreDNS
- `kube-system:traefik` — for Traefik ingress controller
- `kube-system:helm-install-traefik` — for the Helm chart installer
We don't touch these.
### Application service accounts
Our `rbac.yaml` creates four ServiceAccounts in the `honeydue` namespace:
```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: api
namespace: honeydue
automountServiceAccountToken: false # ← important
```
Same for `admin`, `worker`, `redis`.
**`automountServiceAccountToken: false`** means pods don't get a k8s
API token mounted in `/var/run/secrets/kubernetes.io/serviceaccount/`.
Without it, a compromised pod cannot query the Kubernetes API even if
the default service account has broad permissions.
### What the app pods CAN'T do
Our app service accounts have **no RoleBindings or ClusterRoleBindings**.
They cannot:
- List, get, create, update, delete any Kubernetes resource
- Read other namespaces' secrets
- Schedule workloads
- View cluster state
If the api container were fully compromised (RCE), the attacker would
have:
- Network access to other pods in the `honeydue` namespace (Chapter 16)
- Read access to our ConfigMap + Secrets (mounted into the container)
- No ability to pivot to other parts of the cluster via the k8s API
## Layer 4 — Pod Security
Every pod runs with restrictive security context:
```yaml
securityContext:
runAsNonRoot: true
runAsUser: 1000 # api; different per service
runAsGroup: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
```
### What each setting does
| Setting | Effect |
|---|---|
| `runAsNonRoot: true` | Pod refuses to start if the image's default user is root |
| `runAsUser: 1000` | Override to UID 1000 (app user) |
| `allowPrivilegeEscalation: false` | Process cannot become root via setuid, ptrace, etc. |
| `readOnlyRootFilesystem: true` | `/` is read-only; writes require explicit volumes |
| `capabilities: drop: [ALL]` | No Linux capabilities (NET_ADMIN, SYS_TIME, etc.) |
| `seccompProfile: RuntimeDefault` | Restrict syscalls to containerd's default seccomp allowlist |
Read-only root means our app images must declare writable volumes for
anything mutable:
```yaml
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir:
sizeLimit: 64Mi
```
If the app needs to write somewhere else (e.g., Next.js cache), we mount
an emptyDir there explicitly.
### Traefik exception
Traefik needs `CAP_NET_BIND_SERVICE` to bind ports 80/443 on the host
network. Its security context adds just that one capability back:
```yaml
securityContext:
capabilities:
drop: [ALL]
add: [NET_BIND_SERVICE]
readOnlyRootFilesystem: true
runAsGroup: 65532
runAsNonRoot: true
runAsUser: 65532
```
The `net.ipv4.ip_unprivileged_port_start=0` sysctl on the nodes
complements this — on older kernels NET_BIND_SERVICE alone isn't enough
in the host netns.
### Pod Security Admission (PSA)
Kubernetes has a built-in admission controller for enforcing Pod Security
Standards at the namespace level:
```yaml
apiVersion: v1
kind: Namespace
metadata:
name: honeydue
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/enforce-version: latest
```
We **don't currently set this**. We get the equivalent effect from
the explicit securityContext on each pod, but namespace-level enforcement
would catch new workloads that forget to set it. **TODO** (Chapter 20).
## Layer 5 — Network Policies
The `deploy-k3s/manifests/network-policies.yaml` scaffold defines:
- **default-deny-all** — deny all ingress and egress by default in the
`honeydue` namespace
- **allow-dns** — allow egress UDP/TCP 53 to CoreDNS
- **allow-ingress-to-api** — allow Traefik (`kube-system` namespace) to
reach api pods on port 8000
- **allow-ingress-to-admin** — same, for admin:3000
**These are not currently applied.** Without them, our pods can freely
talk to anything — including, theoretically, malicious destinations if
an attacker gets RCE inside a pod.
**TODO** (Chapter 20): Apply network policies. The scaffold is there; we
just need to `kubectl apply -f deploy-k3s/manifests/network-policies.yaml`
and test that nothing breaks.
### What network policies would prevent
| Attack scenario | NetworkPolicy blocks |
|---|---|
| Pod A compromised, attacker SSHs sideways to pod B | Yes (explicit allow needed) |
| Pod RCE → scan internal networks | Yes (default deny egress) |
| Pod RCE → exfil to attacker's C2 | Yes (outbound to internet needs egress rule) |
Without policies, all of these work.
## TLS and encryption
### CF ↔ user
Always TLS 1.2+ (CF doesn't support older). CF presents an automatically-
renewed Let's Encrypt or CF-managed cert for `*.myhoneydue.com`.
### CF ↔ origin
**Plaintext HTTP** (SSL = Flexible). An attacker with access to the
Cloudflare-to-Hetzner path could read traffic. In practice nobody who
isn't Cloudflare or Hetzner sits on that path.
**TODO** (Chapter 20): Upgrade to SSL = Full (strict) with a Cloudflare
Origin CA certificate. This encrypts CF ↔ origin and verifies that
origin's cert is the CF-issued one (prevents MitM if DNS is compromised).
### API ↔ Neon Postgres
**TLS 1.3** via `DB_SSLMODE=require`. The Go app's postgres driver (pgx)
negotiates TLS and verifies Neon's cert against the system CA bundle.
Connection fails if TLS can't be established.
### API ↔ Backblaze B2
**HTTPS** (B2 doesn't support HTTP). `B2_USE_SSL=true` in our ConfigMap
(though actually the app reads `STORAGE_USE_SSL` — see Chapter 9 for this
vestigial variable's story).
### Worker ↔ Fastmail SMTP
**STARTTLS** on port 587. The Go `wneessen/go-mail` library uses
`TLSOpportunistic` mode — which means it connects plain then upgrades via
STARTTLS. Fastmail always supports STARTTLS, so in practice every
connection is encrypted.
### API/worker ↔ Redis
**Plaintext** inside the cluster. Redis 7 supports TLS (redis-tls.conf,
`redis-server --tls-port`), but we haven't enabled it because Redis is
on the overlay network, not exposed externally, and only holds cache +
queue state.
### Pod-to-pod (Flannel overlay)
**Plaintext VXLAN** over Hetzner's public network. See
[Chapter 3 §Layer 3](./03-networking.md#layer-3--pod-overlay-flannel-vxlan).
TODO to switch to WireGuard backend.
## Secrets management
### Kubernetes Secrets
Our k8s Secrets are stored in etcd. etcd-at-rest encryption is **not
currently enabled** — a compromise of the etcd data directory would
expose Secret values. Given:
- Nodes have disk encryption at the Hetzner hypervisor layer
- Attacker needs root on the node to read etcd
- Our operator access is already root-via-sudo
This is an accepted risk. **TODO** (Chapter 20): enable encryption at rest
for etcd. K3s supports it via `--secrets-encryption` flag on the server.
### What Secrets we have
```
$ kubectl get secrets -n honeydue
NAME TYPE DATA AGE
gitea-credentials kubernetes.io/dockerconfigjson 1 ...
honeydue-apns-key Opaque 1 ...
honeydue-secrets Opaque 9 ...
```
Contents:
| Secret | Key | Source |
|---|---|---|
| `gitea-credentials` | `.dockerconfigjson` | PAT for Gitea registry (image pulls) |
| `honeydue-apns-key` | `apns_auth_key.p8` | Placeholder p8 file (push off) |
| `honeydue-secrets` | `POSTGRES_PASSWORD` | Neon DB password |
| `honeydue-secrets` | `SECRET_KEY` | 64-char random, app signing key |
| `honeydue-secrets` | `EMAIL_HOST_PASSWORD` | Fastmail app password |
| `honeydue-secrets` | `FCM_SERVER_KEY` | "disabled-no-push-accounts-yet" placeholder |
| `honeydue-secrets` | `REDIS_PASSWORD` | Empty (no auth on internal Redis) |
| `honeydue-secrets` | `B2_KEY_ID` | B2 app key ID |
| `honeydue-secrets` | `B2_APP_KEY` | B2 app key secret |
| `honeydue-secrets` | `ADMIN_EMAIL` | `admin@myhoneydue.com` |
| `honeydue-secrets` | `ADMIN_PASSWORD` | Generated 24-char initial admin password |
### Source of truth
The Secret values came from:
- `deploy/secrets/*.txt` files on the operator workstation (gitignored)
- `deploy/prod.env` (gitignored)
- `deploy/registry.env` (gitignored)
These Swarm-era files are still the canonical source. If you need to
recreate Secrets in a new cluster:
```bash
cd honeyDueAPI-go
kubectl create secret generic honeydue-secrets -n honeydue \
--from-literal=POSTGRES_PASSWORD="$(cat deploy/secrets/postgres_password.txt)" \
--from-literal=SECRET_KEY="$(cat deploy/secrets/secret_key.txt)" \
--from-literal=EMAIL_HOST_PASSWORD="$(cat deploy/secrets/email_host_password.txt)" \
...
```
The full recreation script is in Chapter 17 (Runbook).
### Secret rotation
Not automated. To rotate (e.g., after a compromise):
1. Generate new value: `openssl rand -base64 32`
2. Update the secret:
```bash
kubectl create secret generic honeydue-secrets -n honeydue \
--from-literal=SECRET_KEY='new-value' \
--dry-run=client -o yaml | kubectl apply -f -
```
3. Restart dependent pods:
```bash
kubectl rollout restart -n honeydue deploy/api deploy/worker
```
4. Update `deploy/secrets/secret_key.txt` to match
5. Revoke the old credential at the source (Neon, Fastmail, etc.)
## Container image provenance
Images come from `gitea.treytartt.com/admin/*`. We have **no image
signing or verification** (cosign/sigstore) in place. A compromise of
the Gitea registry = the ability to push malicious images that would be
pulled into prod on the next rollout.
Mitigations:
- Gitea itself is behind login; PAT is scoped to read:packages +
write:packages only
- Gitea runs on the operator's infrastructure (same operator account)
- Image tags are SHA-pinned (`:237c6b8`) not `:latest` → attacker can't
replace an existing tag's image without us noticing the digest change
**TODO** (Chapter 20): Add cosign signing at build time, verify at pull
time.
## Operator workstation security
The operator workstation has:
- macOS with FileVault (full disk encryption)
- Login password required
- Private keys in `~/.ssh/` (mode 0600)
- Kubeconfig at `~/.kube/honeydue-k3s.yaml` (mode 0600) — contains a bearer
token to the cluster
**Losing the laptop would require immediate credential rotation:**
- New SSH key, redeploy public part on all 3 nodes
- New kubeconfig: run `sudo cat /etc/rancher/k3s/k3s.yaml` on hetzner1,
copy to workstation, update `KUBECONFIG` env
- Rotate operator-access PATs on Gitea, Neon, Cloudflare, Backblaze
## Compliance notes
This stack is **not currently certified** for:
- HIPAA — we transit and store health-related data but haven't contractually
bound any BAA
- SOC 2 — no auditing, no documented controls beyond this document
- PCI-DSS — we don't handle card data; Apple/Google IAP handles payments
- GDPR — we follow GDPR best practices (data minimization, user deletion)
but haven't had a formal assessment
If honeyDue ever needs any of these, the infrastructure is compatible
but the operational processes around it would need formal work.
## Operator cheat sheet
```bash
# See all RBAC-related resources in a namespace
kubectl get sa,role,rolebinding -n honeydue
# Check what a ServiceAccount can do
kubectl auth can-i --list --as=system:serviceaccount:honeydue:api -n honeydue
# Verify pod is running with expected security context
kubectl get pod <pod> -n honeydue -o jsonpath='{.spec.securityContext}'
kubectl get pod <pod> -n honeydue -o jsonpath='{.spec.containers[0].securityContext}'
# List all Secrets (without revealing content)
kubectl get secret -n honeydue
kubectl describe secret honeydue-secrets -n honeydue # shows keys, not values
# Decode a secret (CAREFUL: prints plaintext)
kubectl get secret honeydue-secrets -n honeydue -o jsonpath='{.data.SECRET_KEY}' | base64 -d
```
## References
- [Kubernetes Pod Security Standards][psa]
- [Kubernetes RBAC][rbac]
- [Kubernetes NetworkPolicy][netpol]
- [Cloudflare IP ranges][cf-ips]
- [K3s secrets encryption][k3s-secrets]
- [SSH hardening guide][ssh-guide]
[psa]: https://kubernetes.io/docs/concepts/security/pod-security-standards/
[rbac]: https://kubernetes.io/docs/reference/access-authn-authz/rbac/
[netpol]: https://kubernetes.io/docs/concepts/services-networking/network-policies/
[cf-ips]: https://www.cloudflare.com/ips/
[k3s-secrets]: https://docs.k3s.io/security/secrets-encryption
[ssh-guide]: https://linux-audit.com/audit-and-harden-your-ssh-configuration/
+419
View File
@@ -0,0 +1,419 @@
# 06 — Traefik Ingress
## Summary
Traefik is the reverse proxy that routes external HTTP requests to the
right application pod based on the `Host:` header. We run Traefik v3 as a
Kubernetes DaemonSet with `hostNetwork: true` — each of the three nodes
has its own Traefik pod listening directly on the node's `:80`/`:443`.
Cloudflare round-robins DNS across the three node IPs, so any node can
serve any request. No external load balancer.
## Why Traefik
K3s bundles Traefik by default. The alternatives:
| Option | Pros | Cons |
|---|---|---|
| **Traefik v3 (bundled)** | Zero install, excellent k8s integration, middleware system, active development | Helm-driven config is indirect |
| NGINX Ingress | Most popular, battle-tested | Another thing to install, more config surface |
| HAProxy Ingress | Extremely performant | More hands-on, older docs |
| Caddy | Simple config, auto-HTTPS | `caddy-docker-proxy` / Ingress integration is less mature |
| Envoy / Istio | Most featureful | Massive overkill at our scale |
Traefik came "free" with K3s, does the job, and its
[Swarm provider][traefik-swarm] is what we would have used if we'd
fixed our Swarm architecture. Using it on k3s keeps the mental model
consistent.
## Deployment model
```mermaid
flowchart TB
subgraph CF[Cloudflare edge]
DNS[DNS A records:<br/>api.myhoneydue.com → 3 node IPs<br/>admin.myhoneydue.com → 3 node IPs]
end
subgraph N1[hetzner1]
T1[Traefik pod<br/>hostNetwork:true<br/>:80/:443]
kernel1[Linux kernel<br/>net.ipv4.ip_unprivileged_port_start=0]
end
subgraph N2[hetzner2]
T2[Traefik pod<br/>hostNetwork:true<br/>:80/:443]
kernel2[Linux kernel]
end
subgraph N3[hetzner3]
T3[Traefik pod<br/>hostNetwork:true<br/>:80/:443]
kernel3[Linux kernel]
end
subgraph Cluster[k3s cluster services]
APISvc[api Service :8000]
AdminSvc[admin Service :3000]
end
DNS -. HTTP :80 .-> T1 & T2 & T3
T1 & T2 & T3 -- reverse_proxy --> APISvc & AdminSvc
```
### ASCII fallback
```
Cloudflare DNS
┌───────────────────┐
│ api → 3 IPs │
│ admin→ 3 IPs │
└─────────┬─────────┘
│ HTTP :80
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ hetzner1 │ │ hetzner2 │ │ hetzner3 │
│ Traefik │ │ Traefik │ │ Traefik │
│ :80/443 │ │ :80/443 │ │ :80/443 │
│(hostNet) │ │(hostNet) │ │(hostNet) │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└── ClusterIP ──────┼── ClusterIP ──────┘
┌────────────────────────┐
│ api Service :8000 │
│ admin Service :3000 │
└────────────────────────┘
```
## Why DaemonSet + hostNetwork
**What we're trying to achieve**: Any public-facing node should answer
:80/:443. Cloudflare round-robins DNS; whichever node it picks, that
node must serve.
**The default k3s Traefik deployment** is a single-replica Deployment
exposed via a LoadBalancer Service. That requires either:
- Hetzner Load Balancer (+ $8.49/mo, another thing to manage), **or**
- K3s' built-in `servicelb` (klipper-lb) which binds node ports
dynamically to proxy to the Service
Neither was quite what we wanted. With three replicas of the stock Traefik
behind klipper-lb, each Traefik pod is reachable but there's an extra hop
through klipper's proxy daemon.
**DaemonSet + hostNetwork** is cleaner: each Traefik pod *is* the host's
:80/:443. No proxy daemon, no LB Service, no VIP. Cloudflare DNS →
node IP → kernel → Traefik, one hop.
### Trade-offs of hostNetwork
**Pro:**
- One fewer layer of indirection; lower latency
- No Service needed; no kube-proxy in the ingress path
- Standard Cloudflare round-robin DNS is the failover mechanism
**Con:**
- Traefik is in the host netns; it sees the node's interfaces, not
the cluster overlay
- Traefik still joins the cluster-DNS resolution (via `hostNetwork`'s
default DNS policy) so it can resolve Service names like `api`
- Port conflicts possible if anything else wants :80/:443 on the node
(nothing else does in our setup)
### Trade-offs of DaemonSet
**Pro:**
- One Traefik per node; matches our Cloudflare 3-IP round-robin
exactly
- Any node down = Cloudflare's origin health checks route around it
**Con:**
- Updates require `maxUnavailable > 0` (host ports conflict during
surge) → brief moment where one node is down during rollout
- 3× the memory usage vs. 1-replica Deployment (but Traefik is tiny
— ~128 MB total across all three)
## Our Traefik configuration
We reconfigure the bundled K3s Traefik via a `HelmChartConfig`. K3s
uses the `helm-controller` to manage bundled addons; `HelmChartConfig`
lets us override values without disabling-and-replacing the chart.
Full config at
`deploy-k3s/manifests/traefik-helmchartconfig.yaml`. Key settings:
```yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: traefik
namespace: kube-system
spec:
valuesContent: |-
deployment:
kind: DaemonSet # was Deployment
hostNetwork: true
service:
enabled: false # no LoadBalancer Service
ports:
web:
port: 80
hostPort: 80
websecure:
port: 443
hostPort: 443
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 0
securityContext:
capabilities:
drop: [ALL]
add: [NET_BIND_SERVICE]
readOnlyRootFilesystem: true
runAsGroup: 65532
runAsNonRoot: true
runAsUser: 65532
additionalArguments:
- "--entrypoints.web.forwardedHeaders.trustedIPs=<CF ranges>"
```
### Why each setting
- **`kind: DaemonSet`** — one Traefik per node. Default is a Deployment
with 1 replica.
- **`hostNetwork: true`** — Traefik runs in the host's network namespace
so it can bind real :80/:443 on the node.
- **`service.enabled: false`** — no LoadBalancer Service is created.
With `hostNetwork`, we don't need one.
- **`ports.*.hostPort`** — explicit host port binding. Matches the
container port (DaemonSet semantics with `hostPort: 80` ensure the
kubelet schedules at most one Traefik per node).
- **`updateStrategy.maxUnavailable: 1, maxSurge: 0`** — we accept one
node being down during a Traefik update (host port can't be shared).
The Traefik Helm chart rejects this config combination with
`maxSurge > 0` — this was the second config iteration.
- **Security context** — non-root (UID 65532), read-only root filesystem,
only `NET_BIND_SERVICE` capability. See Chapter 5.
- **`forwardedHeaders.trustedIPs`** — Cloudflare's IP ranges. Traefik
trusts `X-Forwarded-Proto` et al. only from these ranges, so a
bypassing client can't spoof the proto header.
### Forwarded-headers trustedIPs
The full list of trusted CF ranges is in our `additionalArguments`. It's
the union of CF's published IPv4 and IPv6 ranges. When Cloudflare passes
a request to origin, it adds `X-Forwarded-For` and `X-Forwarded-Proto`
headers; Traefik only honors these if the request came from one of these
IPs. Every other client's headers are ignored.
If CF publishes new IP ranges (rare but possible), the
`trustedIPs` list needs updating. It's a raw string in our
HelmChartConfig — we'd need to edit, apply, and bump the helm job.
## Traefik v3 vs v2
K3s ships Traefik v3 (currently `3.6.10`). The v2 → v3 migration
changed a few things:
- `swarmMode` removed (replaced by a `swarm` provider, but we don't
use Swarm anyway)
- Encoded-character handling changed (v3 warns about RFC 3986 handling;
we ignore the warning)
- Middleware CRD group is `traefik.io/v1alpha1` (was `containo.us`)
Our deployment handles all of this automatically via the bundled
chart.
## Ingress resources
We define two standard k8s `Ingress` resources in
`deploy-k3s/manifests/ingress/ingress-simple.yaml`:
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: honeydue-api
namespace: honeydue
spec:
ingressClassName: traefik
rules:
- host: api.myhoneydue.com
http:
paths:
- path: /
pathType: Prefix
backend:
service: {name: api, port: {number: 8000}}
- host: myhoneydue.com
http:
paths:
- path: /
pathType: Prefix
backend:
service: {name: api, port: {number: 8000}}
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: honeydue-admin
namespace: honeydue
spec:
ingressClassName: traefik
rules:
- host: admin.myhoneydue.com
http:
paths:
- path: /
pathType: Prefix
backend:
service: {name: admin, port: {number: 3000}}
```
Traefik watches for Ingress resources with `ingressClassName: traefik`
and programs its router table accordingly. Changes are applied within
seconds — no restart needed.
### What pathType: Prefix means
Every request starting with `/` matches (which is everything). Alternative
is `Exact` (matches only the literal path). `Prefix` is the default for
most Ingress controllers and matches how users think about URL routing.
## How requests flow
1. **Cloudflare DNS** resolves `api.myhoneydue.com` to one of three IPs
(round-robin). Say it picks `178.105.32.198` (hetzner2).
2. **Cloudflare edge** establishes TCP to `178.105.32.198:80` (plain HTTP,
SSL=Flexible). Original HTTPS terminated at CF.
3. **UFW on hetzner2** accepts the SYN (80/tcp open from anywhere).
4. **Linux kernel** sees a listener on 0.0.0.0:80 (the Traefik pod).
Hands off the SYN.
5. **Traefik accepts** the connection. Reads the HTTP request.
6. **Traefik matches** the `Host:` header against its router table.
`Host: api.myhoneydue.com``honeydue-api` Ingress → `api` Service.
7. **Traefik dials** `10.43.167.83:8000` (api Service ClusterIP). This
goes through the cluster DNS (CoreDNS) and kube-proxy (IPVS).
8. **kube-proxy IPVS** rewrites the destination to a live api pod endpoint
— say `10.42.2.6:8000` (api pod on hetzner3).
9. **Flannel VXLAN** encapsulates the packet and sends to hetzner3
(UDP :8472 between node IPs).
10. **hetzner3's kernel** decapsulates, delivers to the api pod.
11. **api pod** processes, returns response.
12. **Response flows back** the reverse path.
Cloudflare caches 200 responses at the edge (default TTL varies; for
HTML/JSON usually 0 unless we set `Cache-Control` headers). So the
second request for the same URL might not reach the origin at all.
## Middleware (mostly unused)
Traefik supports middleware — small functions run before/after the proxy.
The `deploy-k3s/manifests/ingress/middleware.yaml` scaffold defines:
- **`rate-limit`** — 100 req/min average, 200 burst
- **`security-headers`** — HSTS, X-Frame-Options, CSP, etc.
- **`cloudflare-only`** — IP allowlist restricting origin to CF ranges
- **`admin-auth`** — HTTP basic auth for admin panel
**None of these are currently attached to our Ingresses.** To enable,
add the `traefik.ingress.kubernetes.io/router.middlewares` annotation to
the Ingress:
```yaml
metadata:
annotations:
traefik.ingress.kubernetes.io/router.middlewares: honeydue-security-headers@kubernetescrd,honeydue-rate-limit@kubernetescrd
```
We left them off to minimize surface area for the first week of the new
cluster. Enabling is TODO in Chapter 20.
## Traefik dashboard
Disabled. The Traefik dashboard (`/dashboard/` and `/api/`) exposes
runtime state and is potentially information leaky. The bundled k3s
Traefik disables it by default, and we haven't re-enabled it.
If needed for debugging:
```bash
# Port-forward to a Traefik pod
kubectl port-forward -n kube-system daemonset/traefik 9000:9000
# (the chart exposes the dashboard on :9000 when enabled)
# Then visit http://localhost:9000/dashboard/
```
This requires kubectl access and isn't exposed publicly.
## Version pinning
We take whatever Traefik version is bundled with K3s (currently 3.6.10).
The bundled chart is pinned to a specific version in K3s' release notes;
when we upgrade K3s the Traefik version can change. If that ever breaks
something, we can pin a specific version via the HelmChartConfig's
`version` field:
```yaml
spec:
version: 39.0.501+up39.0.5 # specific chart version
```
## Limitations we accept
- **No sticky sessions.** Every request to `api.myhoneydue.com` can go
to a different pod. Our Go API is stateless — this is fine.
- **No canary deployments** (yet). Traefik supports weighted routing
via its CRDs (`TraefikService`) but we don't use them. TODO if/when
we do gradual rollouts.
- **No mTLS.** Traefik supports mutual TLS client auth for sensitive
endpoints. We don't use it.
- **Single ingress class.** Everything goes through the same Traefik.
For multi-tenant setups we'd want separate ingress classes with
separate policies.
## Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| 404 from Traefik | Ingress doesn't match `Host:` | Check Ingress host field, DNS |
| 502 from Traefik | Backend Service has no endpoints | `kubectl get endpoints -n honeydue` |
| 503 from Traefik | Circuit breaker / backend unhealthy | Check pod logs, readiness probe |
| 504 from Traefik | Backend slow | Check pod CPU/memory, DB connections |
| Connection refused at 80 | Traefik pod not running or kernel not listening | `kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik`; `ssh deploy@node 'ss -lntp | grep :80'` |
| Mixed content error in browser | `X-Forwarded-Proto` not honored by app | Check `trustedIPs` includes CF; check app reads the header |
## Operator cheat sheet
```bash
# Traefik pods per node
kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik -o wide
# Traefik logs (all pods)
kubectl logs -n kube-system -l app.kubernetes.io/name=traefik --tail=50 --prefix
# Ingress status
kubectl get ingress -n honeydue
# List all routers Traefik sees (requires dashboard or API)
kubectl exec -n kube-system daemonset/traefik -- traefik healthcheck
# Re-apply config
kubectl apply -f deploy-k3s/manifests/traefik-helmchartconfig.yaml
kubectl delete job -n kube-system helm-install-traefik # triggers reinstall
# Restart all Traefik pods
kubectl rollout restart daemonset/traefik -n kube-system
```
## References
- [Traefik v3 docs][traefik]
- [Traefik Swarm provider][traefik-swarm]
- [K3s Traefik customization][k3s-traefik]
- [HelmChartConfig docs][k3s-helm]
- [Cloudflare IP ranges][cf-ips]
[traefik]: https://doc.traefik.io/traefik/v3.6/
[traefik-swarm]: https://doc.traefik.io/traefik/providers/swarm/
[k3s-traefik]: https://docs.k3s.io/networking/networking-services#traefik-ingress-controller
[k3s-helm]: https://docs.k3s.io/helm#customizing-packaged-components-with-helmchartconfig
[cf-ips]: https://www.cloudflare.com/ips/
+575
View File
@@ -0,0 +1,575 @@
# 07 — Services
## Summary
Four workloads run in the `honeydue` namespace: **api** (Go REST API, 3
replicas), **admin** (Next.js panel, 1 replica), **worker** (Go background
jobs, 1 replica), and **redis** (cache + job queue, 1 replica, PVC-backed).
This chapter deep-dives each: container image, resource limits, probes,
volumes, and why each knob is set the way it is.
## Overview
| Service | Image | Replicas | Ports | Role |
|---|---|---|---|---|
| `api` | `gitea.treytartt.com/admin/honeydue-api:<sha>` | 3 | 8000 | HTTP REST API |
| `admin` | `gitea.treytartt.com/admin/honeydue-admin:<sha>` | 1 | 3000 | Next.js admin panel |
| `worker` | `gitea.treytartt.com/admin/honeydue-worker:<sha>` | 1 | — | Background job processor |
| `redis` | `redis:7-alpine` | 1 | 6379 | Cache + Asynq queue |
All four are Kubernetes `Deployment` workloads (not StatefulSets, not
DaemonSets). They share:
- ServiceAccount with `automountServiceAccountToken: false` (Chapter 5)
- `imagePullSecrets: [gitea-credentials]` (Chapter 11)
- `envFrom: configMapRef: honeydue-config` (Chapter 10)
- Individual env vars wired to `honeydue-secrets` keys
- Read-only root filesystem with `tmp` emptyDir mounted at `/tmp`
## Service 1 — api (Go REST API)
### What it does
The Go HTTP API — the heart of the app. Handlers for user auth,
residences, tasks, contractors, documents, subscriptions, notifications,
etc. Reads/writes to Neon Postgres, reads/writes to Redis cache, reads
from Backblaze B2.
Also serves a marketing landing page at `/` (static HTML + CSS from
`/app/static/`). This is why the `myhoneydue.com` apex domain routes to
the api service (Chapter 6).
### Deployment spec highlights
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
template:
spec:
serviceAccountName: api
imagePullSecrets: [name: gitea-credentials]
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
seccompProfile: { type: RuntimeDefault }
containers:
- name: api
image: gitea.treytartt.com/admin/honeydue-api:237c6b8
ports: [containerPort: 8000]
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities: { drop: [ALL] }
envFrom: [configMapRef: {name: honeydue-config}]
env:
- name: POSTGRES_PASSWORD
valueFrom: { secretKeyRef: {name: honeydue-secrets, key: POSTGRES_PASSWORD} }
- name: SECRET_KEY
valueFrom: { secretKeyRef: {name: honeydue-secrets, key: SECRET_KEY} }
# ... all other secrets
volumeMounts:
- { name: apns-key, mountPath: /secrets/apns, readOnly: true }
- { name: tmp, mountPath: /tmp }
resources:
requests: { cpu: 100m, memory: 128Mi }
limits: { cpu: 1000m, memory: 512Mi }
startupProbe: { httpGet: {path: /api/health/, port: 8000}, failureThreshold: 48, periodSeconds: 5 }
readinessProbe: { httpGet: {path: /api/health/, port: 8000}, initialDelaySeconds: 5, periodSeconds: 10, timeoutSeconds: 5 }
livenessProbe: { httpGet: {path: /api/health/, port: 8000}, initialDelaySeconds: 30, periodSeconds: 30, timeoutSeconds: 10 }
volumes:
- name: apns-key
secret:
secretName: honeydue-apns-key
items: [key: apns_auth_key.p8, path: apns_auth_key.p8]
- name: tmp
emptyDir: {sizeLimit: 64Mi}
```
### Why each setting
**`replicas: 3`** — one per node via anti-affinity rules (not strictly
required but helpful). Three gives us HA (one pod down = two still
serve traffic) and headroom for rolling updates.
**`maxUnavailable: 0, maxSurge: 1`** — during a rollout, start a 4th
pod before killing any old one. Ensures the service stays at 3 live
pods throughout. `maxUnavailable: 0` means zero downtime updates — but
depends on readinessProbe being accurate.
**`runAsUser: 1000`** — the `app` user created in the Dockerfile. Image
doesn't run as root.
**`readOnlyRootFilesystem: true`** — prevents any attacker-introduced
file writes to the image layer. Go binary doesn't need to write to `/`;
only `/tmp` is mutable.
**`startupProbe.failureThreshold: 48`** (= 48 × 5s = 240s grace) — this
was bumped up from the scaffold default of 12. Reason: on first boot,
the Go app runs `MigrateWithLock()` which acquires a Postgres advisory
lock and runs AutoMigrate. First replica takes ~90s; subsequent
replicas wait on the lock. With 3 replicas all starting simultaneously
and the lock serializing them, 240s is the right grace. See
[Chapter 19](./19-postmortem-swarm.md) for the detailed story.
**`readinessProbe.initialDelaySeconds: 5`** — after the startupProbe
passes, wait 5s before starting readiness checks. Prevents a racy
initial failure.
**`livenessProbe.initialDelaySeconds: 30`** — don't start restarting on
liveness failures for 30s after readiness passes. Avoids cascading
failures from false-negative liveness checks.
**`resources.requests/limits`** — Kubernetes uses `requests` for
scheduling (how much a pod "reserves") and `limits` for enforcement
(max it can use before throttling/OOM). Our api is CPU-bursty for
complex query handling, so we give it 100m baseline with a 1000m ceiling.
512Mi memory ceiling is comfortable — in practice api uses ~100-200Mi.
**`volumes.apns-key`** — mounts the `honeydue-apns-key` Secret as a file
at `/secrets/apns/apns_auth_key.p8`. The `APNS_AUTH_KEY_PATH` env var
points to this path. Even though push is currently disabled, the file
must exist because the Go app may try to stat it on startup.
**`volumes.tmp`** — `emptyDir` with `sizeLimit: 64Mi`. Bounded so a
runaway process can't fill the node's disk.
### The Service
```yaml
apiVersion: v1
kind: Service
metadata:
name: api
namespace: honeydue
spec:
type: ClusterIP
selector: {app.kubernetes.io/name: api}
ports:
- port: 8000
targetPort: 8000
protocol: TCP
```
ClusterIP `10.43.167.83`. Reachable as `api.honeydue.svc.cluster.local` or
just `api` from inside the namespace.
### HorizontalPodAutoscaler (not yet enabled)
`deploy-k3s/manifests/api/hpa.yaml` defines an HPA that would scale api
between 3 and 6 replicas based on CPU (70% util) and memory (80% util).
**Not currently applied.** `metrics-server` runs but we haven't run
`kubectl apply -f api/hpa.yaml`. TODO in Chapter 20.
## Service 2 — admin (Next.js panel)
### What it does
Server-rendered admin UI. Authenticates admin users against a
separate `admin_users` table in Postgres (seeded with `ADMIN_EMAIL` +
`ADMIN_PASSWORD` on first migration). Lets operators view/manage
users, residences, tasks, subscriptions, etc.
Built as a Next.js 16 standalone server.
### Why 1 replica
Low traffic. It's an internal tool. One pod suffices. If it crashes,
Kubernetes restarts it in ~10s. If the hosting node dies, Kubernetes
reschedules to another node.
The cost of running 3 replicas is tiny (Next.js is ~128MB per pod) but
has no operational benefit. When the admin panel becomes user-facing,
revisit.
### Deployment highlights
```yaml
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
securityContext:
runAsNonRoot: true
runAsUser: 1001 # different from api (1000) for isolation
runAsGroup: 1001
fsGroup: 1001
containers:
- image: gitea.treytartt.com/admin/honeydue-admin:<sha>
ports: [containerPort: 3000]
env:
- name: PORT
value: "3000"
- name: HOSTNAME
value: "0.0.0.0"
- name: NEXT_PUBLIC_API_URL
valueFrom: {configMapKeyRef: {name: honeydue-config, key: NEXT_PUBLIC_API_URL}}
volumeMounts:
- {name: nextjs-cache, mountPath: /app/.next/cache}
- {name: tmp, mountPath: /tmp}
resources:
requests: {cpu: 50m, memory: 64Mi}
limits: {cpu: 500m, memory: 256Mi}
startupProbe:
httpGet: {path: /, port: 3000} # was /admin/ — wrong for this app (Chapter 19)
failureThreshold: 24
periodSeconds: 5
readinessProbe:
httpGet: {path: /, port: 3000}
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
```
**Probe path `/`** — Next.js serves at root. `/admin/` (scaffold default)
returns 404 and killed the pod repeatedly during initial bring-up.
See Chapter 19 §Admin probe path for the story.
**`runAsUser: 1001`** — different from api's 1000 so that if one
service were compromised, the stolen UID would at least be distinct
from other services' (minor defense-in-depth).
**`nextjs-cache`** — emptyDir mount for Next.js's server-side cache.
Without it, the read-only rootfs would prevent Next from caching
server-rendered pages. Not a persistent volume because cache is
regenerable on restart.
### The Service
```yaml
apiVersion: v1
kind: Service
metadata:
name: admin
spec:
type: ClusterIP
selector: {app.kubernetes.io/name: admin}
ports: [port: 3000, targetPort: 3000]
```
ClusterIP `10.43.136.168`.
## Service 3 — worker (Go + Asynq)
### What it does
Runs scheduled background jobs via [Asynq](https://github.com/hibiken/asynq)
(a Redis-backed job queue for Go):
- **Task reminders** (14:00 UTC daily) — notify users of upcoming tasks
- **Overdue reminders** (15:00 UTC daily) — notify users of overdue tasks
- **Daily digest** (03:00 UTC daily) — summary email per user
- **Onboarding emails** — multi-step drip campaign for new users
- **Cleanup jobs** — expired tokens, stale data
### Why 1 replica (hard requirement)
Asynq uses a `Scheduler` component that does cron-like scheduling. The
Scheduler is **not leader-elected** by default — if you run two, both
fire every cron task. Users get duplicate emails.
The asynq docs cover this: to scale scheduling, migrate to
`PeriodicTaskManager` + `PeriodicTaskConfigProvider` which coordinate
via Redis. Not yet done in our codebase.
Until then: `replicas: 1` is a hard constraint. See the comment in the
deployment manifest:
```yaml
spec:
# Asynq's Scheduler is a singleton — running >1 replica fires every cron
# task once per replica (duplicate daily digests, onboarding emails, etc.).
# Keep at 1 until asynq.PeriodicTaskManager with Redis leader election is
# wired in cmd/worker/main.go.
replicas: 1
```
### What happens if the worker pod dies?
- Asynq schedule state is in Redis (which has AOF persistence)
- When a new worker pod starts, it re-registers the scheduler and picks up
where it left off
- Any job that was in-flight (dequeued but not acknowledged) gets retried
by Asynq's automatic retry logic (see the `worker.RetryOptions` in the
Go code)
- Cron jobs that were supposed to fire during the downtime: fire on the
next tick
A 5-minute worker outage = 5 minutes of delayed jobs. Not great but
acceptable.
### PodDisruptionBudget
```yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: worker-pdb
spec:
minAvailable: 0
selector: {matchLabels: {app.kubernetes.io/name: worker}}
```
`minAvailable: 0` means voluntary disruptions (`kubectl drain`) can take
the worker down. This matches the singleton constraint: there's only one,
it's OK to drain.
### No Service
worker doesn't listen on any HTTP port for application traffic — it's a
queue consumer, not a web server. So there's **no Kubernetes Service**
for it.
(On Swarm we had the worker expose a health endpoint at `:6060/health`;
the k3s scaffold doesn't replicate this. Future work.)
## Service 4 — redis
### What it does
- Caching layer (ETag-based lookups, user session cache)
- Asynq queue backend (job state, scheduled tasks, retry state)
### Why 1 replica
Single-instance Redis with AOF persistence. Not replicated, not
clustered. Downsides:
- Node outage = Redis outage (cache regenerates, queue state is preserved
by AOF on the PVC)
- No failover — if the node hosting Redis dies, Redis restarts on another
node *but* the PVC is local-path (per-node), so the data is gone
For our scale this is acceptable. Redis holds no authoritative state
(everything that matters is in Postgres). Cache regenerates on first
request; Asynq retries enqueue on failure.
### PVC
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: redis-data
spec:
accessModes: [ReadWriteOnce]
storageClassName: local-path
resources: {requests: {storage: 5Gi}}
```
Uses k3s' built-in `local-path-provisioner`. The PVC binds to a local
directory on the node where the Redis pod lands (`/var/lib/rancher/k3s/storage/`).
`ReadWriteOnce` means only one pod at a time.
### Node affinity
```yaml
nodeSelector:
honeydue/redis: "true"
```
We labeled `ubuntu-8gb-nbg1-2` (hetzner1) with `honeydue/redis=true` so
Redis always lands there. This ensures the PVC finds its backing
storage (since PVCs with `local-path` are per-node).
```bash
kubectl label node ubuntu-8gb-nbg1-2 honeydue/redis=true --overwrite
```
### Why not Redis Sentinel / Cluster
Complexity. At our scale (~a few req/s, kilobytes of cache), a single
Redis does fine. If Redis becomes critical-path for availability, we'd:
- Use a managed Redis (Upstash, Dragonfly Cloud) — $5-15/mo, their problem
- Or run Redis Sentinel with 3 replicas — manageable but operational work
Neither is needed yet.
### Redis config
From the deployment:
```yaml
command:
- sh
- -c
- |
ARGS="--appendonly yes --appendfsync everysec --maxmemory 256mb --maxmemory-policy noeviction"
if [ -n "$REDIS_PASSWORD" ]; then
ARGS="$ARGS --requirepass $REDIS_PASSWORD"
fi
exec redis-server $ARGS
```
Settings:
- **`--appendonly yes --appendfsync everysec`** — AOF persistence,
fsync every second. Survives restarts with up to 1 second of data
loss.
- **`--maxmemory 256mb`** — Redis will refuse new data if it grows past
256 MB. Gives us a safety cap.
- **`--maxmemory-policy noeviction`** — we'd rather get errors than
silently drop data. This is the right choice when Redis holds queue
state (losing a queue item silently = missed job).
The `REDIS_PASSWORD` env var is optional. Currently empty (no auth). The
Redis pod is only reachable from inside the overlay network, and our
NetworkPolicies (once enabled) would restrict egress further.
## Resource summary
Combined requests and limits across all services:
| Service | CPU requests | CPU limits | Memory requests | Memory limits | Replicas |
|---|---|---|---|---|---|
| api | 100m | 1000m | 128Mi | 512Mi | 3 |
| admin | 50m | 500m | 64Mi | 256Mi | 1 |
| worker | 50m | 500m | 64Mi | 256Mi | 1 |
| redis | 100m | 500m | 128Mi | 512Mi | 1 |
| traefik (kube-system) | ~100m | unlimited | ~50Mi | unlimited | 3 |
| **Total requests** | **~750m** | | **~550Mi** | | |
Each node has 4000m CPU + 8192Mi memory. Total cluster capacity is
12000m + 24576Mi. We're using roughly 6% CPU and 2% memory for requests
— tons of headroom.
## Health check semantics
Kubernetes distinguishes three probe types:
- **startupProbe** — is the container done starting? Runs until it passes
once, then stops. While running, the other probes are disabled.
Failing startupProbe = container killed and restarted.
- **readinessProbe** — is the container ready to serve traffic? A failing
pod is removed from Service endpoints (traffic stops flowing to it)
but the pod keeps running.
- **livenessProbe** — is the container healthy? A failing pod is killed
and restarted.
### Why we tuned startupProbe separately
The api's first-boot migration takes 90240s. If we only had a
readinessProbe with a typical initialDelay of 5s + failureThreshold of 3,
the pod would be killed before migration finishes. startupProbe lets us
give generous first-boot grace (240s) without affecting the sharper
ongoing readiness/liveness checks.
### Probe path design
Each service's `/health` endpoint should be:
- Cheap (no DB query, no external call)
- Fast (< 100ms)
- Honest (returns 200 iff the process can serve)
Our api's `/api/health/` does a trivial check. It does NOT verify Postgres
connectivity (to avoid cascading DB failures tearing down all api pods).
If Postgres is down, api pods stay "ready" and return 5xx for actual
endpoints — that's the right behavior.
## Log routing
All container logs go to stdout/stderr. containerd captures them to
`/var/log/containers/` on the node. `kubectl logs` fetches them via the
kubelet's /api/v1/pods/<pod>/log endpoint.
We have **no log aggregation** in the cluster (no Loki, no ELK, no
Datadog). For debugging we use:
```bash
kubectl logs -n honeydue deploy/api -f --prefix
kubectl logs -n honeydue deploy/api --previous # previous pod's logs
```
See [Chapter 15](./15-observability.md).
## Rolling update semantics
When you push a new image and `kubectl set image` or `kubectl apply` with
a new image tag:
1. Kubernetes creates a new ReplicaSet with the new image
2. Starts 1 new pod (per `maxSurge: 1`)
3. Waits for it to pass readinessProbe
4. Removes 1 pod from the old ReplicaSet
5. Repeats until all N pods are on the new ReplicaSet
6. Old ReplicaSet stays around (for rollback) with 0 replicas
For api (3 replicas): total rollout time is roughly
`3 × (pod_startup_time + small_buffer)` = ~15 minutes in the cold-boot
case, seconds for warm updates where migrations are no-op.
During the rollout:
- Service endpoint set updates as pods become ready
- kube-proxy IPVS is reprogrammed on each node
- Traefik's connection pool to the Service invalidates gradually
Users see no downtime if the new image is compatible. If it's broken:
```bash
kubectl rollout undo deployment/api -n honeydue
```
Reverts to the previous ReplicaSet. Typically takes 30 seconds to
stabilize.
## Why no StatefulSet
For Redis (the only stateful thing we run), we use a Deployment + PVC.
StatefulSet is designed for:
- Ordered startup (pod-0 before pod-1)
- Stable hostnames (pod-0 gets DNS name `redis-0.redis`)
- Per-replica PVCs
We have one Redis replica. None of those features matter for a
singleton. Deployment + PVC + nodeSelector is simpler and equivalent.
If we ever run Redis Sentinel or Cluster, we'd migrate to StatefulSet.
## Operator cheat sheet
```bash
# See all pods in honeydue namespace
kubectl get pods -n honeydue -o wide
# Per-service rollout status
kubectl rollout status deployment/api -n honeydue
# Scale a service
kubectl scale deployment/api -n honeydue --replicas=5
# Restart all pods (e.g., to re-read a configmap)
kubectl rollout restart deployment/api -n honeydue
# Exec into a pod
kubectl exec -it -n honeydue deploy/admin -- /bin/sh
# Describe a pod (shows events, probe state, restarts)
kubectl describe pod -n honeydue <pod-name>
# Resource usage
kubectl top pods -n honeydue
```
## References
- [Kubernetes Deployments][deploy]
- [Pod lifecycle + probes][probes]
- [Asynq scheduler limitations][asynq-sched]
- [K3s local-path provisioner][k3s-lp]
[deploy]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
[probes]: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-lifecycle
[asynq-sched]: https://github.com/hibiken/asynq/wiki/Periodic-Tasks
[k3s-lp]: https://docs.k3s.io/storage#setting-up-the-local-storage-provider
+298
View File
@@ -0,0 +1,298 @@
# 08 — Database (Neon Postgres)
## Summary
Authoritative user data lives in a Neon-managed Postgres database in AWS
us-east-1. Connections use TLS (`DB_SSLMODE=require`). Schema is managed
via GORM AutoMigrate inside the api binary, coordinated across replicas
by a Postgres advisory lock to prevent concurrent migration attempts.
## Why Neon
### Decision matrix
At deploy time we considered:
| Option | Setup effort | Monthly cost | Backup/PITR | Scale ceiling | Notes |
|---|---|---|---|---|---|
| **Neon Launch** | Zero (managed) | $5-15 | Included | Large | **Picked** |
| Postgres on a Hetzner VPS | High | $8 (VPS) | Manual | Medium | More ops |
| AWS RDS | Medium | $30+ | Included | Huge | Overkill, expensive |
| Supabase Free | Zero | $0 | Limited | Small | Free tier has quota limits |
| CNPG on our k3s | High (Helm) | $0 (using cluster) | Self-rolled | Medium | Operational burden |
Neon Launch won on:
- **Serverless**: scales compute to zero when idle (cheap)
- **Branch databases**: we can create dev/staging branches from prod in seconds
- **Connection pooling built-in**: PgBouncer on the hostname suffix `-pooler`
- **Point-in-time recovery** included (paid tier)
- **Pay-as-you-go** with a $5 minimum — fits a bootstrapped app
### Connection details
| Field | Value |
|---|---|
| Hostname | `ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech` |
| Port | 5432 |
| Username | `neondb_owner` |
| Database | `honeyDue` (case-sensitive!) |
| TLS mode | `require` (enforced by Neon; app pg driver verifies) |
| Branch | production (Neon's concept — isolated DB within the project) |
### The database name is case-sensitive
Postgres identifiers are lowercase unless quoted. Neon's UI created the
database as `"honeyDue"` (quoted, camelCase preserved). In `prod.env` /
ConfigMap we must use exactly `POSTGRES_DB=honeyDue` — lowercase
`honeydue` gets a `database "honeydue" does not exist` error. This bit
us during the initial Swarm deploy (Chapter 19 §Neon DB name).
## Connection pooling
### Why it matters
Postgres is memory-hungry per connection (~5-10 MB each). 3 api replicas
× `DB_MAX_OPEN_CONNS=25` = up to 75 direct Postgres connections. Add
the worker's 25. Neon's free tier caps at 100 concurrent connections;
paid tiers much higher.
### PgBouncer on Neon
Neon provides a built-in PgBouncer at `-pooler` subdomain. Our hostname
already includes `-pooler` handling in the route, so connections go
through PgBouncer transparently.
Modes PgBouncer supports:
- **session** — one server connection held per client session (transparent)
- **transaction** — server connection released after each transaction (high-throughput)
- **statement** — per-statement (most aggressive; breaks many features)
Neon's pooler runs in **transaction mode**. This is compatible with GORM
out of the box (we don't use session-level features like prepared
statements or session variables).
### Connection pool settings
In `prod.env`:
```
DB_MAX_OPEN_CONNS=25
DB_MAX_IDLE_CONNS=10
DB_MAX_LIFETIME=600s
```
These are the Go `database/sql` pool settings (GORM uses `database/sql`
underneath):
- **MaxOpenConns: 25** — at most 25 concurrent connections per replica
- **MaxIdleConns: 10** — keep up to 10 warm connections ready to reuse
- **MaxLifetime: 600s** — recycle connections after 10 min (prevents
stale state in long-lived connections, good for Neon's idle timeout)
### Worst-case connection count
3 api + 1 worker replicas × 25 conns = 100 peak. Right at Neon free
tier's ceiling, with zero margin. **This is a real risk** — a spike that
saturates the pool on all replicas simultaneously would exhaust Neon's
limit.
Mitigations to consider:
- Drop `DB_MAX_OPEN_CONNS` to 15 → 60 peak. Safe on free tier.
- Upgrade to Neon Scale plan (1000+ connections).
- Rely on Neon's PgBouncer to multiplex — the raw backend connections
to Postgres-proper are pooled, not our TCP connections to Neon.
Currently we trust Neon's pooler to handle the multiplexing and run with
the default 25/10. If we hit connection errors in prod, adjust.
## Schema management
### GORM AutoMigrate
On startup, the Go API's `cmd/api/main.go` calls
`database.MigrateWithLock()` which:
1. Opens a dedicated Postgres connection
2. `SELECT pg_advisory_lock(1751412071)` — acquires a session-level
advisory lock on a hardcoded key
3. Calls `db.AutoMigrate(&models.*{})` for every GORM model
4. `SELECT pg_advisory_unlock(...)` via deferred function
5. Close the connection
The advisory lock serializes migrations across replicas: when 3 api
pods start simultaneously, one acquires the lock and migrates; the
others block on the lock. Once the first finishes (≤2s for already-
migrated schema, up to 90s on first cold boot), the next acquires and
sees the schema is current (no-op migrate).
### Why an advisory lock
Without it, concurrent `CREATE TABLE IF NOT EXISTS ...` statements from
multiple replicas would race — Postgres usually handles it, but GORM's
AutoMigrate also alters tables (adds columns, indexes) which can deadlock
under concurrency.
The advisory lock pattern (also used by Rails + Django + Alembic) is the
canonical solution.
### The lock key
`1751412071` is a hardcoded integer in `internal/database/database.go`.
Arbitrary but unique — as long as nothing else in the Postgres instance
uses the same advisory lock key, no conflicts.
### First-boot behavior
On a **fresh database** (new Neon project), the first api pod runs
through every model's `CREATE TABLE` statement. This is ~50 tables for
honeyDue and takes ~90 seconds.
On a **warm database** (tables already exist), AutoMigrate is fast —
typically under 2 seconds. It still runs (GORM checks every model
against the schema) but finds no work to do.
### Where this bit us
With 3 api pods starting simultaneously and migrations taking 90s first
time, the lock queue for the last replica is ~180s. We needed a
startupProbe grace of 240s to cover this without false restart loops.
See Chapter 7 §startupProbe and Chapter 19 §MigrateWithLock.
### Downside: no schema versioning
AutoMigrate can only *add* — new tables, new columns, new indexes. It
won't drop columns, rename them, or change types destructively. For
those we'd need raw SQL migrations (a tool like `golang-migrate` or
`dbmate`).
Today: we accept that schema changes are additive-only. When we need
destructive changes, we'd hand-write them.
## What's in the database
Major tables (see `honeyDueAPI-go/internal/models/`):
| Table | Purpose |
|---|---|
| `auth_user` | Users (Django legacy name kept for compatibility) |
| `user_userprofile` | Profile data |
| `authtoken_token` | API auth tokens |
| `residence_residence` | Properties users manage |
| `task_task` | Maintenance tasks |
| `task_taskcompletion` | Task completion history |
| `contractor_contractor` | Contractor contacts |
| `documents_document` | Document records (files in B2) |
| `notification_notification` | In-app notifications |
| `subscription_usersubscription` | IAP subscriptions |
| `admin_users` | Next.js admin panel users |
See `honeyDueAPI-go/docs/TASK_LOGIC_ARCHITECTURE.md` for the task logic
model details.
## Backup and recovery
### Neon's built-in
Neon Launch includes **point-in-time recovery** within the last 24h
(longer on Scale plan). To restore:
1. Go to Neon console → project → Backups
2. Create a branch from a timestamp
3. Point the app at the new branch (change `DB_HOST` in our ConfigMap)
Done. No tape-wrangling.
### What we don't have
- Off-site backup (if Neon itself is compromised, we have no exfil). A
nightly `pg_dump` to Backblaze B2 would close this gap. **TODO**
(Chapter 20).
- Tested DR drills. We've never actually restored from a Neon backup
into a new branch and pointed the app at it. Should be routine; hasn't
been exercised.
## Migrations from old MyCrib/Casera data
honeyDue originally ran on a Django codebase (MyCrib / Casera-era). The
schema inherits Django's naming (`app_model` table names, `_id` suffix
foreign keys). The Go app's GORM models have `TableName()` methods that
preserve this:
```go
func (Task) TableName() string { return "task_task" }
```
This isn't ideal (GORM's default `tasks` would be cleaner), but changing
would require a migration that renames every table — more risk than
value.
## Neon regions
Neon's default region for new projects is `aws-us-east-1` (Virginia).
Our DB is there. Latency from Nuremberg to us-east-1 is **~90-120ms
round trip**.
This is the slowest hop in our data flow. Every api request that needs
a DB query (most of them) pays this latency at least once.
**When this matters**: When we start seeing ~200ms+ response times from
complex endpoints, it's likely DB latency dominant. Options:
- Migrate Neon to `aws-eu-central-1` (Frankfurt) — shaves ~90ms off
- Add Redis caching for hot reads (Chapter 7)
- Read replicas (Neon supports them on paid tiers)
## Environment variables the app reads
From ConfigMap:
| Var | Purpose |
|---|---|
| `DB_HOST` | Neon pooler hostname |
| `DB_PORT` | 5432 |
| `POSTGRES_USER` | `neondb_owner` |
| `POSTGRES_DB` | `honeyDue` |
| `DB_SSLMODE` | `require` |
| `DB_MAX_OPEN_CONNS` | 25 |
| `DB_MAX_IDLE_CONNS` | 10 |
| `DB_MAX_LIFETIME` | `600s` |
From Secret (`honeydue-secrets`):
| Var | Purpose |
|---|---|
| `POSTGRES_PASSWORD` | Neon DB password |
## Operator cheat sheet
```bash
# Connect to Neon from workstation (requires psql + the password)
PGPASSWORD="<pw>" psql -h ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech \
-U neondb_owner -d honeyDue
# From a pod (lets you debug against the actual in-cluster network path)
kubectl exec -n honeydue -it deploy/api -- sh
# inside the pod (no psql by default, but wget + JSON API works)
wget -qO- http://127.0.0.1:8000/api/health/
# See current migration state (no direct CLI, but the api logs show it)
kubectl logs -n honeydue deploy/api | grep -i migration
# See active connections (run against Neon)
SELECT count(*), usename, state, application_name
FROM pg_stat_activity
GROUP BY usename, state, application_name;
```
## References
- [Neon docs][neon-docs]
- [Neon pricing][neon-pricing]
- [Postgres advisory locks][pg-locks]
- [GORM AutoMigrate][gorm-automigrate]
- [honeyDue task architecture][task-arch] (repo-local)
[neon-docs]: https://neon.com/docs/introduction
[neon-pricing]: https://neon.com/pricing
[pg-locks]: https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS
[gorm-automigrate]: https://gorm.io/docs/migration.html
[task-arch]: ../../docs/TASK_LOGIC_ARCHITECTURE.md
+265
View File
@@ -0,0 +1,265 @@
# 09 — Object Storage (Backblaze B2)
## Summary
User-uploaded files (photos, documents, task completion attachments) go
to Backblaze B2 via its S3-compatible API. The Go API uses `minio-go/v7`
as the client. This works around a Swarm-era problem where named volumes
are per-node — uploads on node A were invisible to replicas on B and C.
With k3s we could use a shared PVC instead, but B2 is cheaper, offsite,
and already set up.
## Why Backblaze B2
### Decision matrix
| Option | Price per TB stored | Egress | Pros | Cons |
|---|---|---|---|---|
| **Backblaze B2** | **$6/mo** | $0.01/GB, free via CF | Cheap, hard spending caps, S3-compatible | US-West/East regions only (not EU) |
| AWS S3 Standard | $23/mo | $0.09/GB | Most ubiquitous | Expensive |
| Cloudflare R2 | $15/mo | Free (!) | Zero egress, CF-native | Newer, fewer features |
| DigitalOcean Spaces | $5/mo for 250GB + $0.01/GB | Free 1TB, $0.01/GB after | Simple | Less reliable than AWS |
| Local PVC on k3s | $0 | $0 | Already in cluster | Per-node, no HA, no offsite |
B2 won because:
1. **Hard spending cap** — unique in the industry. No surprise AWS bill.
2. **Cheapest at rest** — 34× cheaper than S3.
3. **Free egress through Cloudflare** — we already use CF; when we
eventually serve upload URLs through CF, egress is free.
4. **Mature S3-compatible API** — minio-go talks to it natively.
Rejected:
- **R2** was the close second. Zero egress is amazing. Rejected
primarily for inertia (B2 already set up in the MyCrib era). A future
migration to R2 would be reasonable.
- **Local PVC** doesn't work for our setup because we want uploads
durable and accessible from any node/replica.
## Configuration
Bucket: `honeyDueProd` (mixed case; B2 allows this, minio-go handles it
via path-style addressing — see §path-style below).
Region: `us-east-005` (B2's South Carolina region — closer to our
Neon DB in AWS us-east-1 than the West Coast options).
Endpoint: `s3.us-east-005.backblazeb2.com`
### Environment variables
From ConfigMap:
| Var | Value |
|---|---|
| `B2_ENDPOINT` | `s3.us-east-005.backblazeb2.com` |
| `B2_BUCKET_NAME` | `honeyDueProd` |
| `B2_REGION` | `us-east-005` |
| `B2_USE_SSL` | `true` (but see §vestigial var below) |
From Secret:
| Var | Value |
|---|---|
| `B2_KEY_ID` | App key ID (B2-specific identifier) |
| `B2_APP_KEY` | App key secret |
### App key scope
The B2 app key is **bucket-scoped**, not account-scoped. Can only
read/write the `honeyDueProd` bucket. Cannot:
- List other buckets
- Delete the bucket
- Create new buckets
- Touch account settings
This is the B2 equivalent of an IAM role with least privilege. If the
key leaks, the damage is limited to the `honeyDueProd` bucket.
## The minio-go client
The Go app uses `github.com/minio/minio-go/v7` — a Go SDK compatible
with any S3-flavored API. Relevant code at
`internal/services/storage_backend_s3.go`:
```go
client, err := minio.New(endpoint, &minio.Options{
Creds: credentials.NewStaticV4(keyID, appKey, ""),
Secure: useSSL,
Region: region,
})
```
### Path-style vs virtual-hosted addressing
S3's URL scheme has two flavors:
- **Virtual-hosted**: `https://mybucket.s3.amazonaws.com/mykey`
- **Path-style**: `https://s3.amazonaws.com/mybucket/mykey`
With virtual-hosted style, the bucket name must be DNS-compatible —
lowercase, no uppercase letters. `honeyDueProd` fails this.
With path-style, the bucket name is just a URL path segment — any valid
string works.
minio-go auto-detects: for AWS S3 it prefers virtual-hosted; for
non-AWS endpoints (like B2) it defaults to path-style. So
`honeyDueProd` with capital letters works transparently.
## The `B2_USE_SSL` vestigial variable
`prod.env` has `B2_USE_SSL=true`. But the Go app's
`internal/config/config.go:295` reads the env var
`STORAGE_USE_SSL`, not `B2_USE_SSL`:
```go
S3UseSSL: viper.GetString("STORAGE_USE_SSL") == "" || viper.GetBool("STORAGE_USE_SSL"),
```
Whoever wrote the original config used `B2_USE_SSL` in `prod.env` and
`STORAGE_USE_SSL` in the code. They don't match.
**Net effect**: The app reads `STORAGE_USE_SSL`, which is unset, and
the default `(empty) || true` evaluates to `true`. So SSL is always on,
despite `B2_USE_SSL=false` or `true` or anything else.
This is a dormant bug. Anyone setting `B2_USE_SSL=false` expecting to
disable TLS would be surprised it stays on. Fortunately that's the
right default for production B2 (which only accepts HTTPS anyway).
**TODO**: Rename `STORAGE_USE_SSL``B2_USE_SSL` in the Go code to
match the config. Documented in Chapter 19 §Vestigial config.
## What we store there
Today (limited rollout):
- User profile photos
- Task completion photos
- Document uploads (PDFs, images attached to records)
File keys follow a hierarchy like:
```
users/<user_id>/profile/<uuid>.jpg
residences/<residence_id>/documents/<uuid>.pdf
tasks/<task_id>/completions/<uuid>.jpg
```
Max file size is **10 MB** per upload (`STORAGE_MAX_FILE_SIZE=10485760`).
Allowed MIME types: `image/jpeg`, `image/png`, `image/gif`, `image/webp`,
`application/pdf` (`STORAGE_ALLOWED_TYPES`).
## Access control
### Upload flow
1. Client POSTs to `/api/upload/`
2. Go API validates the user is authenticated and authorized for the
target resource
3. Go API streams the upload to B2 via minio-go's `PutObject`
4. B2 returns a key
5. Go API stores the key in Postgres
6. Returns the key to the client
The B2 bucket is **private**. Clients can't GET directly; they always
go through the Go API.
### Download flow (current)
1. Client requests `/api/media/<key>`
2. Go API checks the user can access this key
3. Go API fetches from B2 and streams back to the client
This proxies every download through the api. For high-traffic media
that's inefficient (api becomes an egress bottleneck).
### Future: signed URLs
We could generate time-limited signed URLs for B2 objects:
```go
url, err := s3Client.PresignedGetObject(ctx, bucket, key, 1*time.Hour, nil)
```
Returns a URL the client can GET directly from B2, scoped to a specific
object, valid for 1h. Saves api bandwidth and latency.
Not yet implemented. TODO (Chapter 20).
## Lifecycle and retention
We have **no lifecycle rules** set on the bucket. Objects live forever
unless the app deletes them.
When a user deletes their account, the app should delete their B2
objects. This is currently not automated — a compliance gap for any
"right to be forgotten" request.
**TODO** (Chapter 20): Either:
- Implement explicit cleanup in the user deletion handler, or
- Add B2 lifecycle rule tied to object metadata (tag objects with
user ID; rule deletes tagged objects when user is soft-deleted)
## Backup of B2
We have no backup of B2 objects. B2 itself replicates within the region,
but:
- Accidental deletion via our app = data gone
- B2 itself being compromised = data gone
B2 offers **Object Lock** (WORM — write once read many) which prevents
deletion for a retention period. Not enabled; revisit if/when user data
sensitivity justifies it.
## Cost projection
Current usage is **small** — estimated <50 GB stored.
```
50 GB × $0.006/GB = $0.30/mo storage
1 GB/mo egress (mostly uncached media served via api) → $0.01 (first
3× of stored amount is free anyway, so effectively $0)
```
Total B2 cost: **< $1/mo**. Hard spending cap set to $20/mo in B2
console — if we ever breach that, something's wrong and we want to
know immediately.
At 100k users each uploading ~10 MB average:
- 1 TB stored = $6/mo
- Egress depends on access patterns; with signed URLs served through CF
the egress could still be ~free
## Operator cheat sheet
```bash
# List bucket contents (requires mc or aws CLI configured with B2 creds)
mc alias set b2 https://s3.us-east-005.backblazeb2.com <KEY_ID> <APP_KEY>
mc ls b2/honeyDueProd/
# Count objects
mc find b2/honeyDueProd/ --type f | wc -l
# Download an object
mc cp b2/honeyDueProd/<key> ./
# Check B2 console for usage graphs:
# https://secure.backblaze.com/b2_buckets.htm
```
From inside a Go api pod:
```bash
# Check the in-cluster client config
kubectl exec -n honeydue deploy/api -- env | grep B2_
```
## References
- [Backblaze B2 docs][b2-docs]
- [B2 S3-compatible API][b2-s3]
- [minio-go/v7][minio-go]
- [S3 path-style vs virtual-hosted][s3-style]
[b2-docs]: https://www.backblaze.com/docs/
[b2-s3]: https://www.backblaze.com/docs/cloud-storage-s3-compatible-api
[minio-go]: https://github.com/minio/minio-go
[s3-style]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html
+369
View File
@@ -0,0 +1,369 @@
# 10 — Secrets & Config
## Summary
Non-sensitive config (hostnames, ports, feature flags, etc.) lives in
`honeydue-config` ConfigMap. Sensitive values (DB password, signing
keys, API keys) live in `honeydue-secrets` and `honeydue-apns-key`
Secrets. Container registry auth lives in `gitea-credentials` (type
`kubernetes.io/dockerconfigjson`). This chapter maps every env var to
its source and explains what's stored where.
## Structure
```mermaid
flowchart LR
subgraph SourceWorkstation[Operator workstation]
ProdEnv[deploy/prod.env]
Secrets[deploy/secrets/*.txt]
Registry[deploy/registry.env]
end
subgraph K8s[Kubernetes cluster]
CM[honeydue-config<br/>ConfigMap]
S1[honeydue-secrets<br/>Secret]
S2[honeydue-apns-key<br/>Secret]
S3[gitea-credentials<br/>Secret]
end
subgraph Pods
Api[api pod]
Admin[admin pod]
Worker[worker pod]
end
ProdEnv -. kubectl create configmap<br/>--from-env-file .-> CM
Secrets -. kubectl create secret<br/>--from-file/--from-literal .-> S1
Secrets -. --from-file .-> S2
Registry -. kubectl create secret docker-registry .-> S3
CM -- envFrom --> Api & Admin & Worker
S1 -- env: secretKeyRef --> Api & Worker
S2 -- volumeMounts --> Api & Worker
S3 -- imagePullSecrets --> Api & Admin & Worker
```
## ConfigMap: honeydue-config
Built from `deploy/prod.env` (minus sensitive keys). Contents (58 keys,
abbreviated):
```
ADMIN_PANEL_URL=https://admin.myhoneydue.com
ALLOWED_HOSTS=api.myhoneydue.com,myhoneydue.com
APNS_AUTH_KEY_ID=DISABLED01
APNS_AUTH_KEY_PATH=/secrets/apns/apns_auth_key.p8
APNS_PRODUCTION=false
APNS_TEAM_ID=DISABLED01
APNS_TOPIC=com.tt.honeyDue
APNS_USE_SANDBOX=false
BASE_URL=https://myhoneydue.com
B2_BUCKET_NAME=honeyDueProd
B2_ENDPOINT=s3.us-east-005.backblazeb2.com
B2_REGION=us-east-005
B2_USE_SSL=true
CORS_ALLOWED_ORIGINS=https://myhoneydue.com,https://admin.myhoneydue.com
DAILY_DIGEST_HOUR=3
DB_HOST=ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech
DB_MAX_IDLE_CONNS=10
DB_MAX_LIFETIME=600s
DB_MAX_OPEN_CONNS=25
DB_PORT=5432
DB_SSLMODE=require
DEBUG=false
DEFAULT_FROM_EMAIL=noreply@myhoneydue.com
EMAIL_HOST=smtp.fastmail.com
EMAIL_HOST_USER=treytartt@fastmail.com
EMAIL_PORT=587
EMAIL_USE_TLS=true
FEATURE_EMAIL_ENABLED=true
FEATURE_ONBOARDING_EMAILS_ENABLED=true
FEATURE_PDF_REPORTS_ENABLED=true
FEATURE_PUSH_ENABLED=false
FEATURE_WEBHOOKS_ENABLED=true
FEATURE_WORKER_ENABLED=true
NEXT_PUBLIC_API_URL=https://api.myhoneydue.com
OVERDUE_REMINDER_HOUR=15
PORT=8000
POSTGRES_DB=honeyDue
POSTGRES_USER=neondb_owner
REDIS_DB=0
REDIS_URL=redis://redis:6379/0
STATIC_DIR=/app/static
STORAGE_ALLOWED_TYPES=image/jpeg,image/png,image/gif,image/webp,application/pdf
STORAGE_BASE_URL=/uploads
STORAGE_MAX_FILE_SIZE=10485760
STORAGE_UPLOAD_DIR=/app/uploads
TASK_REMINDER_HOUR=14
TIMEZONE=UTC
```
Plus empty-but-declared keys for optional integrations (Apple/Google
auth + IAP).
### How pods use it
```yaml
envFrom:
- configMapRef:
name: honeydue-config
```
Every key in the ConfigMap becomes an env var in the container.
`envFrom` is bulk — no need to enumerate each one.
### Changing config
Edit `deploy/prod.env` locally, regenerate the ConfigMap:
```bash
# Simplified; see scripts for the full version
kubectl create configmap honeydue-config -n honeydue \
--from-env-file=deploy/prod.env \
--dry-run=client -o yaml | kubectl apply -f -
# Pods don't auto-reload env vars. Restart to pick up changes:
kubectl rollout restart -n honeydue deploy/api deploy/admin deploy/worker
```
## Secret: honeydue-secrets (Opaque)
9 keys:
| Key | Purpose |
|---|---|
| `POSTGRES_PASSWORD` | Neon DB password |
| `SECRET_KEY` | Django-compat signing key (64 chars, base64) |
| `EMAIL_HOST_PASSWORD` | Fastmail app password |
| `FCM_SERVER_KEY` | FCM push key (currently placeholder, push disabled) |
| `REDIS_PASSWORD` | Empty (no auth on in-cluster Redis) |
| `B2_KEY_ID` | Backblaze B2 app key ID |
| `B2_APP_KEY` | Backblaze B2 app key secret |
| `ADMIN_EMAIL` | Next.js admin panel initial admin email |
| `ADMIN_PASSWORD` | Next.js admin panel initial admin password |
### How pods use it
Individual `env:` entries wire specific Secret keys to env vars:
```yaml
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: POSTGRES_PASSWORD
- name: SECRET_KEY
valueFrom:
secretKeyRef:
name: honeydue-secrets
key: SECRET_KEY
# ... etc
```
This pattern (vs. `envFrom: secretRef:`) is more explicit — you know
exactly which secret keys a pod uses by reading the manifest.
### ADMIN_PASSWORD — one-time use
The Go app's `internal/database/database.go:519-538` reads
`ADMIN_EMAIL` + `ADMIN_PASSWORD` at startup. If the `admin_users` table
doesn't have a row for that email, it inserts one with a bcrypt hash of
the password. Already-existing rows are **not** updated.
So:
- First deploy: admin user created
- Subsequent deploys: no-op
- If you want to rotate the initial admin password: do it in the admin
panel UI, not by changing `ADMIN_PASSWORD`
After first deploy you can technically blank `ADMIN_PASSWORD` in the
Secret. Leaving it set is harmless but slightly messy.
## Secret: honeydue-apns-key (Opaque)
One file: `apns_auth_key.p8`. Mounted as a volume into api and worker
pods at `/secrets/apns/apns_auth_key.p8` (read-only).
Push is currently **disabled** (`FEATURE_PUSH_ENABLED=false`), so
this `.p8` is a throwaway EC P-256 private key generated by
`openssl genpkey`. It passes the Go app's "does this file contain
`BEGIN PRIVATE KEY`" validation but cannot authenticate against Apple.
When push is enabled:
1. Generate a real APNs auth key in Apple Developer console
2. Replace `deploy/secrets/apns_auth_key.p8`
3. Update `APNS_AUTH_KEY_ID`, `APNS_TEAM_ID`, `APNS_TOPIC` in ConfigMap
4. `kubectl create secret generic honeydue-apns-key ... --dry-run=client -o yaml | kubectl apply -f -`
5. Set `FEATURE_PUSH_ENABLED=true`
6. `kubectl rollout restart` api and worker
## Secret: gitea-credentials (docker-registry)
Type `kubernetes.io/dockerconfigjson`. Contains a base64-encoded Docker
config for Gitea registry auth.
Created via:
```bash
kubectl create secret docker-registry gitea-credentials \
--namespace=honeydue \
--docker-server=gitea.treytartt.com \
--docker-username=admin \
--docker-password=<gitea PAT> \
--dry-run=client -o yaml | kubectl apply -f -
```
Referenced in every deployment that pulls from Gitea:
```yaml
spec:
imagePullSecrets:
- name: gitea-credentials
```
When a pod needs to pull an image, the kubelet reads this secret and
uses it for the registry authentication.
## Source files — what's canonical
The Swarm-era files are still the **source of truth** for secrets:
| File | Contents | Canonical? |
|---|---|---|
| `deploy/prod.env` | All non-sensitive config | Yes |
| `deploy/secrets/postgres_password.txt` | Neon DB password | Yes |
| `deploy/secrets/secret_key.txt` | App signing key | Yes |
| `deploy/secrets/email_host_password.txt` | Fastmail password | Yes |
| `deploy/secrets/fcm_server_key.txt` | FCM key (placeholder) | Yes |
| `deploy/secrets/apns_auth_key.p8` | APNs key (placeholder) | Yes |
| `deploy/registry.env` | Gitea registry auth | Yes |
| `deploy-k3s/manifests/secrets.yaml.example` | Template only (never committed with real values) | No — template |
| In-cluster Secrets | Live state | Derived |
### Why canonical lives in `deploy/` not `deploy-k3s/`
Historical. We migrated from Swarm to k3s but kept the source files
untouched. Rather than move them now (and break any remaining Swarm-era
tooling), we use them from the k3s setup scripts as-is.
Future cleanup: move to `deploy-k3s/secrets/` for better provenance.
## Recreating the cluster secrets
If the k3s cluster is rebuilt, the Secrets need to be recreated from the
local source files. Rough procedure:
```bash
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
# Namespace first
kubectl create namespace honeydue
# Docker config secret for Gitea
set -a; source deploy/registry.env; set +a
kubectl create secret docker-registry gitea-credentials \
-n honeydue \
--docker-server="$REGISTRY" \
--docker-username="$REGISTRY_USERNAME" \
--docker-password="$REGISTRY_TOKEN"
# Main secrets bundle
set -a; source deploy/prod.env; set +a
kubectl create secret generic honeydue-secrets -n honeydue \
--from-literal=POSTGRES_PASSWORD="$(tr -d '\n' < deploy/secrets/postgres_password.txt)" \
--from-literal=SECRET_KEY="$(tr -d '\n' < deploy/secrets/secret_key.txt)" \
--from-literal=EMAIL_HOST_PASSWORD="$(tr -d '\n' < deploy/secrets/email_host_password.txt)" \
--from-literal=FCM_SERVER_KEY="$(tr -d '\n' < deploy/secrets/fcm_server_key.txt)" \
--from-literal=REDIS_PASSWORD="" \
--from-literal=B2_KEY_ID="$B2_KEY_ID" \
--from-literal=B2_APP_KEY="$B2_APP_KEY" \
--from-literal=ADMIN_EMAIL="$ADMIN_EMAIL" \
--from-literal=ADMIN_PASSWORD="$ADMIN_PASSWORD"
# APNS key Secret
kubectl create secret generic honeydue-apns-key -n honeydue \
--from-file=apns_auth_key.p8=deploy/secrets/apns_auth_key.p8
# ConfigMap from prod.env (minus secret keys)
# See deploy-k3s/scripts/02-setup-secrets.sh for the full version
# Simplified:
declare -a args
secret_keys="POSTGRES_PASSWORD SECRET_KEY EMAIL_HOST_PASSWORD FCM_SERVER_KEY REDIS_PASSWORD B2_KEY_ID B2_APP_KEY ADMIN_EMAIL ADMIN_PASSWORD"
while IFS='=' read -r k v; do
[[ -z "$k" || "$k" =~ ^# ]] && continue
for sk in $secret_keys; do [[ "$k" == "$sk" ]] && continue 2; done
args+=(--from-literal="$k=$v")
done < deploy/prod.env
kubectl create configmap honeydue-config -n honeydue "${args[@]}"
```
The full version with all edge cases is in
`deploy-k3s/scripts/02-setup-secrets.sh` (which was written for the
GHCR-era assumption; adapt for Gitea).
## Pitfalls
### Trailing newlines in secret files
Secret files created by text editors typically end with a newline. If
we pass the content directly, the newline becomes part of the secret
— a mismatch to what the app expects.
We strip trailing newlines with `tr -d '\n'` before creating Secrets.
If you forget, your DB password will be silently wrong.
### Case sensitivity on POSTGRES_DB
`POSTGRES_DB=honeyDue` must be exactly `honeyDue`. `honeydue` (lowercase)
fails with `database "honeydue" does not exist`. Postgres identifiers
are case-sensitive if originally quoted at CREATE time.
### Placeholder detection
The Swarm-era deploy script rejected values containing `CHANGEME`,
`your-`, `paste_here`, etc. When setting up the k3s cluster we had to
strip those from `prod.env` first. If you ever see a pod error about
"invalid host" or "invalid key id", check if a placeholder leaked
through.
### B2_USE_SSL vs STORAGE_USE_SSL
The config has `B2_USE_SSL` but the Go code reads `STORAGE_USE_SSL`.
See Chapter 9 §Vestigial variable. Setting `B2_USE_SSL=false` in the
ConfigMap does nothing; SSL stays on.
## Operator cheat sheet
```bash
# Print a ConfigMap as env-file format
kubectl get cm honeydue-config -n honeydue -o jsonpath='{range .data}{"\n"}{end}'
# Edit a ConfigMap interactively (DOES NOT restart pods)
kubectl edit cm honeydue-config -n honeydue
# After editing a ConfigMap, restart pods to pick up
kubectl rollout restart -n honeydue deploy/api deploy/admin deploy/worker
# View a Secret (prints base64 — decode with base64 -d)
kubectl get secret honeydue-secrets -n honeydue -o yaml
# Reveal a specific secret value (DANGER: plaintext to stdout)
kubectl get secret honeydue-secrets -n honeydue \
-o jsonpath='{.data.POSTGRES_PASSWORD}' | base64 -d
# Update a single secret key
kubectl patch secret honeydue-secrets -n honeydue \
--type=merge -p "{\"data\":{\"SECRET_KEY\":\"$(echo -n 'newvalue' | base64)\"}}"
```
## References
- [Kubernetes ConfigMaps][cm]
- [Kubernetes Secrets][secret]
- [Secret types][secret-types]
[cm]: https://kubernetes.io/docs/concepts/configuration/configmap/
[secret]: https://kubernetes.io/docs/concepts/configuration/secret/
[secret-types]: https://kubernetes.io/docs/concepts/configuration/secret/#secret-types
+329
View File
@@ -0,0 +1,329 @@
# 11 — Container Registry (Gitea)
## Summary
We host our own container registry on Gitea at `gitea.treytartt.com`.
Every image push and pull goes there, not Docker Hub or GHCR. The Gitea
instance runs outside this k3s cluster (on its own VPS) and is available
at `https://gitea.treytartt.com` with public HTTPS. Image pulls are
authenticated via a Personal Access Token stored as a Kubernetes
`dockerconfigjson` Secret.
## Why Gitea
### Decision matrix
| Option | Cost | Auth model | Pros | Cons |
|---|---|---|---|---|
| **Gitea built-in registry** | $0 (already running Gitea) | Gitea PAT | Self-hosted, integrated with code | Another service to maintain |
| GHCR (GitHub Container Registry) | Free for public, $0 for private with paid plan | GitHub PAT | Popular, reliable | Uses GitHub; vendor dependency |
| Docker Hub | Free tier limited; paid $5-7/mo | Docker Hub account | Ubiquitous | Rate limits on anonymous pulls |
| AWS ECR | ~$1/mo for small use | IAM | Integrates with AWS workloads | AWS account required |
| Harbor (self-hosted) | $0 | Many options | Best enterprise features | Heavy to operate |
Gitea won primarily because **the operator was already running Gitea for
code hosting**. Container registry is built into Gitea 1.17+ as a free
feature. One fewer service to set up.
Side benefits:
- Code and images live together (one backup policy, one access model)
- PATs are scoped and rotatable via the same UI
- No external vendor to worry about for this critical piece of the
deploy pipeline
Rejected alternatives:
- **Docker Hub** — rate limits on unauthenticated pulls would bite us if
nodes pull the same image repeatedly during rolling updates
- **GHCR** — fine but adds GitHub dependency we don't otherwise have
- **Harbor** — massive overkill; we're not a 100-team enterprise
## Layout
Images live under the authenticated user's namespace:
```
gitea.treytartt.com/admin/honeydue-api:237c6b8
gitea.treytartt.com/admin/honeydue-worker:237c6b8
gitea.treytartt.com/admin/honeydue-admin:237c6b8
```
`admin` is the Gitea user that owns the images. Images are **private**
by default.
### Image tagging strategy
Tags are git short SHAs (e.g., `237c6b8`). Not `:latest`. Not semantic
version.
Rationale:
- `:latest` is ambiguous — which build? Rolling updates should roll a
*specific* tag so rollbacks are deterministic.
- `:v1.2.3` works for released libraries but our app rolls forward
continuously; versioning per deploy is unnecessary overhead.
- Git SHAs are unique, immutable, and tie each image to the exact
commit that built it.
`PUSH_LATEST_TAG=false` is set in `deploy/cluster.env`. When we rebuild
and push, only the SHA tag gets pushed. The `latest` tag is never
created by our deploy pipeline.
## Authentication
### Creating the PAT
At <https://gitea.treytartt.com/-/user/settings/applications>, we created
a token with scopes:
- `read:package`
- `write:package`
No other scopes. This token can only interact with package registry; it
can't read repo contents, create issues, or touch account settings.
### PAT on the operator workstation
Stored in `deploy/registry.env`:
```
REGISTRY=gitea.treytartt.com
REGISTRY_NAMESPACE=admin
REGISTRY_USERNAME=admin
REGISTRY_TOKEN=<pat>
```
This file is `.gitignore`d in `deploy/.gitignore`. If it ever gets
committed accidentally, rotate the PAT immediately.
### PAT in the cluster
Stored as the `gitea-credentials` Secret (type `dockerconfigjson`) in
the `honeydue` namespace. See Chapter 10.
Kubelet reads this Secret when a pod needs to pull from the Gitea
registry.
## The build pipeline
### Dockerfile multi-stage
`honeyDueAPI-go/Dockerfile` has three target stages:
- `api` — compiled Go binary + static assets for the HTTP API
- `worker` — compiled Go binary for the background worker
- `admin` — Next.js standalone build of the admin panel
A single Dockerfile keeps build-cache sharing efficient (the Go builder
stage produces binaries for both api and worker; admin reuses its own
Node builder stage).
### Multi-arch cross-compilation
The operator workstation is **arm64** (Apple Silicon). The Hetzner nodes
are **x86_64**. A naive `docker build` on arm64 produces arm64 images
that won't run on the nodes (`exec format error`).
The deploy pipeline uses `docker buildx`:
```bash
docker buildx build \
--platform linux/amd64 \
--target api \
-t gitea.treytartt.com/admin/honeydue-api:$SHA \
--push \
/Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
```
- **`--platform linux/amd64`** — cross-compile to x86_64
- **`--target api`** — which Dockerfile stage to build
- **`--push`** — push directly to the registry (skip local image cache)
The Go stages use the `TARGETARCH` build arg to produce the right
architecture binary. Node stages use QEMU emulation (which is slower but
acceptable for our ~1 min admin build).
### Buildx builder
We use a named buildx builder to keep state out of Docker's default
environment:
```bash
docker buildx create --name honeydue-builder --use
docker buildx inspect --bootstrap
```
The `honeydue-builder` is a docker-container driver — spawns a
BuildKit container when building, tears it down when idle. Supports
multi-platform and caches layers across builds.
## From local file to cluster — the full path
```mermaid
flowchart LR
subgraph dev[Operator workstation]
Code[Source code]
Dockerfile
Buildx[docker buildx]
end
subgraph Gitea[gitea.treytartt.com]
Reg[Package registry]
end
subgraph K8s[k3s cluster]
Kubelet
Containerd
Pod
end
Code --> Dockerfile
Dockerfile --> Buildx
Buildx -- push --> Reg
Reg -- pull --> Kubelet
Kubelet --> Containerd
Containerd --> Pod
```
### End-to-end
1. **Operator pushes code**: commits to `main` locally
2. **Operator builds + pushes image**: `docker buildx build --push ...`
from the repo root. Build takes 13 minutes first time, seconds on
warm cache.
3. **Image lands in Gitea**: visible at
`https://gitea.treytartt.com/admin/-/packages/container/honeydue-api`
4. **Operator updates Deployment**: `kubectl set image deployment/api
api=gitea.treytartt.com/admin/honeydue-api:$NEW_SHA -n honeydue`
5. **K8s begins rolling update**: creates new ReplicaSet with new image
6. **Kubelet on target node** sees a pod with an image it doesn't have
7. **Kubelet calls containerd**: "pull this image using these creds"
8. **Containerd authenticates** to Gitea registry using the PAT from
`gitea-credentials` Secret, downloads the image
9. **Containerd starts the container** with the new image
10. **Readiness probe passes**: new pod joins the Service endpoints
11. **Kubelet tears down** an old pod
## Pushing manually
If you need to push a one-off image (e.g., testing a fix):
```bash
# Login (once per session)
set -a; source deploy/registry.env; set +a
printf '%s' "$REGISTRY_TOKEN" | docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin
# Build + push
cd honeyDueAPI-go
SHA=$(git rev-parse --short HEAD)
docker buildx build \
--platform linux/amd64 \
--target api \
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" \
--push .
# Logout (don't leave creds in ~/.docker/config.json)
docker logout gitea.treytartt.com
```
## Image sizes
Current images:
| Image | Size | Layers |
|---|---|---|
| `honeydue-api` | ~53 MB | Alpine base + Go binary |
| `honeydue-worker` | ~50 MB | Alpine base + Go binary |
| `honeydue-admin` | ~150 MB | Node 20 alpine + Next.js standalone |
The Go binaries are statically compiled, CGO_ENABLED=0. Alpine is the
base for smallest footprint.
## Image retention
Gitea does **not auto-prune** images. Every `:<sha>` tag accumulates
forever. The package page at
`https://gitea.treytartt.com/admin/-/packages/container/honeydue-api`
lists them all.
At current pace (deploys ~few/week, images ~50-150 MB each), this grows
~10 GB/year. Not critical; 80 GB node disk can take years.
**TODO**: Add a monthly cleanup: delete all but last 30 tags per image.
Can be a cron job or a manual quarterly cleanup.
## Image verification — not yet
We do not sign images or verify signatures. An attacker who compromised
Gitea could push a malicious image under an existing tag (though Gitea
should prevent tag reuse if immutable tags are configured).
**TODO** (Chapter 20): Add [cosign](https://github.com/sigstore/cosign)
for signing at build time + `Kyverno` or `Connaisseur` policy to verify
at pull time.
## Gitea registry itself
The Gitea instance runs outside this k3s cluster on its own VPS
(operator's existing infrastructure). It's **not** part of the honeyDue
deployment — it's adjacent infrastructure.
If the Gitea host goes down:
- Currently-running pods keep working (they already pulled their images)
- New deployments/scale-ups fail at the image-pull step
- No impact on existing user traffic
This is an acceptable external dependency. Gitea host has its own
uptime story.
## Cost
**$0/mo.** Gitea registry is included in the Gitea install we already
pay the VPS for (not accounted to honeyDue's cost).
If we ever switched to GHCR, cost would still be $0 for public images
or bundled with our (nonexistent) GitHub Team subscription.
## What we don't have
- **Image scanning** (Trivy, Snyk) — scan images for known CVEs on push
- **Image signing** (cosign)
- **Multi-region replication** — only hosted in one place
- **High availability** — Gitea is single-instance
For our scale, none of these are needed. TODO (Chapter 20) if the
operator appetite increases.
## Operator cheat sheet
```bash
# List packages via API
curl -sS "https://gitea.treytartt.com/api/v1/packages/admin?type=container" \
-H "Accept: application/json" | jq .
# Browse in UI
# https://gitea.treytartt.com/admin/-/packages
# Delete a specific tag via API
curl -X DELETE \
-H "Authorization: token $GITEA_PAT" \
"https://gitea.treytartt.com/api/v1/packages/admin/container/honeydue-api/237c6b8"
# Login from kubectl side (refresh the Secret)
kubectl create secret docker-registry gitea-credentials -n honeydue \
--docker-server=gitea.treytartt.com \
--docker-username=admin \
--docker-password=<new PAT> \
--dry-run=client -o yaml | kubectl apply -f -
# After rotating PAT, restart pods that use it for pulls
kubectl rollout restart -n honeydue deploy/api deploy/admin deploy/worker
```
## References
- [Gitea Container Registry][gitea-cr]
- [Docker buildx multi-platform][buildx]
- [Kubernetes image pull secrets][pull-secrets]
- [cosign][cosign]
[gitea-cr]: https://docs.gitea.com/usage/packages/container
[buildx]: https://docs.docker.com/build/buildx/
[pull-secrets]: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
[cosign]: https://github.com/sigstore/cosign
+317
View File
@@ -0,0 +1,317 @@
# 12 — Data Flow
## Summary
This chapter follows a user's request end to end, hop by hop. It's the
consolidated picture of Chapters 3, 6, 7, 8, 9 working together. Use
this chapter to answer "when X doesn't work, which layer failed?"
## Scenario: User creates a task
A user in Austin opens the mobile app and adds a new task for their
property. The client sends `POST https://api.myhoneydue.com/api/tasks/`
with a JSON body and an auth token. We trace every hop.
## Hop 1 — Mobile client → Cloudflare edge
```mermaid
sequenceDiagram
participant App as iOS client
participant DNS as Local DNS
participant CFE as Cloudflare edge (DFW)
App->>DNS: Resolve api.myhoneydue.com
DNS->>App: 104.21.13.7 (Cloudflare edge IP)
App->>CFE: TCP SYN :443
CFE-->>App: TCP SYN+ACK
App->>CFE: TLS ClientHello
CFE->>App: TLS ServerHello + cert
Note over App,CFE: TLS 1.3 handshake<br/>~1 RTT
App->>CFE: HTTP/2 stream<br/>POST /api/tasks/<br/>Authorization: Token <xxx>
```
- Client resolves `api.myhoneydue.com` via OS resolver, gets Cloudflare
edge IP (not our origin IP)
- Client establishes TLS 1.3 to CF's nearest POP (Dallas for Austin)
- Cert presented by CF is `sni.cloudflaressl.com` or a CF-issued
`*.myhoneydue.com` — our origin cert is never seen by the client
- Latency: ~515 ms Austin → DFW
## Hop 2 — Cloudflare edge → Origin (hetzner)
```mermaid
sequenceDiagram
participant CFE as Cloudflare DFW POP
participant DNS as CF internal DNS
participant HN as hetzner node (random of 3)
participant Traefik as Traefik pod<br/>(host network)
CFE->>DNS: Which origin for api.myhoneydue.com?
DNS->>CFE: One of 178.104.247.152, 178.105.32.198, 178.104.249.189
CFE->>HN: TCP SYN :80
HN-->>CFE: SYN+ACK
CFE->>HN: HTTP/1.1 POST /api/tasks/<br/>Host: api.myhoneydue.com<br/>X-Forwarded-For: <user IP><br/>X-Forwarded-Proto: https<br/>CF-Connecting-IP: <user IP>
Note over HN: UFW: allow 80/tcp from<br/>anywhere (anywhere for now)
HN->>Traefik: delivered to listener
```
- CF picks one of the 3 node IPs via DNS round-robin. This is per-connection, not per-request.
- Protocol between CF and origin: **HTTP/1.1 plaintext** (SSL=Flexible).
A future Full-strict upgrade would make this HTTPS.
- Latency: ~90120 ms DFW → Nuremberg
- CF adds headers: `CF-Connecting-IP`, `X-Forwarded-For`, `X-Forwarded-Proto`
## Hop 3 — Traefik → api Service
```mermaid
sequenceDiagram
participant Traefik as Traefik pod
participant CoreDNS as CoreDNS (10.43.0.10)
participant KP as kube-proxy IPVS<br/>(kernel)
participant APIPod as api pod<br/>(some node)
Note over Traefik: Match Host: api.myhoneydue.com<br/>→ honeydue-api Ingress<br/>→ backend: api Service :8000
Traefik->>CoreDNS: Resolve "api"
CoreDNS->>Traefik: 10.43.167.83 (Service ClusterIP)
Traefik->>KP: TCP SYN to 10.43.167.83:8000
KP->>KP: IPVS: pick endpoint<br/>from Service endpoint set
KP->>APIPod: Rewrite destination<br/>to 10.42.2.6:8000<br/>(Flannel VXLAN if remote node)
```
- Traefik resolves `api` via CoreDNS → gets the Service ClusterIP
- Traefik sends to `10.43.167.83:8000`
- kube-proxy IPVS (running in-kernel on the node where Traefik lives)
intercepts, picks a live endpoint, rewrites
- Destination might be local (same node) or remote (VXLAN tunnel to
another node)
- Latency: <3 ms even cross-node
## Hop 4 — api → Postgres (Neon)
```mermaid
sequenceDiagram
participant API as api pod (Go)
participant Resolv as Pod resolv.conf
participant Neon as Neon pooler<br/>AWS us-east-1
API->>Resolv: Resolve ep-floral-truth-...-pooler.us-east-1.aws.neon.tech
Note over Resolv: Goes to CoreDNS<br/>which forwards to upstream<br/>(Hetzner's DNS, then public root)
Resolv->>API: Neon pooler IP (e.g., 34.206.177.121)
API->>Neon: TCP :5432
API->>Neon: TLS 1.3 handshake (DB_SSLMODE=require)
API->>Neon: Postgres startup (user, database)
API->>Neon: BEGIN<br/>SELECT ... FROM task_task WHERE residence_id = ?<br/>INSERT INTO task_task (...) VALUES (...)<br/>COMMIT
Neon-->>API: Query results
```
- Go's database/sql pool may already have an idle connection. If so,
skip handshake.
- If new connection: ~50 ms TLS handshake + Postgres startup
- Query itself: typically ~520 ms (single-row read/write on indexed
columns)
- Total for this hop: often <10 ms on a warm connection, ~80 ms cold
## Hop 5 — api → Redis (cache miss invalidation)
```mermaid
sequenceDiagram
participant API as api pod
participant CoreDNS
participant KP as kube-proxy
participant Redis as redis pod
API->>CoreDNS: Resolve "redis"
CoreDNS->>API: 10.43.7.10
API->>KP: TCP :6379
KP->>Redis: rewritten to 10.42.x.y:6379
API->>Redis: DEL tasks:user:<user_id> (invalidate cached list)
Redis-->>API: OK
```
- Redis connection is usually kept alive in the api's pool
- Latency: <1 ms (Redis is on hetzner2, usually a short hop)
## Hop 6 — api → worker (enqueue side effect)
For some task creation events, api enqueues a background job
(send-notification, update-lookup-table, etc.):
```mermaid
sequenceDiagram
participant API as api pod
participant Redis as redis pod (acting as Asynq queue)
participant Worker as worker pod
API->>Redis: RPUSH asynq:queue:default <job JSON>
Redis-->>API: OK
Note over API,Worker: (Async, no response blocking)
Worker->>Redis: BLPOP asynq:queue:default
Redis-->>Worker: <job JSON>
Worker->>Worker: Process job<br/>(send email, push, etc.)
```
api returns to the caller without waiting for the job.
## Hop 7 — Response back to user
Reverse the path:
1. api returns JSON response to Traefik
2. Traefik returns to Cloudflare
3. Cloudflare re-encrypts TLS to user
4. User receives response
## End-to-end latency budget
For a typical "create task" operation:
| Hop | Latency |
|---|---|
| User → CF (Austin → DFW) | 515 ms |
| CF → hetzner (cross-Atlantic) | 90120 ms |
| UFW + kernel + Traefik accept | <1 ms |
| Traefik → api (same or cross-node) | 13 ms |
| api request parsing, auth validation | 13 ms |
| api → Postgres (query) | 2060 ms |
| api → Redis (invalidate) | <1 ms |
| api response generation | 15 ms |
| Return path | same as forward, reversed |
**Total**: ~220310 ms typical. Dominated by the cross-Atlantic CF→origin
hop and the Postgres query round trip.
## Read path (GET /api/tasks/)
Similar but simpler:
```mermaid
sequenceDiagram
participant App as iOS client
participant CF as Cloudflare
participant Traefik
participant API as api pod
participant Redis
participant Neon
App->>CF: GET /api/tasks/
CF->>Traefik: (no cache hit)
Traefik->>API: Route via Service
API->>Redis: GET tasks:user:<user_id>
alt Cache hit
Redis-->>API: cached JSON
else Cache miss
API->>Neon: SELECT ...
Neon-->>API: rows
API->>Redis: SET tasks:user:<user_id> <json> EX 300
end
API-->>Traefik: 200 JSON
Traefik-->>CF: 200
CF-->>App: 200 (may cache per response headers)
```
## Admin panel data flow
A different dance because the admin is Next.js:
```mermaid
sequenceDiagram
participant Browser
participant CF
participant Traefik
participant Admin as admin pod (Next.js)
participant AdminAPI as api pod<br/>(via public URL)
participant Neon
Browser->>CF: GET admin.myhoneydue.com/users
CF->>Traefik: HTTP :80
Traefik->>Admin: Service /users
Note over Admin: Next.js SSR:<br/>fetch from NEXT_PUBLIC_API_URL
Admin->>CF: GET api.myhoneydue.com/api/admin/users/
CF->>Traefik: (api ingress)
Traefik->>AdminAPI: Service
AdminAPI->>Neon: SELECT ... FROM auth_user
Neon-->>AdminAPI: rows
AdminAPI-->>Admin: JSON
Admin->>Admin: Render HTML
Admin-->>Traefik: HTML
Traefik-->>CF: HTML
CF-->>Browser: HTML
```
Notably, the admin pod's calls to api go **back out to Cloudflare** and
in through the public URL. Not the in-cluster Service IP. This is
because `NEXT_PUBLIC_API_URL=https://api.myhoneydue.com` — Next.js builds
use the same URL for browser-side and server-side fetches.
This is **suboptimal** — server-side (SSR) calls could use the internal
`api.honeydue.svc:8000` URL and skip the CF round-trip. Future
optimization: separate `NEXT_PUBLIC_API_URL` (browser) from `API_URL`
(server-side).
## Static asset flow
For the marketing landing page at `https://myhoneydue.com/`:
1. CF caches HTML per `Cache-Control` (the Go app sets short TTLs)
2. CF caches CSS / JS / images aggressively (via default CF rules)
3. First request hits origin, subsequent requests served from CF edge
The static assets live inside the api container at `/app/static/`.
Served by Echo's static file handler at routes `/css`, `/js`, `/images`.
## Request flow during a rolling update
When a new api image is deployed, some requests will hit old pods and
some will hit new pods for a few minutes:
```mermaid
sequenceDiagram
participant CF
participant Traefik
participant OldPod as api pod v1
participant NewPod as api pod v2 (starting)
Note over NewPod: kubelet starts new pod
Note over NewPod: pod connects to Postgres<br/>MigrateWithLock runs (no-op)<br/>HTTP server starts<br/>readinessProbe passes
Note over NewPod: kube-proxy updates endpoints<br/>NewPod added to Service pool
CF->>Traefik: request 1
Traefik->>OldPod: routed (old pod still in pool)
CF->>Traefik: request 2
Traefik->>NewPod: routed (new pod now in pool)
Note over OldPod: Kubelet terminates old pod<br/>(graceful SIGTERM, then SIGKILL after grace)
CF->>Traefik: request 3
Traefik->>NewPod: routed (OldPod gone from pool)
```
Both old and new handle traffic simultaneously until the rolling update
completes. As long as the new code is API-compatible, users don't
notice.
## Failure modes in the data path
See [Chapter 16 — Failure Modes](./16-failure-modes.md) for a full
catalog.
Quick summary:
| Layer fails | User sees | Recovery |
|---|---|---|
| Cloudflare DNS down | Can't resolve api.myhoneydue.com | Manual DNS fallback; extremely rare |
| Cloudflare edge down (single POP) | Slow, CF routes to another POP | Automatic |
| Node NIC fails | Some requests time out (CF routes away) | Cluster reschedules pods |
| UFW misconfig blocks :80 | 521 errors at CF | Re-add rule |
| Traefik pod down on one node | CF routes to other nodes | Automatic |
| kube-proxy broken on one node | Pods on that node can't reach Services | Restart kubelet |
| CoreDNS down | New connections fail DNS | Restart CoreDNS |
| Flannel broken between nodes | Cross-node pod communication fails | Restart flannel or node |
| api pod OOM | 502 to user briefly | kubelet restarts pod |
| Postgres down | 500 errors from api | Neon-side issue; outage |
| Redis down | api serves without cache (degraded) | Restart Redis pod |
| B2 down | Uploads fail, existing content served if cached | Backblaze-side outage |
## References
- [Chapter 3 — Networking](./03-networking.md) for the overlay mechanics
- [Chapter 6 — Traefik](./06-traefik-ingress.md) for routing details
- [Chapter 7 — Services](./07-services.md) for per-service specifics
- [Chapter 16 — Failure Modes](./16-failure-modes.md) for what-if scenarios
+344
View File
@@ -0,0 +1,344 @@
# 13 — Cloudflare
## Summary
Cloudflare sits in front of every public request. It provides DNS
(authoritative nameservers for `myhoneydue.com`), TLS termination at
the edge, DDoS mitigation, caching, and the round-robin fan-out across
our three node IPs. We use the Free plan. TLS mode is "Flexible"
(HTTP between CF and origin). This chapter documents every Cloudflare
setting that matters.
## DNS
### Zone
`myhoneydue.com`, managed by Cloudflare. Authoritative nameservers:
```
carol.ns.cloudflare.com
ishaan.ns.cloudflare.com
```
### Records that matter
| Type | Name | Content | Proxy | Notes |
|---|---|---|---|---|
| A | `api` | 178.104.247.152 | 🟠 Proxied | hetzner1 |
| A | `api` | 178.105.32.198 | 🟠 Proxied | hetzner2 |
| A | `api` | 178.104.249.189 | 🟠 Proxied | hetzner3 |
| A | `admin` | 178.104.247.152 | 🟠 Proxied | same 3 IPs |
| A | `admin` | 178.105.32.198 | 🟠 Proxied | |
| A | `admin` | 178.104.249.189 | 🟠 Proxied | |
| A | `@` | 178.104.247.152 | 🟠 Proxied | same 3 IPs |
| A | `@` | 178.105.32.198 | 🟠 Proxied | |
| A | `@` | 178.104.249.189 | 🟠 Proxied | |
Three A records per name → Cloudflare selects one per request. With
proxying on (orange cloud), **the client never sees these IPs** — it
sees a Cloudflare edge IP. CF internally picks which of the three
origin IPs to connect to; if one fails the connection, CF retries the
next.
**TXT records for email** (Fastmail sending domain): SPF, DKIM, DMARC.
Not our immediate concern; configured by the Fastmail custom-domain
setup.
### Why three A records per name, not one
With one record pointing at hetzner1:
- Only hetzner1 sees traffic
- If hetzner1 is unreachable, everything breaks until we change DNS
With three records:
- CF chooses one origin per connection
- If one node's port :80 stops responding, CF tries the others
- Node upgrades can be done one at a time with no user impact
This is poor-man's load balancing. A Hetzner Load Balancer or Cloudflare
Load Balancer (paid) would be more sophisticated — with active health
checks and automatic failover on sub-second latency. Our DNS approach
is "good enough" for the traffic volume.
### Cloudflare's origin health checks
On Free plan, CF doesn't actively probe origins. It reacts to real
connection failures: if an origin returns 5xx repeatedly or connection
times out, CF marks it unhealthy for that edge POP for some time.
Upgrading to **Cloudflare Load Balancing** ($5/mo add-on) would enable
active health checks — explicit probes independent of traffic. Useful
when you want sub-second failover.
## TLS
### Mode: Flexible
CF Dashboard → SSL/TLS → Overview → **Flexible**.
**What this means:**
- User ↔ Cloudflare: **TLS** (HTTPS)
- Cloudflare ↔ Origin: **plaintext HTTP** (port 80)
**Why we chose it:**
- No origin cert required on the Hetzner nodes
- Zero Traefik cert-management complexity
- Fine for a site where CF terminates all user-facing TLS
**Downsides:**
- An attacker with network access between CF and Hetzner could read
traffic. Realistically: nobody between CF's POPs and Hetzner's
Nuremberg DC, but it's theoretically plaintext on the wire.
- MitM risk if DNS gets hijacked and traffic is routed through an
unintended origin.
### Future: Full (strict)
The next step up is **Full (strict)**: CF verifies origin's TLS cert
and connects over HTTPS. Cloudflare provides free **Origin CA
certificates** for this: they're issued by a CF-internal CA that only
CF's own edge accepts. An attacker without a CF-signed cert can't
impersonate our origin.
Path to enable:
1. Generate Origin CA cert in CF dashboard → SSL/TLS → Origin Server
2. Download as PEM
3. Create k8s Secret `cloudflare-origin-cert`:
```bash
kubectl create secret tls cloudflare-origin-cert -n honeydue \
--cert=origin.crt --key=origin.key
```
4. Add `tls:` block to our Ingress:
```yaml
spec:
tls:
- hosts: [api.myhoneydue.com]
secretName: cloudflare-origin-cert
```
5. Switch CF SSL mode to Full (strict)
Trad-off: the `cloudflare-origin-cert` expires (default 15 years), so
low maintenance. **TODO** (Chapter 20).
### Edge certificate
CF provides a free edge certificate for `*.myhoneydue.com` and
`myhoneydue.com`. Auto-renewed by Cloudflare. We don't touch it.
### Always Use HTTPS
SSL/TLS → Edge Certificates → **Always Use HTTPS: On** (default).
Redirects any HTTP → HTTPS at the CF edge. Clients that hit
`http://api.myhoneydue.com/*` get 301'd to `https://...`. Origin never
sees the HTTP request.
### HSTS
**Not currently enabled.** HSTS (HTTP Strict Transport Security) sends
a header telling browsers "always use HTTPS for this domain." Once set
with long `max-age`, it's **permanent** until it expires — if we later
misconfigure TLS, HSTS-enabled browsers refuse to connect at all.
Enabling HSTS is a TODO but requires confidence in our TLS stability.
Not tonight.
## DDoS mitigation
CF's Free plan includes basic DDoS protection:
- Volumetric attacks absorbed at the edge
- Obvious bot patterns blocked (known-bad user agents, headless browsers
doing suspicious things)
Under a large attack, CF might:
- Insert a "checking your browser" JavaScript challenge (the ~5-second
"Cloudflare is checking your browser" page)
- Rate-limit by IP
Under a sustained, sophisticated attack we might need:
- CF Pro plan ($20/mo) for more rule customization
- Enterprise plan for negotiated protection
- Extra measures like Cloudflare Magic Transit
So far, not needed.
## Caching
Default CF caching:
- Static assets (CSS, JS, images) cached aggressively based on extension
- HTML pages honored per `Cache-Control` headers from origin
- JSON API responses typically not cached (no `Cache-Control: public`)
Our Go API doesn't set `Cache-Control: public` on any endpoint, so CF
treats them as uncacheable. Every API call reaches origin.
If we wanted to cache certain endpoints (e.g., public lookup tables):
```go
c.Response().Header().Set("Cache-Control", "public, max-age=300")
```
And CF will cache for 5 minutes.
## Firewall rules at CF
CF Dashboard → Security → WAF. On Free tier:
- Managed rules: a small free allowlist of "obvious-attack" patterns
- Custom rules: limited (5 on Free, 20 on Pro)
We have **no custom rules defined** currently. The managed ruleset
covers:
- SQL injection attempts in query strings
- Known-vulnerable bot User-Agents
- XSS attempts in common parameters
## Rate limiting
CF Free: **10,000 requests per 10 minutes per IP for free rules** (we
haven't configured any). The API itself should have rate limits for
sensitive endpoints; we don't rely on CF for that.
## What CF does NOT do for us
- **Authenticate users** — our app does
- **Authorize requests** — our app does
- **Encrypt pod-to-pod traffic** — nothing Cloudflare can help with
- **Backup origin data** — CF caches but doesn't store copies
persistently
## Turnstile / bot management
Not enabled. If we start seeing account-creation spam, Cloudflare
Turnstile (free) would be a good addition — a CAPTCHA replacement that
doesn't require user interaction for most traffic.
## Origin IP protection
CF proxying (orange cloud) is the primary protection of our origin IPs.
When proxying is on:
- DNS queries return CF edge IPs, never origin
- HTTP/HTTPS traffic goes through CF
However, our origin IPs **can leak** via:
- Email sending (if the app ever sent email directly from the origin IP)
— we use Fastmail so this isn't an issue
- Outbound connections (our pods connect out to Neon, B2, Fastmail from
the nodes' public IPs; those IPs appear in external logs)
- Historical DNS records (services like SecurityTrails log historical
DNS; if we ever had unproxied A records, attackers can look them up)
**If origin IPs leak**, attackers can bypass CF's protection by
connecting directly to node IPs. Current mitigation:
- UFW only allows :80/:443 from anywhere
- Our app has no ports bound to the public IP
**Future** (Chapter 20): UFW rule to allow :80/:443 only from CF IP
ranges. Prevents direct-connect bypass entirely.
## Cloudflare IP ranges (used in Traefik trustedIPs)
From [cloudflare.com/ips](https://www.cloudflare.com/ips/):
IPv4 ranges:
```
173.245.48.0/20
103.21.244.0/22
103.22.200.0/22
103.31.4.0/22
141.101.64.0/18
108.162.192.0/18
190.93.240.0/20
188.114.96.0/20
197.234.240.0/22
198.41.128.0/17
162.158.0.0/15
104.16.0.0/13
104.24.0.0/14
172.64.0.0/13
131.0.72.0/22
```
IPv6 ranges:
```
2400:cb00::/32
2606:4700::/32
2803:f800::/32
2405:b500::/32
2405:8100::/32
2a06:98c0::/29
2c0f:f248::/32
```
These are used in two places:
1. **Traefik `forwardedHeaders.trustedIPs`** — we already have this
configured (Chapter 6)
2. **UFW `allow 80/tcp from <cf-range>`** — NOT configured (TODO)
CF occasionally adds new ranges. If a future CF range isn't in our
list, we'd either trust unknown IPs (if lax) or reject legitimate CF
traffic (if strict). The canonical source is the public API:
```bash
curl -sS https://www.cloudflare.com/ips-v4
curl -sS https://www.cloudflare.com/ips-v6
```
## API token for programmatic changes
If we automate DNS changes (e.g., adding new subdomain on deploy),
we'd need a CF API token with `Zone:DNS:Edit` scope for the
`myhoneydue.com` zone.
Currently not automated; DNS is managed in the CF dashboard by hand.
## Cost
**$0/mo**. Free plan covers everything we use. Paid plans add features
we don't need yet:
| Feature | Free | Pro ($20) | Business ($200) |
|---|---|---|---|
| DNS + proxying | ✓ | ✓ | ✓ |
| Basic DDoS | ✓ | ✓ | ✓ |
| SSL (edge + Flexible + Full + Full strict) | ✓ | ✓ | ✓ |
| WAF managed rules | ✓ (limited) | ✓ (more) | ✓ (all) |
| Custom firewall rules | 5 | 20 | 100 |
| Page Rules | 3 | 20 | 50 |
| Image Resizing | no | no | ✓ |
| Load Balancing | no | $5/mo add-on | ✓ |
We'd consider Pro ($20/mo) if:
- We needed a custom WAF rule beyond the 5-rule limit
- We wanted Image Resizing for user-uploaded photos
Neither is needed today.
## Operator cheat sheet
```bash
# Query current CF-served DNS
dig +short @1.1.1.1 api.myhoneydue.com # returns CF edge IPs when proxied
# Query our origin directly (bypass CF)
curl -sS -H "Host: api.myhoneydue.com" http://178.104.247.152/api/health/
# Check CF headers (confirm you're going through CF)
curl -sS -I https://api.myhoneydue.com/api/health/ | grep -i cf-
# Purge CF cache (requires API token)
curl -X POST \
-H "Authorization: Bearer $CF_TOKEN" \
-H "Content-Type: application/json" \
"https://api.cloudflare.com/client/v4/zones/<zone_id>/purge_cache" \
-d '{"purge_everything":true}'
```
## References
- [Cloudflare IP ranges][cf-ips]
- [Cloudflare SSL modes explained][cf-ssl]
- [Origin CA certificates][cf-origin-ca]
- [Cloudflare DNS best practices][cf-dns]
[cf-ips]: https://www.cloudflare.com/ips/
[cf-ssl]: https://developers.cloudflare.com/ssl/origin-configuration/ssl-modes/
[cf-origin-ca]: https://developers.cloudflare.com/ssl/origin-configuration/origin-ca/
[cf-dns]: https://developers.cloudflare.com/dns/
+433
View File
@@ -0,0 +1,433 @@
# 14 — Deployment Process
## Summary
A production deploy is: build a new image, push to Gitea, update the
Deployment's image field with the new SHA, Kubernetes rolls new pods in.
No downtime if the change is backward-compatible. Rollback is
`kubectl rollout undo`. This chapter walks through the full process,
plus alternate paths (config-only changes, manifest changes, hotfixes).
## TL;DR for a code change
```bash
# 1. Commit + get SHA
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
git add . && git commit -m "..." && SHA=$(git rev-parse --short HEAD)
# 2. Login to Gitea registry
set -a; source deploy/registry.env; set +a
printf '%s' "$REGISTRY_TOKEN" | docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin
# 3. Build + push amd64 image
docker buildx build --platform linux/amd64 --target api \
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .
# 4. Roll it in
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
kubectl set image deployment/api -n honeydue \
api="gitea.treytartt.com/admin/honeydue-api:${SHA}"
# 5. Watch
kubectl rollout status -n honeydue deployment/api
# 6. Log out
docker logout "$REGISTRY"
```
~35 minutes end to end for api.
## The build
### Step 1 — Prepare
```bash
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
git status # clean working tree?
git log -1 --oneline # this is the SHA that'll ship
```
### Step 2 — Login to Gitea
```bash
set -a; source deploy/registry.env; set +a
printf '%s' "$REGISTRY_TOKEN" | \
docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin
```
**Note**: `docker login` without `--password-stdin` writes the token to
shell history. Don't skip the `printf` trick.
### Step 3 — Build + push
```bash
SHA=$(git rev-parse --short HEAD)
# For API
docker buildx build \
--platform linux/amd64 \
--target api \
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" \
--push .
# For Worker
docker buildx build \
--platform linux/amd64 \
--target worker \
-t "gitea.treytartt.com/admin/honeydue-worker:${SHA}" \
--push .
# For Admin (Next.js)
docker buildx build \
--platform linux/amd64 \
--target admin \
-t "gitea.treytartt.com/admin/honeydue-admin:${SHA}" \
--push .
```
- `--platform linux/amd64` — cross-compile from operator's arm64 to
Hetzner nodes' amd64
- `--target X` — select a stage from the multi-stage Dockerfile
- `--push` — push to registry in one step; don't leave image in local
Docker
First build is slow (~35 min cold). Subsequent builds hit BuildKit
layer cache and complete in ~3060s if only app code changed.
### Build platform note
If `docker buildx` isn't configured:
```bash
docker buildx create --name honeydue-builder --use
docker buildx inspect --bootstrap
```
This creates a BuildKit container that supports cross-platform builds.
The `--bootstrap` line spins it up immediately so errors surface now
instead of on first build.
## The deploy
### For a single service
```bash
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
kubectl set image deployment/api -n honeydue \
api="gitea.treytartt.com/admin/honeydue-api:${SHA}"
```
This updates the Deployment's image field. Kubernetes:
1. Creates a new ReplicaSet with the new image (annotation records
rev)
2. Starts a new pod (per `maxSurge: 1`)
3. Waits for readinessProbe to pass on the new pod (up to 240s for
cold api boot)
4. Once ready, removes a pod from the old ReplicaSet
5. Repeats until all pods are on the new ReplicaSet
6. Marks rollout complete
### Watching the rollout
```bash
kubectl rollout status -n honeydue deployment/api
```
Outputs progress; returns when complete or timed out. Default timeout
is 10 minutes.
More detailed:
```bash
# Watch pods transition
kubectl get pods -n honeydue -l app.kubernetes.io/name=api -w
# Watch events
kubectl get events -n honeydue --sort-by=.lastTimestamp -w
```
### For all three services
```bash
for svc in api worker admin; do
kubectl set image deployment/$svc -n honeydue \
$svc="gitea.treytartt.com/admin/honeydue-${svc}:${SHA}"
done
# Watch all rollouts
for svc in api worker admin; do
kubectl rollout status -n honeydue deployment/$svc
done
```
## Config-only changes (no new image)
When you change `prod.env` but code is unchanged:
```bash
# 1. Update prod.env locally
# 2. Regenerate ConfigMap
kubectl create configmap honeydue-config -n honeydue \
--from-env-file=deploy/prod.env \
--dry-run=client -o yaml | kubectl apply -f -
# 3. Pods do NOT auto-reload env vars. Restart them.
kubectl rollout restart -n honeydue deployment/api deployment/admin deployment/worker
```
`rollout restart` triggers a rolling update with the *same* image but
forces pod recreation. New pods pick up the updated ConfigMap.
### Why not auto-reload?
Kubernetes has no built-in mechanism to restart pods on ConfigMap change.
There's no `envFromWatch` equivalent. Third-party operators like
Reloader can do it, but we don't run one.
For sensitive config (like the `SECRET_KEY`), this is actually good —
pods don't cycle unexpectedly when someone tweaks the ConfigMap.
## Secret changes
Same flow as config:
```bash
# Rotate a value
kubectl patch secret honeydue-secrets -n honeydue \
--type=merge -p "{\"data\":{\"SECRET_KEY\":\"$(echo -n 'newvalue' | base64)\"}}"
# Restart pods
kubectl rollout restart -n honeydue deployment/api deployment/worker
```
## Manifest changes
When you add/modify a deployment YAML:
```bash
kubectl apply -f deploy-k3s/manifests/api/deployment.yaml
```
If the change is a spec field that Kubernetes considers a new pod
template (e.g., changing resource limits, env, volumes), pods roll.
If the change is a scalar like replicas, no pod churn — just new pods
added/removed.
## Rollback
### Last-known-good rollback
```bash
kubectl rollout undo deployment/api -n honeydue
```
Reverts to the previous ReplicaSet (the one with the previous image).
Takes ~30s to stabilize.
### Rollback to a specific revision
```bash
# See revision history
kubectl rollout history deployment/api -n honeydue
# Revert to specific revision number
kubectl rollout undo deployment/api -n honeydue --to-revision=3
```
Kubernetes keeps up to 10 ReplicaSet revisions by default
(`spec.revisionHistoryLimit`).
### Hard rollback (deploy an older image)
```bash
kubectl set image deployment/api -n honeydue \
api="gitea.treytartt.com/admin/honeydue-api:<older-sha>"
```
Useful when you want to go back further than the revision history, or
to a specific known-good SHA.
## Rolling update semantics
```yaml
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
```
For api (3 replicas):
- `maxUnavailable: 0` — no pod is removed until replacement is ready
- `maxSurge: 1` — up to 4 pods exist simultaneously during rollout
Timeline (approximate, warm state):
- t=0: kubectl set image
- t=0: k8s creates new RS with 1 pod
- t=30s (or so): new pod readiness probe passes
- t=30s: k8s terminates 1 old pod
- t=60s: next new pod ready
- t=60s: another old pod terminates
- ...continues until all on new RS
For cold-boot (e.g., first deploy on a rebuilt cluster), the
MigrateWithLock advisory lock extends this to several minutes. But the
rollout is serialized — only one pod starts per iteration, so the lock
queue is small.
## Hotfix workflow
When we need to ship a fix fast and skip the usual steps:
1. Fix in code
2. Build + push
3. `kubectl set image` on the affected service only
4. Monitor with `kubectl logs -f`
Don't skip CI/tests in a real org; for solo operator this is the tradeoff.
## Integration with Gitea
Currently no CI/CD. The operator builds from the workstation and pushes
manually. Future:
- Gitea Actions (Drone-like CI) could trigger on push to `main`
- Build + push step could run in a GitHub Actions-compatible workflow
- Auto-deploy on tag push, manual promote to prod
**TODO** (Chapter 20).
## What the old Swarm deploy script did
Contrast: `deploy/scripts/deploy_prod.sh` (Swarm-era) did:
1. Validate every config file (placeholder detection, APNS key format,
B2 all-or-none)
2. Buildx to amd64
3. Push to Gitea (we retrofitted this from GHCR)
4. SCP bundle to manager node
5. `docker secret create` + `docker config create` with versioned names
6. `docker stack deploy --with-registry-auth`
7. Poll stack services until convergence (420s timeout)
8. Prune old secret/config versions
9. Healthcheck the final URL; auto-rollback on failure
10. Log out of registries
Our current k3s deploy is more manual but simpler. We'd write a similar
script for k3s if deploys become frequent:
```bash
# deploy-k3s/scripts/04-deploy.sh (not yet updated for Gitea)
```
See the scaffold in `deploy-k3s/scripts/`.
## Common deploy failures
| Symptom | Likely cause |
|---|---|
| `ImagePullBackOff` | Image not in registry, or pull secret expired |
| Stuck at "Progressing" | Readiness probe not passing; check pod logs |
| `CrashLoopBackOff` immediately | App won't start; check pod logs for panic/exit reason |
| `CrashLoopBackOff` after migration | Cache service, Redis connection, or post-init code issue |
| Old pods never terminate | New pods not ready; rollout doesn't progress |
| Rollout succeeds but app is broken | Readiness probe is too lenient; passes on broken app |
### Debugging commands
```bash
# Describe the deployment (shows events, conditions)
kubectl describe deployment api -n honeydue
# Describe the latest pod
kubectl describe pod -n honeydue -l app.kubernetes.io/name=api
# Logs from currently-running pods
kubectl logs -n honeydue -l app.kubernetes.io/name=api --tail=100 --prefix
# Logs from the last-terminated pod
kubectl logs -n honeydue <pod> --previous
# Events in the namespace (newest first)
kubectl get events -n honeydue --sort-by=.lastTimestamp
# Pause a rollout (stops new pods from being created)
kubectl rollout pause deployment/api -n honeydue
# Resume
kubectl rollout resume deployment/api -n honeydue
```
## Zero-downtime considerations
For zero-downtime deploys, the new image must be:
1. **Backward-compatible** with the current database schema (schema
migrations run before new code)
2. **Backward-compatible** with in-flight API requests (don't remove
endpoints mid-deploy; deprecate first)
3. **Backward-compatible** with Redis data structures (don't change
cache key formats abruptly)
For breaking changes:
1. Deploy intermediate version that handles both old and new
2. Once rolled out everywhere, deploy breaking-change version
3. Two deploys, same day or different days
We don't have this discipline yet; our API has too few clients to
worry about. As mobile clients proliferate, this becomes more important.
## Blue-green / canary (not yet)
Kubernetes supports advanced rollout strategies:
- **Canary**: route 5% of traffic to new version, scale up gradually
- **Blue-green**: run new version alongside old, flip traffic all at
once
These require Traefik's TraefikService CRD with weighted routing, or
a service mesh. **TODO** if traffic scale justifies.
## Cleanup: the old Swarm config
`deploy/` directory contains the Swarm-era config. It's still there but
unused. After we're confident in k3s (a few weeks? month?), remove it:
```bash
rm -rf deploy/
```
Keep the useful files in `deploy-k3s/` only.
## Operator cheat sheet
```bash
# Full build + deploy
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
SHA=$(git rev-parse --short HEAD)
set -a; source deploy/registry.env; set +a
printf '%s' "$REGISTRY_TOKEN" | docker login "$REGISTRY" -u admin --password-stdin
docker buildx build --platform linux/amd64 --target api -t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .
docker buildx build --platform linux/amd64 --target worker -t "gitea.treytartt.com/admin/honeydue-worker:${SHA}" --push .
docker buildx build --platform linux/amd64 --target admin -t "gitea.treytartt.com/admin/honeydue-admin:${SHA}" --push .
docker logout gitea.treytartt.com
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
for svc in api worker admin; do
kubectl set image deployment/$svc -n honeydue "$svc=gitea.treytartt.com/admin/honeydue-${svc}:${SHA}"
done
for svc in api worker admin; do
kubectl rollout status -n honeydue deployment/$svc
done
```
## References
- [Kubernetes Deployment rolling update][rolling]
- [kubectl rollout][rollout]
- [Docker buildx][buildx]
[rolling]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment
[rollout]: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#rollout
[buildx]: https://docs.docker.com/build/buildx/
+305
View File
@@ -0,0 +1,305 @@
# 15 — Observability
## Summary
We have minimal observability today: `kubectl logs`, `kubectl top`,
Cloudflare Analytics, and the Neon dashboard. No Prometheus, no Grafana,
no centralized log aggregator, no APM. This is adequate for the
current traffic volume (low) but is a known gap. This chapter documents
what we *have* and what we'd add as traffic grows.
## What we have
### 1. `kubectl logs`
Every container's stdout/stderr is captured by containerd and readable
via kubectl:
```bash
# Live tail from all api pods
kubectl logs -n honeydue -l app.kubernetes.io/name=api -f --prefix
# Last 100 lines
kubectl logs -n honeydue -l app.kubernetes.io/name=api --tail=100
# Previous pod's logs (before the most recent restart)
kubectl logs -n honeydue <pod-name> --previous
# Events (not logs — k8s-level state changes)
kubectl get events -n honeydue --sort-by=.lastTimestamp
```
**Retention**: containerd rotates logs when they exceed 10 MB (default).
Only the last ~20 MB of logs is retained per container, on-disk on the
node. Once a pod is deleted, its logs are gone.
For persistent log access we'd need aggregation (see §what we'd add).
### 2. `kubectl top`
Pod and node resource usage via metrics-server:
```bash
kubectl top nodes
# NAME CPU(cores) CPU(%) MEMORY(bytes) MEMORY(%)
# ubuntu-8gb-nbg1-1 169m 4% 748Mi 9%
# ubuntu-8gb-nbg1-2 229m 5% 1043Mi 13%
# ubuntu-8gb-nbg1-3 124m 3% 770Mi 9%
kubectl top pods -n honeydue
```
**Retention**: In-memory only. Last few minutes of data. No
historical view.
### 3. Cloudflare Analytics
CF Dashboard → Analytics & Logs. Per-zone stats:
- Requests per second
- Bandwidth
- Cache hit ratio
- Top HTTP status codes
- Top request paths
- Bot traffic score
All aggregated, no individual request traces. Good for spotting macro
trends ("suddenly 10× more 502s today"), poor for debugging specific
issues.
Free tier retention: 7 days of aggregate stats. Pro extends this.
### 4. Neon dashboard
Neon console → project → Monitoring:
- Compute utilization (CU-hours consumed)
- Query performance (slow queries)
- Active connections
- Storage usage
Good for "is the DB busy?" and "am I close to my free tier limit?"
Not real-time.
### 5. Kubernetes events
`kubectl get events` shows cluster-level state changes: pod scheduling,
failures, image pulls, probe failures. Useful for post-mortem on
deploys.
Retention: events are stored in etcd but default to 1 hour.
## What we don't have (the gap)
### No log aggregation
Individual pod logs are on the node. For multi-pod debugging ("show me
all api pod logs for user X") we have to:
```bash
# Query all at once with stern (if installed)
stern -n honeydue api
# Or for specific pod
kubectl logs -n honeydue <pod> | grep user_id=12345
```
This works but doesn't scale. Grep across 3 pods for a specific
user_id is OK. Across 30 pods, intractable.
**What we'd add**: [Loki](https://grafana.com/oss/loki/) — a lightweight
log aggregator designed for k8s. ~$0 to self-host; integrates with
Grafana for queries. Or [Betterstack](https://betterstack.com/logs)
($10/mo, hosted).
### No metrics/dashboards
`kubectl top` tells us "is this pod hot right now?" but not "has CPU
been climbing over the past hour?" We'd need:
- **Prometheus** — scrapes metrics from kubelet and pods' `/metrics`
endpoints, stores time series
- **Grafana** — queries Prometheus, renders dashboards
K3s can install these via Helm in ~10 minutes. Adds ~500MB RAM to the
cluster. Stability and operational load: moderate.
**Alternative**: [Kubernetes Dashboard](https://github.com/kubernetes/dashboard)
bundled with k3s (disabled by default). Minimal UI over the existing
metrics API. Cheaper than Prometheus but less queryable.
### No distributed tracing
"This request took 800ms — which hop was slow?" is currently unanswerable
beyond "the DB query, probably." A real trace would show:
- TLS handshake time
- Traefik routing time
- Go handler time
- Postgres query time
- Redis call time
- Each B2 request time
We'd add OpenTelemetry to the Go app and export to Jaeger/Tempo. Work
is moderate; value kicks in when we have complex request flows.
### No alerting
No PagerDuty, no Slack webhooks, no email on "api is returning 500s."
The operator finds out when users complain.
Cheapest fix: [Uptime Kuma](https://github.com/louislam/uptime-kuma)
(self-hosted) or Better Stack Uptime (free for small teams). Ping
`https://api.myhoneydue.com/api/health/` every minute; alert if it fails.
### No APM (Application Performance Monitoring)
No request-level profiling. We can't see "which endpoint has the highest
p99 latency?" or "which SQL query is hot this week?"
Options: Datadog, New Relic, Honeycomb, self-hosted Tempo+Grafana.
All are meaningful work to set up and cost $$$.
## The app's logging conventions
The Go app uses zerolog and emits structured JSON:
```json
{
"level": "info",
"time": "2026-04-24T05:29:40Z",
"caller": "/app/cmd/api/main.go:189",
"addr": ":8000",
"message": "HTTP server listening"
}
```
Log levels: `debug`, `info`, `warn`, `error`, `fatal`. Controlled by
`DEBUG=true|false` in ConfigMap (true sets level to debug, false sets
level to info).
Every request is logged with:
- Method, path, status code
- Request ID (for correlating logs across pods)
- User ID (if authenticated)
- Latency
```json
{
"level": "info",
"method": "GET",
"path": "/api/tasks/",
"status": 200,
"latency_ms": 42,
"user_id": 123,
"request_id": "a6b5db35-..."
}
```
This is queryable by grep. Better with log aggregation.
## Health endpoints
Each service exposes a health endpoint:
| Service | Endpoint | What it checks |
|---|---|---|
| api | `/api/health/` | Process alive (doesn't verify DB) |
| admin | `/` | Next.js is up |
| worker | (none public) | Internal Asynq status |
Health endpoints are **shallow** — they return 200 if the process is
running and listening. They don't try to reach Postgres/Redis/etc.
Rationale: if Postgres is briefly down, we don't want all api pods to
start failing liveness and cascade-restart.
## Dozzle (deprecated)
The Swarm era had [Dozzle](https://github.com/amir20/dozzle) — a
lightweight web UI for Docker logs. Accessible via SSH tunnel to the
manager node. Not deployed on k3s; `kubectl logs` + `stern` fills the
niche.
## Kubernetes metrics the k8s API exposes
Even without Prometheus, these are queryable:
```bash
# Resource metrics (via metrics-server)
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/honeydue/pods
# Core API (k8s state)
kubectl get --raw /api/v1/namespaces/honeydue/pods/<name>
# Kubelet metrics (per-node; requires tunneling)
kubectl get --raw /api/v1/nodes/<node>/proxy/metrics
```
If we ever spin up Prometheus, these are the endpoints it would scrape.
## Future: what to add and when
| Trigger | Add |
|---|---|
| 10k+ daily users | Loki + Grafana for logs |
| 100+ req/s sustained | Prometheus + Grafana for metrics |
| Performance incidents | OpenTelemetry tracing |
| Revenue > $5k/mo | Paid monitoring (Datadog or similar) |
| First production outage | Alerting to phone/Slack |
The overall philosophy: observability is an investment that compounds.
Add it before you need it, not after. But also don't over-invest at
idle.
**Next quarter**: set up Uptime Kuma + Loki at minimum.
## Checking what's installed
```bash
# In kube-system namespace
kubectl get pods -n kube-system
# Should see: coredns, metrics-server, traefik, local-path-provisioner,
# and some k3s-related helm install jobs
# In honeydue namespace
kubectl get pods -n honeydue
# api, admin, worker, redis
# No monitoring namespace (yet)
kubectl get namespaces
# default, honeydue, kube-node-lease, kube-public, kube-system
```
## Operator cheat sheet
```bash
# Tail all logs in the namespace
kubectl logs -n honeydue --all-containers=true --tail=50 -l app.kubernetes.io/part-of=honeydue
# With stern (if installed: brew install stern)
stern -n honeydue .
# Follow specific pod, including previous runs
kubectl logs -n honeydue <pod> -f --previous=false
# Pod resource usage
kubectl top pods -n honeydue --sort-by=memory
kubectl top pods -n honeydue --sort-by=cpu
# Events (cluster-wide)
kubectl get events -A --sort-by=.lastTimestamp | tail -20
# Full state dump for a pod (debugging)
kubectl describe pod -n honeydue <pod> > /tmp/pod-dump.txt
kubectl logs -n honeydue <pod> > /tmp/pod-logs.txt
```
## References
- [Kubernetes metrics-server][ms]
- [K3s metrics][k3s-metrics]
- [Loki][loki]
- [Stern (multi-pod log tail)][stern]
[ms]: https://github.com/kubernetes-sigs/metrics-server
[k3s-metrics]: https://docs.k3s.io/advanced#enabling-metrics-server
[loki]: https://grafana.com/oss/loki/
[stern]: https://github.com/stern/stern
+360
View File
@@ -0,0 +1,360 @@
# 16 — Failure Modes
## Summary
Every component in the system has a failure mode, a user-visible
symptom, and a recovery story. This chapter enumerates them from the
edge inward. Use this as a reference when debugging or when planning
resilience improvements.
## Failure catalog
### Cloudflare-level
#### CF edge POP outage
**Symptom**: users in one geographic region see errors; other regions
fine.
**Recovery**: automatic — CF routes traffic to next-nearest POP.
**Our action**: none; wait for CF.
**Frequency**: rare, usually resolved in minutes.
#### CF global outage (rare but has happened)
**Symptom**: the whole site unreachable via CF.
**Recovery**: manual — disable CF proxy (grey cloud DNS records), users
hit origins directly.
**Our action**: in Cloudflare dashboard, flip each A record's proxy off.
Users then resolve to our node IPs directly; UFW allows :80/:443 from
anywhere so they reach Traefik. TLS breaks (origin has no cert in SSL
Flexible mode), but HTTP works.
**Frequency**: extremely rare (hours-long event happens ~annually).
#### DNS hijacking
**Symptom**: users' DNS queries return attacker IPs; all traffic
compromised.
**Mitigation**: unlikely at CF; users who use DoH/DoT are protected.
No mitigation at our level.
**Recovery**: requires CF incident response.
### Node-level
#### One node's NIC fails
**Symptom**: Cloudflare's retry logic routes around it within seconds.
Users see a brief spike in latency as CF learns the IP is unhealthy.
Pods on that node get rescheduled to surviving nodes by Kubernetes
after `node-monitor-grace-period` (40s).
**Recovery**:
- Automatic pod rescheduling takes ~5 min (grace period + pod eviction)
- Dead node's Raft vote is missing; cluster stays up (2 of 3 quorum)
- Replace the node via Hetzner console when convenient
**Our action**: verify `kubectl get nodes` shows NotReady; check
Hetzner console to confirm the node's status; recreate if needed.
#### Two nodes fail simultaneously
**Symptom**: Raft loses quorum. Kubernetes API server rejects writes.
Existing pods keep running but nothing new can be scheduled/updated.
Single surviving node's pods continue serving traffic.
**Recovery**:
- If a failed node comes back within Raft's leader-election timeout
(seconds to minutes), quorum restores
- If failed nodes are truly gone, the cluster is broken — need to
rebuild
**Rebuild procedure**: from the surviving node, `k3s-killall.sh`, then
bootstrap a new 3-node cluster from scratch. Data in Neon/B2 is safe;
Redis state is lost.
#### All three nodes fail simultaneously
**Symptom**: full site outage.
**Recovery**: rebuild the cluster from scratch.
**Frequency**: Hetzner-region-wide outage, extremely rare.
#### Node disk fills up
**Symptom**: pods get evicted ("node is disk-pressure"). Containers
can't be scheduled on that node.
**Common cause**: container log buildup (containerd rotates at 10 MB
per container but across dozens of pod churn cycles, total fills up),
local-path PVC fills up, apt cache.
**Recovery**:
```bash
ssh deploy@<node> "sudo df -h; sudo du -sh /var/lib/rancher/* | sort -h"
# Then clean up
```
### k3s control plane failures
#### etcd corruption on one node
**Symptom**: Raft detects divergence; that node stops serving writes.
**Recovery**: remove the node from the cluster, rejoin. Etcd snapshot
is pulled from surviving peers automatically.
#### CoreDNS down
**Symptom**: pods can't resolve Service names. New TCP connections
fail; existing connections continue (they already resolved).
Typical manifestation: "DB connection failed — no such host" errors.
**Recovery**: k3s automatically restarts CoreDNS pod. If it
keeps crashing:
```bash
kubectl logs -n kube-system deploy/coredns --previous
kubectl rollout restart deployment/coredns -n kube-system
```
**Frequency**: rare.
#### metrics-server down
**Symptom**: `kubectl top` returns an error; HPAs can't scale.
**Recovery**: restart metrics-server pod. Non-critical; service stays up.
```bash
kubectl rollout restart deployment/metrics-server -n kube-system
```
### Networking failures
#### UFW rule accidentally blocks essential traffic
**Symptom**: Some specific thing stops working (e.g., api can't reach
Postgres, cross-node pod traffic fails, kubectl times out).
**Recovery**: log in via SSH (if that still works), `sudo ufw status
numbered`, `sudo ufw --force delete <N>` to remove offending rule.
**If SSH is blocked too**: Hetzner console → Rescue mode → mount disk
→ edit `/etc/ufw/user.rules`.
#### Flannel broken on one node
**Symptom**: pods on that node can't reach remote pods via overlay.
ClusterIP Services involving cross-node endpoints fail.
**Recovery**: restart kubelet on that node:
```bash
ssh deploy@<node> "sudo systemctl restart k3s"
```
#### Kube-proxy broken on one node
**Symptom**: pods on that node can't reach ClusterIPs. Symptoms look
like DNS resolution succeeded but connection refused or timed out.
**Recovery**: same as Flannel — restart k3s on the node.
### Application-level
#### api pod OOM
**Symptom**: pod gets killed, kubelet restarts it. User's request
returns 502 briefly; subsequent requests routed to healthy pods.
Readiness probe removes the OOMing pod from Service endpoints.
**Recovery**: automatic (pod restarts). If it keeps OOMing:
- Increase `resources.limits.memory` in the deployment
- Or debug the memory leak
**Check**:
```bash
kubectl describe pod -n honeydue <pod> | grep -i oom
kubectl logs -n honeydue <pod> --previous
```
#### api pod panics
**Symptom**: goroutine panic kills the process. Kubelet restarts.
Similar user impact to OOM.
**Recovery**: automatic restart. But if the panic is deterministic
(same input → panic), the pod crashloops.
**Action**: read the logs, find the panic stack trace, fix the code,
deploy.
**Circuit-breaker scenario**: if all 3 api pods crashloop on startup
because of bad code, kubectl rollout undo to previous revision.
#### api deadlocks
**Symptom**: all 3 pods are up, readiness passes (shallow probe), but
real requests time out or hang.
**Recovery**: liveness probe is the same endpoint as readiness, so it
won't help. You'll see gradually increasing 504s at the edge. Manual
intervention:
```bash
kubectl rollout restart deployment/api -n honeydue
```
#### admin pod crashes
**Symptom**: 502 at Cloudflare when accessing admin.myhoneydue.com.
**Recovery**: k8s auto-restarts. Usually within 10-30s.
**Impact**: only admins lose access; user-facing api is unaffected.
#### worker stops processing jobs
**Symptom**: emails stop being sent, cron jobs stop firing.
**Detection**: no direct alert; need to notice via user feedback or
missing daily-digest emails. Or check Redis for queue backlog.
**Recovery**:
```bash
kubectl rollout restart deployment/worker -n honeydue
```
**If persistent**: check logs for specific error:
```bash
kubectl logs -n honeydue deploy/worker --tail=100
```
#### redis pod dies + node is different
**Symptom**: Redis schedules to a new node, but the PVC is on the
original node (local-path is per-node). New Redis pod comes up but
finds an empty data directory (or can't mount at all).
**Recovery**:
- If the original node is still alive but Redis pod died: pod comes
back up on same node with data intact
- If the original node is gone: Redis starts empty. Cache regenerates.
Asynq queue state is lost; pending jobs re-queue on retry, cron
fires re-schedule on next tick.
- Ensure the node label `honeydue/redis=true` is on a healthy node:
```bash
kubectl label node <new-node> honeydue/redis=true --overwrite
kubectl label node <dead-node> honeydue/redis- 2>/dev/null || true
```
### External service failures
#### Neon Postgres outage
**Symptom**: api logs fill with "failed to connect to database." All
mutating API calls fail. Reads from cache continue (via Redis) but
eventually cache expires.
**Recovery**: no action from us; Neon's problem. Users will see 5xx
until Neon is back.
**Mitigation for future**: multi-region Neon read replica, or
Postgres-level failover.
**Frequency**: Neon has had a handful of hours-scale outages since launch.
#### Backblaze B2 outage
**Symptom**: image uploads fail; image downloads fail unless cached by
CF.
**Recovery**: wait. B2 rarely goes down.
**Mitigation**: serve downloads via CF with long cache TTL — most
users won't notice brief B2 outages for read traffic.
#### Fastmail SMTP unreachable
**Symptom**: `worker` can't send transactional emails. Jobs retry per
Asynq's retry policy, eventually giving up and logging an error.
**Recovery**: automatic retry; wait for Fastmail to come back.
**Manual intervention**: re-enqueue jobs from the Asynq UI (we don't
expose it yet — future).
#### Gitea registry unreachable
**Symptom**: `kubectl rollout` stuck at "Pulling image" for new pods.
Existing pods continue running with their already-pulled images.
**Recovery**: wait for Gitea to come back.
**Mitigation**: K8s has `imagePullPolicy: IfNotPresent` by default on
SHA-tagged images, so images aren't re-pulled on every restart if
the node already has them cached.
#### Cloudflare DNS failure
See §CF failures above.
## Combined failures
### "Everything is slow"
Most often = Neon is being hammered by our load + someone else's noisy
neighbor.
- Check `kubectl top pods` (are we CPU-bound?)
- Check Neon console for query performance
- Check CF analytics for traffic spikes
### "Some users see 502, others don't"
Usually one node has an unhealthy Traefik or api. Cloudflare routes
some connections to it, others to healthy nodes.
- `kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik`
- `kubectl get pods -n honeydue -l app.kubernetes.io/name=api`
- Check per-pod logs
### "It worked 5 minutes ago, now it doesn't"
Something recent changed. Check:
- Recent deploys: `kubectl rollout history deployment/api -n honeydue`
- Recent manifest changes: `kubectl get events -A --sort-by=.lastTimestamp | tail -30`
- External: Cloudflare Status page, Neon Status page, Backblaze Status page
## Planned outages
### Node upgrades (OS patches)
```bash
# Drain the node (evict pods, block scheduling)
kubectl drain ubuntu-8gb-nbg1-1 --ignore-daemonsets --delete-emptydir-data
# SSH in, upgrade, reboot
ssh deploy@hetzner2 "sudo apt update && sudo apt upgrade -y && sudo reboot"
# Wait for node to come back
watch kubectl get nodes
# Uncordon
kubectl uncordon ubuntu-8gb-nbg1-1
```
During the drain, pods from that node reschedule to the survivors.
With current workload (api: 3 replicas, everything else: 1), rescheduling
1 api pod is fine. Traffic loss: zero.
Worker pod or Redis pod scheduled on the drained node would be
briefly unavailable during reschedule. Acceptable for planned windows.
### k3s upgrades
Same per-node drain + upgrade pattern, but with k3s-specific install:
```bash
# On the node
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.35.x+k3s1 sh -s - server
# k3s detects existing install and upgrades in place
```
Do one node at a time. Verify cluster health between each.
## Disaster recovery
### Complete cluster loss
Procedure:
1. Provision 3 new Hetzner CX33 nodes (or use existing if healthy)
2. Follow bootstrap procedure (Chapter 1 §node hardening)
3. Install k3s on each (Chapter 2 §HA architecture)
4. Configure kubeconfig
5. Apply all manifests:
```bash
kubectl apply -f deploy-k3s/manifests/namespace.yaml
kubectl apply -f deploy-k3s/manifests/rbac.yaml
kubectl apply -f deploy-k3s/manifests/traefik-helmchartconfig.yaml
# Wait for Traefik to redeploy
# ... recreate secrets (see Chapter 10) ...
# ... apply rest of manifests ...
```
6. Update DNS if node IPs changed
7. Verify: curl https://api.myhoneydue.com/api/health/
Estimated time: **1-2 hours** if you've done it before. A lot of
context-switching between Hetzner console, SSH, kubectl, and CF.
Neon data is untouched by any of this. B2 data is untouched. Only
state that's lost: Redis cache (regenerates) and any in-flight Asynq
jobs that were mid-processing.
## References
- [Kubernetes pod lifecycle][lifecycle]
- [K3s HA recovery][k3s-ha-recovery]
- [Hetzner rescue system][hetzner-rescue]
[lifecycle]: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/
[k3s-ha-recovery]: https://docs.k3s.io/datastore/ha-embedded#new-cluster-with-embedded-db
[hetzner-rescue]: https://docs.hetzner.com/cloud/servers/getting-started/enabling-rescue-system/
+369
View File
@@ -0,0 +1,369 @@
# 17 — Operator Runbook
## Summary
Common procedures the operator runs. Each is a numbered sequence of
exact commands. If a step is unclear, add a comment; if a procedure
fails in an unexpected way, add the symptom + fix to this document.
## Environment setup
Every command assumes:
```bash
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
```
If you see "Unable to connect to the server," the kubeconfig isn't set.
## 1. Check cluster health
```bash
kubectl get nodes # all 3 Ready?
kubectl get pods -A | grep -vE 'Running|Completed' # anything not running?
kubectl top nodes # resource usage
kubectl get events -A --sort-by=.lastTimestamp | tail -20
```
## 2. Deploy new code
### Full deploy (all three services)
```bash
SHA=$(git rev-parse --short HEAD)
# Login
set -a; source deploy/registry.env; set +a
printf '%s' "$REGISTRY_TOKEN" | \
docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin
# Build
docker buildx build --platform linux/amd64 --target api \
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .
docker buildx build --platform linux/amd64 --target worker \
-t "gitea.treytartt.com/admin/honeydue-worker:${SHA}" --push .
docker buildx build --platform linux/amd64 --target admin \
-t "gitea.treytartt.com/admin/honeydue-admin:${SHA}" --push .
# Apply
for svc in api worker admin; do
kubectl set image deployment/$svc -n honeydue \
"$svc=gitea.treytartt.com/admin/honeydue-${svc}:${SHA}"
done
# Watch
for svc in api worker admin; do
kubectl rollout status -n honeydue deployment/$svc
done
# Logout
docker logout gitea.treytartt.com
```
### Single service
```bash
SHA=$(git rev-parse --short HEAD)
set -a; source deploy/registry.env; set +a
printf '%s' "$REGISTRY_TOKEN" | docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin
docker buildx build --platform linux/amd64 --target api \
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .
kubectl set image deployment/api -n honeydue \
api="gitea.treytartt.com/admin/honeydue-api:${SHA}"
kubectl rollout status -n honeydue deployment/api
docker logout "$REGISTRY"
```
## 3. Rollback
### Last good
```bash
kubectl rollout undo deployment/api -n honeydue
kubectl rollout status -n honeydue deployment/api
```
### Specific SHA
```bash
kubectl set image deployment/api -n honeydue \
api="gitea.treytartt.com/admin/honeydue-api:<sha>"
```
## 4. Read logs
```bash
# Follow all api pod logs
kubectl logs -n honeydue -l app.kubernetes.io/name=api -f --prefix
# Errors only
kubectl logs -n honeydue -l app.kubernetes.io/name=api --tail=1000 | grep -i error
# Previous pod (before crash/restart)
kubectl logs -n honeydue <pod> --previous
```
## 5. Exec into a pod
```bash
kubectl exec -n honeydue -it deploy/api -- /bin/sh
# inside:
# wget -qO- http://127.0.0.1:8000/api/health/
# env | grep DB_
# exit
```
## 6. Rotate a secret
```bash
# For honeydue-secrets keys
kubectl patch secret honeydue-secrets -n honeydue \
--type=merge \
-p "{\"data\":{\"SECRET_KEY\":\"$(echo -n 'new-value' | base64)\"}}"
# Update local file to match (keep in sync)
printf '%s' 'new-value' > deploy/secrets/secret_key.txt
# Restart pods so they pick up the new secret
kubectl rollout restart -n honeydue deploy/api deploy/worker
```
## 7. Change a ConfigMap value
```bash
# Edit deploy/prod.env locally
# Regenerate the configmap
kubectl create configmap honeydue-config -n honeydue \
--from-env-file=deploy/prod.env \
--dry-run=client -o yaml | kubectl apply -f -
# Restart to pick up
kubectl rollout restart -n honeydue deploy/api deploy/admin deploy/worker
```
## 8. Scale a service
```bash
kubectl scale deployment/api -n honeydue --replicas=5
# Then wait
kubectl rollout status -n honeydue deployment/api
```
**DO NOT** scale worker above 1 until Asynq PeriodicTaskManager is wired.
## 9. Drain a node for maintenance
```bash
# Prevent new pods, evict existing
kubectl drain <node-hostname> --ignore-daemonsets --delete-emptydir-data
# Do maintenance (apt upgrade, reboot, etc.)
ssh deploy@<node> "sudo apt update && sudo apt upgrade -y && sudo reboot"
# Wait for node to come back
watch kubectl get nodes
# Allow scheduling again
kubectl uncordon <node-hostname>
```
Node hostnames (not SSH aliases!):
- `ubuntu-8gb-nbg1-1` (hetzner2)
- `ubuntu-8gb-nbg1-2` (hetzner1)
- `ubuntu-8gb-nbg1-3` (hetzner3)
## 10. Add a new node
```bash
# 1. Provision CX33 in Hetzner console
# 2. SSH in as root, create deploy user + key
# 3. Install k3s as agent (or server)
NODE_TOKEN=$(ssh -i ~/.ssh/hetzner deploy@hetzner1 'sudo cat /var/lib/rancher/k3s/server/node-token')
ssh -i ~/.ssh/hetzner root@<new-node-ip> "curl -sfL https://get.k3s.io | K3S_TOKEN=\"$NODE_TOKEN\" INSTALL_K3S_EXEC=\"server --server=https://178.104.247.152:6443 --disable=servicelb --write-kubeconfig-mode=644\" sh -"
# 4. Add UFW rules for inter-node traffic
# (see deploy-k3s/scripts/ for the script)
# 5. Verify
kubectl get nodes
```
## 11. Remove a node
```bash
# Drain first
kubectl drain <hostname> --ignore-daemonsets --delete-emptydir-data
# Tell k3s to leave
ssh -i ~/.ssh/hetzner deploy@<node-alias> "sudo systemctl stop k3s && sudo /usr/local/bin/k3s-uninstall.sh"
# Remove from cluster
kubectl delete node <hostname>
```
## 12. Force-restart all pods
```bash
kubectl rollout restart -n honeydue deploy/api deploy/admin deploy/worker deploy/redis
```
Use sparingly. Causes brief downtime per pod.
## 13. Migrate to a new Neon DB
```bash
# 1. Point a new branch or project on Neon
# 2. Update prod.env with new DB_HOST
# 3. Apply new ConfigMap
kubectl create configmap honeydue-config -n honeydue \
--from-env-file=deploy/prod.env \
--dry-run=client -o yaml | kubectl apply -f -
# 4. Rolling restart
kubectl rollout restart -n honeydue deploy/api deploy/worker
```
## 14. Rotate Gitea registry PAT
```bash
# 1. Create new PAT in Gitea UI
# 2. Update deploy/registry.env locally
# 3. Update in-cluster Secret
kubectl create secret docker-registry gitea-credentials -n honeydue \
--docker-server=gitea.treytartt.com \
--docker-username=admin \
--docker-password=<new-pat> \
--dry-run=client -o yaml | kubectl apply -f -
# 4. Delete old PAT from Gitea UI
# 5. Pods don't re-auth with existing images (already pulled), but
# new pulls will use new PAT. Test by rolling a pod:
kubectl rollout restart -n honeydue deployment/api
```
## 15. Clean up old images in Gitea
Manual, via Gitea UI:
https://gitea.treytartt.com/admin/-/packages
Keep ~last 30 tags per image; delete older.
Or via API:
```bash
GITEA_PAT="$(grep REGISTRY_TOKEN deploy/registry.env | cut -d= -f2)"
# List tags
curl -sS -H "Authorization: token $GITEA_PAT" \
"https://gitea.treytartt.com/api/v1/packages/admin/container/honeydue-api/versions" | jq .
# Delete specific tag
curl -X DELETE -H "Authorization: token $GITEA_PAT" \
"https://gitea.treytartt.com/api/v1/packages/admin/container/honeydue-api/<tag>"
```
## 16. Recreate the cluster from scratch
See [Chapter 16 §Disaster recovery](./16-failure-modes.md#disaster-recovery).
## 17. Connect to Neon directly
```bash
# Get password
PW=$(cat deploy/secrets/postgres_password.txt)
# Connect
PGPASSWORD="$PW" psql \
-h ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech \
-U neondb_owner \
-d honeyDue
```
## 18. Check admin user credentials
```bash
# ADMIN_EMAIL is in the honeydue-secrets Secret
kubectl get secret honeydue-secrets -n honeydue \
-o jsonpath='{.data.ADMIN_EMAIL}' | base64 -d
# ADMIN_PASSWORD (ONLY VALID FOR FIRST DEPLOY; may have been changed in UI)
kubectl get secret honeydue-secrets -n honeydue \
-o jsonpath='{.data.ADMIN_PASSWORD}' | base64 -d
```
If you need to reset admin password because nobody remembers it:
```bash
# Generate a new bcrypt hash
NEW_PASSWORD='newpassword'
HASH=$(htpasswd -bnBC 10 "" "$NEW_PASSWORD" | tr -d ':\n')
# Update directly in Postgres
PGPASSWORD="$(cat deploy/secrets/postgres_password.txt)" psql \
-h ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech \
-U neondb_owner -d honeyDue \
-c "UPDATE admin_users SET password='$HASH' WHERE email='admin@myhoneydue.com'"
```
## 19. Trigger a Helm chart re-run (Traefik etc.)
If the Traefik HelmChartConfig was updated but chart didn't reconcile:
```bash
kubectl delete job -n kube-system helm-install-traefik
# Helm operator re-runs automatically within ~30 seconds
kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik -w
```
## 20. Smoke test after any change
```bash
# Through Cloudflare
for url in "https://api.myhoneydue.com/api/health/" \
"https://admin.myhoneydue.com/" \
"https://myhoneydue.com/"; do
ok=0
for i in $(seq 1 20); do
[[ "$(curl -sS -o /dev/null -w '%{http_code}' --max-time 10 "$url")" == "200" ]] && ok=$((ok+1))
done
printf "%-45s %d/20 ok\n" "$url" "$ok"
done
```
Expect 20/20 on all three.
## 21. Kill everything (emergency rollback)
If the cluster is so broken you need to reset the app layer:
```bash
# Scale everything to 0
kubectl scale -n honeydue deploy/api deploy/admin deploy/worker deploy/redis --replicas=0
# When ready, scale back up
kubectl scale -n honeydue deploy/api --replicas=3
kubectl scale -n honeydue deploy/admin deploy/worker deploy/redis --replicas=1
```
During the scale-down, CF returns errors to users because no pod is
serving. The rolling update for scale-up takes ~5 min.
## 22. Find which pod a user's request hit
Not directly supported (we don't log node/pod name in requests). When
we add request logging that includes these, a grep through logs works.
Workaround: in each pod's logs, search for a unique user identifier:
```bash
stern -n honeydue api | grep "user_id=12345"
```
## References
- [kubectl cheat sheet][kubectl-cs]
- [K3s docs][k3s-docs]
- [Neon connect][neon-connect]
[kubectl-cs]: https://kubernetes.io/docs/reference/kubectl/cheatsheet/
[k3s-docs]: https://docs.k3s.io/
[neon-connect]: https://neon.com/docs/connect/connect-from-any-app
+243
View File
@@ -0,0 +1,243 @@
# 18 — Cost
## Summary
Current monthly infrastructure cost is ~$30-40. External SaaS (Fastmail,
Apple Developer, Google Play) adds ~$8-17/mo depending on push-enable
status. This chapter itemizes every line, projects costs at scale
(10k, 100k, 1M users), and shows what dials to turn when we need to
save or spend.
## Current monthly cost
### Compute (Hetzner)
| Item | Unit cost | Count | Monthly |
|---|---:|---|---:|
| CX33 (4 vCPU, 8 GB RAM, 80 GB SSD) | $7.99 | 3 | **$23.97** |
| Traffic | $0 (20 TB/mo included per node, well below) | — | $0 |
| Hetzner Cloud Firewall | $0 | — | $0 |
| IPv4 public address | $0 (included) | 3 | $0 |
| **Subtotal** | | | **$23.97** |
### Database (Neon)
Neon Launch plan: $0.106/CU-hour + $0.35/GB-month storage, $5 minimum.
At current usage (low traffic, small schema):
- ~10 CU-hours/month × $0.106 ≈ $1
- ~1 GB storage × $0.35 ≈ $0.35
- Hits the $5 minimum
| Item | Monthly |
|---|---:|
| Neon Launch ($5 min + usage) | **~$5** |
### Object storage (Backblaze B2)
At current usage (~50 GB stored):
| Item | Monthly |
|---|---:|
| Storage ($0.006/GB × 50 GB) | $0.30 |
| Egress (effectively $0 — mostly served through CF) | $0 |
| **Subtotal** | **~$0.30** |
### Edge (Cloudflare)
| Item | Monthly |
|---|---:|
| Cloudflare Free plan (DNS, TLS, CDN, basic DDoS) | **$0** |
### Registry (Gitea)
Self-hosted on the operator's existing Gitea VPS. Not charged to
honeyDue.
| Item | Monthly |
|---|---:|
| Gitea container registry | **$0** |
### Total infrastructure
| Category | Monthly |
|---|---:|
| Compute | $23.97 |
| Database | ~$5 |
| Storage | ~$0.30 |
| Edge | $0 |
| Registry | $0 |
| **Total** | **~$30** |
## External SaaS
Things not part of the deploy but required for the product:
| Item | Cost | Notes |
|---|---:|---|
| Fastmail (SMTP for transactional email) | Part of operator's existing plan | — |
| Apple Developer Program | $99/year = $8.25/mo | Required for iOS app + APNs |
| Google Play Developer | $25 one-time + $0/mo ongoing | — |
| Hetzner Cloud Firewall | $0 | Free; we use UFW instead |
At push-enabled state, total monthly run rate is **~$38-42**.
## Hidden / untracked costs
- **Operator time**: The biggest cost for a bootstrapped project.
Treating ops time at $100/hr, a 4-hour incident = $400.
- **Electricity for operator workstation during builds**: trivial.
- **Domain registration (myhoneydue.com)**: ~$12/year = $1/mo.
## Cost drivers
### 1. Compute (scales with traffic)
If api gets >70% CPU utilization, HPA will scale from 3 to 6 replicas.
Memory at 3 replicas × 512Mi limit = 1.5 GB; nodes have 8 GB each.
Plenty of room before needing more nodes.
Tipping points:
- >6 api replicas needed sustainedly = bigger CX43 (8 vCPU, 16 GB,
~$16/mo each) or more CX33s
- Heavy worker throughput = need Asynq PeriodicTaskManager (code
change, not infra)
### 2. Database (scales with query volume + data)
Neon Launch: pay per CU-hour of compute. If idle time ≫ active time,
we stay near $5 min. If the app is busy, CU-hours grow.
Tipping points:
- Consistently >$30/mo at Launch → evaluate Neon Scale plan
- DB storage >50 GB → $15+/mo just for storage
- Active query load → consider read replicas (paid feature)
### 3. Storage (scales with user uploads)
B2 at $0.006/GB is cheap. 1 TB = $6/mo.
Tipping points:
- >5 TB stored = consider R2 (free egress) if egress becomes a factor
- Very high egress = evaluate moving B2 behind CF Workers
### 4. Edge
Cloudflare Free is generous. We move to Pro ($20/mo) if:
- We need custom WAF rules beyond 5
- We need Image Resizing for user uploads
- We need custom Page Rules beyond 3
## Projections
### 10,000 daily active users
Assume 50 API requests per user per day = 500k req/day = ~6 req/s avg.
Peaks maybe 3-5× = ~25 req/s.
Bottleneck: probably Neon free-tier CU-hours. At 25 req/s with DB calls,
we'd burn through CU-hours fast. Neon bill: $15-30/mo.
Compute: 3 CX33s still handle this comfortably.
| Category | Projected monthly |
|---|---:|
| Compute | $24 |
| Neon | ~$20 |
| Storage | ~$2 |
| Cloudflare | $0 |
| **Total** | **~$46** |
### 100,000 daily active users
500k req/s peaks = multi-node api scaling. HPA kicks in.
| Category | Projected monthly |
|---|---:|
| Compute (3x CX33) | $24 |
| Plus Hetzner LB | $8.49 |
| Neon Scale (pay-as-you-go, higher baseline) | $40-60 |
| B2 (200 GB stored, some egress) | $2 |
| Cloudflare Pro | $20 |
| **Total** | **~$95-115** |
At this scale, operator time becomes the bigger cost. Adding paid
monitoring (Betterstack ~$15/mo) and uptime (Betterstack Uptime $5/mo)
becomes reasonable.
### 1,000,000 daily active users
Bigger question. We'd be re-evaluating:
- More Hetzner nodes or bigger instances
- Neon at scale vs. self-hosted Postgres
- Maybe Cloudflare Workers to offload traffic
Ballpark: $300-500/mo. At this scale, the company has revenue to
justify an ops hire, and this chapter's assumptions break down.
## Dials to save money
### Immediate (reduce $)
| Lever | Savings | Trade-off |
|---|---|---|
| Switch 3 CX33 → 3 Netcup VPS1000G11 | ~$4/mo | Less polished provider, slightly worse UX |
| Disable Neon Launch, use Supabase free tier | ~$5/mo | Supabase free tier limits |
| 2 nodes instead of 3 | ~$8/mo | Lose HA, two-node Raft is worse than one |
| 1 CX23 (2 vCPU, 4 GB) for admin + worker; 2 CX33 for api | ~$5/mo | Complexity; node roles |
None of these are compelling. Current cost is in the "don't optimize"
zone.
### Dials to spend when it becomes worth it
| Spend | Return |
|---|---|
| Upgrade Neon to Scale ($20+) | More CU-hours, connection count room |
| Add Hetzner LB ($8.49) | Real active health checks, sub-second failover |
| Add monitoring (Betterstack $15) | Proactive detection of issues |
| Add uptime monitoring ($5) | Alerts when site is down |
| CF Pro ($20) | Better WAF, Image Resizing |
| CF Load Balancing ($5) | Multi-region failover, active checks on origins |
Cumulatively **~$70/mo** takes us to a fully-monitored, fully-alerted,
multi-region-failing-over setup. At 100k users, worth it.
## Historical spend
**April 2026 MTD**: ~$35 (Hetzner + Neon prorated).
**April 2026 (projected)**: $30-40.
**March 2026**: Pre-launch; no user traffic yet. Just node rentals.
~$25.
## Hetzner April 2026 price adjustment
CX33 went from ~$6.59 → $7.99/mo on 2026-04-01. Our monthly compute
cost rose by $4.20 overnight. This is on our budget radar but isn't a
forcing function to switch providers.
If Hetzner keeps raising prices (which they've historically resisted;
the 2026 adjustment was their first in several years), reconsider.
## Budget alerts
- **B2**: hard-capped via B2 console at $20/mo. If we breach, something
is wrong and B2 rejects further writes.
- **Neon**: soft limits via Neon alerts. Set threshold at $20 to get
email when approaching.
- **Hetzner**: no variable cost at our scale, no alerts needed.
- **Cloudflare**: Free plan has hard quotas; no surprise bills possible.
## References
- [Hetzner Cloud pricing][hetzner-cloud]
- [Neon pricing][neon-pricing]
- [Backblaze B2 pricing][b2-pricing]
- [Cloudflare Free plan][cf-free]
[hetzner-cloud]: https://www.hetzner.com/cloud/
[neon-pricing]: https://neon.com/pricing
[b2-pricing]: https://www.backblaze.com/cloud-storage/pricing
[cf-free]: https://www.cloudflare.com/plans/free/
+480
View File
@@ -0,0 +1,480 @@
# 19 — Postmortem: The Swarm Era
## Summary
honeyDue launched on Docker Swarm on 2026-04-23. Over the course of a
single afternoon we hit **thirteen distinct bugs** before declaring
Swarm unfit and migrating to k3s. This chapter is the forensic record:
the symptom of each bug, the root cause, the specific fix, and citations
where relevant. It's preserved because these lessons are expensive and
future-us should not pay them again.
**TL;DR**: Twelve of the thirteen bugs were recoverable. The thirteenth
was a Docker libnetwork ghost-DNS defect ([moby/moby#52265][moby-52265])
that is fundamentally incompatible with single-replica services. No
amount of clever config fixed it; we had to change orchestrators.
## Timeline
**~18:00** — Infrastructure stood up. Docker Swarm initialized. First
build + push to Gitea.
**~19:30** — First deploy runs. Immediate failures.
**~22:00** — api + admin returning 200 through Cloudflare. Flaky but
working.
**~23:00** — Admin flapping 50%+ through Cloudflare. Ghost DNS record
identified. Workarounds begin.
**~00:30 (next day)** — Ghost DNS survives every non-nuclear
intervention. Research confirms it's a known libnetwork bug. Decision
to migrate to k3s.
**~04:30** — k3s cluster up, all services healthy, 150/150 requests
green. Postmortem begins.
The session ran ~10 hours. The migration itself took ~1 hour.
## The thirteen bugs
### 1 — Deploy script array expansion under `set -u`
**File**: `deploy/scripts/deploy_prod.sh`
**Symptom**:
```
./deploy/scripts/deploy_prod.sh: line 339: api_extra[@]: unbound variable
```
**Root cause**: Bash arrays expanded with `"${arr[@]}"` under `set -u`
fail when the array is empty. Our deploy script initialized empty
arrays conditionally but expanded them unconditionally.
**Fix**: Use the `${arr[@]+"${arr[@]}"}` safe-expansion idiom, or
restructure to avoid passing empty arrays:
```bash
build_and_push api "${API_IMAGE}" ${api_extra[@]+"${api_extra[@]}"}
```
Inside the function, same treatment — use `shift` instead of array
slicing.
**Moral**: `set -u` with bash arrays is a known pitfall. The
`"${arr[@]}"` expansion isn't safe under strict mode if arrays can be
empty.
### 2 — Dockerfile Go version mismatch
**File**: `Dockerfile`
**Symptom**:
```
go: go.mod requires go >= 1.25 (running go 1.24.13; GOTOOLCHAIN=local)
ERROR: failed to build: failed to solve: process "/bin/sh -c go mod download" did not complete successfully: exit code: 1
```
**Root cause**: `go.mod` specifies `go 1.25`, but the Dockerfile's
builder stage used `golang:1.24-alpine`.
**Fix**: Bumped to `golang:1.25-alpine`. One-character change.
**Moral**: Keep the Dockerfile base image in sync with `go.mod`'s
go directive. CI would catch this; we had none.
### 3 — dev machine arm64 vs node amd64
**Symptom**: Would have been `exec format error` on the nodes if we'd
deployed without fixing. Caught at build config stage.
**Root cause**: Operator on Apple Silicon (arm64). Hetzner nodes are
amd64. Plain `docker build` produces arm64 images.
**Fix**: Switched deploy script to use `docker buildx build --platform
linux/amd64 --push`. This cross-compiles the Go stages (they honor
`TARGETARCH`) and uses QEMU emulation for the Node stages.
**Moral**: Cross-platform builds are routine for Apple Silicon
developers. Document it up front, bake it into the deploy script.
### 4 — Swarm stack `host_ip` rejected
**File**: `deploy/swarm-stack.prod.yml` (dozzle service)
**Symptom**:
```
services.dozzle.ports.0 Additional property host_ip is not allowed
```
**Root cause**: Docker Compose v3.8 schema allows `host_ip` in long-form
port spec. Swarm's `docker stack deploy` parser doesn't.
**Fix**: Use the short form:
```yaml
ports:
- "127.0.0.1:${DOZZLE_PORT}:8080"
```
But then: Swarm's ingress mesh mode silently ignores the `127.0.0.1`
binding and listens on `0.0.0.0` anyway. Only way to get true
loopback-only binding is `mode: host`, which changes port-publishing
semantics.
**Moral**: Compose-file compatibility between plain Docker and Swarm
is imperfect. Check the [Swarm-specific compose reference][swarm-compose]
when in doubt.
### 5 — Stack file secret references
**Symptom**:
```
service worker: undefined secret "honeydue_postgres_password_237c6b8-20260423195810"
```
**Root cause**: The original stack file template used
`source: ${POSTGRES_PASSWORD_SECRET}` (which expanded to the versioned
secret name like `honeydue_postgres_password_<ts>`) under each service's
`secrets:` list.
Swarm expects `source:` to match the **alias** in the top-level
`secrets:` block (`postgres_password`), not the actual secret `name:`.
**Fix**: Changed every `source:` to the alias form:
```yaml
# Was:
- source: ${POSTGRES_PASSWORD_SECRET}
target: postgres_password
# Now:
- source: postgres_password
target: postgres_password
```
**Moral**: The original template was clever but subtly wrong. It had
never successfully deployed — the earlier Dokku setup used a different
secret model. Bugs-in-template-code catch you when you first hit them.
### 6 — API pod crash: `sync.Once` double-unlock
**File**: `internal/services/cache_service.go:54`
**Symptom**: api pods completed migrations, started HTTP server, then
fataled with:
```
fatal error: sync: unlock of unlocked mutex
goroutine 1 [running]:
internal/sync.fatal(...)
sync.(*Once).doSlow(...)
github.com/treytartt/honeydue-api/internal/services.NewCacheService
/app/internal/services/cache_service.go:31
```
**Root cause**: Inside a `sync.Once.Do(func() { ... })` callback, the
code did:
```go
cacheOnce.Do(func() {
// ...
if err := client.Ping(ctx).Err(); err != nil {
initErr = fmt.Errorf(...)
cacheOnce = sync.Once{} // ← THIS LINE
return
}
})
```
The intent: "if Redis ping fails, reset the Once so a retry can happen."
The reality: the Once's internal mutex is held while `Do` is running the
callback. Reassigning `cacheOnce = sync.Once{}` creates a NEW zero-
valued Once and replaces the old one. When `Do` tries to release the
mutex afterward, the mutex is the new-zero-valued one — which isn't
locked. Panic.
**Fix**: Removed the reset. `main.go` already handles the error
gracefully (`cache = nil`, continues without caching). Retries happen
via pod restart, not in-process.
```go
if err := client.Ping(ctx).Err(); err != nil {
initErr = fmt.Errorf(...)
// Don't reassign cacheOnce here — mutating it from inside Do()
// is a fatal error. Let main.go handle the error.
return
}
```
**Moral**: `sync.Once` is simpler than it looks. Never reassign an
active sync primitive from within its own callback.
### 7 — Stack file `maxUnavailable: 2` warning for worker
**Symptom**: We noticed `WORKER_REPLICAS=2` in `cluster.env` despite
the Asynq scheduler being a singleton.
**Root cause**: Asynq's `Scheduler` is not leader-elected by default.
Running >1 replica causes duplicate cron firings — duplicate daily
digests, double-welcome emails.
**Fix**: `WORKER_REPLICAS=1`. Added a comment in `cluster.env.example`
explaining why.
**Moral**: Defaults can be dangerous. Even when a default seems
reasonable ("2 replicas for HA"), check against the app's semantics.
### 8 — `PUSH_LATEST_TAG=true` for prod
**Symptom**: During a test, we saw `honeydue-api:latest` updating,
which would make rollbacks harder.
**Root cause**: The cluster.env had `PUSH_LATEST_TAG=true` when the
design intent was SHA-pinned deploys only.
**Fix**: `PUSH_LATEST_TAG=false`. SHA tags only.
**Moral**: Tag-mutable images make rollbacks non-deterministic.
Prefer immutable SHA tags.
### 9 — Neon DB name case sensitivity
**Symptom**:
```
server error: ERROR: database "honeydue" does not exist (SQLSTATE 3D000)
```
**Root cause**: Neon's UI created the database as `"honeyDue"` (quoted,
camelCase). Postgres treats quoted identifiers case-sensitively at
create time. Our `prod.env` had `POSTGRES_DB=honeydue` (lowercase).
**Fix**: `POSTGRES_DB=honeyDue`.
**Moral**: Respect Postgres's identifier quoting rules. If something
was created with quotes, refer to it with exact case.
### 10 — Admin DNS ghost A-record (the big one)
**Symptom**: Through Cloudflare, `admin.myhoneydue.com` returned 502 on
~50% of requests. The other 50% succeeded. The pattern was stable over
hours.
**Investigation**:
The admin service had 1 replica, alive on one of three Swarm nodes.
Caddy (reverse proxy at the time) resolved `admin` via Swarm's
embedded DNS at `127.0.0.11`. `nslookup admin` returned:
```
Name: admin Address: 10.0.1.36 (current task IP)
Name: admin Address: 10.0.1.17 (GHOST — what is this?)
```
Two A records for one-replica service, both returned randomly.
`10.0.1.17` was checked: that IP now belonged to the **dozzle**
container on hetzner3. Nothing listens on dozzle's 3000 port →
connection refused → 502.
The old admin task had run on hetzner3 with IP 10.0.1.17. When it
migrated to hetzner1 with IP 10.0.1.36, libnetwork's DNS registration
for admin was supposed to update. On hetzner2 and hetzner3, the old
10.0.1.17 record never got removed.
**Things tried, none worked**:
| Attempt | Result |
|---|---|
| `endpoint_mode: dnsrr` on admin | DNS still returns both IPs |
| Kill + restart Caddy container | DNS still returns both IPs |
| Scale admin to 0 and back to 1 | Ghost 10.0.1.17 still in DNS with 0 replicas |
| `docker service rm honeydue_admin` | Ghost 10.0.1.17 still in DNS (orphaned) |
| Change admin to `mode: global` | Different IPs but ghost remains |
| `mode: host` on admin ports + `extra_hosts: host.docker.internal:host-gateway` | `host.docker.internal` resolved to docker0 (172.17.0.1), not reachable from overlay |
| Hardcoded 3 node IPs in Caddy + UFW port 3000 node-to-node | ~90% reliable, NAT hairpin issues when Caddy dials its own node |
**Root cause**: [moby/moby#52265][moby-52265] — Docker libnetwork's
overlay network state store doesn't reliably deregister service
endpoints when tasks migrate between nodes. Known bug in the 29.x
line. Partial fixes in #50236 (29.0) were incomplete; 29.3 still
leaks; #52289 is the pending follow-up.
**Why it only manifests on single-replica services**: With 3 replicas,
Caddy's DNS query returns 4 IPs (3 real + 1 ghost). Round-robin
succeeds 75% of the time. With 1 replica, 1 real + 1 ghost = 50%
failure. More replicas = bug is masked.
**Final fix**: None at the libnetwork level. The ghost survives every
non-cluster-recreating operation. The only clean purge is
`docker stack rm` + `docker network rm` + full redeploy. Even then,
the bug recurs on the next task migration.
**Decision**: Migrate to k3s. CoreDNS has none of libnetwork's state-
store semantics and the bug class doesn't exist. 4 hours of fighting
Swarm → 1-hour k3s migration that just worked.
**Citations**:
- [moby/moby#52265 — Overlay ARP stale entries on 29.3.0][moby-52265]
- [moby/moby#51491 — DNS broken after swarm init][moby-51491]
- [Dokploy#3480 — Traefik stale VIP on Swarm][dokploy-3480]
### 11 — IPSec ESP + UDP 500 blocked
**Symptom**: Earlier in the Swarm setup, api 3/3 was working but
cross-node overlay traffic was intermittently failing. This turned out
to be a separate bug masking #10 earlier in the session.
**Root cause**: We had encrypted overlay enabled
(`driver_opts: encrypted: "true"`). Swarm's encrypted mode uses IPSec
ESP (IP protocol 50) + UDP 500 (IKE). Our UFW only allowed UDP 4789
(VXLAN) and 7946 (gossip). ESP was blocked by default-deny. Encrypted
packets dropped silently on some flows.
**Fix**: Added UFW rules for each peer node IP:
```bash
sudo ufw allow from <peer> to any proto esp
sudo ufw allow from <peer> to any port 500 proto udp
```
Once applied, cross-node overlay data path became stable.
**Moral**: Encrypted Swarm overlay requires more than VXLAN to be open.
ESP (protocol 50) and UDP 500 (IKE) for IPSec. Official Docker docs
mention this but it's easy to miss.
### 12 — Admin startupProbe path
**Symptom**: Admin pod kept restarting with startup probe failures.
Kubelet reported:
```
Startup probe failed: HTTP probe failed with statuscode: 404
```
**Root cause**: The k3s scaffold's `admin/deployment.yaml` had:
```yaml
startupProbe:
httpGet:
path: /admin/
port: 3000
```
But our admin Next.js app serves at `/`, not `/admin/`. Requests to
`/admin/` return 404. K8s considered the pod unhealthy and restart-
looped.
**Fix**: Change probe path to `/`. Also bumped `failureThreshold` from
12 to 24 (120s grace) for Next.js's slower-than-expected cold boot
when the node's already busy.
**Moral**: Copy-pasted scaffolds can have assumptions that don't match
your app. Always verify probes against actual reachable paths.
### 13 — MigrateWithLock startup probe grace
**Symptom**: API pods were getting killed by k8s during migration.
First replica was OK (fast migration); replicas 2 and 3 waited on
the advisory lock too long and healthchecks tripped.
**Root cause**: Go app's `MigrateWithLock()` uses
`pg_advisory_lock()` to serialize migrations across replicas. First
replica does real AutoMigrate (~90s cold); subsequent replicas wait
on the lock, then run no-op migrations. Total time for 3rd replica
can be 3+ minutes.
K3s scaffold's `api/deployment.yaml` had:
```yaml
startupProbe:
failureThreshold: 12
periodSeconds: 5
```
= 60s grace. Not enough.
**Fix**: Bumped `failureThreshold` to 48 (= 240s grace). Comment in
the manifest explains why. This is *not* a band-aid — the real startup
time genuinely is 90-240s depending on lock queue position. The probe
should reflect reality, not be optimistic.
**Moral**: Healthchecks should be realistic, not aspirational. Know
what your app actually does at startup.
## What we learned
### Docker Swarm is in a bad place in 2026
Not dead — Mirantis supports it through 2030 — but **nobody is
modernizing libnetwork**. When you hit a DNS or networking bug, you're
on your own. The fix churn on #52265 (incomplete 29.0 fix → 29.3
regression → pending #52289) is a tell: the code has no champion.
For new deployments, **don't pick Swarm** unless you're doing something
Swarm-shaped (tiny, single-replica, no inter-service traffic). K3s is
a strictly better choice for anything approximating what we're doing.
### Investigate before you work around
We spent a lot of time on clever workarounds for bug #10 (host-mode
ports, host.docker.internal, hardcoded node IPs, UFW routing) before
doing the 20-minute research task that revealed the bug was a known
libnetwork defect. If we'd searched "Swarm DNS stale record 2026" first,
we'd have saved ~3 hours.
### Scaffolds are starting points, not finishing points
The k3s scaffold in `deploy-k3s/` was excellent — production-grade
RBAC, PDBs, security contexts, network policies, Traefik middleware.
But its image references (GHCR), TLS assumptions (CF Full strict), and
probe paths (admin's `/admin/`) didn't match our actual setup. Every
scaffold needs a read-through against your environment before you
`kubectl apply -f`.
### Keep the old config until the new config is proven
We kept `deploy/` (Swarm) intact during the k3s migration. That meant
if k3s failed, we could `git stash` the k3s work and do a fast Swarm
redeploy. It took ~4 days before we deleted `deploy/`, by which point
we were confident.
## Files affected by tonight's work
All in `honeyDueAPI-go`:
- `Dockerfile` — Go 1.24 → 1.25 (bug #2)
- `deploy/scripts/deploy_prod.sh` — buildx refactor, array expansion fixes (bugs #1, #3)
- `deploy/swarm-stack.prod.yml` — dozzle host_ip, secret source references, multiple iterations trying to fix #10
- `deploy/prod.env` — admin seed env vars, DB_POSTGRES_DB case, B2 values, push-disabled placeholders (bug #9)
- `deploy/cluster.env` — WORKER_REPLICAS 2 → 1, PUSH_LATEST_TAG (bugs #7, #8)
- `deploy/Caddyfile` — multiple iterations (ultimately deleted when we moved to k3s)
- `internal/services/cache_service.go` — removed sync.Once reset (bug #6)
- `internal/database/database.go` — (no change, MigrateWithLock semantics investigated)
- `deploy-k3s/manifests/api/deployment.yaml` — startupProbe grace (bug #13)
- `deploy-k3s/manifests/admin/deployment.yaml` — probe path (bug #12)
- `deploy-k3s/manifests/worker/deployment.yaml` — replicas 2 → 1
- `deploy-k3s/manifests/pod-disruption-budgets.yaml` — worker minAvailable 1 → 0
- `deploy-k3s/manifests/traefik-helmchartconfig.yaml` — NEW (DaemonSet + hostNetwork for Traefik)
- `deploy-k3s/manifests/ingress/ingress-simple.yaml` — NEW (simple host routing, no TLS)
- `deploy-k3s/MIGRATION_NOTES.md` — NEW
## What was thrown away
- Swarm stack definitions (still in `deploy/`, planned for removal)
- Caddy Caddyfile (k3s uses Traefik instead)
- Several hours of work on Caddy `dynamic a` upstream refresh, host-
mode ports, and NAT-hairpin workarounds for bug #10 — all moot
once we migrated
## References
- [moby/moby#52265 — Overlay ARP stale entries][moby-52265]
- [moby/moby#51491 — DNS broken after swarm init][moby-51491]
- [Dokploy#3480 — Traefik stale VIP][dokploy-3480]
- [Mirantis Swarm LTS commitment][mirantis-swarm]
- [Kubernetes probe best practices][k8s-probes]
- [Asynq scheduler limitations][asynq-sched]
[moby-52265]: https://github.com/moby/moby/issues/52265
[moby-51491]: https://github.com/moby/moby/issues/51491
[dokploy-3480]: https://github.com/Dokploy/dokploy/issues/3480
[mirantis-swarm]: https://www.mirantis.com/blog/mirantis-guarantees-long-term-support-for-swarm/
[k8s-probes]: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
[asynq-sched]: https://github.com/hibiken/asynq/wiki/Periodic-Tasks
[swarm-compose]: https://docs.docker.com/reference/compose-file/legacy-versions/
+318
View File
@@ -0,0 +1,318 @@
# 20 — Roadmap
## Summary
A consolidated list of known gaps, improvements, and scaling triggers.
Items are grouped by category and roughly ordered by priority. This is
the "if we had more time" list referenced throughout the book.
## High priority (do soon)
### Uptime monitoring
**Why**: Right now we find out the site is down when users complain.
**How**: Set up Uptime Kuma (self-hosted) or Better Stack Uptime
(free tier) to ping `https://api.myhoneydue.com/api/health/` every
minute, with Slack/email alerts on failure.
**Effort**: ~30 min for Uptime Kuma deploy, ~10 min for Better Stack
signup.
### Cloudflare origin IP restriction
**Why**: UFW allows :80 from anywhere. If node IPs leak, direct-connect
attackers bypass CF's WAF/DDoS protection.
**How**: Replace the anywhere-80 UFW rule with 15 IPv4 + 7 IPv6 CF
ranges. See [Chapter 13 §CF IP ranges](./13-cloudflare.md#cloudflare-ip-ranges-used-in-traefik-trustedips).
Automation: a small script that refreshes the CF IP list monthly and
re-applies UFW rules.
**Effort**: 1 hour.
### Enable network policies in k3s
**Why**: Currently pods can freely egress anywhere. A compromised pod
could exfiltrate data or attack lateral services.
**How**: `kubectl apply -f deploy-k3s/manifests/network-policies.yaml`.
The scaffold defines default-deny + explicit allows for:
- DNS egress for all pods
- Traefik → api (port 8000)
- Traefik → admin (port 3000)
- api/worker → Redis
- api/worker → external services (Postgres, B2, Fastmail)
Then test that nothing breaks (might need to adjust allow rules).
**Effort**: 1-2 hours including testing.
### Apply Traefik security middleware
**Why**: Our current Ingress has no rate limiting or security headers
beyond what Traefik adds by default.
**How**: Apply `deploy-k3s/manifests/ingress/middleware.yaml`, annotate
Ingresses to use them:
```yaml
metadata:
annotations:
traefik.ingress.kubernetes.io/router.middlewares: honeydue-security-headers@kubernetescrd,honeydue-rate-limit@kubernetescrd
```
**Effort**: 15 min.
## Medium priority
### Upgrade to CF Full (strict) SSL
**Why**: Currently CF↔origin is plain HTTP. An attacker between CF and
Hetzner could read traffic. Full (strict) mode encrypts this leg with
a CF-issued origin cert.
**How**:
1. Generate Origin CA cert in CF dashboard → SSL/TLS → Origin Server
2. Create `cloudflare-origin-cert` Secret in k8s
3. Add `tls:` block to Ingresses
4. Switch CF SSL mode to Full (strict)
**Effort**: 30 min.
**Citations**: [Cloudflare Origin CA docs][cf-origin-ca]
### Migration Job for schema changes
**Why**: Currently every api pod runs `MigrateWithLock()` on startup,
serializing on a Postgres advisory lock. Adds 90-240s to cold startup
and caused bug #13 in Chapter 19.
**How**: Create a Kubernetes `Job` resource that runs the api image
with a `--migrate-only` flag. Job runs once per deploy, completes when
schema is current. api pods get an initContainer that waits for the
Job to complete.
Requires Go code change to support `--migrate-only` flag.
**Effort**: 3-4 hours (code + job manifest + testing).
### Redis password
**Why**: Redis runs in the cluster with no auth. Any compromised pod
could read cache or queue state.
**How**: Set `REDIS_PASSWORD` in `honeydue-secrets`, update api/worker
env, update Redis command to include `--requirepass`. Already partially
wired up in the manifests.
**Effort**: 20 min.
### Image signing with cosign
**Why**: No guarantee that an image pulled from Gitea is the one we
built. Gitea compromise = arbitrary code execution in cluster.
**How**:
1. Install cosign on build machine
2. Sign images as part of deploy: `cosign sign gitea.treytartt.com/admin/honeydue-api:<sha>`
3. Deploy Kyverno (or Connaisseur) to cluster
4. Apply cluster policy requiring all images have valid cosign signatures
**Effort**: 4-6 hours.
### etcd encryption at rest
**Why**: Kubernetes Secrets are stored in etcd unencrypted by default.
Node disk compromise = plaintext secrets.
**How**: K3s supports `--secrets-encryption` flag at server install.
Need to recreate cluster or re-install k3s server on each node.
**Effort**: 1 hour.
### Automated unattended-upgrades
**Why**: Currently OS patches require manual `apt upgrade`. Security
patches can be delayed.
**How**:
```bash
sudo apt install unattended-upgrades
# Configure /etc/apt/apt.conf.d/50unattended-upgrades for security-only
sudo dpkg-reconfigure -plow unattended-upgrades
```
**Effort**: 30 min per node.
### fail2ban
**Why**: SSH is open to the world. No rate limiting on failed attempts.
Bot noise is constant.
**How**: `sudo apt install fail2ban; sudo systemctl enable --now fail2ban`.
Default config bans IPs after 5 failed attempts for 10 min.
**Effort**: 15 min per node.
### Move SSH off port 22
**Why**: Port 22 attracts constant scanner noise. Moving to a
non-default port cuts >90% of attempts.
**How**:
1. Edit `/etc/ssh/sshd_config` on each node: `Port 2222`
2. UFW rule: `sudo ufw allow 2222/tcp`
3. Update `~/.ssh/config` on operator: `Port 2222`
4. Restart sshd: `sudo systemctl restart ssh`
5. Remove UFW rule for port 22 after verifying
**Effort**: 30 min (and pray).
## Lower priority
### Prometheus + Grafana
**Why**: Historical metrics, dashboards, alerting.
**How**: `kube-prometheus-stack` Helm chart. Adds ~500 MB RAM across
cluster.
**Effort**: 4-6 hours including dashboard setup.
### Loki log aggregation
**Why**: Cross-pod log queries, longer retention.
**How**: `grafana/loki` + `promtail` DaemonSet. Integrates with existing
Grafana.
**Effort**: 2-3 hours.
### OpenTelemetry tracing
**Why**: Request-level profiling. Show which hop dominates p99 latency.
**How**: Add OpenTelemetry SDK to Go app; export to Jaeger/Tempo.
**Effort**: 8-12 hours including tuning.
### Hetzner private network
**Why**: Currently all inter-node traffic (including Flannel overlay)
goes over public network. Private network = less attack surface, no
bandwidth costs (if metered in future).
**How**: Attach Hetzner vswitch to the 3 nodes, reconfigure Flannel to
advertise private IPs, update UFW rules to allow from private IP range
instead of specific public IPs.
**Effort**: 2-3 hours including testing Flannel reconfig.
### Move secrets to Vault
**Why**: Kubernetes Secrets are base64-encoded etcd values. Vault is
purpose-built for secret management with audit logs, dynamic secrets,
rotation policies.
**How**: Deploy Vault in the cluster (or external), migrate secret
values, use Vault Agent Injector or External Secrets Operator.
**Effort**: 6-8 hours.
Not high priority until we have multiple engineers who shouldn't see
every secret, or compliance requirements.
### Automated backups to B2
**Why**: Neon's backup is Neon's problem. If Neon-as-a-company
disappeared, we'd lose everything.
**How**: Nightly `pg_dump | gzip | aws s3 cp` (via `s3cmd` for B2) as a
CronJob in the cluster.
**Effort**: 2 hours.
### Multi-region
**Why**: ~100 ms CF→origin hop could be reduced by having origins in
multiple regions. Not needed at current scale.
**How**: Add 2 more Hetzner nodes in ash (Ashburn, US). Separate k3s
cluster (or one stretched cluster — painful). Cloudflare Load Balancing
for geo-based routing.
**Effort**: Days of work, doubling cost. Don't until traffic justifies.
### CF Workers for static + caching
**Why**: Certain endpoints (the marketing landing page, public API
lookups) could serve from CF Workers with near-zero origin load.
**How**: Move static pages to Cloudflare Pages; cache API responses
with `Cache-Control: public, max-age=300`.
**Effort**: 4-6 hours.
### WireGuard-encrypted overlay
**Why**: Current Flannel VXLAN is plaintext between nodes. An attacker
with Hetzner-internal network access could read pod-to-pod traffic.
**How**: K3s supports `--flannel-backend=wireguard-native`. Reinstall
k3s server on each node with the new backend.
**Effort**: 2-3 hours (requires brief downtime).
## Scaling triggers
| Trigger | Action |
|---|---|
| p99 latency > 500ms sustained | Investigate with tracing; consider CF Workers for cached paths |
| API CPU > 70% sustained | HPA already configured; may need more nodes |
| DB connections at Neon limit | Upgrade Neon Scale or reduce `DB_MAX_OPEN_CONNS` |
| Redis memory > 80% | Scale Redis memory; consider cache sharding |
| B2 storage > 500 GB | Evaluate if R2 (free egress) is cheaper overall |
| Active users > 100k | Evaluate multi-region, CF Pro, paid monitoring |
| Revenue > $5k/mo | Hire ops help; this document assumes solo operator |
## Known gaps we accept
- **No canary deploys**: all-or-nothing rollouts via `kubectl set image`
- **No feature flags** (app-level): code is deployed as-is. Can't toggle
features without re-deploying
- **No A/B testing infra**: out of scope for current product stage
- **No Windows/tablet-specific CDN rules**: CF serves everyone the same
responses
- **No explicit blue-green**: rolling updates only
## Stuff to delete when brave
- `deploy/` (the Swarm era) — once we've been on k3s 30 days
- Legacy UFW rules from the Swarm era (2377, 7946, 4789, ESP, 500, 3000)
— they don't hurt but they're confusing
- `deploy-k3s/manifests/secrets.yaml.example` — we don't use this
pattern, we create secrets imperatively
## Stuff that could go wrong and we should plan for
- **Hetzner price hike**: 2026-04-01 already happened. If another one
comes, we could migrate to Netcup or OVH for savings.
- **Neon EOL free tier**: Neon could change pricing policy. Fallback:
self-host Postgres on a Hetzner box or migrate to Supabase.
- **Cloudflare Free plan changes**: CF could restrict Free features.
Fallback: BunnyCDN, or raw nodes without CDN.
- **Gitea host outage**: If Gitea is down, deploys can't pull new
images. Existing pods continue. For long outages, we'd cache images
locally or temporarily push to Docker Hub.
## Progress tracker
As items are done, mark them here. Think of this as a running changelog.
- [x] k3s migration from Swarm (2026-04-24)
- [x] Traefik DaemonSet + hostNetwork
- [x] Admin seed via ADMIN_EMAIL + ADMIN_PASSWORD
- [x] Documentation book (this doc set)
- [ ] All other items above
+112
View File
@@ -0,0 +1,112 @@
# honeyDue Production Deployment — The Book
This is the complete reference for the honeyDue production deployment as it
exists on **2026-04-24**. It serves two audiences:
1. **A new engineer** learning the system for the first time. Start at
Chapter 0 (Overview) and read in order. Concepts are built up; nothing is
assumed beyond "you've deployed web apps before."
2. **The operator** (future-you) needing a specific fact fast. Every chapter
opens with a one-paragraph summary and has an operator runbook at its end.
The appendices are a cheat sheet.
The deployment is non-trivial. It's a 3-node HA Kubernetes cluster running
a Go API, a Next.js admin panel, a background worker, Redis, and Traefik —
all fronted by Cloudflare, integrated with Neon Postgres, Backblaze B2, and
a self-hosted Gitea registry. This book explains **why each of those pieces
was chosen** (often over two or three alternatives we tried first), what
they do, and how to operate them.
## Table of Contents
### Part I — The System
- [00 — Overview](./00-overview.md) — what's running, at a glance
- [01 — Infrastructure](./01-infrastructure.md) — Hetzner nodes, specs, cost, region
- [02 — Orchestrator Choice](./02-orchestrator-choice.md) — why k3s (and not Swarm, full k8s, or Nomad)
### Part II — Networking
- [03 — Networking](./03-networking.md) — flannel, CoreDNS, kube-proxy, the overlay story
- [04 — Firewall](./04-firewall.md) — every UFW rule on every node, rationale
- [13 — Cloudflare](./13-cloudflare.md) — DNS, SSL modes, round-robin origin pool
### Part III — Security
- [05 — Security](./05-security.md) — RBAC, Pod Security, secrets, TLS chain
- [06 — Traefik Ingress](./06-traefik-ingress.md) — host-network DaemonSet, cert plan
### Part IV — Workloads
- [07 — Services](./07-services.md) — api, admin, worker, redis per-service deep dive
- [08 — Database](./08-database.md) — Neon Postgres, advisory-lock migrations
- [09 — Storage](./09-storage.md) — Backblaze B2, minio-go client details
- [10 — Secrets & Config](./10-secrets-config.md) — ConfigMap, Secret, env mapping
- [11 — Registry](./11-registry.md) — Gitea container registry, multi-arch builds
### Part V — Operation
- [12 — Data Flow](./12-data-flow.md) — end-to-end request lifecycle
- [14 — Deployment Process](./14-deployment-process.md) — how to roll new code
- [15 — Observability](./15-observability.md) — logs, metrics, tracing
- [16 — Failure Modes](./16-failure-modes.md) — what happens when X dies
- [17 — Runbook](./17-runbook.md) — common ops tasks
### Part VI — Context
- [18 — Cost](./18-cost.md) — what this costs to run, per service
- [19 — Swarm Postmortem](./19-postmortem-swarm.md) — the story of why we migrated from Docker Swarm
- [20 — Roadmap](./20-roadmap.md) — known TODOs and scaling triggers
### Appendices
- [A — Glossary](./appendices/a-glossary.md)
- [B — kubectl Cheat Sheet](./appendices/b-commands.md)
- [C — File Locations](./appendices/c-file-locations.md)
- [D — References & Citations](./appendices/d-references.md)
## Quick Facts
| Field | Value |
|---|---|
| Orchestrator | K3s v1.34.6+k3s1 (3 nodes, HA control plane) |
| Ingress | Traefik v3 (DaemonSet, hostNetwork) |
| Nodes | 3× Hetzner Cloud CX33 (4 vCPU, 8 GB RAM, 80 GB SSD) in `nbg1` (Nuremberg) |
| DNS & Edge | Cloudflare (Free plan), SSL=Flexible, round-robin 3 node A records |
| Database | Neon Postgres, `ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech` |
| Cache + Queue | Redis 7-alpine, in-cluster, 1 replica, PVC-backed, pinned to `nbg1-2` |
| Object Storage | Backblaze B2, `honeyDueProd` bucket, `us-east-005` region |
| Image Registry | Self-hosted Gitea v1.25.5 at `gitea.treytartt.com` |
| Transactional Email | Fastmail SMTP (`smtp.fastmail.com:587`) |
| Domains | `api.myhoneydue.com`, `admin.myhoneydue.com`, `myhoneydue.com` |
| Monthly Cost (current) | ~$3040 (3× Hetzner + Neon Launch + B2 + Cloudflare Free + Gitea free) |
| kubeconfig | `~/.kube/honeydue-k3s.yaml` on operator workstation |
| Repo | `honeyDueAPI-go/deploy-k3s/` for manifests, `deploy/` is the legacy Swarm config |
## How to Read This Book
- **"Why did we…?"** answers are in the chapter covering that component. Every
major design choice has an explicit rejection of 13 alternatives.
- **Historical bugs** are in Chapter 19. The rest of the book describes the
current (fixed) state; 19 is the forensic record of what was broken and
how we figured it out.
- **Operator commands** you'll run regularly are in Appendix B. Chapter 17
has longer procedures (cert rotation, DB migration, etc.).
- **Citations** throughout use footnote-style links to the canonical source
(k3s docs, moby issues, Cloudflare docs, etc.). Appendix D collects them.
## Conventions
- Kubernetes namespace for the app is `honeydue`.
- SSH aliases are `hetzner1`, `hetzner2`, `hetzner3` in your `~/.ssh/config`.
- Node hostnames in the cluster are `ubuntu-8gb-nbg1-{1,2,3}` (Hetzner-assigned).
- The mapping is non-obvious because the Hetzner hostname suffix order does
not match SSH alias order:
| SSH alias | Public IP | Hostname in k3s |
|---|---|---|
| hetzner1 | 178.104.247.152 | `ubuntu-8gb-nbg1-2` |
| hetzner2 | 178.105.32.198 | `ubuntu-8gb-nbg1-1` |
| hetzner3 | 178.104.249.189 | `ubuntu-8gb-nbg1-3` |
When a chapter refers to "hetzner1" it means the box at 178.104.247.152 / `nbg1-2`.
+207
View File
@@ -0,0 +1,207 @@
# Appendix A — Glossary
Alphabetical. Cross-referenced to chapters where each term is used in
detail.
## Kubernetes / k3s
**ClusterIP**: Internal IP of a Kubernetes Service. Stable; load-
balances to backing pods. (Chapter 3)
**containerd**: Container runtime bundled with k3s. Replaces Docker for
the runtime layer. (Chapter 2)
**ConfigMap**: Kubernetes resource holding non-sensitive config (env
vars). Mounted into pods via `envFrom`. (Chapter 10)
**CoreDNS**: Cluster-internal DNS resolver. Every pod's
`/etc/resolv.conf` points to the CoreDNS Service. (Chapter 3)
**CRD (Custom Resource Definition)**: Kubernetes extension mechanism
for third-party resource types. Traefik's `IngressRoute` and
`Middleware` are CRDs. (Chapter 6)
**DaemonSet**: Workload that runs exactly one pod per node. We use it
for Traefik so each node has its own ingress pod. (Chapter 6)
**Deployment**: Kubernetes workload for stateless pods. Supports rolling
updates. Most of our services are Deployments. (Chapter 7)
**Endpoints**: The actual pod IPs backing a Service's ClusterIP.
Dynamically updated as pods come and go. (Chapter 3)
**etcd**: Distributed key-value store holding cluster state. K3s
embeds it. Raft-replicated across server nodes. (Chapter 2)
**Flannel**: Kubernetes CNI (Container Network Interface) plugin for
pod-to-pod networking. Uses VXLAN tunneling. (Chapter 3)
**HPA (HorizontalPodAutoscaler)**: K8s resource that scales Deployment
replicas based on CPU/memory usage. Not currently enabled for us.
(Chapter 7)
**Ingress**: K8s resource describing external-to-internal routing rules.
Traefik watches Ingresses and programs itself accordingly. (Chapter 6)
**IPVS**: Linux kernel feature for in-kernel L4 load balancing. Our
kube-proxy uses it. (Chapter 3)
**k3s**: Lightweight Kubernetes distribution by Rancher/SUSE. What we
run. (Chapter 2)
**kubectl**: Kubernetes CLI tool. Runs on operator workstation.
(Chapter 17)
**kubelet**: Agent running on each node, responsible for pod lifecycle.
(Chapter 2)
**kube-proxy**: Service-to-pod routing component. Runs on each node in
IPVS mode. (Chapter 3)
**Namespace**: Kubernetes logical grouping. Our app lives in `honeydue`.
System services in `kube-system`. (Chapter 7)
**NetworkPolicy**: K8s resource defining allowed traffic between pods.
Not currently applied. (Chapter 5)
**Node**: A physical or virtual machine running Kubernetes. We have 3.
(Chapter 1)
**PDB (PodDisruptionBudget)**: Constraint on voluntary pod disruptions
(drain, upgrade). Keeps N replicas available. (Chapter 7)
**Pod**: Smallest Kubernetes unit — one or more containers sharing
network and storage. Our pods are usually one-container. (Chapter 7)
**PVC (PersistentVolumeClaim)**: Request for persistent storage. Redis
uses one. (Chapter 7)
**RBAC**: Role-Based Access Control. Governs who/what can do what via
the Kubernetes API. (Chapter 5)
**ReplicaSet**: Managed by a Deployment; ensures N pods of a template
are running. Each deploy creates a new ReplicaSet. (Chapter 14)
**Secret**: K8s resource holding sensitive values. Base64-encoded;
stored in etcd (unencrypted by default). (Chapter 10)
**Service**: K8s resource providing a stable endpoint (ClusterIP) for
a set of pods. (Chapter 3)
**ServiceAccount**: Identity used by pods to authenticate to the
Kubernetes API. We disable token mounting for our app pods.
(Chapter 5)
**Taint / Toleration**: Mechanism to prevent pods from being scheduled
on certain nodes. Not used in our setup. (Chapter 7)
## Docker / Swarm
**libnetwork**: Docker's networking library. Provides overlay
networking for Swarm. Source of the DNS ghost bug (Chapter 19).
**mode: global**: Swarm deploy mode for services running one pod per
node. (Chapter 19)
**mode: host**: Port publishing mode that binds to node's real
interface, bypassing the ingress mesh. (Chapter 4)
**Overlay network**: Encrypted or unencrypted virtual network spanning
Swarm nodes. (Chapter 19)
**Swarm**: Docker's built-in orchestrator. What we used to run.
(Chapter 19)
**VXLAN**: Virtual Extensible LAN. Layer-2 over Layer-3 tunneling.
Used by both Swarm overlay and Kubernetes Flannel. (Chapter 3)
## Cloudflare
**Flexible SSL**: CF SSL mode where CF↔origin is HTTP. Our current
setup. (Chapter 13)
**Full (strict) SSL**: CF SSL mode where CF↔origin is HTTPS with cert
verification. Our target. (Chapter 13)
**Origin CA**: CF-internal certificate authority that issues certs CF's
edge trusts. Used for Full strict mode. (Chapter 13)
**POP (Point of Presence)**: A CF edge location. ~300 globally.
(Chapter 13)
**Proxied (orange cloud)**: DNS record with CF proxying on. Traffic
goes through CF. (Chapter 13)
**Workers**: CF's serverless compute at the edge. We don't use yet.
(Chapter 20)
## Hetzner
**CX33**: Hetzner Cloud instance type. 4 vCPU, 8 GB RAM, 80 GB SSD.
(Chapter 1)
**Cloud Firewall**: Hetzner's provider-level firewall feature. We use
UFW on nodes instead. (Chapter 4)
**nbg1**: Nuremberg datacenter code. Our region. (Chapter 1)
## Neon
**Branch**: Neon's isolation primitive. Each project can have multiple
branches (prod, staging, dev). (Chapter 8)
**CU (Compute Unit)**: Neon's pricing unit for compute.
(Chapter 8)
**Launch plan**: Neon's entry-level paid plan. $5 min + usage.
(Chapter 8)
**Pooler**: Neon's built-in PgBouncer instance at the `-pooler` hostname
suffix. (Chapter 8)
## Backblaze B2
**B2**: Backblaze's object storage. What we use for uploads.
(Chapter 9)
**App key**: B2's bucket-scoped credential. Not an IAM-flavored role.
(Chapter 9)
**S3-compatible**: API that speaks AWS S3 protocol. B2 supports it.
(Chapter 9)
## Go + Asynq
**AutoMigrate**: GORM function that syncs DB schema to Go structs.
(Chapter 8)
**Asynq**: Go library for background job queues. Redis-backed.
(Chapter 7)
**GORM**: Go ORM we use. (Chapter 8)
**pgx**: Go Postgres driver used by GORM. (Chapter 8)
**sync.Once**: Go stdlib primitive for "run this exactly once." Source
of bug #6 (Chapter 19).
## Other
**advisory lock**: A Postgres lock that doesn't block rows but lets
apps coordinate voluntarily. We use for migration serialization.
(Chapter 8)
**AOF (Append-Only File)**: Redis persistence mode that logs every
write. (Chapter 7)
**MTU**: Maximum Transmission Unit. Packet size limit. VXLAN reduces
effective MTU by 50 bytes. (Chapter 3)
**Raft**: Consensus algorithm. Used by etcd. (Chapter 2)
**STARTTLS**: SMTP upgrade from plain to TLS. Used for Fastmail.
(Chapter 5)
**UFW**: Uncomplicated Firewall. Frontend for iptables. (Chapter 4)
**VXLAN**: See Docker/Swarm section.
+305
View File
@@ -0,0 +1,305 @@
# Appendix B — kubectl Cheat Sheet
Specific to this deployment. Assumes:
```bash
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
```
## Viewing state
```bash
# All pods in our namespace
kubectl get pods -n honeydue
# With node placement + IPs
kubectl get pods -n honeydue -o wide
# All resources in our namespace
kubectl get all -n honeydue
# Cluster-wide pod overview
kubectl get pods -A
# Node health
kubectl get nodes
kubectl top nodes
# What's using RAM
kubectl top pods -n honeydue --sort-by=memory
# What's using CPU
kubectl top pods -n honeydue --sort-by=cpu
```
## Logs
```bash
# Follow all api pod logs
kubectl logs -n honeydue -l app.kubernetes.io/name=api -f --prefix
# One specific pod
kubectl logs -n honeydue <pod-name>
# Previous pod's logs (after crash)
kubectl logs -n honeydue <pod-name> --previous
# Filtered
kubectl logs -n honeydue deploy/api | grep -i error
kubectl logs -n honeydue deploy/api --since=1h
# stern is nicer for multi-pod (if installed)
stern -n honeydue api
```
## Deploying new code
```bash
SHA=$(git rev-parse --short HEAD)
# Build + push (requires docker login to Gitea first)
docker buildx build --platform linux/amd64 --target api \
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .
# Roll it in
kubectl set image deployment/api -n honeydue \
api="gitea.treytartt.com/admin/honeydue-api:${SHA}"
# Watch
kubectl rollout status -n honeydue deployment/api
```
## Rolling update controls
```bash
# Pause a rollout in progress (new pods stop being created)
kubectl rollout pause deployment/api -n honeydue
# Resume
kubectl rollout resume deployment/api -n honeydue
# Rollback to previous version
kubectl rollout undo deployment/api -n honeydue
# Rollback to specific revision
kubectl rollout history deployment/api -n honeydue
kubectl rollout undo deployment/api -n honeydue --to-revision=3
# Force restart (re-pulls image if digest changed; reloads ConfigMap)
kubectl rollout restart deployment/api -n honeydue
```
## Scaling
```bash
# Scale up
kubectl scale deployment/api -n honeydue --replicas=5
# Scale down
kubectl scale deployment/api -n honeydue --replicas=3
# Kill everything (emergency)
kubectl scale deployment -n honeydue --all --replicas=0
# Bring back
kubectl scale deployment/api -n honeydue --replicas=3
kubectl scale deployment/admin deployment/worker deployment/redis -n honeydue --replicas=1
```
## Debugging a pod
```bash
# Describe = events + state + restart history
kubectl describe pod -n honeydue <pod-name>
# Shell in
kubectl exec -it -n honeydue deploy/api -- /bin/sh
# Inside:
# Test HTTP locally (bypasses Traefik, Service, overlay)
wget -qO- http://127.0.0.1:8000/api/health/
# Test cross-Service DNS
getent hosts redis
getent hosts admin
getent hosts postgres
# Run arbitrary command (one-shot)
kubectl exec -n honeydue deploy/api -- env | grep POSTGRES
```
## Networking checks
```bash
# Resolve a Service from a pod
kubectl exec -n honeydue deploy/api -- nslookup redis
# Check Service endpoints (the actual IPs behind a ClusterIP)
kubectl get endpoints -n honeydue api
# Traffic test via Service
kubectl run test --rm -it --image=alpine/curl -- sh
# curl http://api.honeydue.svc:8000/api/health/
# List all Ingresses
kubectl get ingress -A
```
## Secret / Config
```bash
# List
kubectl get secrets -n honeydue
kubectl get configmap -n honeydue
# Describe (shows keys, not values)
kubectl describe secret honeydue-secrets -n honeydue
# Read a value (DANGER: plaintext to stdout)
kubectl get secret honeydue-secrets -n honeydue \
-o jsonpath='{.data.POSTGRES_PASSWORD}' | base64 -d; echo
# Update a single secret key
kubectl patch secret honeydue-secrets -n honeydue \
--type=merge -p "{\"data\":{\"SECRET_KEY\":\"$(echo -n 'new-val' | base64)\"}}"
# Regenerate ConfigMap from prod.env
kubectl create configmap honeydue-config -n honeydue \
--from-env-file=deploy/prod.env \
--dry-run=client -o yaml | kubectl apply -f -
# Edit a ConfigMap interactively (does NOT restart pods)
kubectl edit configmap honeydue-config -n honeydue
```
## Node management
```bash
# Prevent scheduling on a node
kubectl cordon <node-hostname>
# Prevent scheduling + evict existing pods
kubectl drain <node-hostname> --ignore-daemonsets --delete-emptydir-data
# Allow scheduling again
kubectl uncordon <node-hostname>
# Label a node
kubectl label node <node-hostname> honeydue/redis=true --overwrite
# Remove a label
kubectl label node <node-hostname> honeydue/redis-
```
## Events (the timeline)
```bash
# All events, newest last
kubectl get events -A --sort-by=.lastTimestamp
# Watch live
kubectl get events -A --sort-by=.lastTimestamp -w
# Only warnings
kubectl get events -A --field-selector type=Warning
# Events for a specific pod
kubectl describe pod -n honeydue <pod> | awk '/Events:/,0'
```
## Traefik-specific
```bash
# All Traefik pods (DaemonSet, so one per node)
kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik -o wide
# Restart Traefik across all nodes
kubectl rollout restart daemonset/traefik -n kube-system
# View Traefik config (via ConfigMap)
kubectl get cm -n kube-system traefik -o yaml | less
# See the HelmChartConfig we applied
kubectl get helmchartconfig -n kube-system traefik -o yaml
# Force Helm re-reconcile
kubectl delete job -n kube-system helm-install-traefik
```
## Cluster-wide operations
```bash
# API server health
kubectl cluster-info
# All namespaces
kubectl get namespaces
# All k3s-system pods
kubectl get pods -n kube-system
# All ServiceAccounts in our namespace
kubectl get sa -n honeydue
# Check what an SA can do
kubectl auth can-i --list --as=system:serviceaccount:honeydue:api
```
## Hetzner SSH (not kubectl but oft needed)
```bash
# SSH in
ssh -i ~/.ssh/hetzner deploy@hetzner1
# Check k3s service
ssh -i ~/.ssh/hetzner deploy@hetzner1 'sudo systemctl status k3s'
# Per-node commands in parallel (e.g., apt upgrade)
for h in hetzner1 hetzner2 hetzner3; do
ssh -i ~/.ssh/hetzner "deploy@$h" 'sudo apt update && sudo apt upgrade -y'
done
```
## Emergency: cluster is wedged
```bash
# Check all nodes Ready
kubectl get nodes
# If one is NotReady
ssh -i ~/.ssh/hetzner deploy@<node> 'sudo systemctl restart k3s'
# If still bad, kill k3s on that node and check
ssh -i ~/.ssh/hetzner deploy@<node> 'sudo /usr/local/bin/k3s-killall.sh'
ssh -i ~/.ssh/hetzner deploy@<node> 'sudo systemctl start k3s'
# Last resort: uninstall + rejoin
# ssh -i ~/.ssh/hetzner deploy@<node> 'sudo /usr/local/bin/k3s-uninstall.sh'
# then re-join via the k3s install command
```
## One-liners worth memorizing
```bash
# Heavy smoke test through CF
for url in https://api.myhoneydue.com/api/health/ https://admin.myhoneydue.com/ https://myhoneydue.com/; do
ok=0
for i in $(seq 1 20); do
[[ "$(curl -sS -o /dev/null -w '%{http_code}' --max-time 10 "$url")" == "200" ]] && ok=$((ok+1))
done
printf "%-45s %d/20\n" "$url" "$ok"
done
# Pods not ready
kubectl get pods -A | awk '$3!="Running" && $3!="Completed" && $3!="STATUS"'
# Restart everything in our namespace
for d in api admin worker redis; do
kubectl rollout restart deploy/$d -n honeydue
done
# Watch all rollouts simultaneously
for d in api admin worker redis; do
kubectl rollout status deploy/$d -n honeydue &
done; wait
```
@@ -0,0 +1,216 @@
# Appendix C — File Locations
Complete map of where every significant file lives — on the operator
workstation, in the git repo, and on the Hetzner nodes.
## Operator workstation
### Kubernetes
| Path | Purpose |
|---|---|
| `~/.kube/honeydue-k3s.yaml` | kubeconfig for the k3s cluster. Contains an admin bearer token. Mode 0600. |
| `~/.kube/config` | Default kubeconfig (points elsewhere, not our cluster). |
Set `KUBECONFIG=~/.kube/honeydue-k3s.yaml` before any `kubectl` command.
### SSH
| Path | Purpose |
|---|---|
| `~/.ssh/hetzner` | Private key for node SSH (ed25519). Mode 0600. |
| `~/.ssh/hetzner.pub` | Public key corresponding to above. |
| `~/.ssh/config` | Host aliases for hetzner1/hetzner2/hetzner3 → node IPs. |
Public key content:
```
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBU9xTTBD78tYUqHijgyU9PDqtmS4NuM/6uy8XgDzva+ hetzner2@myhoneydue.com
```
### Docker
| Path | Purpose |
|---|---|
| `~/.docker/config.json` | Docker CLI config. After `docker login` to Gitea, contains creds. **Log out after each deploy** to not leave PATs on disk. |
| `~/Library/Containers/com.docker.docker/` | Docker Desktop state (macOS). |
## Git repo (`/Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go/`)
### Top-level
| Path | Purpose |
|---|---|
| `CLAUDE.md` | Project-wide instructions for Claude assistant. Never commit secrets here. |
| `Dockerfile` | Multi-stage Docker build: api, worker, admin targets. |
| `go.mod`, `go.sum` | Go module definition. |
| `package.json` (admin-ui/) | Next.js dependencies. |
### Application code
| Path | Purpose |
|---|---|
| `cmd/api/main.go` | API server entry point. |
| `cmd/worker/main.go` | Background worker entry point. |
| `cmd/admin/main.go` | (may or may not exist for Go admin variant) |
| `internal/config/` | Viper configuration loading. |
| `internal/database/` | Postgres connection, migrations. |
| `internal/handlers/` | HTTP handlers (one file per domain). |
| `internal/services/` | Business logic. `cache_service.go` is where the sync.Once bug was (Chapter 19). |
| `internal/repositories/` | GORM repositories. |
| `internal/router/router.go` | Echo routes, including static file serving. CSP is set here. |
| `internal/middleware/` | Echo middleware (auth, logging, etc.). |
| `internal/task/` | Task predicates/scopes/categorization. See `docs/TASK_LOGIC_ARCHITECTURE.md`. |
### Deploy config (Swarm era — still exists, unused)
| Path | Purpose |
|---|---|
| `deploy/` | Legacy Swarm deploy root. |
| `deploy/prod.env` | Non-secret config (ConfigMap source). **Gitignored.** |
| `deploy/registry.env` | Gitea PAT + registry URL. **Gitignored.** |
| `deploy/cluster.env` | Swarm cluster settings. Partly used for k3s too (manager host). **Gitignored.** |
| `deploy/secrets/postgres_password.txt` | Neon password. **Gitignored.** |
| `deploy/secrets/secret_key.txt` | App signing key (≥32 chars). **Gitignored.** |
| `deploy/secrets/email_host_password.txt` | Fastmail password. **Gitignored.** |
| `deploy/secrets/fcm_server_key.txt` | FCM key (placeholder, push off). **Gitignored.** |
| `deploy/secrets/apns_auth_key.p8` | APNs key (placeholder, push off). **Gitignored.** |
| `deploy/swarm-stack.prod.yml` | Swarm stack definition. Unused after migration. |
| `deploy/Caddyfile` | Caddy config. Unused after migration. |
| `deploy/scripts/deploy_prod.sh` | Swarm deploy script. Unused. |
| `deploy/DEPLOYING.md`, `deploy/README.md`, `deploy/shit_deploy_cant_do.md` | Swarm-era docs. Historical reference. |
### Deploy config (k3s)
| Path | Purpose |
|---|---|
| `deploy-k3s/README.md` | k3s deployment README (scaffold version). |
| `deploy-k3s/MIGRATION_NOTES.md` | Notes from Swarm → k3s migration. |
| `deploy-k3s/SECURITY.md` | Security posture doc (scaffold). |
| `deploy-k3s/config.yaml.example` | Template for a unified config.yaml (unused — we kept Swarm's file layout). |
| `deploy-k3s/manifests/namespace.yaml` | Creates `honeydue` namespace. |
| `deploy-k3s/manifests/rbac.yaml` | ServiceAccounts + `automountServiceAccountToken: false`. |
| `deploy-k3s/manifests/pod-disruption-budgets.yaml` | PDBs for api (2/3) and worker (0/1). |
| `deploy-k3s/manifests/network-policies.yaml` | Default-deny + allows. NOT currently applied. |
| `deploy-k3s/manifests/api/deployment.yaml` | api Deployment. |
| `deploy-k3s/manifests/api/service.yaml` | api ClusterIP Service. |
| `deploy-k3s/manifests/api/hpa.yaml` | api HorizontalPodAutoscaler. NOT currently applied. |
| `deploy-k3s/manifests/admin/deployment.yaml` | admin Deployment. |
| `deploy-k3s/manifests/admin/service.yaml` | admin Service. |
| `deploy-k3s/manifests/worker/deployment.yaml` | worker Deployment. |
| `deploy-k3s/manifests/redis/deployment.yaml` | Redis Deployment. |
| `deploy-k3s/manifests/redis/service.yaml` | Redis Service. |
| `deploy-k3s/manifests/redis/pvc.yaml` | Redis PersistentVolumeClaim. |
| `deploy-k3s/manifests/ingress/ingress.yaml` | Full Ingress with TLS + middleware (scaffold; needs CF origin cert). |
| `deploy-k3s/manifests/ingress/ingress-simple.yaml` | Simple Ingress without TLS (what we actually apply). |
| `deploy-k3s/manifests/ingress/middleware.yaml` | Traefik middleware CRDs. Not currently applied. |
| `deploy-k3s/manifests/traefik-helmchartconfig.yaml` | Our DaemonSet + hostNetwork override for Traefik. |
| `deploy-k3s/manifests/secrets.yaml.example` | Template (never deployed). |
| `deploy-k3s/scripts/01-provision-cluster.sh` | hetzner-k3s provisioning (we didn't use it; existing nodes). |
| `deploy-k3s/scripts/02-setup-secrets.sh` | Creates Secrets + ConfigMap (scaffold version; we ran commands manually). |
| `deploy-k3s/scripts/03-deploy.sh` | Applies manifests (unused; we ran kubectl manually). |
| `deploy-k3s/scripts/04-verify.sh` | Post-deploy verification. |
| `deploy-k3s/scripts/rollback.sh` | Rollback helper. |
### Documentation
| Path | Purpose |
|---|---|
| `docs/deployment/` | **This book.** |
| `docs/TASK_LOGIC_ARCHITECTURE.md` | Task logic internals. |
| `docs/PUSH_NOTIFICATIONS.md` | Push notifications setup (for future). |
| `docs/SUBSCRIPTION_WEBHOOKS.md` | Apple/Google subscription webhooks. |
| `docs/Dokku_notes` | Pre-Swarm era deployment notes. Historical. |
| `docs/server_2026_2_24.md` | Earlier architecture doc (predates k3s migration). |
## On the Hetzner nodes
### System
| Path | Purpose |
|---|---|
| `/etc/ssh/sshd_config` | SSH config — `PermitRootLogin no`, `PasswordAuthentication no`, `AllowUsers deploy`. |
| `/etc/sudoers.d/deploy` | `deploy ALL=(ALL) NOPASSWD: ALL`. |
| `/etc/ufw/` | UFW configuration. See Chapter 4 for rule inventory. |
| `/etc/sysctl.d/99-unprivileged-ports.conf` | `net.ipv4.ip_unprivileged_port_start=0` for Traefik. |
| `/home/deploy/.ssh/authorized_keys` | Our hetzner.pub. |
### K3s
| Path | Purpose |
|---|---|
| `/etc/rancher/k3s/k3s.yaml` | Kubeconfig (localhost-scoped; we copied to workstation). |
| `/etc/systemd/system/k3s.service` | systemd service file. |
| `/etc/systemd/system/k3s.service.env` | K3s install args (INSTALL_K3S_EXEC). |
| `/var/lib/rancher/k3s/` | K3s state root (etcd, containerd, PVC storage). |
| `/var/lib/rancher/k3s/server/node-token` | Token for joining additional nodes. |
| `/var/lib/rancher/k3s/storage/` | local-path PVC storage. Redis data lives here. |
| `/var/lib/rancher/k3s/agent/containerd/` | containerd state. |
| `/var/log/containers/` | Container log files. |
### Commands installed
| Path | Purpose |
|---|---|
| `/usr/local/bin/k3s` | The k3s binary. |
| `/usr/local/bin/kubectl` | Symlink to k3s (CLI for this cluster). |
| `/usr/local/bin/crictl` | containerd CLI. |
| `/usr/local/bin/k3s-killall.sh` | Emergency kill-all-k3s script. |
| `/usr/local/bin/k3s-uninstall.sh` | Clean uninstall script. |
### Docker (legacy; disabled)
| Path | Purpose |
|---|---|
| `/etc/systemd/system/docker.service` | systemd unit (stopped + disabled). |
| `/var/lib/docker/` | Docker state (unused on current cluster). |
## On Cloudflare
Not a filesystem, but worth noting the dashboard hierarchy:
```
Websites → myhoneydue.com
├── DNS → Records (A records for api, admin, @)
├── SSL/TLS → Overview (SSL mode: Flexible)
├── SSL/TLS → Edge Certificates (Always Use HTTPS: On)
├── SSL/TLS → Origin Server (would live the Origin CA cert if we enabled it)
├── Rules → Overview (where Origin Rules live if we had them)
├── Rules → Page Rules (none)
├── Security → WAF (managed rules only)
├── Speed → Optimization (default)
└── Analytics & Logs (read-only stats)
```
## On Gitea (`gitea.treytartt.com`)
The image registry lives at:
```
gitea.treytartt.com/admin/-/packages # UI listing of all packages
gitea.treytartt.com/admin/-/packages/container/honeydue-api # API image
gitea.treytartt.com/admin/-/packages/container/honeydue-worker # Worker image
gitea.treytartt.com/admin/-/packages/container/honeydue-admin # Admin image
```
Per-version tags visible in the UI with `docker pull` commands.
PATs at `gitea.treytartt.com/-/user/settings/applications`.
## On Neon
```
console.neon.tech → project → Branches (production branch default)
console.neon.tech → project → Monitoring (CU-hour usage, slow queries)
console.neon.tech → project → Operations (history of schema changes)
```
Connection strings at `console.neon.tech → project → Connection Details`.
## On Backblaze B2
```
secure.backblaze.com/b2_buckets.htm # Buckets list
secure.backblaze.com/b2_app_keys.htm # App keys
```
`honeyDueProd` bucket → Files tab for browsing contents.
+202
View File
@@ -0,0 +1,202 @@
# Appendix D — References & Citations
Every external link cited anywhere in this book, grouped by topic.
## Docker / Moby
- [moby/moby#52265 — Overlay ARP stale entries on 29.3.0 regression][moby-52265] (Chapter 19, primary root-cause citation)
- [moby/moby#51491 — DNS broken after `docker swarm init` on 29.0.0][moby-51491]
- [Dokploy#3480 — Traefik routes intermittently timeout due to stale VIP][dokploy-3480]
- [Mirantis: Commits to Long-Term Support for Swarm Through 2030][mirantis-swarm]
- [Better Stack: Hetzner Cloud Review 2026][bstack-swarm]
- [VirtualizationHowTo: Is Docker Swarm Still Safe in 2026?][vht-swarm]
- [bleevht: Where Docker Swarm Still Fits in 2026][bleevht-swarm]
- [Docker buildx multi-platform builds][buildx]
- [Compose specification][compose-spec]
## Kubernetes / k3s
- [K3s documentation home][k3s-docs]
- [K3s architecture][k3s-arch]
- [K3s requirements (networking ports)][k3s-reqs]
- [K3s advanced config — metrics server][k3s-metrics]
- [K3s HA datastore recovery][k3s-ha-recovery]
- [K3s storage — local-path provisioner][k3s-lp]
- [K3s Helm integration — HelmChartConfig][k3s-helm]
- [K3s Traefik customization][k3s-traefik]
- [K3s secrets encryption][k3s-secrets]
- [Kubernetes concepts — Services & Networking][k8s-net]
- [Kubernetes Ingress][k8s-ingress]
- [Kubernetes Deployments — rolling updates][rolling]
- [kubectl rollout][rollout]
- [kubectl cheat sheet][kubectl-cs]
- [Pod lifecycle + probes][probes]
- [Pod Security Standards][psa]
- [Kubernetes RBAC][rbac]
- [NetworkPolicy][netpol]
- [Ports and Protocols reference][k8s-ports]
- [metrics-server][ms]
## Traefik
- [Traefik v3 documentation][traefik]
- [Traefik Swarm provider][traefik-swarm]
- [Traefik migrate v2 → v3][traefik-v3]
## Cloudflare
- [IP ranges][cf-ips]
- [SSL modes explained][cf-ssl]
- [Origin CA certificates][cf-origin-ca]
- [DNS best practices][cf-dns]
- [Free plan][cf-free]
## Hetzner
- [Hetzner Cloud][hetzner-cloud]
- [Hetzner price adjustment 2026-04-01][hetzner-prices]
- [Hetzner rescue system][hetzner-rescue]
- [hetzner-k3s tool][hetzner-k3s]
## Neon / Postgres
- [Neon docs][neon-docs]
- [Neon pricing][neon-pricing]
- [Neon usage-based pricing announcement][neon-blog]
- [Neon connect from any app][neon-connect]
- [Postgres advisory locks][pg-locks]
- [GORM AutoMigrate][gorm-automigrate]
## Backblaze B2
- [B2 documentation][b2-docs]
- [B2 S3-compatible API][b2-s3]
- [B2 pricing][b2-pricing]
- [minio-go SDK][minio-go]
- [S3 path-style vs virtual-hosted addressing][s3-style]
## Gitea
- [Gitea container registry docs][gitea-cr]
## CNI / Networking
- [Flannel VXLAN backend][flannel-vxlan]
- [CoreDNS Kubernetes plugin][coredns-k8s]
- [IPVS mode for kube-proxy deep dive][ipvs]
- [VXLAN RFC 7348][vxlan-rfc]
- [Kubernetes NetworkPolicy][netpol]
## Security tools
- [cosign (image signing)][cosign]
- [Loki (logs)][loki]
- [Stern (multi-pod log tailing)][stern]
- [fail2ban][fail2ban]
## Asynq
- [Asynq documentation][asynq]
- [Asynq periodic tasks (scheduler limitations)][asynq-sched]
## Miscellaneous
- [Let's Encrypt][le]
- [UFW man page][ufw-man]
- [SSH hardening guide][ssh-guide]
- [pg_dump][pg-dump]
---
## Link definitions
<!-- Docker / Moby -->
[moby-52265]: https://github.com/moby/moby/issues/52265
[moby-51491]: https://github.com/moby/moby/issues/51491
[dokploy-3480]: https://github.com/Dokploy/dokploy/issues/3480
[mirantis-swarm]: https://www.mirantis.com/blog/mirantis-guarantees-long-term-support-for-swarm/
[bstack-swarm]: https://betterstack.com/community/guides/web-servers/hetzner-cloud-review/
[vht-swarm]: https://www.virtualizationhowto.com/2026/03/is-docker-swarm-still-safe-in-2026/
[bleevht-swarm]: https://bleevht.substack.com/p/where-docker-swarm-still-fits-in
[buildx]: https://docs.docker.com/build/buildx/
[compose-spec]: https://docs.docker.com/reference/compose-file/
<!-- Kubernetes / k3s -->
[k3s-docs]: https://docs.k3s.io/
[k3s-arch]: https://docs.k3s.io/architecture
[k3s-reqs]: https://docs.k3s.io/installation/requirements#networking
[k3s-metrics]: https://docs.k3s.io/advanced#enabling-metrics-server
[k3s-ha-recovery]: https://docs.k3s.io/datastore/ha-embedded#new-cluster-with-embedded-db
[k3s-lp]: https://docs.k3s.io/storage#setting-up-the-local-storage-provider
[k3s-helm]: https://docs.k3s.io/helm#customizing-packaged-components-with-helmchartconfig
[k3s-traefik]: https://docs.k3s.io/networking/networking-services#traefik-ingress-controller
[k3s-secrets]: https://docs.k3s.io/security/secrets-encryption
[k8s-net]: https://kubernetes.io/docs/concepts/services-networking/
[k8s-ingress]: https://kubernetes.io/docs/concepts/services-networking/ingress/
[rolling]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment
[rollout]: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#rollout
[kubectl-cs]: https://kubernetes.io/docs/reference/kubectl/cheatsheet/
[probes]: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-lifecycle
[psa]: https://kubernetes.io/docs/concepts/security/pod-security-standards/
[rbac]: https://kubernetes.io/docs/reference/access-authn-authz/rbac/
[netpol]: https://kubernetes.io/docs/concepts/services-networking/network-policies/
[k8s-ports]: https://kubernetes.io/docs/reference/networking/ports-and-protocols/
[ms]: https://github.com/kubernetes-sigs/metrics-server
<!-- Traefik -->
[traefik]: https://doc.traefik.io/traefik/v3.6/
[traefik-swarm]: https://doc.traefik.io/traefik/providers/swarm/
[traefik-v3]: https://doc.traefik.io/traefik/migrate/v2-to-v3-details/
<!-- Cloudflare -->
[cf-ips]: https://www.cloudflare.com/ips/
[cf-ssl]: https://developers.cloudflare.com/ssl/origin-configuration/ssl-modes/
[cf-origin-ca]: https://developers.cloudflare.com/ssl/origin-configuration/origin-ca/
[cf-dns]: https://developers.cloudflare.com/dns/
[cf-free]: https://www.cloudflare.com/plans/free/
<!-- Hetzner -->
[hetzner-cloud]: https://www.hetzner.com/cloud/
[hetzner-prices]: https://docs.hetzner.com/general/infrastructure-and-availability/price-adjustment/
[hetzner-rescue]: https://docs.hetzner.com/cloud/servers/getting-started/enabling-rescue-system/
[hetzner-k3s]: https://github.com/vitobotta/hetzner-k3s
<!-- Neon / Postgres -->
[neon-docs]: https://neon.com/docs/introduction
[neon-pricing]: https://neon.com/pricing
[neon-blog]: https://neon.com/blog/new-usage-based-pricing
[neon-connect]: https://neon.com/docs/connect/connect-from-any-app
[pg-locks]: https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS
[gorm-automigrate]: https://gorm.io/docs/migration.html
<!-- B2 -->
[b2-docs]: https://www.backblaze.com/docs/
[b2-s3]: https://www.backblaze.com/docs/cloud-storage-s3-compatible-api
[b2-pricing]: https://www.backblaze.com/cloud-storage/pricing
[minio-go]: https://github.com/minio/minio-go
[s3-style]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html
<!-- Gitea -->
[gitea-cr]: https://docs.gitea.com/usage/packages/container
<!-- CNI -->
[flannel-vxlan]: https://github.com/flannel-io/flannel/blob/master/Documentation/backends.md#vxlan
[coredns-k8s]: https://coredns.io/plugins/kubernetes/
[ipvs]: https://kubernetes.io/blog/2018/07/09/ipvs-based-in-cluster-load-balancing-deep-dive/
[vxlan-rfc]: https://datatracker.ietf.org/doc/html/rfc7348
<!-- Security tools -->
[cosign]: https://github.com/sigstore/cosign
[loki]: https://grafana.com/oss/loki/
[stern]: https://github.com/stern/stern
[fail2ban]: https://www.fail2ban.org/
<!-- Asynq -->
[asynq]: https://github.com/hibiken/asynq
[asynq-sched]: https://github.com/hibiken/asynq/wiki/Periodic-Tasks
<!-- Misc -->
[le]: https://letsencrypt.org/
[ufw-man]: https://manpages.ubuntu.com/manpages/noble/en/man8/ufw.8.html
[ssh-guide]: https://linux-audit.com/audit-and-harden-your-ssh-configuration/
[pg-dump]: https://www.postgresql.org/docs/current/app-pgdump.html
+17 -6
View File
@@ -62,13 +62,24 @@ func SetupRouter(deps *Dependencies) *echo.Echo {
e.Use(custommiddleware.StructuredLogger())
// Security headers (X-Frame-Options, X-Content-Type-Options, X-XSS-Protection, etc.)
//
// CSP is permissive enough to serve the marketing landing page at / (which
// loads same-origin CSS/JS/images and Google Fonts over https). JSON API
// responses are unaffected — they don't load any assets, so any CSP is fine.
// frame-ancestors stays 'none' to block clickjacking.
e.Use(middleware.SecureWithConfig(middleware.SecureConfig{
XSSProtection: "1; mode=block",
ContentTypeNosniff: "nosniff",
XFrameOptions: "SAMEORIGIN",
HSTSMaxAge: 31536000, // 1 year in seconds
ReferrerPolicy: "strict-origin-when-cross-origin",
ContentSecurityPolicy: "default-src 'none'; frame-ancestors 'none'",
XSSProtection: "1; mode=block",
ContentTypeNosniff: "nosniff",
XFrameOptions: "SAMEORIGIN",
HSTSMaxAge: 31536000,
ReferrerPolicy: "strict-origin-when-cross-origin",
ContentSecurityPolicy: "default-src 'self'; " +
"style-src 'self' https://fonts.googleapis.com; " +
"font-src 'self' https://fonts.gstatic.com data:; " +
"img-src 'self' data:; " +
"script-src 'self'; " +
"connect-src 'self'; " +
"frame-ancestors 'none'",
}))
e.Use(middleware.BodyLimitWithConfig(middleware.BodyLimitConfig{
Limit: "1M", // 1MB default for JSON payloads
+6 -2
View File
@@ -50,8 +50,12 @@ func NewCacheService(cfg *config.RedisConfig) (*CacheService, error) {
if err := client.Ping(ctx).Err(); err != nil {
initErr = fmt.Errorf("failed to connect to Redis: %w", err)
// Reset Once so a retry is possible after transient failures
cacheOnce = sync.Once{}
// NOTE: Don't reassign `cacheOnce = sync.Once{}` here. Mutating the
// Once from within its own Do() callback fatals with "unlock of
// unlocked mutex" because Do is holding the inner lock while we
// zero it. main.go handles the error (caching disabled, keep running);
// a pod restart is the right "retry" path for a transient Redis
// outage, not in-process.
return
}