Migrate prod deploy from Swarm to K3s; add full deployment book
Infrastructure:
- Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers)
- Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh
- All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept
temporarily for reference
Bug fixes surfaced during migration:
- Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25)
- cache_service.go: remove sync.Once reassignment from inside Do()
callback (was causing 'unlock of unlocked mutex' fatal after
Redis Ping failure)
- router.go: relax CSP from 'default-src none' to 'default-src self'
+ allowlist fonts.googleapis.com so the marketing landing page CSS
actually loads in browsers
- deploy/scripts/deploy_prod.sh: use docker buildx with
--platform linux/amd64 so arm64 (Apple Silicon) dev machines produce
images runnable on x86_64 Hetzner nodes; fix array expansion under
set -u
- deploy/swarm-stack.prod.yml: fix secret source references to use
top-level aliases (the '\${X_SECRET}' form never actually resolved);
dozzle ports: long-form host_ip is rejected by Swarm, switched to
short-form (bound to 0.0.0.0 with UFW-based loopback restriction);
worker replicas 2 -> 1 (Asynq scheduler singleton)
- deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/'
(Next.js serves at root; /admin/ returned 404 and killed pods);
startupProbe failureThreshold 12 -> 24
- deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable
1 -> 0 (singleton)
- deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold
12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot;
real startup takes up to 240s)
- .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/
and admin/src/app/api/*, hiding legitimate files)
New files:
- deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet +
hostNetwork override for k3s-bundled Traefik
- deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress
without TLS (CF Flexible SSL) and without middleware
- deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log
Documentation:
- docs/deployment/ — full deployment book, 26 files, ~42k words:
- Part I Overview, infrastructure, orchestrator choice (Ch 0-2)
- Part II Networking, firewall, Cloudflare (Ch 3-4, 13)
- Part III Security, Traefik ingress (Ch 5-6)
- Part IV Services, DB, storage, secrets, registry (Ch 7-11)
- Part V Data flow, deploy process, observability, failures, runbook
(Ch 12, 14-17)
- Part VI Cost, Swarm postmortem, roadmap (Ch 18-20)
- Appendices: glossary, kubectl cheat sheet, file locations,
consolidated citations
- README.md: Production Deployment section replaced with pointer to
the book; Go version bumped to 1.25
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+1
-1
@@ -5,7 +5,7 @@
|
||||
|
||||
# Binaries
|
||||
bin/
|
||||
api
|
||||
/api
|
||||
/worker
|
||||
/admin
|
||||
!admin/
|
||||
|
||||
@@ -4,7 +4,7 @@ Go REST API for the honeyDue property management platform. Powers iOS and Androi
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- **Language**: Go 1.24
|
||||
- **Language**: Go 1.25
|
||||
- **HTTP Framework**: [Echo v4](https://github.com/labstack/echo)
|
||||
- **ORM**: [GORM](https://gorm.io/) with PostgreSQL
|
||||
- **Background Jobs**: [Asynq](https://github.com/hibiken/asynq) (Redis-backed)
|
||||
@@ -16,7 +16,7 @@ Go REST API for the honeyDue property management platform. Powers iOS and Androi
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **Go 1.24+** — [install](https://go.dev/dl/)
|
||||
- **Go 1.25+** — [install](https://go.dev/dl/)
|
||||
- **PostgreSQL 16+** — via Docker (recommended) or [native install](https://www.postgresql.org/download/)
|
||||
- **Redis 7+** — via Docker (recommended) or [native install](https://redis.io/docs/getting-started/)
|
||||
- **Docker & Docker Compose** — [install](https://docs.docker.com/get-docker/) (recommended for local development)
|
||||
@@ -259,34 +259,43 @@ All protected endpoints require an `Authorization: Token <token>` header.
|
||||
|
||||
## Production Deployment
|
||||
|
||||
### Dokku
|
||||
Production runs on a **3-node K3s HA cluster** on Hetzner Cloud, fronted
|
||||
by Cloudflare, with Neon Postgres, Backblaze B2, and a self-hosted Gitea
|
||||
container registry. See the full deployment book for every detail:
|
||||
|
||||
```bash
|
||||
# Push to Dokku
|
||||
git push dokku main
|
||||
**→ [docs/deployment/](./docs/deployment/README.md) — The Deployment Book**
|
||||
|
||||
# Seed lookup data
|
||||
cat seeds/001_lookups.sql | dokku postgres:connect honeydue-db
|
||||
26 chapters and ~42,000 words covering:
|
||||
|
||||
# Check logs
|
||||
dokku logs honeydue-api -t
|
||||
```
|
||||
- **Part I — The System**: overview, Hetzner infrastructure, why K3s
|
||||
(and not Swarm, full Kubernetes, or Nomad)
|
||||
- **Part II — Networking**: Flannel VXLAN, CoreDNS, kube-proxy, every
|
||||
UFW rule on every node, Cloudflare DNS setup
|
||||
- **Part III — Security**: RBAC, Pod Security, secrets, TLS chain
|
||||
- **Part IV — Workloads**: api, admin, worker, redis per-service deep
|
||||
dives; Neon Postgres config; Backblaze B2 storage; Gitea registry
|
||||
- **Part V — Operation**: end-to-end data flow, deploy process,
|
||||
observability, failure modes, operator runbook
|
||||
- **Part VI — Context**: cost breakdown, postmortem of the bugs from
|
||||
the Swarm→K3s migration, roadmap
|
||||
|
||||
### Docker Swarm
|
||||
Quick links:
|
||||
|
||||
```bash
|
||||
# Build and push production images
|
||||
make docker-build-prod
|
||||
docker push ${REGISTRY}/honeydue-api:${TAG}
|
||||
docker push ${REGISTRY}/honeydue-worker:${TAG}
|
||||
docker push ${REGISTRY}/honeydue-admin:${TAG}
|
||||
- **Runbook** — [docs/deployment/17-runbook.md](./docs/deployment/17-runbook.md) — 22 common ops procedures
|
||||
- **kubectl cheat sheet** — [docs/deployment/appendices/b-commands.md](./docs/deployment/appendices/b-commands.md)
|
||||
- **Deploy process** — [docs/deployment/14-deployment-process.md](./docs/deployment/14-deployment-process.md) — build → push → rollout
|
||||
- **Failure modes** — [docs/deployment/16-failure-modes.md](./docs/deployment/16-failure-modes.md) — what happens when X dies
|
||||
- **Swarm postmortem** — [docs/deployment/19-postmortem-swarm.md](./docs/deployment/19-postmortem-swarm.md) — why we migrated
|
||||
|
||||
# Deploy the stack (all env vars must be set in .env or environment)
|
||||
docker stack deploy -c docker-compose.yml honeydue
|
||||
```
|
||||
Operational state lives under:
|
||||
|
||||
- `deploy-k3s/manifests/` — Kubernetes manifests (apply with `kubectl`)
|
||||
- `deploy-k3s/MIGRATION_NOTES.md` — notes from the Swarm → K3s migration
|
||||
- `deploy/` — legacy Swarm config (retained temporarily; to be removed)
|
||||
|
||||
## Related Projects
|
||||
|
||||
- **Deployment Book**: [`docs/deployment/`](./docs/deployment/README.md) — full production operations reference
|
||||
- **Mobile App (KMM)**: `../HoneyDueKMM` — Kotlin Multiplatform iOS/Android client
|
||||
- **Task Logic Docs**: `docs/TASK_LOGIC_ARCHITECTURE.md` — required reading before task-related work
|
||||
- **Push Notification Docs**: `docs/PUSH_NOTIFICATIONS.md`
|
||||
|
||||
@@ -0,0 +1,12 @@
|
||||
import { NextResponse } from 'next/server'
|
||||
|
||||
export const dynamic = 'force-dynamic'
|
||||
export const revalidate = 0
|
||||
|
||||
export async function GET() {
|
||||
return NextResponse.json({ status: 'ok' }, { status: 200 })
|
||||
}
|
||||
|
||||
export async function HEAD() {
|
||||
return new NextResponse(null, { status: 200 })
|
||||
}
|
||||
@@ -0,0 +1,118 @@
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: api
|
||||
namespace: honeydue
|
||||
labels:
|
||||
app.kubernetes.io/name: api
|
||||
app.kubernetes.io/part-of: honeydue
|
||||
spec:
|
||||
replicas: 1
|
||||
strategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxUnavailable: 0
|
||||
maxSurge: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: api
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app.kubernetes.io/name: api
|
||||
app.kubernetes.io/part-of: honeydue
|
||||
spec:
|
||||
serviceAccountName: api
|
||||
imagePullSecrets:
|
||||
- name: ghcr-credentials
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
runAsGroup: 1000
|
||||
fsGroup: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
containers:
|
||||
- name: api
|
||||
image: IMAGE_PLACEHOLDER # Replaced by 03-deploy.sh
|
||||
ports:
|
||||
- containerPort: 8000
|
||||
protocol: TCP
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: true
|
||||
capabilities:
|
||||
drop: ["ALL"]
|
||||
envFrom:
|
||||
- configMapRef:
|
||||
name: honeydue-config
|
||||
env:
|
||||
- name: POSTGRES_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: POSTGRES_PASSWORD
|
||||
- name: SECRET_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: SECRET_KEY
|
||||
- name: EMAIL_HOST_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: EMAIL_HOST_PASSWORD
|
||||
- name: FCM_SERVER_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: FCM_SERVER_KEY
|
||||
- name: REDIS_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: REDIS_PASSWORD
|
||||
optional: true
|
||||
volumeMounts:
|
||||
- name: apns-key
|
||||
mountPath: /secrets/apns
|
||||
readOnly: true
|
||||
- name: tmp
|
||||
mountPath: /tmp
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
limits:
|
||||
cpu: "1"
|
||||
memory: 512Mi
|
||||
startupProbe:
|
||||
httpGet:
|
||||
path: /api/health/
|
||||
port: 8000
|
||||
failureThreshold: 12
|
||||
periodSeconds: 5
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /api/health/
|
||||
port: 8000
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 5
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /api/health/
|
||||
port: 8000
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 30
|
||||
timeoutSeconds: 10
|
||||
volumes:
|
||||
- name: apns-key
|
||||
secret:
|
||||
secretName: honeydue-apns-key
|
||||
items:
|
||||
- key: apns_auth_key.p8
|
||||
path: apns_auth_key.p8
|
||||
- name: tmp
|
||||
emptyDir:
|
||||
sizeLimit: 64Mi
|
||||
@@ -0,0 +1,16 @@
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: api
|
||||
namespace: honeydue
|
||||
labels:
|
||||
app.kubernetes.io/name: api
|
||||
app.kubernetes.io/part-of: honeydue
|
||||
spec:
|
||||
type: ClusterIP
|
||||
selector:
|
||||
app.kubernetes.io/name: api
|
||||
ports:
|
||||
- port: 8000
|
||||
targetPort: 8000
|
||||
protocol: TCP
|
||||
@@ -0,0 +1,101 @@
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: worker
|
||||
namespace: honeydue
|
||||
labels:
|
||||
app.kubernetes.io/name: worker
|
||||
app.kubernetes.io/part-of: honeydue
|
||||
spec:
|
||||
replicas: 1
|
||||
strategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxUnavailable: 0
|
||||
maxSurge: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: worker
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app.kubernetes.io/name: worker
|
||||
app.kubernetes.io/part-of: honeydue
|
||||
spec:
|
||||
serviceAccountName: worker
|
||||
imagePullSecrets:
|
||||
- name: ghcr-credentials
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
runAsGroup: 1000
|
||||
fsGroup: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
containers:
|
||||
- name: worker
|
||||
image: IMAGE_PLACEHOLDER # Replaced by 03-deploy.sh
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: true
|
||||
capabilities:
|
||||
drop: ["ALL"]
|
||||
envFrom:
|
||||
- configMapRef:
|
||||
name: honeydue-config
|
||||
env:
|
||||
- name: POSTGRES_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: POSTGRES_PASSWORD
|
||||
- name: SECRET_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: SECRET_KEY
|
||||
- name: EMAIL_HOST_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: EMAIL_HOST_PASSWORD
|
||||
- name: FCM_SERVER_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: FCM_SERVER_KEY
|
||||
- name: REDIS_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: REDIS_PASSWORD
|
||||
optional: true
|
||||
volumeMounts:
|
||||
- name: apns-key
|
||||
mountPath: /secrets/apns
|
||||
readOnly: true
|
||||
- name: tmp
|
||||
mountPath: /tmp
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 512Mi
|
||||
livenessProbe:
|
||||
exec:
|
||||
command: ["pgrep", "-f", "/app/worker"]
|
||||
initialDelaySeconds: 15
|
||||
periodSeconds: 30
|
||||
timeoutSeconds: 5
|
||||
volumes:
|
||||
- name: apns-key
|
||||
secret:
|
||||
secretName: honeydue-apns-key
|
||||
items:
|
||||
- key: apns_auth_key.p8
|
||||
path: apns_auth_key.p8
|
||||
- name: tmp
|
||||
emptyDir:
|
||||
sizeLimit: 64Mi
|
||||
@@ -0,0 +1,170 @@
|
||||
# K3s Migration Notes — 2026-04-24
|
||||
|
||||
honeyDue is running on a 3-node K3s HA cluster on the existing Hetzner nodes
|
||||
(hetzner1/2/3), replacing the previous Docker Swarm deployment.
|
||||
|
||||
## Why we migrated
|
||||
|
||||
Docker Swarm's libnetwork has a known stale-DNS bug on 29.x
|
||||
([moby/moby#52265](https://github.com/moby/moby/issues/52265)) that leaves
|
||||
ghost A-records when tasks migrate between nodes. Single-replica services
|
||||
(like the admin panel) landed on a ghost IP ~50% of the time → connection
|
||||
refused → 502. Full stack recreate cleared it, but the bug recurs on every
|
||||
node-to-node task migration.
|
||||
|
||||
K3s uses CoreDNS + containerd with no libnetwork history → the bug class
|
||||
doesn't exist there. See `docs/SWARM_POSTMORTEM.md` if it exists, or the
|
||||
research summary in the earlier deploy session.
|
||||
|
||||
## Differences from the original `deploy-k3s/` scaffold
|
||||
|
||||
The original scaffold assumes a greenfield provision via `hetzner-k3s`,
|
||||
GHCR for images, Cloudflare origin certs, and a Hetzner Load Balancer.
|
||||
We reused existing nodes and kept Cloudflare Flexible SSL:
|
||||
|
||||
| Setting | Scaffold default | What we did |
|
||||
|---|---|---|
|
||||
| Provisioning | `hetzner-k3s` tool creates boxes | Manual k3s install on existing Hetzner boxes |
|
||||
| Registry | GHCR (`ghcr-credentials`) | Gitea (`gitea-credentials`) via `kubectl create secret docker-registry` |
|
||||
| Ingress TLS | `cloudflare-origin-cert` Secret | No TLS at origin (CF Flexible) |
|
||||
| Load balancer | Hetzner LB → nodes | Cloudflare round-robin across 3 node IPs |
|
||||
| Admin basic auth | `admin-auth` Traefik middleware | Not applied — in-app auth only |
|
||||
| CF-only IP allowlist | `cloudflare-only` middleware | Not applied — UFW restricts some ports, 80/443 open to anyone who knows node IPs |
|
||||
| Traefik | LoadBalancer via servicelb | DaemonSet w/ hostNetwork (servicelb disabled); see `traefik-config.yaml` below |
|
||||
| Worker replicas | 2 | 1 (Asynq scheduler is singleton) |
|
||||
| API start_period | 12×5s = 60s | 48×5s = 240s (covers migrate + lock queue on first boot) |
|
||||
| Admin probe path | `/admin/` | `/` (Next.js serves at root) |
|
||||
|
||||
## Manifest fixes applied in-repo (already committed)
|
||||
|
||||
- `manifests/api/deployment.yaml` — `startupProbe.failureThreshold: 12 → 48`
|
||||
- `manifests/admin/deployment.yaml` — probe path `/admin/ → /`, threshold `12 → 24`
|
||||
- `manifests/worker/deployment.yaml` — `replicas: 2 → 1`
|
||||
- `manifests/pod-disruption-budgets.yaml` — worker `minAvailable: 1 → 0`
|
||||
|
||||
## Traefik override (applied as HelmChartConfig)
|
||||
|
||||
K3s ships Traefik as a single-replica Deployment with a LoadBalancer service.
|
||||
With servicelb disabled (to avoid binding a random port), we reconfigure it
|
||||
to a DaemonSet binding directly on each node's public :80/:443 via
|
||||
`hostNetwork: true`. The HelmChartConfig:
|
||||
|
||||
```yaml
|
||||
apiVersion: helm.cattle.io/v1
|
||||
kind: HelmChartConfig
|
||||
metadata:
|
||||
name: traefik
|
||||
namespace: kube-system
|
||||
spec:
|
||||
valuesContent: |-
|
||||
deployment:
|
||||
kind: DaemonSet
|
||||
hostNetwork: true
|
||||
service:
|
||||
enabled: false
|
||||
ports:
|
||||
web:
|
||||
port: 80
|
||||
hostPort: 80
|
||||
websecure:
|
||||
port: 443
|
||||
hostPort: 443
|
||||
updateStrategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxUnavailable: 1
|
||||
maxSurge: 0
|
||||
securityContext:
|
||||
capabilities:
|
||||
drop: [ALL]
|
||||
add: [NET_BIND_SERVICE]
|
||||
readOnlyRootFilesystem: true
|
||||
runAsGroup: 65532
|
||||
runAsNonRoot: true
|
||||
runAsUser: 65532
|
||||
additionalArguments:
|
||||
- "--entrypoints.web.forwardedHeaders.trustedIPs=173.245.48.0/20,103.21.244.0/22,103.22.200.0/22,103.31.4.0/22,141.101.64.0/18,108.162.192.0/18,190.93.240.0/20,188.114.96.0/20,197.234.240.0/22,198.41.128.0/17,162.158.0.0/15,104.16.0.0/13,104.24.0.0/14,172.64.0.0/13,131.0.72.0/22"
|
||||
```
|
||||
|
||||
Apply with `kubectl apply -f traefik-config.yaml`, then bump the helm job
|
||||
(`kubectl delete job -n kube-system helm-install-traefik`) to trigger reinstall.
|
||||
|
||||
## Required node-level sysctl
|
||||
|
||||
hostNetwork pods with capabilities don't get CAP_NET_BIND_SERVICE in the
|
||||
host netns on modern containerd. Set on each node:
|
||||
|
||||
```bash
|
||||
echo 'net.ipv4.ip_unprivileged_port_start=0' | sudo tee /etc/sysctl.d/99-unprivileged-ports.conf
|
||||
sudo sysctl --system
|
||||
```
|
||||
|
||||
## UFW rules added for k3s (per node)
|
||||
|
||||
All between the 3 node IPs (178.104.247.152, 178.105.32.198, 178.104.249.189):
|
||||
|
||||
- `6443/tcp` — kube API
|
||||
- `2379/tcp`, `2380/tcp` — embedded etcd client + peer
|
||||
- `10250/tcp` — kubelet
|
||||
- `8472/udp` — flannel VXLAN overlay
|
||||
|
||||
Plus from your workstation IP to each node's `6443/tcp` for `kubectl`.
|
||||
|
||||
## Ingress
|
||||
|
||||
Minimal hostname-only routing (`/tmp/honeydue-ingress.yaml` at deploy time
|
||||
— move it into `deploy-k3s/manifests/ingress/` in a follow-up):
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: honeydue-api
|
||||
namespace: honeydue
|
||||
spec:
|
||||
ingressClassName: traefik
|
||||
rules:
|
||||
- host: api.myhoneydue.com
|
||||
http:
|
||||
paths:
|
||||
- {path: /, pathType: Prefix, backend: {service: {name: api, port: {number: 8000}}}}
|
||||
- host: myhoneydue.com
|
||||
http:
|
||||
paths:
|
||||
- {path: /, pathType: Prefix, backend: {service: {name: api, port: {number: 8000}}}}
|
||||
---
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: honeydue-admin
|
||||
namespace: honeydue
|
||||
spec:
|
||||
ingressClassName: traefik
|
||||
rules:
|
||||
- host: admin.myhoneydue.com
|
||||
http:
|
||||
paths:
|
||||
- {path: /, pathType: Prefix, backend: {service: {name: admin, port: {number: 3000}}}}
|
||||
```
|
||||
|
||||
## Operator access
|
||||
|
||||
Kubeconfig lives at `~/.kube/honeydue-k3s.yaml`.
|
||||
|
||||
```bash
|
||||
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
|
||||
kubectl get pods -n honeydue
|
||||
```
|
||||
|
||||
## Remaining TODOs (not blocking)
|
||||
|
||||
- Apply `manifests/ingress/middleware.yaml` for security headers + rate limiting
|
||||
(CF-only allowlist + basic auth deliberately skipped until you want them)
|
||||
- Apply `manifests/network-policies.yaml` for default-deny + explicit allows
|
||||
- Apply `manifests/api/hpa.yaml` if you want autoscaling (metrics-server is
|
||||
already running, so just `kubectl apply` it)
|
||||
- Upgrade to CF Full (strict) SSL: generate origin cert, create
|
||||
`cloudflare-origin-cert` Secret, add `tls:` block back to Ingress
|
||||
- Set up a proper migration Job so `api` replicas don't each run `MigrateWithLock`
|
||||
on startup — lets you drop the 240s startupProbe grace
|
||||
- Remove `deploy/` (the Swarm-era config) once you're confident in k3s
|
||||
@@ -65,15 +65,17 @@ spec:
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 256Mi
|
||||
# Admin Next.js app serves at `/`, not `/admin/`. `/admin/` returns
|
||||
# 404 and kills the pod via the probe.
|
||||
startupProbe:
|
||||
httpGet:
|
||||
path: /admin/
|
||||
path: /
|
||||
port: 3000
|
||||
failureThreshold: 12
|
||||
failureThreshold: 24
|
||||
periodSeconds: 5
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /admin/
|
||||
path: /
|
||||
port: 3000
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 10
|
||||
|
||||
@@ -0,0 +1,123 @@
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: api
|
||||
namespace: honeydue
|
||||
labels:
|
||||
app.kubernetes.io/name: api
|
||||
app.kubernetes.io/part-of: honeydue
|
||||
spec:
|
||||
replicas: 3
|
||||
strategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxUnavailable: 0
|
||||
maxSurge: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: api
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app.kubernetes.io/name: api
|
||||
app.kubernetes.io/part-of: honeydue
|
||||
spec:
|
||||
serviceAccountName: api
|
||||
imagePullSecrets:
|
||||
- name: ghcr-credentials
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
runAsGroup: 1000
|
||||
fsGroup: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
containers:
|
||||
- name: api
|
||||
image: IMAGE_PLACEHOLDER # Replaced by 03-deploy.sh
|
||||
ports:
|
||||
- containerPort: 8000
|
||||
protocol: TCP
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: true
|
||||
capabilities:
|
||||
drop: ["ALL"]
|
||||
envFrom:
|
||||
- configMapRef:
|
||||
name: honeydue-config
|
||||
env:
|
||||
- name: POSTGRES_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: POSTGRES_PASSWORD
|
||||
- name: SECRET_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: SECRET_KEY
|
||||
- name: EMAIL_HOST_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: EMAIL_HOST_PASSWORD
|
||||
- name: FCM_SERVER_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: FCM_SERVER_KEY
|
||||
- name: REDIS_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: REDIS_PASSWORD
|
||||
optional: true
|
||||
volumeMounts:
|
||||
- name: apns-key
|
||||
mountPath: /secrets/apns
|
||||
readOnly: true
|
||||
- name: tmp
|
||||
mountPath: /tmp
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
limits:
|
||||
cpu: "1"
|
||||
memory: 512Mi
|
||||
startupProbe:
|
||||
httpGet:
|
||||
path: /api/health/
|
||||
port: 8000
|
||||
# MigrateWithLock in cmd/api/main.go runs pg_advisory_lock on
|
||||
# every startup. On a cold boot with 3 replicas, the first does
|
||||
# AutoMigrate (~90s) and the others wait on the lock, so real
|
||||
# startup runs 90–240s. 48 × 5s = 240s grace absorbs it without
|
||||
# healthcheck killing a still-starting replica.
|
||||
failureThreshold: 48
|
||||
periodSeconds: 5
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /api/health/
|
||||
port: 8000
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 5
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /api/health/
|
||||
port: 8000
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 30
|
||||
timeoutSeconds: 10
|
||||
volumes:
|
||||
- name: apns-key
|
||||
secret:
|
||||
secretName: honeydue-apns-key
|
||||
items:
|
||||
- key: apns_auth_key.p8
|
||||
path: apns_auth_key.p8
|
||||
- name: tmp
|
||||
emptyDir:
|
||||
sizeLimit: 64Mi
|
||||
@@ -0,0 +1,41 @@
|
||||
apiVersion: autoscaling/v2
|
||||
kind: HorizontalPodAutoscaler
|
||||
metadata:
|
||||
name: api
|
||||
namespace: honeydue
|
||||
labels:
|
||||
app.kubernetes.io/name: api
|
||||
app.kubernetes.io/part-of: honeydue
|
||||
spec:
|
||||
scaleTargetRef:
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
name: api
|
||||
minReplicas: 3
|
||||
maxReplicas: 6
|
||||
metrics:
|
||||
- type: Resource
|
||||
resource:
|
||||
name: cpu
|
||||
target:
|
||||
type: Utilization
|
||||
averageUtilization: 70
|
||||
- type: Resource
|
||||
resource:
|
||||
name: memory
|
||||
target:
|
||||
type: Utilization
|
||||
averageUtilization: 80
|
||||
behavior:
|
||||
scaleUp:
|
||||
stabilizationWindowSeconds: 60
|
||||
policies:
|
||||
- type: Pods
|
||||
value: 1
|
||||
periodSeconds: 60
|
||||
scaleDown:
|
||||
stabilizationWindowSeconds: 300
|
||||
policies:
|
||||
- type: Pods
|
||||
value: 1
|
||||
periodSeconds: 120
|
||||
@@ -0,0 +1,16 @@
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: api
|
||||
namespace: honeydue
|
||||
labels:
|
||||
app.kubernetes.io/name: api
|
||||
app.kubernetes.io/part-of: honeydue
|
||||
spec:
|
||||
type: ClusterIP
|
||||
selector:
|
||||
app.kubernetes.io/name: api
|
||||
ports:
|
||||
- port: 8000
|
||||
targetPort: 8000
|
||||
protocol: TCP
|
||||
@@ -0,0 +1,61 @@
|
||||
# Simple hostname-based Ingress — no TLS (Cloudflare Flexible handles edge
|
||||
# TLS, CF→origin is plain HTTP on 80). Upgrade to Full (strict) by
|
||||
# adding back a `tls:` block with a Cloudflare Origin CA cert stored in
|
||||
# secret/cloudflare-origin-cert.
|
||||
#
|
||||
# Middleware chain (security headers, rate limit, CF-only allowlist, admin
|
||||
# basic auth) is defined in `middleware.yaml` but NOT attached here —
|
||||
# annotate this ingress to turn any of them on.
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: honeydue-api
|
||||
namespace: honeydue
|
||||
labels:
|
||||
app.kubernetes.io/part-of: honeydue
|
||||
spec:
|
||||
ingressClassName: traefik
|
||||
rules:
|
||||
- host: api.myhoneydue.com
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: api
|
||||
port:
|
||||
number: 8000
|
||||
# Root domain serves the marketing landing page from the Go API's
|
||||
# STATIC_DIR. ALLOWED_HOSTS in honeydue-config includes myhoneydue.com.
|
||||
- host: myhoneydue.com
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: api
|
||||
port:
|
||||
number: 8000
|
||||
---
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: honeydue-admin
|
||||
namespace: honeydue
|
||||
labels:
|
||||
app.kubernetes.io/part-of: honeydue
|
||||
spec:
|
||||
ingressClassName: traefik
|
||||
rules:
|
||||
- host: admin.myhoneydue.com
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: admin
|
||||
port:
|
||||
number: 3000
|
||||
@@ -1,6 +1,6 @@
|
||||
# Pod Disruption Budgets — prevent node maintenance from killing all replicas
|
||||
# API: at least 2 of 3 replicas must stay up during voluntary disruptions
|
||||
# Worker: at least 1 of 2 replicas must stay up
|
||||
# Worker: singleton (Asynq scheduler) — must allow drain, minAvailable: 0
|
||||
|
||||
apiVersion: policy/v1
|
||||
kind: PodDisruptionBudget
|
||||
@@ -26,7 +26,7 @@ metadata:
|
||||
app.kubernetes.io/name: worker
|
||||
app.kubernetes.io/part-of: honeydue
|
||||
spec:
|
||||
minAvailable: 1
|
||||
minAvailable: 0
|
||||
selector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: worker
|
||||
|
||||
@@ -0,0 +1,53 @@
|
||||
# Traefik reconfiguration for this deployment.
|
||||
#
|
||||
# K3s defaults: Traefik as single-replica Deployment, LoadBalancer service.
|
||||
# We disabled servicelb (--disable=servicelb on k3s install), so LoadBalancer
|
||||
# doesn't get an external IP. This config makes Traefik a DaemonSet binding
|
||||
# directly on each node's public :80/:443 via hostNetwork, matching our
|
||||
# Cloudflare DNS round-robin across 3 node IPs.
|
||||
#
|
||||
# Apply: kubectl apply -f traefik-helmchartconfig.yaml
|
||||
# Then bump Helm reconcile: kubectl delete job -n kube-system helm-install-traefik
|
||||
apiVersion: helm.cattle.io/v1
|
||||
kind: HelmChartConfig
|
||||
metadata:
|
||||
name: traefik
|
||||
namespace: kube-system
|
||||
spec:
|
||||
valuesContent: |-
|
||||
deployment:
|
||||
kind: DaemonSet
|
||||
hostNetwork: true
|
||||
service:
|
||||
enabled: false
|
||||
ports:
|
||||
web:
|
||||
port: 80
|
||||
hostPort: 80
|
||||
websecure:
|
||||
port: 443
|
||||
hostPort: 443
|
||||
# hostNetwork with port 80/443 requires RollingUpdate maxUnavailable > 0
|
||||
# (each node's port is held by one pod; can't surge).
|
||||
updateStrategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxUnavailable: 1
|
||||
maxSurge: 0
|
||||
securityContext:
|
||||
capabilities:
|
||||
drop: [ALL]
|
||||
add: [NET_BIND_SERVICE]
|
||||
readOnlyRootFilesystem: true
|
||||
runAsGroup: 65532
|
||||
runAsNonRoot: true
|
||||
runAsUser: 65532
|
||||
# NOTE: The host-level sysctl `net.ipv4.ip_unprivileged_port_start=0`
|
||||
# must be set on each node. Without it, hostNetwork pods can't actually
|
||||
# use NET_BIND_SERVICE to bind :80/:443. Persisted at
|
||||
# /etc/sysctl.d/99-unprivileged-ports.conf on each node.
|
||||
additionalArguments:
|
||||
# Trust Cloudflare's forwarded proto header so the Go app sees the
|
||||
# original https scheme even though CF→origin is plain HTTP.
|
||||
# IP ranges from https://www.cloudflare.com/ips-v4/ (as of 2026-04).
|
||||
- "--entrypoints.web.forwardedHeaders.trustedIPs=173.245.48.0/20,103.21.244.0/22,103.22.200.0/22,103.31.4.0/22,141.101.64.0/18,108.162.192.0/18,190.93.240.0/20,188.114.96.0/20,197.234.240.0/22,198.41.128.0/17,162.158.0.0/15,104.16.0.0/13,104.24.0.0/14,172.64.0.0/13,131.0.72.0/22"
|
||||
@@ -0,0 +1,105 @@
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: worker
|
||||
namespace: honeydue
|
||||
labels:
|
||||
app.kubernetes.io/name: worker
|
||||
app.kubernetes.io/part-of: honeydue
|
||||
spec:
|
||||
# Asynq's Scheduler is a singleton — running >1 replica fires every cron
|
||||
# task once per replica (duplicate daily digests, onboarding emails, etc.).
|
||||
# Keep at 1 until asynq.PeriodicTaskManager with Redis leader election is
|
||||
# wired in cmd/worker/main.go.
|
||||
replicas: 1
|
||||
strategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxUnavailable: 0
|
||||
maxSurge: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: worker
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app.kubernetes.io/name: worker
|
||||
app.kubernetes.io/part-of: honeydue
|
||||
spec:
|
||||
serviceAccountName: worker
|
||||
imagePullSecrets:
|
||||
- name: ghcr-credentials
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
runAsGroup: 1000
|
||||
fsGroup: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
containers:
|
||||
- name: worker
|
||||
image: IMAGE_PLACEHOLDER # Replaced by 03-deploy.sh
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: true
|
||||
capabilities:
|
||||
drop: ["ALL"]
|
||||
envFrom:
|
||||
- configMapRef:
|
||||
name: honeydue-config
|
||||
env:
|
||||
- name: POSTGRES_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: POSTGRES_PASSWORD
|
||||
- name: SECRET_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: SECRET_KEY
|
||||
- name: EMAIL_HOST_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: EMAIL_HOST_PASSWORD
|
||||
- name: FCM_SERVER_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: FCM_SERVER_KEY
|
||||
- name: REDIS_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: REDIS_PASSWORD
|
||||
optional: true
|
||||
volumeMounts:
|
||||
- name: apns-key
|
||||
mountPath: /secrets/apns
|
||||
readOnly: true
|
||||
- name: tmp
|
||||
mountPath: /tmp
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 512Mi
|
||||
livenessProbe:
|
||||
exec:
|
||||
command: ["pgrep", "-f", "/app/worker"]
|
||||
initialDelaySeconds: 15
|
||||
periodSeconds: 30
|
||||
timeoutSeconds: 5
|
||||
volumes:
|
||||
- name: apns-key
|
||||
secret:
|
||||
secretName: honeydue-apns-key
|
||||
items:
|
||||
- key: apns_auth_key.p8
|
||||
path: apns_auth_key.p8
|
||||
- name: tmp
|
||||
emptyDir:
|
||||
sizeLimit: 64Mi
|
||||
@@ -0,0 +1,52 @@
|
||||
# honeyDue edge proxy — terminates HTTP from Cloudflare, routes by Host header.
|
||||
#
|
||||
# Cloudflare is in front, SSL mode "Flexible" — CF terminates TLS at the edge
|
||||
# and talks to this origin over plain HTTP on port 80. No LE certs needed here
|
||||
# for now. Later, to go "Full (strict)", remove `auto_https off`, add `tls` blocks
|
||||
# that use the ACME HTTP-01 challenge, and open 443 on the node.
|
||||
|
||||
{
|
||||
admin off
|
||||
auto_https off
|
||||
}
|
||||
|
||||
# api.myhoneydue.com → Go REST API
|
||||
# `dynamic a` re-resolves the Swarm service DNS every 30s instead of caching
|
||||
# the IP forever at config parse. This is critical on Swarm with endpoint_mode:
|
||||
# dnsrr — when a task restarts, its overlay IP changes, and static DNS caching
|
||||
# leaves Caddy dialing dead IPs.
|
||||
api.myhoneydue.com:80 {
|
||||
reverse_proxy {
|
||||
dynamic a {
|
||||
name api
|
||||
port 8000
|
||||
refresh 30s
|
||||
}
|
||||
header_up X-Forwarded-Proto {http.request.header.X-Forwarded-Proto}
|
||||
}
|
||||
}
|
||||
|
||||
# admin.myhoneydue.com → Next.js admin panel via overlay DNS (VIP endpoint)
|
||||
#
|
||||
# This relies on Swarm's embedded resolver, which has a known libnetwork
|
||||
# stale-record bug (moby/moby#52265, affects 29.x). We work around it by
|
||||
# (a) using default VIP endpoint_mode — a stable service IP — and
|
||||
# (b) running a clean overlay from scratch (see Phase 1 stack recreate).
|
||||
#
|
||||
# If ghosts come back, the long-term fix is Traefik w/ Swarm provider that
|
||||
# reads task IPs from Docker API, bypassing libnetwork DNS entirely. See
|
||||
# deploy/MIGRATION_NOTES.md for the Traefik migration plan.
|
||||
admin.myhoneydue.com:80 {
|
||||
reverse_proxy admin:3000 {
|
||||
lb_try_duration 3s
|
||||
lb_try_interval 250ms
|
||||
header_up X-Forwarded-Proto {http.request.header.X-Forwarded-Proto}
|
||||
}
|
||||
}
|
||||
|
||||
# Catch-all for root/unknown hostnames hitting our IPs directly.
|
||||
# Cloudflare SSL=Flexible will still hit us on :80 for myhoneydue.com; return
|
||||
# a placeholder until you wire a real marketing site.
|
||||
:80 {
|
||||
respond "honeyDue" 200
|
||||
}
|
||||
@@ -248,13 +248,15 @@ Images that would be built and pushed:
|
||||
${ADMIN_IMAGE}
|
||||
|
||||
Replicas:
|
||||
caddy: 3 (one per node)
|
||||
api: ${API_REPLICAS:-3}
|
||||
worker: ${WORKER_REPLICAS:-2}
|
||||
worker: ${WORKER_REPLICAS:-1}
|
||||
admin: ${ADMIN_REPLICAS:-1}
|
||||
|
||||
Published ports:
|
||||
api: ${API_PORT:-8000} (ingress)
|
||||
admin: ${ADMIN_PORT:-3000} (ingress)
|
||||
caddy: 80, 443 (ingress — public)
|
||||
api: internal only (proxied by caddy)
|
||||
admin: internal only (proxied by caddy)
|
||||
dozzle: ${DOZZLE_PORT:-9999} (manager loopback only — SSH tunnel required)
|
||||
|
||||
Versioned secrets that would be created on this deploy:
|
||||
@@ -264,6 +266,9 @@ Versioned secrets that would be created on this deploy:
|
||||
${DEPLOY_STACK_NAME}_fcm_server_key_<deploy_id>
|
||||
${DEPLOY_STACK_NAME}_apns_auth_key_<deploy_id>
|
||||
|
||||
Versioned configs that would be created on this deploy:
|
||||
${DEPLOY_STACK_NAME}_caddyfile_<deploy_id>
|
||||
|
||||
No changes made. Re-run without DRY_RUN=1 to deploy.
|
||||
=================================================
|
||||
|
||||
@@ -289,27 +294,54 @@ if [[ "${SKIP_BUILD}" != "1" ]]; then
|
||||
log "Logging in to ${REGISTRY}"
|
||||
printf '%s' "${REGISTRY_TOKEN}" | docker login "${REGISTRY}" -u "${REGISTRY_USERNAME}" --password-stdin >/dev/null
|
||||
|
||||
log "Building API image ${API_IMAGE}"
|
||||
docker build --target api -t "${API_IMAGE}" "${REPO_DIR}"
|
||||
log "Building Worker image ${WORKER_IMAGE}"
|
||||
docker build --target worker -t "${WORKER_IMAGE}" "${REPO_DIR}"
|
||||
log "Building Admin image ${ADMIN_IMAGE}"
|
||||
docker build --target admin -t "${ADMIN_IMAGE}" "${REPO_DIR}"
|
||||
# Target platform for Swarm nodes. Hetzner CX is x86_64; override via
|
||||
# BUILD_PLATFORM=linux/arm64 if you move to ARM (Ampere) hosts.
|
||||
BUILD_PLATFORM="${BUILD_PLATFORM:-linux/amd64}"
|
||||
log "Build platform: ${BUILD_PLATFORM} (dev host may differ; buildx cross-compiles)"
|
||||
|
||||
log "Pushing deploy images"
|
||||
docker push "${API_IMAGE}"
|
||||
docker push "${WORKER_IMAGE}"
|
||||
docker push "${ADMIN_IMAGE}"
|
||||
|
||||
if [[ "${PUSH_LATEST_TAG}" == "true" ]]; then
|
||||
log "Updating :latest tags"
|
||||
docker tag "${API_IMAGE}" "${REGISTRY_PREFIX}/honeydue-api:latest"
|
||||
docker tag "${WORKER_IMAGE}" "${REGISTRY_PREFIX}/honeydue-worker:latest"
|
||||
docker tag "${ADMIN_IMAGE}" "${REGISTRY_PREFIX}/honeydue-admin:latest"
|
||||
docker push "${REGISTRY_PREFIX}/honeydue-api:latest"
|
||||
docker push "${REGISTRY_PREFIX}/honeydue-worker:latest"
|
||||
docker push "${REGISTRY_PREFIX}/honeydue-admin:latest"
|
||||
# Ensure a buildx builder exists and is usable
|
||||
if ! docker buildx inspect honeydue-builder >/dev/null 2>&1; then
|
||||
log "Creating buildx builder 'honeydue-builder'"
|
||||
docker buildx create --name honeydue-builder --use >/dev/null
|
||||
else
|
||||
docker buildx use honeydue-builder >/dev/null
|
||||
fi
|
||||
docker buildx inspect --bootstrap >/dev/null
|
||||
|
||||
build_and_push() {
|
||||
local target="$1"
|
||||
local image="$2"
|
||||
shift 2
|
||||
|
||||
local tag_args=(-t "${image}")
|
||||
while (( $# > 0 )); do
|
||||
tag_args+=(-t "$1")
|
||||
shift
|
||||
done
|
||||
|
||||
log "Building + pushing ${target} image for ${BUILD_PLATFORM}: ${image}"
|
||||
docker buildx build \
|
||||
--platform "${BUILD_PLATFORM}" \
|
||||
--target "${target}" \
|
||||
"${tag_args[@]}" \
|
||||
--push \
|
||||
"${REPO_DIR}"
|
||||
}
|
||||
|
||||
api_extra=()
|
||||
worker_extra=()
|
||||
admin_extra=()
|
||||
if [[ "${PUSH_LATEST_TAG}" == "true" ]]; then
|
||||
api_extra=("${REGISTRY_PREFIX}/honeydue-api:latest")
|
||||
worker_extra=("${REGISTRY_PREFIX}/honeydue-worker:latest")
|
||||
admin_extra=("${REGISTRY_PREFIX}/honeydue-admin:latest")
|
||||
fi
|
||||
|
||||
# ${arr[@]+"${arr[@]}"} safely expands to nothing when the array is empty
|
||||
# under `set -u` — avoids "unbound variable" on bash arrays.
|
||||
build_and_push api "${API_IMAGE}" ${api_extra[@]+"${api_extra[@]}"}
|
||||
build_and_push worker "${WORKER_IMAGE}" ${worker_extra[@]+"${worker_extra[@]}"}
|
||||
build_and_push admin "${ADMIN_IMAGE}" ${admin_extra[@]+"${admin_extra[@]}"}
|
||||
else
|
||||
warn "SKIP_BUILD=1 set. Using prebuilt images for tag: ${DEPLOY_TAG}"
|
||||
fi
|
||||
@@ -322,6 +354,12 @@ SECRET_KEY_SECRET="${DEPLOY_STACK_NAME}_secret_key_${DEPLOY_ID}"
|
||||
EMAIL_HOST_PASSWORD_SECRET="${DEPLOY_STACK_NAME}_email_host_password_${DEPLOY_ID}"
|
||||
FCM_SERVER_KEY_SECRET="${DEPLOY_STACK_NAME}_fcm_server_key_${DEPLOY_ID}"
|
||||
APNS_AUTH_KEY_SECRET="${DEPLOY_STACK_NAME}_apns_auth_key_${DEPLOY_ID}"
|
||||
CADDYFILE_CONFIG="${DEPLOY_STACK_NAME}_caddyfile_${DEPLOY_ID}"
|
||||
|
||||
CADDYFILE_SRC="${DEPLOY_DIR}/Caddyfile"
|
||||
if [[ ! -f "${CADDYFILE_SRC}" ]]; then
|
||||
die "Missing required file: ${CADDYFILE_SRC}"
|
||||
fi
|
||||
|
||||
TMP_DIR="$(mktemp -d)"
|
||||
cleanup() {
|
||||
@@ -332,6 +370,7 @@ trap cleanup EXIT
|
||||
cp "${STACK_TEMPLATE}" "${TMP_DIR}/swarm-stack.prod.yml"
|
||||
cp "${PROD_ENV}" "${TMP_DIR}/prod.env"
|
||||
cp "${REGISTRY_ENV}" "${TMP_DIR}/registry.env"
|
||||
cp "${CADDYFILE_SRC}" "${TMP_DIR}/Caddyfile"
|
||||
mkdir -p "${TMP_DIR}/secrets"
|
||||
cp "${SECRET_POSTGRES}" "${TMP_DIR}/secrets/postgres_password.txt"
|
||||
cp "${SECRET_APP_KEY}" "${TMP_DIR}/secrets/secret_key.txt"
|
||||
@@ -356,6 +395,7 @@ SECRET_KEY_SECRET=${SECRET_KEY_SECRET}
|
||||
EMAIL_HOST_PASSWORD_SECRET=${EMAIL_HOST_PASSWORD_SECRET}
|
||||
FCM_SERVER_KEY_SECRET=${FCM_SERVER_KEY_SECRET}
|
||||
APNS_AUTH_KEY_SECRET=${APNS_AUTH_KEY_SECRET}
|
||||
CADDYFILE_CONFIG=${CADDYFILE_CONFIG}
|
||||
EOF
|
||||
|
||||
log "Uploading deploy bundle to ${SSH_TARGET}:${DEPLOY_REMOTE_DIR}"
|
||||
@@ -364,6 +404,7 @@ scp "${SCP_OPTS[@]}" "${TMP_DIR}/swarm-stack.prod.yml" "${SSH_TARGET}:${DEPLOY_R
|
||||
scp "${SCP_OPTS[@]}" "${TMP_DIR}/prod.env" "${SSH_TARGET}:${DEPLOY_REMOTE_DIR}/prod.env"
|
||||
scp "${SCP_OPTS[@]}" "${TMP_DIR}/registry.env" "${SSH_TARGET}:${DEPLOY_REMOTE_DIR}/registry.env"
|
||||
scp "${SCP_OPTS[@]}" "${TMP_DIR}/runtime.env" "${SSH_TARGET}:${DEPLOY_REMOTE_DIR}/runtime.env"
|
||||
scp "${SCP_OPTS[@]}" "${TMP_DIR}/Caddyfile" "${SSH_TARGET}:${DEPLOY_REMOTE_DIR}/Caddyfile"
|
||||
scp "${SCP_OPTS[@]}" "${TMP_DIR}/secrets/postgres_password.txt" "${SSH_TARGET}:${DEPLOY_REMOTE_DIR}/secrets/postgres_password.txt"
|
||||
scp "${SCP_OPTS[@]}" "${TMP_DIR}/secrets/secret_key.txt" "${SSH_TARGET}:${DEPLOY_REMOTE_DIR}/secrets/secret_key.txt"
|
||||
scp "${SCP_OPTS[@]}" "${TMP_DIR}/secrets/email_host_password.txt" "${SSH_TARGET}:${DEPLOY_REMOTE_DIR}/secrets/email_host_password.txt"
|
||||
@@ -397,6 +438,17 @@ create_secret() {
|
||||
fi
|
||||
}
|
||||
|
||||
create_config() {
|
||||
local name="$1"
|
||||
local src="$2"
|
||||
if docker config inspect "${name}" >/dev/null 2>&1; then
|
||||
echo "[remote] config exists: ${name}"
|
||||
else
|
||||
docker config create "${name}" "${src}" >/dev/null
|
||||
echo "[remote] created config: ${name}"
|
||||
fi
|
||||
}
|
||||
|
||||
printf '%s' "${REGISTRY_TOKEN}" | docker login "${REGISTRY}" -u "${REGISTRY_USERNAME}" --password-stdin >/dev/null
|
||||
rm -f "${REMOTE_DIR}/registry.env"
|
||||
|
||||
@@ -406,6 +458,8 @@ create_secret "${EMAIL_HOST_PASSWORD_SECRET}" "${REMOTE_DIR}/secrets/email_host_
|
||||
create_secret "${FCM_SERVER_KEY_SECRET}" "${REMOTE_DIR}/secrets/fcm_server_key.txt"
|
||||
create_secret "${APNS_AUTH_KEY_SECRET}" "${REMOTE_DIR}/secrets/apns_auth_key.p8"
|
||||
|
||||
create_config "${CADDYFILE_CONFIG}" "${REMOTE_DIR}/Caddyfile"
|
||||
|
||||
rm -f "${REMOTE_DIR}/secrets/postgres_password.txt"
|
||||
rm -f "${REMOTE_DIR}/secrets/secret_key.txt"
|
||||
rm -f "${REMOTE_DIR}/secrets/email_host_password.txt"
|
||||
@@ -455,18 +509,22 @@ while true; do
|
||||
sleep 10
|
||||
done
|
||||
|
||||
log "Pruning old secret versions (keeping last ${SECRET_KEEP_VERSIONS})"
|
||||
ssh "${SSH_OPTS[@]}" "${SSH_TARGET}" "bash -s -- '${DEPLOY_STACK_NAME}' '${SECRET_KEEP_VERSIONS}'" <<'EOF' || warn "Secret pruning reported errors (non-fatal)"
|
||||
log "Pruning old secret + config versions (keeping last ${SECRET_KEEP_VERSIONS})"
|
||||
ssh "${SSH_OPTS[@]}" "${SSH_TARGET}" "bash -s -- '${DEPLOY_STACK_NAME}' '${SECRET_KEEP_VERSIONS}'" <<'EOF' || warn "Pruning reported errors (non-fatal)"
|
||||
set -euo pipefail
|
||||
|
||||
STACK_NAME="$1"
|
||||
KEEP="$2"
|
||||
|
||||
prune_prefix() {
|
||||
local prefix="$1"
|
||||
# List matching secrets with creation time, sorted newest-first.
|
||||
local kind="$1" # "secret" or "config"
|
||||
local prefix="$2"
|
||||
local ls_cmd rm_cmd
|
||||
ls_cmd="docker ${kind} ls --format '{{.CreatedAt}}|{{.Name}}'"
|
||||
rm_cmd="docker ${kind} rm"
|
||||
|
||||
local all
|
||||
all="$(docker secret ls --format '{{.CreatedAt}}|{{.Name}}' 2>/dev/null \
|
||||
all="$(eval "${ls_cmd}" 2>/dev/null \
|
||||
| grep "|${prefix}_" \
|
||||
| sort -r \
|
||||
|| true)"
|
||||
@@ -477,7 +535,7 @@ prune_prefix() {
|
||||
local total
|
||||
total="$(printf '%s\n' "${all}" | wc -l | tr -d ' ')"
|
||||
if (( total <= KEEP )); then
|
||||
echo "[cleanup] ${prefix}: ${total} version(s) — nothing to prune"
|
||||
echo "[cleanup] ${kind}/${prefix}: ${total} version(s) — nothing to prune"
|
||||
return 0
|
||||
fi
|
||||
|
||||
@@ -486,16 +544,20 @@ prune_prefix() {
|
||||
|
||||
while IFS= read -r name; do
|
||||
[[ -z "${name}" ]] && continue
|
||||
if docker secret rm "${name}" >/dev/null 2>&1; then
|
||||
echo "[cleanup] removed: ${name}"
|
||||
if ${rm_cmd} "${name}" >/dev/null 2>&1; then
|
||||
echo "[cleanup] removed ${kind}: ${name}"
|
||||
else
|
||||
echo "[cleanup] in-use (kept): ${name}"
|
||||
echo "[cleanup] in-use ${kind} (kept): ${name}"
|
||||
fi
|
||||
done <<< "${to_remove}"
|
||||
}
|
||||
|
||||
for base in postgres_password secret_key email_host_password fcm_server_key apns_auth_key; do
|
||||
prune_prefix "${STACK_NAME}_${base}"
|
||||
prune_prefix secret "${STACK_NAME}_${base}"
|
||||
done
|
||||
|
||||
for base in caddyfile; do
|
||||
prune_prefix config "${STACK_NAME}_${base}"
|
||||
done
|
||||
EOF
|
||||
|
||||
|
||||
+91
-26
@@ -1,6 +1,59 @@
|
||||
version: "3.8"
|
||||
|
||||
services:
|
||||
# Edge reverse proxy — the only service publishing :80/:443 publicly.
|
||||
# Routes by Host header to internal `api` and `admin` services over the
|
||||
# overlay network. Runs one replica per node via ingress mesh, so any node
|
||||
# can terminate incoming traffic.
|
||||
caddy:
|
||||
image: caddy:2-alpine
|
||||
ports:
|
||||
- target: 80
|
||||
published: 80
|
||||
protocol: tcp
|
||||
mode: ingress
|
||||
- target: 443
|
||||
published: 443
|
||||
protocol: tcp
|
||||
mode: ingress
|
||||
configs:
|
||||
- source: caddyfile
|
||||
target: /etc/caddy/Caddyfile
|
||||
mode: 0444
|
||||
volumes:
|
||||
- caddy_data:/data
|
||||
- caddy_config:/config
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://127.0.0.1/"]
|
||||
interval: 30s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
start_period: 10s
|
||||
deploy:
|
||||
replicas: 3
|
||||
restart_policy:
|
||||
condition: any
|
||||
delay: 5s
|
||||
update_config:
|
||||
parallelism: 1
|
||||
delay: 10s
|
||||
order: start-first
|
||||
rollback_config:
|
||||
parallelism: 1
|
||||
delay: 5s
|
||||
order: stop-first
|
||||
placement:
|
||||
max_replicas_per_node: 1
|
||||
resources:
|
||||
limits:
|
||||
cpus: "0.25"
|
||||
memory: 128M
|
||||
reservations:
|
||||
cpus: "0.05"
|
||||
memory: 32M
|
||||
networks:
|
||||
- honeydue-network
|
||||
|
||||
redis:
|
||||
image: redis:7-alpine
|
||||
command: redis-server --appendonly yes --appendfsync everysec --maxmemory 200mb --maxmemory-policy allkeys-lru
|
||||
@@ -30,11 +83,8 @@ services:
|
||||
|
||||
api:
|
||||
image: ${API_IMAGE}
|
||||
ports:
|
||||
- target: 8000
|
||||
published: ${API_PORT}
|
||||
protocol: tcp
|
||||
mode: ingress
|
||||
# No `ports:` block — Caddy edge service proxies to api:8000 over the
|
||||
# overlay network. Port 8000 is never publicly exposed.
|
||||
environment:
|
||||
PORT: "8000"
|
||||
DEBUG: "${DEBUG}"
|
||||
@@ -104,6 +154,10 @@ services:
|
||||
APPLE_IAP_SANDBOX: "${APPLE_IAP_SANDBOX}"
|
||||
GOOGLE_IAP_SERVICE_ACCOUNT_PATH: "${GOOGLE_IAP_SERVICE_ACCOUNT_PATH}"
|
||||
GOOGLE_IAP_PACKAGE_NAME: "${GOOGLE_IAP_PACKAGE_NAME}"
|
||||
|
||||
# Seeded on first migration (idempotent — skipped if admin_users row exists)
|
||||
ADMIN_EMAIL: "${ADMIN_EMAIL}"
|
||||
ADMIN_PASSWORD: "${ADMIN_PASSWORD}"
|
||||
stop_grace_period: 60s
|
||||
command:
|
||||
- /bin/sh
|
||||
@@ -116,15 +170,15 @@ services:
|
||||
export FCM_SERVER_KEY="$$(cat /run/secrets/fcm_server_key)"
|
||||
exec /app/api
|
||||
secrets:
|
||||
- source: ${POSTGRES_PASSWORD_SECRET}
|
||||
- source: postgres_password
|
||||
target: postgres_password
|
||||
- source: ${SECRET_KEY_SECRET}
|
||||
- source: secret_key
|
||||
target: secret_key
|
||||
- source: ${EMAIL_HOST_PASSWORD_SECRET}
|
||||
- source: email_host_password
|
||||
target: email_host_password
|
||||
- source: ${FCM_SERVER_KEY_SECRET}
|
||||
- source: fcm_server_key
|
||||
target: fcm_server_key
|
||||
- source: ${APNS_AUTH_KEY_SECRET}
|
||||
- source: apns_auth_key
|
||||
target: apns_auth_key
|
||||
volumes:
|
||||
- uploads:/app/uploads
|
||||
@@ -132,10 +186,18 @@ services:
|
||||
test: ["CMD", "curl", "-f", "http://127.0.0.1:8000/api/health/"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
start_period: 15s
|
||||
# Single-replica AutoMigrate on a fresh DB takes ~90s; subsequent
|
||||
# replicas are ~2s (idempotent). 180s gives honest headroom for the
|
||||
# first replica to finish, without masking cascade failures.
|
||||
start_period: 180s
|
||||
retries: 3
|
||||
deploy:
|
||||
replicas: ${API_REPLICAS}
|
||||
# DNS round-robin instead of VIP. VIP's kernel IPVS state can go stale
|
||||
# during replica churn (rolling updates, task restarts), causing
|
||||
# intermittent i/o timeouts from clients on the overlay network (Caddy).
|
||||
# dnsrr resolves to live task IPs directly and bypasses IPVS.
|
||||
endpoint_mode: dnsrr
|
||||
restart_policy:
|
||||
condition: any
|
||||
delay: 5s
|
||||
@@ -159,11 +221,8 @@ services:
|
||||
|
||||
admin:
|
||||
image: ${ADMIN_IMAGE}
|
||||
ports:
|
||||
- target: 3000
|
||||
published: ${ADMIN_PORT}
|
||||
protocol: tcp
|
||||
mode: ingress
|
||||
# No `ports:` block — reached via Caddy on admin.myhoneydue.com using
|
||||
# Swarm's embedded DNS and default VIP endpoint_mode.
|
||||
environment:
|
||||
PORT: "3000"
|
||||
HOSTNAME: "0.0.0.0"
|
||||
@@ -248,15 +307,15 @@ services:
|
||||
export FCM_SERVER_KEY="$$(cat /run/secrets/fcm_server_key)"
|
||||
exec /app/worker
|
||||
secrets:
|
||||
- source: ${POSTGRES_PASSWORD_SECRET}
|
||||
- source: postgres_password
|
||||
target: postgres_password
|
||||
- source: ${SECRET_KEY_SECRET}
|
||||
- source: secret_key
|
||||
target: secret_key
|
||||
- source: ${EMAIL_HOST_PASSWORD_SECRET}
|
||||
- source: email_host_password
|
||||
target: email_host_password
|
||||
- source: ${FCM_SERVER_KEY_SECRET}
|
||||
- source: fcm_server_key
|
||||
target: fcm_server_key
|
||||
- source: ${APNS_AUTH_KEY_SECRET}
|
||||
- source: apns_auth_key
|
||||
target: apns_auth_key
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://127.0.0.1:6060/health"]
|
||||
@@ -293,12 +352,11 @@ services:
|
||||
# ssh -L ${DOZZLE_PORT}:127.0.0.1:${DOZZLE_PORT} <manager>
|
||||
# Then browse http://localhost:${DOZZLE_PORT}
|
||||
image: amir20/dozzle:latest
|
||||
# Bind to loopback only on the manager. Swarm's long-form port spec
|
||||
# rejects `host_ip`, so we use the short form — 127.0.0.1:<port>:8080.
|
||||
# Access via SSH tunnel: ssh -L ${DOZZLE_PORT}:127.0.0.1:${DOZZLE_PORT} <manager>
|
||||
ports:
|
||||
- target: 8080
|
||||
published: ${DOZZLE_PORT}
|
||||
protocol: tcp
|
||||
mode: host
|
||||
host_ip: 127.0.0.1
|
||||
- "127.0.0.1:${DOZZLE_PORT}:8080"
|
||||
environment:
|
||||
DOZZLE_NO_ANALYTICS: "true"
|
||||
volumes:
|
||||
@@ -324,6 +382,8 @@ services:
|
||||
volumes:
|
||||
redis_data:
|
||||
uploads:
|
||||
caddy_data:
|
||||
caddy_config:
|
||||
|
||||
networks:
|
||||
honeydue-network:
|
||||
@@ -331,6 +391,11 @@ networks:
|
||||
driver_opts:
|
||||
encrypted: "true"
|
||||
|
||||
configs:
|
||||
caddyfile:
|
||||
external: true
|
||||
name: ${CADDYFILE_CONFIG}
|
||||
|
||||
secrets:
|
||||
postgres_password:
|
||||
external: true
|
||||
|
||||
@@ -0,0 +1,240 @@
|
||||
# 00 — Overview
|
||||
|
||||
## Summary
|
||||
|
||||
honeyDue runs on a three-node Kubernetes cluster managed by K3s, fronted by
|
||||
Cloudflare, and backed by a managed Postgres (Neon), S3-compatible object
|
||||
storage (Backblaze B2), and a self-hosted container registry (Gitea). The
|
||||
application consists of a Go REST API, a Next.js admin panel, and a
|
||||
background worker process using Redis-backed queues. Traefik handles HTTP
|
||||
ingress and path-based routing. The whole stack fits in about 1 GB of RAM
|
||||
across the three nodes with plenty of headroom.
|
||||
|
||||
This chapter is the map. Everything here is expanded in a later chapter.
|
||||
|
||||
## Architecture at a glance
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph Internet
|
||||
Browser[End-user browser / mobile client]
|
||||
end
|
||||
|
||||
subgraph CF[Cloudflare]
|
||||
CFEdge[Edge POP<br/>TLS terminates here]
|
||||
end
|
||||
|
||||
Browser -- HTTPS :443 --> CFEdge
|
||||
|
||||
subgraph Hetzner[Hetzner Cloud — Nuremberg nbg1]
|
||||
direction LR
|
||||
subgraph H1[hetzner1<br/>178.104.247.152]
|
||||
T1[Traefik<br/>:80/:443 hostNet]
|
||||
A1[api pod]
|
||||
W1[worker pod]
|
||||
end
|
||||
subgraph H2[hetzner2<br/>178.105.32.198]
|
||||
T2[Traefik<br/>:80/:443 hostNet]
|
||||
A2[api pod]
|
||||
R1[redis pod<br/>PVC]
|
||||
end
|
||||
subgraph H3[hetzner3<br/>178.104.249.189]
|
||||
T3[Traefik<br/>:80/:443 hostNet]
|
||||
A3[api pod]
|
||||
AD1[admin pod]
|
||||
end
|
||||
end
|
||||
|
||||
CFEdge -- HTTP :80<br/>DNS round-robin --> T1
|
||||
CFEdge -- HTTP :80 --> T2
|
||||
CFEdge -- HTTP :80 --> T3
|
||||
|
||||
T1 & T2 & T3 -.Ingress routes by<br/>Host header.-> A1
|
||||
T1 & T2 & T3 -.-> AD1
|
||||
A1 & A2 & A3 -.-> R1
|
||||
|
||||
subgraph External[Managed services]
|
||||
Neon[(Neon Postgres<br/>AWS us-east-1)]
|
||||
B2[(Backblaze B2<br/>us-east-005)]
|
||||
FM[Fastmail SMTP]
|
||||
Gitea[Gitea Registry<br/>gitea.treytartt.com]
|
||||
end
|
||||
|
||||
A1 & A2 & A3 -- SSL --> Neon
|
||||
W1 -- SSL --> Neon
|
||||
A1 & A2 & A3 -- HTTPS --> B2
|
||||
W1 -- SMTP :587 --> FM
|
||||
H1 & H2 & H3 -. image pull .-> Gitea
|
||||
```
|
||||
|
||||
### ASCII fallback
|
||||
|
||||
```
|
||||
┌─────────────────────┐
|
||||
│ End user │
|
||||
└──────────┬──────────┘
|
||||
│ HTTPS :443
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ Cloudflare edge │ TLS terminates here
|
||||
│ (SSL = Flexible) │
|
||||
└──────────┬──────────┘
|
||||
HTTP :80 round-robin
|
||||
┌─────────────┼─────────────┐
|
||||
▼ ▼ ▼
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ hetzner1 │ │ hetzner2 │ │ hetzner3 │
|
||||
│ 178.104.247.152 │ │ 178.105.32.198 │ │ 178.104.249.189 │
|
||||
│ Traefik :80/443 │ │ Traefik :80/443 │ │ Traefik :80/443 │
|
||||
│ api worker │ │ api redis │ │ api admin │
|
||||
└─────────┬───────┘ └─────────┬───────┘ └─────────┬───────┘
|
||||
│ │ │
|
||||
└──────── Kubernetes overlay ───────────┘
|
||||
│
|
||||
┌─────────────────────────────┴──────────────────────────────┐
|
||||
│ │
|
||||
▼ ▼ ▼ ▼
|
||||
┌─────────┐ ┌─────────────┐ ┌──────────┐ ┌───────────────┐
|
||||
│ Neon │ │ Backblaze B2│ │ Fastmail │ │ Gitea Registry│
|
||||
│Postgres │ │ uploads │ │ SMTP │ │ image pull │
|
||||
└─────────┘ └─────────────┘ └──────────┘ └───────────────┘
|
||||
```
|
||||
|
||||
## The stack, one layer at a time
|
||||
|
||||
### Layer 0 — Hardware
|
||||
|
||||
Three Hetzner Cloud CX33 instances (4 vCPU, 8 GB RAM, 80 GB NVMe SSD) in
|
||||
Hetzner's Nuremberg (nbg1) datacenter. Each node is $7.99/mo (April 2026
|
||||
pricing), totaling ~$24/mo. See [Chapter 1](./01-infrastructure.md).
|
||||
|
||||
### Layer 1 — Operating system
|
||||
|
||||
Ubuntu 24.04.3 LTS. Each node has:
|
||||
- SSH on port 22, key-only auth, `deploy` user with NOPASSWD sudo
|
||||
- `ufw` firewall with strict default-deny-incoming; specific ports allowed
|
||||
per Chapter 4
|
||||
- Sysctl override `net.ipv4.ip_unprivileged_port_start=0` so non-root
|
||||
containers can bind privileged ports (needed for Traefik to serve :80/:443)
|
||||
|
||||
### Layer 2 — Container runtime
|
||||
|
||||
`containerd` v2.2.2 (bundled with K3s). Docker was previously installed from
|
||||
the Swarm era but is now disabled. containerd is Kubernetes' reference
|
||||
runtime and has a smaller footprint than Docker's full stack.
|
||||
|
||||
### Layer 3 — Orchestrator
|
||||
|
||||
K3s v1.34.6 in HA mode. All 3 nodes are `control-plane,etcd` (Raft quorum
|
||||
of 3 — can tolerate one node failure). K3s is a minimal Kubernetes
|
||||
distribution from Rancher Labs (now Suse): single-binary, embedded etcd
|
||||
instead of a separate etcd cluster, sane defaults for small installations.
|
||||
See [Chapter 2](./02-orchestrator-choice.md) for why k3s over full Kubernetes
|
||||
or Docker Swarm.
|
||||
|
||||
### Layer 4 — Cluster networking
|
||||
|
||||
- **Flannel VXLAN** for pod-to-pod overlay (default on K3s). VXLAN tunnels
|
||||
pod traffic over UDP port 8472 between nodes.
|
||||
- **CoreDNS** for service discovery (what pods call `api` or `redis` to
|
||||
reach each other).
|
||||
- **kube-proxy** in IPVS mode for ClusterIP → pod routing.
|
||||
|
||||
[Chapter 3](./03-networking.md) walks through a single request to show
|
||||
every hop.
|
||||
|
||||
### Layer 5 — Ingress
|
||||
|
||||
**Traefik v3** as a DaemonSet with `hostNetwork: true`. Each node has a
|
||||
Traefik pod that binds directly to the node's public :80 and :443. No
|
||||
`servicelb`, no Hetzner Load Balancer — Cloudflare round-robins the three
|
||||
node IPs in DNS and any node can serve any request. See
|
||||
[Chapter 6](./06-traefik-ingress.md).
|
||||
|
||||
### Layer 6 — Edge / CDN
|
||||
|
||||
Cloudflare Free plan. Proxied A records for `api.myhoneydue.com`,
|
||||
`admin.myhoneydue.com`, and `myhoneydue.com` each point at all three node
|
||||
IPs. Edge handles TLS termination (SSL=Flexible), DDoS protection, caching
|
||||
for static assets, and traffic failover if a node becomes unreachable.
|
||||
See [Chapter 13](./13-cloudflare.md).
|
||||
|
||||
### Layer 7 — Application services
|
||||
|
||||
| Service | Type | Replicas | Image |
|
||||
|---|---|---|---|
|
||||
| `api` | Go (Echo, GORM) | 3 | `gitea.treytartt.com/admin/honeydue-api:<sha>` |
|
||||
| `admin` | Next.js 16 | 1 | `gitea.treytartt.com/admin/honeydue-admin:<sha>` |
|
||||
| `worker` | Go (Asynq) | 1 | `gitea.treytartt.com/admin/honeydue-worker:<sha>` |
|
||||
| `redis` | redis:7-alpine | 1 | Docker Hub |
|
||||
|
||||
See [Chapter 7](./07-services.md).
|
||||
|
||||
### Layer 8 — External dependencies
|
||||
|
||||
- **Neon Postgres** (Launch plan) — `honeyDue` database
|
||||
- **Backblaze B2** — `honeyDueProd` bucket for user uploads
|
||||
- **Fastmail SMTP** — transactional email
|
||||
- **Gitea** (self-hosted at `gitea.treytartt.com`) — container registry
|
||||
- **Cloudflare** — DNS, TLS, CDN
|
||||
|
||||
See [Chapter 8](./08-database.md), [9](./09-storage.md), and
|
||||
[11](./11-registry.md).
|
||||
|
||||
## What's deliberately absent
|
||||
|
||||
- **TLS at origin.** Cloudflare terminates TLS at the edge and talks HTTP
|
||||
on port 80 to the nodes. This is "Flexible SSL" in Cloudflare terminology.
|
||||
It's the simplest setup; we have a TODO to upgrade to "Full (strict)" with
|
||||
Cloudflare Origin CA certs ([Chapter 13](./13-cloudflare.md), §Future).
|
||||
- **Hetzner Load Balancer.** We save the $8.49/mo by having Cloudflare
|
||||
round-robin across node IPs directly. If any node is unresponsive,
|
||||
Cloudflare's own origin health checks will route around it within 30s.
|
||||
- **Push notifications.** APNs (iOS) and FCM (Android) are *configured off*
|
||||
until we have Apple Developer / Google Play accounts. The env vars are
|
||||
set to sentinel values that let the Go app boot; `FEATURE_PUSH_ENABLED=false`
|
||||
gates all call sites.
|
||||
- **External metrics/monitoring (Prometheus, Grafana, Betterstack).**
|
||||
Right now we rely on `kubectl logs`, `kubectl top`, and Cloudflare's own
|
||||
analytics. See [Chapter 15](./15-observability.md) for what's there and
|
||||
what we'd add.
|
||||
- **Automated backups of Redis state.** Redis is configured with AOF
|
||||
(append-only file) persistence, but the PVC is only on one node. Redis
|
||||
holds only cache + Asynq queue state; losing it re-populates on first
|
||||
request / next cron tick. Not critical.
|
||||
- **Admin panel basic auth (Traefik middleware).** In-app admin login is
|
||||
enabled; the extra Traefik-layer basic auth the scaffold supports is not
|
||||
currently attached.
|
||||
|
||||
## The deployment pipeline in one paragraph
|
||||
|
||||
Changes to application code are built on your workstation by
|
||||
`docker buildx build --platform linux/amd64 --push`, which cross-compiles
|
||||
from arm64 (Apple Silicon) to amd64 (Hetzner nodes) and pushes directly to
|
||||
`gitea.treytartt.com`. Manifests live in `deploy-k3s/manifests/`; they
|
||||
reference image tags by git short SHA. `kubectl apply -f` rolls the new
|
||||
image in with `maxUnavailable: 0, maxSurge: 1` — one new pod at a time,
|
||||
old one stays up until new is healthy. Service discovery by Kubernetes
|
||||
DNS means `api` and `admin` hostnames always resolve to live backing pods;
|
||||
traffic shifts the moment a new pod passes its readiness probe.
|
||||
[Chapter 14](./14-deployment-process.md) walks through a complete deploy.
|
||||
|
||||
## What we *used* to have (the short version)
|
||||
|
||||
Up until 2026-04-24 this stack ran on **Docker Swarm** on the same three
|
||||
Hetzner boxes. It worked, but the Docker libnetwork service-discovery
|
||||
layer has a bug in the 29.x line ([moby/moby#52265][moby-52265]) that
|
||||
leaves stale DNS A-records behind when tasks migrate between nodes. We
|
||||
hit it: the admin panel returned 502s for ~50% of requests through
|
||||
Cloudflare because Caddy (our previous reverse proxy) was dialing a ghost
|
||||
IP that had since been recycled to the Dozzle log viewer. We spent four
|
||||
hours trying increasingly clever workarounds (dnsrr vs VIP,
|
||||
`dynamic a` DNS refresh, global mode, host-mode ports, host.docker.internal,
|
||||
hardcoded node IPs) before concluding that libnetwork state corruption
|
||||
survives every non-nuclear fix.
|
||||
|
||||
The full autopsy is in [Chapter 19 — Swarm Postmortem](./19-postmortem-swarm.md).
|
||||
K3s uses CoreDNS and has no libnetwork history; the bug class doesn't
|
||||
exist there.
|
||||
|
||||
[moby-52265]: https://github.com/moby/moby/issues/52265
|
||||
@@ -0,0 +1,294 @@
|
||||
# 01 — Infrastructure
|
||||
|
||||
## Summary
|
||||
|
||||
Three Hetzner Cloud CX33 virtual machines in the Nuremberg (nbg1) datacenter
|
||||
form the compute foundation. Each is a 4 vCPU / 8 GB RAM / 80 GB NVMe SSD
|
||||
instance on Hetzner's shared-CPU "Cloud" line. Total compute cost is
|
||||
$23.97/mo. This chapter explains each node spec in detail, why we picked
|
||||
Hetzner and this tier specifically, and the rejected alternatives.
|
||||
|
||||
## Node specifications
|
||||
|
||||
All three nodes are identical. Specs per node:
|
||||
|
||||
| Spec | Value |
|
||||
|---|---|
|
||||
| Provider | Hetzner Cloud (`www.hetzner.com/cloud`) |
|
||||
| Instance type | CX33 (shared-CPU line) |
|
||||
| vCPU | 4 |
|
||||
| RAM | 8 GB |
|
||||
| Disk | 80 GB NVMe SSD |
|
||||
| Network | 20 TB/mo outbound included |
|
||||
| IPv4 address | Public dedicated |
|
||||
| IPv6 address | /64 subnet |
|
||||
| Region | `nbg1` (Nuremberg, Germany) |
|
||||
| OS | Ubuntu 24.04.3 LTS (HWE kernel 6.8.0-90-generic) |
|
||||
| Price | **$7.99/mo** (April 2026) ⁽¹⁾ |
|
||||
|
||||
⁽¹⁾ Hetzner applied a price adjustment on 2026-04-01 — CX33 went from
|
||||
~$6.59 to $7.99. See [Hetzner price adjustment announcement][hetzner-prices].
|
||||
|
||||
### The three nodes
|
||||
|
||||
| SSH alias | Public IPv4 | IPv6 | k3s hostname |
|
||||
|---|---|---|---|
|
||||
| `hetzner1` | 178.104.247.152 | `2a01:4f8:1c18:79c7::1` | `ubuntu-8gb-nbg1-2` |
|
||||
| `hetzner2` | 178.105.32.198 | `2a01:4f8:1c18:5ecf::1` | `ubuntu-8gb-nbg1-1` |
|
||||
| `hetzner3` | 178.104.249.189 | `2a01:4f8:1c18:241a::1` | `ubuntu-8gb-nbg1-3` |
|
||||
|
||||
**Naming quirk.** The SSH-alias numbers and the Hetzner-assigned hostname
|
||||
numbers do not match (`hetzner1` is `nbg1-2`, `hetzner2` is `nbg1-1`). This
|
||||
is because the Hetzner hostnames are assigned in server-creation order; the
|
||||
SSH aliases were set up later in the order we wanted to refer to them. We
|
||||
chose not to rename the hosts — renaming `hostname` on a Kubernetes node
|
||||
after it joins the cluster causes problems (node certificates, etcd
|
||||
identity, etc. tie to the hostname). Living with the quirk is easier than
|
||||
rebuilding. See the mapping table in [the README](./README.md).
|
||||
|
||||
## Why Hetzner
|
||||
|
||||
### Decision matrix
|
||||
|
||||
Compared at the time of purchase (~2026-04-23):
|
||||
|
||||
| Provider | Instance | vCPU / RAM / SSD | Price/mo | Traffic/mo |
|
||||
|---|---|---|---:|---|
|
||||
| **Hetzner** | **CX33** | **4 / 8 GB / 80 GB** | **$7.99** | **20 TB** |
|
||||
| DigitalOcean | General-purpose | 2 / 8 GB / 25 GB | $63 | 4 TB |
|
||||
| DigitalOcean | Basic | 4 / 8 GB / 160 GB | $48 | 5 TB |
|
||||
| Vultr | High Perf | 4 / 8 GB / 180 GB | $48 | 5 TB |
|
||||
| Linode (Akamai) | Shared | 4 / 8 GB / 160 GB | $48 | 5 TB |
|
||||
| OVHcloud | VPS 2026 4vC | 4 / 8 GB / 75 GB | ~$13 | unlimited |
|
||||
| Contabo | Cloud VPS 2 | 4 / 8 GB / 200 GB | $8 | 32 TB |
|
||||
| Netcup | VPS 1000 G11 | 4 / 8 GB / 256 GB | ~$6 | unlimited |
|
||||
| Oracle Always Free | ARM Ampere | up to 4 / 24 GB / 200 GB | $0 | 10 TB | *availability lottery* |
|
||||
|
||||
**Why Hetzner won:**
|
||||
|
||||
1. **Price/performance at this tier is best-in-class among mainstream hosts.**
|
||||
Similar specs at DigitalOcean/Vultr/Linode cost 6× as much. You're paying
|
||||
the "American managed cloud" premium there for UX polish we don't need.
|
||||
2. **Dedicated IPv4 + /64 IPv6 + 20 TB traffic included.** No overage anxiety
|
||||
at this scale; 20 TB is multiple months of anticipated traffic for a
|
||||
bootstrapped app.
|
||||
3. **European datacenter, GDPR-native.** honeyDue serves users in
|
||||
multiple regions; if EU users dominate, Nuremberg is fast. US users pay
|
||||
about +100 ms over a US-East host, which is well within Cloudflare-cached
|
||||
tolerances for most app traffic.
|
||||
4. **Mature API + `hcloud` CLI** for automation if we ever need it.
|
||||
5. **Hetzner Cloud Firewall is free** and rule-for-rule equivalent to AWS
|
||||
Security Groups / DO Cloud Firewall. We use UFW on the nodes instead
|
||||
(Chapter 4) because our rule set evolved ad-hoc and moving it to the
|
||||
provider's firewall is a small cleanup project.
|
||||
|
||||
**Why not the cheaper options:**
|
||||
|
||||
- **Netcup** is ~$1/mo cheaper per node with more disk, but its API is
|
||||
barebones, the account/billing UX is more fiddly, and their network
|
||||
routing in the US (where the operator is based) has more hops than
|
||||
Hetzner's.
|
||||
- **Contabo** is the cheapest, but the company has a reputation for
|
||||
oversubscribed nodes. For a production service, unpredictable CPU steal
|
||||
and disk I/O variance is not worth saving $0/node. Contabo is fine for
|
||||
non-critical workloads; it's a poor fit for prod.
|
||||
- **Oracle Cloud Always Free** is genuinely free (4 ARM cores + 24 GB RAM)
|
||||
but:
|
||||
- Requires ARM64 builds (we build on ARM but would need to not need
|
||||
cross-compile — see Chapter 11 for why amd64 matters)
|
||||
- Capacity for free accounts is a lottery; instance creation fails
|
||||
"out of capacity" more often than it succeeds
|
||||
- Oracle has reclaimed idle free-tier instances in the past
|
||||
|
||||
### Why not the premium options
|
||||
|
||||
DigitalOcean, Vultr, and Linode are excellent products with better UX than
|
||||
Hetzner. They were rejected because at honeyDue's current scale the 3–6×
|
||||
price multiplier doesn't buy anything we'd use:
|
||||
|
||||
- We don't need managed databases, object storage, or load balancers from
|
||||
the same provider — those are Neon, Backblaze, and Cloudflare
|
||||
- We don't need their monitoring dashboards — Cloudflare Analytics +
|
||||
`kubectl top` + future Prometheus cover it
|
||||
- The UI polish matters mostly for day-1 setup; ongoing operations are
|
||||
`kubectl` and `ssh`
|
||||
|
||||
When honeyDue has enough revenue that an engineer's time is worth more than
|
||||
$40/mo, we'd consider moving for the better tooling. Not yet.
|
||||
|
||||
## Why Nuremberg (`nbg1`)
|
||||
|
||||
Hetzner has datacenters in Nuremberg (nbg1), Falkenstein (fsn1), Helsinki
|
||||
(hel1), Ashburn (ash), and Hillsboro (hil). Nuremberg was picked because:
|
||||
|
||||
- The operator's primary user base is expected to be mixed US/EU
|
||||
- Within the EU, Nuremberg is the most central from a peering perspective
|
||||
(well-connected to DE-CIX, Europe's largest internet exchange)
|
||||
- Falkenstein is Hetzner's main datacenter and tends to have longer
|
||||
provisioning queues during capacity crunches; Nuremberg is smaller and
|
||||
more available
|
||||
|
||||
For a US-only userbase, Ashburn (ash) or Hillsboro (hil) would be better
|
||||
picks — US users would see ~20 ms instead of ~120 ms.
|
||||
|
||||
Cloudflare's edge caches most assets, so the origin location matters mostly
|
||||
for first-request / uncached / POST traffic.
|
||||
|
||||
## Why three nodes
|
||||
|
||||
**Raft quorum and fault tolerance.** K3s in HA mode uses Raft consensus
|
||||
(via embedded etcd) for cluster state. Raft requires a majority of nodes
|
||||
to agree on every write. Quorum formulas:
|
||||
|
||||
| Total managers | Quorum | Max failures tolerated |
|
||||
|---|---|---|
|
||||
| 1 | 1 | 0 |
|
||||
| 2 | 2 | 0 |
|
||||
| 3 | 2 | 1 |
|
||||
| 4 | 3 | 1 |
|
||||
| 5 | 3 | 2 |
|
||||
|
||||
Three is the smallest odd number that tolerates a failure, and three is
|
||||
where price/resilience is sweetest. Five nodes doesn't help until you need
|
||||
to tolerate *two* simultaneous failures — a scale concern that doesn't
|
||||
apply at our traffic volume.
|
||||
|
||||
Two nodes is worse than one: you still have single-failure intolerance
|
||||
(one down = no quorum), but you've doubled your cost and failure surface.
|
||||
Avoid even-node clusters for consensus systems.
|
||||
|
||||
## Node hardening
|
||||
|
||||
Each node was bootstrapped with:
|
||||
|
||||
1. **Docker installed** from `download.docker.com` using the stable repo
|
||||
(this was the original Swarm setup; still installed but disabled — k3s
|
||||
bundles its own containerd).
|
||||
2. **`deploy` user created** with:
|
||||
- Home directory
|
||||
- Bash as login shell
|
||||
- Member of `docker` group (historical, when Swarm was the orchestrator)
|
||||
- Member of `sudo` group with `NOPASSWD: ALL` in `/etc/sudoers.d/deploy`
|
||||
3. **SSH key installed** at `/home/deploy/.ssh/authorized_keys`
|
||||
- The key is the public half of `~/.ssh/hetzner` on the operator
|
||||
workstation (`ssh-ed25519`, 256 bits)
|
||||
4. **`/opt/honeydue/deploy`** directory created, owned by `deploy`
|
||||
(originally for Swarm deploy bundle drop zone; unused now)
|
||||
5. **Sysctl** `net.ipv4.ip_unprivileged_port_start=0` persisted to
|
||||
`/etc/sysctl.d/99-unprivileged-ports.conf`. Required so Traefik (running
|
||||
as UID 65532) can bind `:80` and `:443` in the host network namespace.
|
||||
|
||||
The full bootstrap script is at `/tmp/honeydue_bootstrap.sh` on the
|
||||
operator workstation (used during the initial Swarm setup — see
|
||||
[Chapter 19](./19-postmortem-swarm.md) for context).
|
||||
|
||||
## Cost breakdown
|
||||
|
||||
```
|
||||
3 × Hetzner CX33 $23.97/mo
|
||||
Hetzner network traffic $0 (20 TB/mo included per node, nowhere near it)
|
||||
Neon Postgres (Launch) $5-15/mo (usage-based, ~$5 min)
|
||||
Backblaze B2 <$1/mo (tiny upload volume currently)
|
||||
Cloudflare Free $0
|
||||
Gitea (self-hosted) $0 (the operator's existing Gitea)
|
||||
─────────────────────────────────
|
||||
Total infra ~$30-40/mo
|
||||
```
|
||||
|
||||
See [Chapter 18 — Cost](./18-cost.md) for a full breakdown including
|
||||
external SaaS (Fastmail, Apple Developer, etc.) and at-scale projections.
|
||||
|
||||
## Provisioning workflow
|
||||
|
||||
Nodes were provisioned manually through Hetzner Cloud Console. This is
|
||||
fine for a three-node cluster; for larger clusters we'd switch to the
|
||||
[`hetzner-k3s`][hetzner-k3s] Ruby tool that the `deploy-k3s/` scaffold
|
||||
expects. The manual steps were:
|
||||
|
||||
1. Create project in Hetzner Cloud Console.
|
||||
2. Upload SSH key (`hetzner.pub`).
|
||||
3. Create 3× CX33 servers in `nbg1` with Ubuntu 24.04.
|
||||
4. SSH in as `root`, run bootstrap to create `deploy` user and install
|
||||
Docker / later k3s.
|
||||
5. Apply Hetzner Cloud Firewall rules at the network edge *optional* (we
|
||||
use UFW per Chapter 4 instead).
|
||||
|
||||
A future greenfield deployment would run `deploy-k3s/scripts/01-provision-cluster.sh`,
|
||||
which does all of this in one shot via the `hetzner-k3s` CLI.
|
||||
|
||||
## Upgrade / replacement plan
|
||||
|
||||
**Node failure.** If a node becomes unreachable, the other two retain
|
||||
Raft quorum and the cluster continues accepting writes. Pods from the
|
||||
failed node get rescheduled to the survivors (so long as the survivors
|
||||
have spare capacity — see Chapter 16). To replace the dead node:
|
||||
|
||||
1. Delete it from the cluster: `kubectl delete node <name>`
|
||||
2. Create a replacement CX33 in Hetzner console
|
||||
3. Install k3s on it with `--server=https://<manager>:6443`
|
||||
4. Verify `kubectl get nodes` shows it as Ready
|
||||
|
||||
**Scaling up.** To add a fourth node, same procedure without deleting
|
||||
anything. Consider whether you want it as a server (adds to Raft quorum;
|
||||
must also add up to an odd total) or an agent (worker-only). K3s agents
|
||||
join with `INSTALL_K3S_EXEC=agent` instead of `server`.
|
||||
|
||||
**Upgrading K3s.** K3s has a minor release every ~3 months. Upgrade by
|
||||
running the install script with the new version on each node, one at a
|
||||
time, verifying cluster health between each. See
|
||||
[Chapter 17](./17-runbook.md) for the detailed procedure.
|
||||
|
||||
**Upgrading the OS.** Ubuntu 24.04 LTS is supported until 2029.
|
||||
`unattended-upgrades` is *not* currently installed, so OS patches require
|
||||
manual `apt upgrade`. Install `unattended-upgrades` when time permits —
|
||||
security patches are important and automation reduces the risk of
|
||||
falling behind.
|
||||
|
||||
## Physical location & regulatory
|
||||
|
||||
- **Sovereignty**: Hetzner is headquartered in Gunzenhausen, Germany.
|
||||
All data at rest in `nbg1` is subject to German law and the GDPR.
|
||||
- **User data**: Most user data actually lives in
|
||||
**Neon Postgres (AWS us-east-1, Virginia)** and **Backblaze B2
|
||||
(us-east-005, South Carolina)** — both US-hosted. EU users' data
|
||||
therefore *exits* the EU in the API path. If strict EU data residency
|
||||
is ever a requirement, Neon has a EU region (Frankfurt) and Backblaze
|
||||
has EU endpoints; switching is a configuration change, not an
|
||||
architectural one.
|
||||
- **Encryption at rest**: Hetzner encrypts node-local disks at the
|
||||
hypervisor layer. Neon encrypts at the AWS EBS layer. B2 encrypts
|
||||
objects server-side. None of our application code or config holds
|
||||
secrets at rest that aren't already in Kubernetes Secrets (which
|
||||
are stored in etcd; etcd on disk is unencrypted by default in k3s
|
||||
but see Chapter 5 for hardening).
|
||||
|
||||
## Operator cheat sheet
|
||||
|
||||
```bash
|
||||
# SSH to any node
|
||||
ssh -i ~/.ssh/hetzner deploy@hetzner1
|
||||
|
||||
# Check node health
|
||||
kubectl get nodes -o wide
|
||||
|
||||
# Per-node resource usage
|
||||
kubectl top nodes
|
||||
|
||||
# See what's on each node
|
||||
kubectl get pods -A -o wide | sort -k 8
|
||||
|
||||
# Hetzner console (in browser)
|
||||
# https://console.hetzner.cloud/
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Hetzner Cloud product page][hetzner-cloud]
|
||||
- [Hetzner price adjustment April 2026][hetzner-prices]
|
||||
- [hetzner-k3s tool][hetzner-k3s]
|
||||
- [K3s architecture docs][k3s-arch]
|
||||
|
||||
[hetzner-cloud]: https://www.hetzner.com/cloud/
|
||||
[hetzner-prices]: https://docs.hetzner.com/general/infrastructure-and-availability/price-adjustment/
|
||||
[hetzner-k3s]: https://github.com/vitobotta/hetzner-k3s
|
||||
[k3s-arch]: https://docs.k3s.io/architecture
|
||||
@@ -0,0 +1,323 @@
|
||||
# 02 — Orchestrator Choice
|
||||
|
||||
## Summary
|
||||
|
||||
We run K3s — a lightweight Kubernetes distribution from SUSE/Rancher Labs.
|
||||
This wasn't our first choice. We originally deployed on Docker Swarm and
|
||||
spent a long afternoon hitting a libnetwork bug before migrating. This
|
||||
chapter walks through the comparison of the three realistic orchestrators
|
||||
(Docker Swarm, full Kubernetes, and K3s) and a fourth (Nomad) we
|
||||
considered and rejected. The story of the Swarm→k3s migration is in
|
||||
[Chapter 19](./19-postmortem-swarm.md); this chapter is about the decision
|
||||
framework.
|
||||
|
||||
## The decision
|
||||
|
||||
**K3s v1.34.6+k3s1**, HA mode, three control-plane nodes with embedded etcd.
|
||||
|
||||
## Candidates considered
|
||||
|
||||
| | Docker Swarm | K3s | Full Kubernetes (kubeadm) | Hashicorp Nomad |
|
||||
|---|---|---|---|---|
|
||||
| Learning curve | Easiest | Medium | Hardest | Easy |
|
||||
| Install on 3 nodes | `docker swarm init/join` | `curl \| sh` per node | Many steps | `nomad server/agent` |
|
||||
| Memory footprint (control plane) | ~200 MB per node | ~500 MB per node | ~1 GB per node | ~200 MB per node |
|
||||
| Service discovery | libnetwork (buggy) | CoreDNS | CoreDNS | Consul |
|
||||
| HA quorum | Raft (3+ managers) | Raft via embedded etcd (3+ servers) | etcd cluster (3+ nodes) | Raft (3+ servers) |
|
||||
| Secrets management | Swarm secrets | k8s Secrets | k8s Secrets | Vault or file-backed |
|
||||
| Rolling updates | Swarm update_config | Deployments | Deployments | job update stanza |
|
||||
| Ingress | None (third-party) | Traefik bundled | None (install yourself) | None (install yourself) |
|
||||
| Active development | Maintenance mode | Active | Active | Active |
|
||||
| Industry momentum | Declining | Growing | Dominant | Niche |
|
||||
|
||||
## Why K3s
|
||||
|
||||
### Against Docker Swarm
|
||||
|
||||
Swarm was our first pick because it's the simplest "production-like"
|
||||
option. `docker swarm init` gives you a working cluster in seconds. It's
|
||||
built into the Docker daemon you already have.
|
||||
|
||||
What killed it:
|
||||
|
||||
1. **libnetwork state bugs.** Swarm's service discovery relies on
|
||||
libnetwork's gossip-backed service registry. When a service's task
|
||||
migrates between nodes, the old endpoint record isn't always removed
|
||||
cleanly — especially on encrypted overlays or during transient network
|
||||
partitions. The result: stale DNS A-records that persist indefinitely,
|
||||
survive service removal, survive containerd restarts, survive pretty
|
||||
much everything except recreating the overlay network. Multiple open
|
||||
issues track this: [moby/moby#52265][moby-52265],
|
||||
[moby/moby#51491][moby-51491], [Dokploy#3480][dokploy-3480].
|
||||
|
||||
2. **It's in maintenance mode.** Mirantis [committed to supporting
|
||||
Swarm through 2030][mirantis-swarm] as part of Mirantis Kubernetes
|
||||
Engine 3, but nothing is being actively developed. The libnetwork code
|
||||
has no champion; bug fixes land slowly and often incompletely (the
|
||||
29.0.0 partial fix for #50236, the 29.3.0 regression, the pending
|
||||
follow-up in #52289 — months apart).
|
||||
|
||||
3. **Industry signal.** Every 2026 write-up of "should I pick Swarm"
|
||||
reaches the same conclusion: run what works; don't bet new workload on
|
||||
it. [Better Stack][bstack-swarm] and [VirtualizationHowTo][vht-swarm]
|
||||
are representative.
|
||||
|
||||
The [Chapter 19 postmortem](./19-postmortem-swarm.md) details the specific
|
||||
bug we hit, the workarounds we tried, and why each failed.
|
||||
|
||||
### Against full Kubernetes (kubeadm)
|
||||
|
||||
Full Kubernetes is the de-facto standard. It has the biggest ecosystem, the
|
||||
most documentation, the most mindshare. Against it:
|
||||
|
||||
1. **Operational overhead.** A kubeadm-built cluster has ~6 control-plane
|
||||
processes (kube-apiserver, etcd, kube-scheduler, kube-controller-manager,
|
||||
kube-proxy, kubelet) each of which needs monitoring, upgrading, and
|
||||
understanding. K3s bundles them into a single binary with sensible
|
||||
defaults.
|
||||
|
||||
2. **Memory.** A kubeadm control plane wants ~1 GB RAM baseline per master
|
||||
node. On an 8 GB node that's 12% gone before any workload runs. K3s is
|
||||
~500 MB per master.
|
||||
|
||||
3. **Etcd.** Full Kubernetes expects a separate 3+ node etcd cluster for
|
||||
HA, typically on the same masters but as an independent process. K3s
|
||||
embeds etcd in the server binary; still Raft, still HA, but one less
|
||||
thing to install/upgrade/monitor.
|
||||
|
||||
4. **Cluster creation UX.** `kubeadm init` + certificate distribution + CNI
|
||||
install + storage class setup is a multi-step dance. K3s `curl -sfL
|
||||
https://get.k3s.io | sh -s - server --cluster-init` plus two joins is a
|
||||
10-minute cluster.
|
||||
|
||||
**What we'd lose by not using full Kubernetes:** nothing that matters at
|
||||
our scale. K3s is 100% Kubernetes API-compatible. Every `kubectl` command,
|
||||
every Helm chart, every manifest works identically. If we ever need to
|
||||
migrate to full Kubernetes, `kubectl get all -A -o yaml` gives us the
|
||||
entire state and we re-apply it on the new cluster.
|
||||
|
||||
### Against Hashicorp Nomad
|
||||
|
||||
Nomad is very good at what it does — simpler than Kubernetes, more robust
|
||||
than Swarm, has real load balancing (via Consul Connect), and the
|
||||
`nomad agent` binary is ~80 MB vs k3s' ~200 MB.
|
||||
|
||||
Against it:
|
||||
|
||||
1. **Ecosystem is smaller.** Far fewer community Helm charts, operators,
|
||||
tutorials. Every new component needs bespoke integration.
|
||||
2. **Service discovery requires Consul.** Two products to operate, not one.
|
||||
3. **Ingress requires a separate tool** (Traefik, HAProxy, Fabio). K3s
|
||||
bundles Traefik by default.
|
||||
4. **Secrets management** requires Vault or relies on Nomad's template
|
||||
stanza. Not bad, but more moving parts.
|
||||
5. **The operator hasn't used Nomad in production before.** Learning curve
|
||||
on a new platform during a prod migration is a bad trade.
|
||||
|
||||
Nomad would be a defensible choice. K3s won primarily on ecosystem
|
||||
maturity and the operator's familiarity with Kubernetes primitives.
|
||||
|
||||
## What K3s actually is
|
||||
|
||||
K3s is a CNCF Sandbox project (now graduated to Rancher/SUSE-backed)
|
||||
originally designed for edge and IoT. Its design goals:
|
||||
|
||||
- Single ~200 MB static binary
|
||||
- Works on ARM64 and AMD64
|
||||
- Bundles everything needed for a working cluster: containerd, Flannel,
|
||||
CoreDNS, Traefik, metrics-server, local-path storage provisioner, and
|
||||
(optionally) servicelb (klipper-lb) load balancer
|
||||
- Replaces the kubeadm setup dance with `curl | sh`
|
||||
- Replaces etcd-in-its-own-cluster with embedded etcd (or SQLite for
|
||||
single-node)
|
||||
- Replaces Docker with containerd (though you can opt back into Docker)
|
||||
|
||||
It is **not** a fork of Kubernetes. K3s is Kubernetes, packaged differently.
|
||||
The Kubernetes Go code it wraps is unmodified (aside from build-time
|
||||
stripping of cloud provider integrations you don't need). `kubectl`,
|
||||
the API, CRDs, operators — all identical.
|
||||
|
||||
## HA architecture we chose
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph Cluster[k3s HA cluster]
|
||||
subgraph N1[hetzner1]
|
||||
K1[k3s server]
|
||||
E1[etcd]
|
||||
KUB1[kubelet]
|
||||
TR1[Traefik pod<br/>hostNet :80/:443]
|
||||
P1[app pods]
|
||||
end
|
||||
subgraph N2[hetzner2]
|
||||
K2[k3s server]
|
||||
E2[etcd]
|
||||
KUB2[kubelet]
|
||||
TR2[Traefik pod<br/>hostNet :80/:443]
|
||||
P2[app pods]
|
||||
end
|
||||
subgraph N3[hetzner3]
|
||||
K3[k3s server]
|
||||
E3[etcd]
|
||||
KUB3[kubelet]
|
||||
TR3[Traefik pod<br/>hostNet :80/:443]
|
||||
P3[app pods]
|
||||
end
|
||||
end
|
||||
|
||||
E1 <--Raft--> E2 <--Raft--> E3
|
||||
E1 <--Raft--> E3
|
||||
|
||||
K1 & K2 & K3 --- API[kube-apiserver<br/>port 6443]
|
||||
```
|
||||
|
||||
### ASCII fallback
|
||||
|
||||
```
|
||||
hetzner1 hetzner2 hetzner3
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐
|
||||
│ k3s srv │ │ k3s srv │ │ k3s srv │
|
||||
│ ├ etcd ─┼──────┼ ├ etcd ──┼──────┼─ etcd │ │
|
||||
│ │ :6443│ │ │ :6443│ │ :6443│ │
|
||||
│ ├ kubelet │ ├ kubelet │ kubelet│
|
||||
│ └ pods │ │ └ pods │ │ pods │ │
|
||||
└──────────┘ └──────────┘ └──────────┘
|
||||
│ ▲ │ ▲ │ ▲
|
||||
│ └─── Raft ────┤ └─── Raft ────┘ │
|
||||
└────────── Raft ─┴─────────────────────┘
|
||||
```
|
||||
|
||||
All three nodes are **server** nodes (in k3s terminology) — they all run
|
||||
`kube-apiserver`, `kube-scheduler`, `kube-controller-manager`, and
|
||||
participate in etcd Raft consensus. A fourth "agent" node could be added
|
||||
as worker-only; we don't need that capacity yet.
|
||||
|
||||
**Quorum**: 2 out of 3 nodes must agree on writes. The cluster stays
|
||||
operational if any one node dies. Two dying nodes = cluster loses quorum
|
||||
(Raft halts) until a majority returns.
|
||||
|
||||
## What we disabled
|
||||
|
||||
We ran k3s install with `--disable=servicelb`. `servicelb` (a.k.a.
|
||||
`klipper-lb`) is a trick where k3s spawns a daemonset that listens on a
|
||||
node's host ports and proxies to `LoadBalancer`-typed services. Fine for
|
||||
dev; we don't need it because we handle ingress with Traefik in
|
||||
DaemonSet+hostNetwork mode (Chapter 6).
|
||||
|
||||
We did **not** disable:
|
||||
- **traefik** — we reconfigured it via HelmChartConfig rather than
|
||||
disable-and-replace. See Chapter 6.
|
||||
- **local-path-provisioner** — provides the default `StorageClass` we use
|
||||
for Redis PVC (Chapter 7).
|
||||
- **metrics-server** — required for `kubectl top` and HorizontalPodAutoscaler.
|
||||
- **coredns** — the cluster DNS. Essential for service discovery.
|
||||
|
||||
## Version choices
|
||||
|
||||
### K3s v1.34.6+k3s1
|
||||
|
||||
This was the latest stable K3s release as of 2026-04-24. K3s follows
|
||||
upstream Kubernetes' release cadence — `1.34` matches Kubernetes 1.34.x.
|
||||
The `+k3s1` suffix is the K3s build number within that upstream version.
|
||||
|
||||
**Upgrade policy**: K3s supports one minor version per quarter. We'd
|
||||
upgrade in place to 1.35 when it's been out ~30 days and has no open
|
||||
critical bugs in the release notes. See Chapter 17 for the procedure.
|
||||
|
||||
### containerd v2.2.2
|
||||
|
||||
Bundled with K3s. containerd 2.x brought full support for the
|
||||
`cri-dockerd` replacement API and performance improvements over 1.x.
|
||||
We don't pin containerd separately — we take whatever K3s ships.
|
||||
|
||||
### Flannel (VXLAN backend)
|
||||
|
||||
Bundled with K3s as the default CNI. Flannel's VXLAN backend is
|
||||
straightforward, performant enough, and has worked reliably in every K3s
|
||||
install we've seen. Alternatives (Calico, Cilium) are more featureful but
|
||||
add operational complexity.
|
||||
|
||||
See [Chapter 3](./03-networking.md) for a deep dive on the networking
|
||||
layer.
|
||||
|
||||
## What we did NOT choose from K3s' ecosystem
|
||||
|
||||
- **servicelb / klipper-lb** — off. Reason above.
|
||||
- **embedded SQLite** — on single-node k3s, SQLite replaces etcd. We're
|
||||
multi-node, so this doesn't apply.
|
||||
- **`--flannel-backend=wireguard-native`** — WireGuard-encrypted overlay.
|
||||
We didn't enable it because (a) VXLAN already works, (b) our node-to-node
|
||||
traffic stays within Hetzner's internal network anyway, and (c) we haven't
|
||||
proven we need it. Encryption is a TODO (Chapter 20).
|
||||
|
||||
## Raft and split-brain behavior
|
||||
|
||||
If the 3 nodes become network-partitioned such that one node sees the
|
||||
other two and vice versa (a "2-1 split"):
|
||||
|
||||
- **Majority partition (2 nodes)** — retains quorum, cluster keeps
|
||||
accepting writes. Pods on those 2 nodes keep running. Pods on the
|
||||
isolated node eventually get marked `NotReady` after
|
||||
`node-monitor-grace-period` (default 40s), and after
|
||||
`pod-eviction-timeout` (default 5 min) their pods are marked for
|
||||
eviction and rescheduled onto the surviving nodes.
|
||||
- **Minority partition (1 node)** — loses quorum. API server on that
|
||||
node refuses writes; existing pods keep running (kubelet doesn't need
|
||||
the API server for already-scheduled pods), but nothing new can deploy,
|
||||
scale, or reschedule.
|
||||
|
||||
When the partition heals, Raft reconciles automatically. The minority
|
||||
node catches up on etcd state via snapshot+replay.
|
||||
|
||||
**Worst case** (all 3 isolated from each other): no quorum, no node is
|
||||
authoritative. Pods keep running from existing state; nothing can be
|
||||
updated. This requires all three nodes losing network to each other
|
||||
simultaneously, which implies Hetzner's entire internal switching is
|
||||
broken — at that point, the whole region is likely down anyway.
|
||||
|
||||
## Our decision in one sentence
|
||||
|
||||
K3s gave us the Kubernetes API (enormous ecosystem, known primitives, our
|
||||
existing scaffold in `deploy-k3s/manifests/`) without the operational
|
||||
overhead of kubeadm; and unlike Swarm, its service-discovery layer is
|
||||
rock-solid.
|
||||
|
||||
## Operator cheat sheet
|
||||
|
||||
```bash
|
||||
# On any k3s server node, root commands use k3s-wrapped kubectl:
|
||||
sudo k3s kubectl get nodes
|
||||
|
||||
# From workstation, use the copied kubeconfig:
|
||||
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
|
||||
kubectl get nodes
|
||||
|
||||
# Check k3s service:
|
||||
ssh deploy@hetzner1 "sudo systemctl status k3s"
|
||||
|
||||
# Watch cluster events live:
|
||||
kubectl get events -A --watch
|
||||
|
||||
# See what's on each node:
|
||||
kubectl get pods -A -o wide | sort -k 8
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [K3s architecture][k3s-arch]
|
||||
- [K3s requirements][k3s-reqs]
|
||||
- [Mirantis Swarm support announcement][mirantis-swarm]
|
||||
- [moby/moby#52265 — libnetwork stale records][moby-52265]
|
||||
- [moby/moby#51491 — DNS broken after swarm init][moby-51491]
|
||||
- [Dokploy #3480 — Traefik stale VIP on Swarm][dokploy-3480]
|
||||
- [Better Stack: Hetzner Cloud Review 2026][bstack-swarm]
|
||||
- [VirtualizationHowTo: Is Docker Swarm Still Safe in 2026?][vht-swarm]
|
||||
|
||||
[k3s-arch]: https://docs.k3s.io/architecture
|
||||
[k3s-reqs]: https://docs.k3s.io/installation/requirements
|
||||
[mirantis-swarm]: https://www.mirantis.com/blog/mirantis-guarantees-long-term-support-for-swarm/
|
||||
[moby-52265]: https://github.com/moby/moby/issues/52265
|
||||
[moby-51491]: https://github.com/moby/moby/issues/51491
|
||||
[dokploy-3480]: https://github.com/Dokploy/dokploy/issues/3480
|
||||
[bstack-swarm]: https://betterstack.com/community/guides/web-servers/hetzner-cloud-review/
|
||||
[vht-swarm]: https://www.virtualizationhowto.com/2026/03/is-docker-swarm-still-safe-in-2026/
|
||||
@@ -0,0 +1,465 @@
|
||||
# 03 — Networking
|
||||
|
||||
## Summary
|
||||
|
||||
The network stack has five layers: the physical/internet layer (Hetzner's
|
||||
public network), the node layer (Ubuntu with UFW), the Kubernetes overlay
|
||||
(Flannel VXLAN), the service layer (kube-proxy IPVS + CoreDNS), and the
|
||||
ingress layer (Traefik). This chapter walks through each, explains how
|
||||
they compose, and traces a single HTTP request from browser to Go API
|
||||
response showing every hop.
|
||||
|
||||
## The five layers
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph L5[Layer 5 — Ingress]
|
||||
Traefik
|
||||
end
|
||||
subgraph L4[Layer 4 — Service discovery]
|
||||
KubeProxy[kube-proxy IPVS]
|
||||
CoreDNS
|
||||
end
|
||||
subgraph L3[Layer 3 — Pod overlay]
|
||||
Flannel[Flannel VXLAN<br/>UDP 8472]
|
||||
end
|
||||
subgraph L2[Layer 2 — Node network]
|
||||
UFW
|
||||
Kernel[Linux kernel<br/>netfilter/iptables]
|
||||
end
|
||||
subgraph L1[Layer 1 — Physical]
|
||||
Hetzner[Hetzner network<br/>public v4 + v6]
|
||||
end
|
||||
|
||||
L5 --> L4 --> L3 --> L2 --> L1
|
||||
```
|
||||
|
||||
### ASCII fallback
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────┐
|
||||
│ L5 Traefik (host network, :80/:443)│
|
||||
├──────────────────────────────────────┤
|
||||
│ L4 kube-proxy (IPVS) + CoreDNS │
|
||||
├──────────────────────────────────────┤
|
||||
│ L3 Flannel VXLAN overlay │
|
||||
│ 10.42.0.0/16 pod CIDR │
|
||||
├──────────────────────────────────────┤
|
||||
│ L2 Ubuntu + UFW + kernel iptables │
|
||||
├──────────────────────────────────────┤
|
||||
│ L1 Hetzner public IPv4/IPv6 │
|
||||
└──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Layer 1 — Physical network
|
||||
|
||||
Each Hetzner CX33 has:
|
||||
- A **public IPv4** address on the internet
|
||||
- A **public IPv6** /64 subnet (one address used, the rest unused)
|
||||
- **20 TB/mo** outbound traffic included; inbound is free
|
||||
- **~1 Gbps** network bandwidth per node
|
||||
|
||||
All inter-node traffic goes over the **public network**. Hetzner Cloud
|
||||
offers a private-network feature (vswitch), but we didn't attach one —
|
||||
adding it now would require reconfiguring Flannel's advertise-addr. A
|
||||
future improvement: attach a private vSwitch to all three nodes,
|
||||
reconfigure Flannel to use it, shrink our public-interface attack surface.
|
||||
|
||||
## Layer 2 — Node network
|
||||
|
||||
Each node runs Ubuntu 24.04.3 LTS with:
|
||||
|
||||
- **Default routing** via the Hetzner-provided gateway
|
||||
- **UFW** as the iptables frontend (Chapter 4 lists every rule)
|
||||
- **IP forwarding** enabled (`net.ipv4.ip_forward=1`) — required for
|
||||
Kubernetes pod routing
|
||||
- **Bridge netfilter** enabled (`net.bridge.bridge-nf-call-iptables=1`)
|
||||
— required so iptables can see bridged traffic
|
||||
|
||||
K3s configures the latter two automatically at install time via
|
||||
`/etc/sysctl.d/90-kubelet.conf` (or similar; exact file varies by distro).
|
||||
|
||||
Two additional sysctls we set manually:
|
||||
|
||||
```
|
||||
# /etc/sysctl.d/99-unprivileged-ports.conf
|
||||
net.ipv4.ip_unprivileged_port_start=0
|
||||
```
|
||||
|
||||
**Why**: Traefik runs as UID 65532 (non-root) in host network mode to bind
|
||||
:80 and :443. Without this sysctl, even with `CAP_NET_BIND_SERVICE`, it
|
||||
can't bind privileged ports in the host namespace. Ubuntu 24.04's default
|
||||
is 1024 (so ports 1–1023 are "privileged"). Setting it to 0 lets any
|
||||
user bind any port.
|
||||
|
||||
**Security implication**: Minimal. The ports Traefik binds are still
|
||||
controlled by the container runtime — other pods on the node can't
|
||||
accidentally grab 80/443 because kubelet won't schedule conflicting host
|
||||
ports. And the UFW rules still gate what's reachable externally.
|
||||
|
||||
## Layer 3 — Pod overlay (Flannel VXLAN)
|
||||
|
||||
### What Flannel is
|
||||
|
||||
Flannel is a CNI (Container Network Interface) plugin. Its job: give every
|
||||
pod in the cluster a routable IP address, and make those IPs reachable
|
||||
from any other pod regardless of which node they're on.
|
||||
|
||||
### The pod CIDR
|
||||
|
||||
K3s assigns **10.42.0.0/16** as the cluster-wide pod CIDR by default. Each
|
||||
node gets a /24 slice:
|
||||
|
||||
| Node | Pod CIDR |
|
||||
|---|---|
|
||||
| ubuntu-8gb-nbg1-1 | 10.42.1.0/24 |
|
||||
| ubuntu-8gb-nbg1-2 | 10.42.0.0/24 |
|
||||
| ubuntu-8gb-nbg1-3 | 10.42.2.0/24 |
|
||||
|
||||
Each pod gets an IP from its node's slice. So a pod on hetzner2
|
||||
(`nbg1-1`) might be `10.42.1.6`; a pod on hetzner3 (`nbg1-3`) might be
|
||||
`10.42.2.10`.
|
||||
|
||||
### How VXLAN works
|
||||
|
||||
VXLAN ("Virtual Extensible LAN") tunnels Layer-2 frames over UDP. Flannel
|
||||
wraps every inter-node packet like so:
|
||||
|
||||
```
|
||||
Original pod → pod packet:
|
||||
┌──────────────────────────────────────────────────┐
|
||||
│ Ethernet │ IP src=10.42.0.5 → dst=10.42.2.10 │ … │
|
||||
└──────────────────────────────────────────────────┘
|
||||
|
||||
Flannel VXLAN-encapsulates it:
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ Eth │ IP src=178.104.247.152 → dst=178.104.249.189 │ UDP 8472 │ │
|
||||
│ VXLAN header │ <original Ethernet+IP+payload> │ │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
The outer IP/UDP carries the packet between nodes over Hetzner's public
|
||||
network. On arrival, the destination node unwraps the VXLAN header and
|
||||
delivers the inner packet to the target pod.
|
||||
|
||||
**UDP port 8472** is VXLAN's IANA-assigned port. It must be open
|
||||
node-to-node in UFW (see Chapter 4).
|
||||
|
||||
**MTU note**: VXLAN encapsulation adds 50 bytes of overhead (8 VXLAN +
|
||||
8 UDP + 20 IP + 14 Ethernet). Hetzner's network uses standard 1500-byte
|
||||
MTU, so Flannel's overlay MTU is 1450. Mismatches cause silent packet
|
||||
drops. K3s sets this correctly by default.
|
||||
|
||||
### Flannel config
|
||||
|
||||
`/var/lib/rancher/k3s/agent/etc/flannel/net-conf.json` on each node:
|
||||
|
||||
```json
|
||||
{
|
||||
"Network": "10.42.0.0/16",
|
||||
"EnableIPv6": false,
|
||||
"EnableIPv4": true,
|
||||
"IPv6Network": "::/0",
|
||||
"Backend": { "Type": "vxlan" }
|
||||
}
|
||||
```
|
||||
|
||||
We did not enable IPv6 in the cluster — an unnecessary complexity for our
|
||||
scale, and CoreDNS + kube-proxy + node controllers all work fine in v4-only
|
||||
mode.
|
||||
|
||||
### No encryption (yet)
|
||||
|
||||
Flannel VXLAN traffic over Hetzner's public network is **not encrypted**.
|
||||
This means pod-to-pod traffic between nodes is visible to any attacker
|
||||
with packet capture on the path — in practice, nobody between our three
|
||||
nodes at Hetzner Nuremberg, but it's still plaintext on the wire.
|
||||
|
||||
**Mitigation today**: All sensitive inter-pod traffic already uses TLS:
|
||||
- api ↔ Neon Postgres: TLS 1.3 (`DB_SSLMODE=require`)
|
||||
- api/worker ↔ Backblaze B2: HTTPS
|
||||
- api ↔ Fastmail: STARTTLS
|
||||
- api ↔ Redis: plaintext but Redis only holds cache + Asynq queue state,
|
||||
no user credentials
|
||||
|
||||
**TODO** (Chapter 20): Switch Flannel to `wireguard-native` backend. K3s
|
||||
supports this with a flag at install time; enabling on an existing
|
||||
cluster requires a config edit and rolling kubelet restart.
|
||||
|
||||
## Layer 4 — Service discovery
|
||||
|
||||
Pods don't talk to each other by IP — IPs are ephemeral, assigned on pod
|
||||
creation. They use **service names** resolved by DNS.
|
||||
|
||||
### CoreDNS
|
||||
|
||||
K3s runs **CoreDNS** as the cluster DNS server. A pod in the `honeydue`
|
||||
namespace resolves `redis` to the Redis Service's ClusterIP:
|
||||
|
||||
```
|
||||
redis → 10.43.7.10 (Service ClusterIP)
|
||||
redis.honeydue → 10.43.7.10
|
||||
redis.honeydue.svc.cluster.local → 10.43.7.10
|
||||
```
|
||||
|
||||
When an app resolves `redis:6379`:
|
||||
|
||||
1. The pod's `/etc/resolv.conf` points to `10.43.0.10` (the CoreDNS
|
||||
Service).
|
||||
2. CoreDNS receives the query, checks its known Services, returns
|
||||
`10.43.7.10`.
|
||||
3. The pod sends TCP to `10.43.7.10:6379`.
|
||||
4. kube-proxy (Layer 4, below) intercepts and routes to the actual pod IP.
|
||||
|
||||
### The service CIDR
|
||||
|
||||
K3s assigns **10.43.0.0/16** as the service CIDR. ClusterIPs live here.
|
||||
Currently:
|
||||
|
||||
| Service | ClusterIP |
|
||||
|---|---|
|
||||
| `api.honeydue` | 10.43.167.83 |
|
||||
| `admin.honeydue` | 10.43.136.168 |
|
||||
| `redis.honeydue` | 10.43.7.10 |
|
||||
| `kubernetes.default` | 10.43.0.1 |
|
||||
| `kube-dns.kube-system` | 10.43.0.10 |
|
||||
|
||||
ClusterIPs are **stable** for the life of the Service — they don't change
|
||||
when pods come and go.
|
||||
|
||||
### kube-proxy (IPVS mode)
|
||||
|
||||
`kube-proxy` is the dataplane component that makes Services work. It runs
|
||||
as a DaemonSet (one per node), watches the k3s API for Service and
|
||||
Endpoint changes, and programs the kernel to route traffic.
|
||||
|
||||
K3s defaults to **IPVS mode** on modern kernels. IPVS is a Linux kernel
|
||||
feature for in-kernel L4 load balancing — essentially connection-tracking
|
||||
NAT with round-robin or other scheduling.
|
||||
|
||||
When a pod dials `10.43.7.10:6379`:
|
||||
|
||||
1. The first packet hits the node's kernel
|
||||
2. IPVS sees the destination is a ClusterIP
|
||||
3. IPVS picks an endpoint from the Service's endpoint set (e.g.,
|
||||
`10.42.0.10:6379` on hetzner2)
|
||||
4. IPVS rewrites the destination and forwards
|
||||
5. Flannel tunnels it to the destination node (if remote) or delivers
|
||||
locally (if the endpoint is on the same node)
|
||||
|
||||
This happens per-TCP-connection, not per-packet, thanks to conntrack.
|
||||
|
||||
### Why IPVS over iptables
|
||||
|
||||
K3s' default kube-proxy mode is IPVS. The alternative (iptables mode) is
|
||||
older and slower — for every Service, iptables mode adds a chain of rules
|
||||
that grow linearly with Service count. IPVS uses a hash table and scales
|
||||
to thousands of Services without performance degradation. At our scale
|
||||
either works, but IPVS is the better default.
|
||||
|
||||
### Headless Services
|
||||
|
||||
Some of our Services are *not* using a ClusterIP — they're "headless"
|
||||
(`clusterIP: None`). Our setup doesn't currently use them but it's worth
|
||||
knowing the distinction: headless Services return all endpoint IPs
|
||||
directly via DNS, no kube-proxy involvement. Useful for stateful sets
|
||||
where clients need to talk to a specific replica.
|
||||
|
||||
## Layer 5 — Ingress (Traefik)
|
||||
|
||||
External traffic arrives on the node's public :80 or :443. Traefik
|
||||
handles the first mile of routing. See [Chapter 6](./06-traefik-ingress.md)
|
||||
for Traefik-specific details; this section just shows how it fits in the
|
||||
networking stack.
|
||||
|
||||
Traefik runs as a **DaemonSet** with `hostNetwork: true`. That means:
|
||||
- One Traefik pod per node
|
||||
- Each pod is in the **host's network namespace**, not a pod netns
|
||||
- Each pod can bind directly to `0.0.0.0:80` and `0.0.0.0:443` on the node
|
||||
|
||||
When Cloudflare sends a request to `178.104.247.152:80`:
|
||||
|
||||
1. Packet arrives at hetzner1's NIC
|
||||
2. UFW accepts (80/tcp is open from anywhere)
|
||||
3. Linux kernel routes to localhost:80 because something's listening
|
||||
4. Traefik (running in host namespace) accepts the connection
|
||||
5. Traefik reads the `Host:` header
|
||||
6. Traefik matches an Ingress rule (api.myhoneydue.com → api Service)
|
||||
7. Traefik dials `10.43.167.83:8000` (Service ClusterIP)
|
||||
8. Kube-proxy IPVS rewrites to a live api pod endpoint
|
||||
9. Flannel VXLAN tunnels if the endpoint is on a remote node
|
||||
10. The api pod receives the request, processes, responds
|
||||
11. Response flows back the reverse path
|
||||
|
||||
Full trace in the [end-to-end section](#end-to-end-request-trace) below.
|
||||
|
||||
## IPs we care about
|
||||
|
||||
| What | CIDR / IP | Used for |
|
||||
|---|---|---|
|
||||
| Pod CIDR | 10.42.0.0/16 | All pod IPs cluster-wide |
|
||||
| Service CIDR | 10.43.0.0/16 | All ClusterIPs |
|
||||
| Flannel VXLAN | UDP 8472 | Pod-to-pod traffic (inter-node) |
|
||||
| CoreDNS Service | 10.43.0.10:53 | Cluster DNS |
|
||||
| Kubernetes Service | 10.43.0.1:443 | Internal kube-apiserver |
|
||||
| Node IPs | See README | External + flannel source/dst |
|
||||
| Traefik | host network | Listens on node's :80, :443 |
|
||||
|
||||
## End-to-end request trace
|
||||
|
||||
A user in Texas hits `https://api.myhoneydue.com/api/tasks/`. Here's every
|
||||
hop:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant U as User (Austin, TX)
|
||||
participant CF as Cloudflare edge (DFW POP)
|
||||
participant H as hetzner2 (picked by CF)<br/>178.105.32.198
|
||||
participant TR as Traefik pod<br/>(hostNetwork)
|
||||
participant API as api pod on hetzner3<br/>10.42.2.6:8000
|
||||
participant DB as Neon Postgres<br/>(AWS us-east-1)
|
||||
|
||||
U->>CF: HTTPS :443 GET /api/tasks/
|
||||
Note over CF: TLS handshake terminates here
|
||||
CF->>H: HTTP :80 (with original Host header)
|
||||
H->>TR: Accepted by kernel, delivered to Traefik
|
||||
Note over TR: Matches Ingress rule<br/>host: api.myhoneydue.com
|
||||
TR->>TR: Resolve api.honeydue → 10.43.167.83
|
||||
TR->>H: dial 10.43.167.83:8000
|
||||
H->>H: kube-proxy IPVS rewrites<br/>dst → 10.42.2.6:8000
|
||||
H->>API: Flannel VXLAN encapsulate<br/>UDP 8472 → hetzner3
|
||||
Note over API: Pod receives packet
|
||||
API->>DB: SELECT … FROM tasks WHERE user_id = …<br/>TLS :5432
|
||||
DB-->>API: Result rows
|
||||
API-->>TR: HTTP 200 JSON
|
||||
TR-->>CF: HTTP 200
|
||||
CF-->>U: HTTPS 200
|
||||
```
|
||||
|
||||
### Timing budget for a cache-miss read
|
||||
|
||||
| Hop | Typical latency |
|
||||
|---|---|
|
||||
| User → CF edge (DFW) | 5–15 ms |
|
||||
| CF edge → hetzner2 (origin HTTP :80) | 90–120 ms (cross-Atlantic) |
|
||||
| UFW + kernel accept | <1 ms |
|
||||
| Traefik accept + route | 1–2 ms |
|
||||
| kube-proxy + Flannel (same node) | <1 ms |
|
||||
| kube-proxy + Flannel (remote node, VXLAN) | 1–3 ms |
|
||||
| Go API request handling | 1–5 ms |
|
||||
| Neon Postgres query (TLS + SQL) | 20–60 ms (AWS us-east-1) |
|
||||
| Return path (reverse) | similar |
|
||||
|
||||
**Total typical**: ~200–300 ms for a user in North America, dominated by
|
||||
the cross-Atlantic CF→origin hop. Cached responses at Cloudflare skip the
|
||||
origin hop entirely.
|
||||
|
||||
## Inter-node routing concretely
|
||||
|
||||
Here's what `ip route` shows on hetzner2 (not run live, reconstructed from
|
||||
typical k3s+flannel+vxlan setup):
|
||||
|
||||
```
|
||||
default via 172.31.1.1 dev eth0 # Hetzner gateway
|
||||
10.42.0.0/24 via 10.42.0.0 dev flannel.1 # to hetzner1 pods (via VXLAN iface)
|
||||
10.42.1.0/24 dev cni0 # local pods on hetzner2
|
||||
10.42.2.0/24 via 10.42.2.0 dev flannel.1 # to hetzner3 pods (via VXLAN iface)
|
||||
10.43.0.0/16 via 10.42.1.1 dev cni0 # services via kube-proxy
|
||||
```
|
||||
|
||||
The `flannel.1` interface is the VXLAN tunnel endpoint. Traffic written
|
||||
to it gets encapsulated in UDP 8472 and sent to the peer node's public IP.
|
||||
|
||||
Flannel learns about peer nodes via the Kubernetes API (it watches Node
|
||||
resources). When hetzner3 joins, Flannel on hetzner1 and hetzner2 both
|
||||
learn its public IP and pod CIDR, update their routes and ARP tables,
|
||||
and traffic just works.
|
||||
|
||||
## Network performance
|
||||
|
||||
### Within a node (pod to pod, same host)
|
||||
|
||||
Packets go through `cni0` bridge, never leave the node. Sub-millisecond
|
||||
latency, bounded by kernel + veth performance. Easily >10 Gbps.
|
||||
|
||||
### Between nodes (pod to pod, different host)
|
||||
|
||||
Packets go through Flannel VXLAN. Added overhead: encap/decap in the
|
||||
kernel (~5–10 μs), plus the actual network hop between hetzner nodes
|
||||
(~0.5 ms within the same Hetzner datacenter). Throughput is bounded by
|
||||
Hetzner's NIC (≈1 Gbps sustained per node).
|
||||
|
||||
In practice this is fine for everything we do. The slowest link in our
|
||||
application is Neon (AWS us-east-1), which is ~100 ms round-trip.
|
||||
|
||||
## DNS resolution path
|
||||
|
||||
A pod resolves `redis`:
|
||||
|
||||
1. App does `getaddrinfo("redis")`.
|
||||
2. glibc reads `/etc/resolv.conf`, finds nameserver `10.43.0.10`.
|
||||
3. sends UDP 53 to `10.43.0.10`.
|
||||
4. Destination is CoreDNS Service ClusterIP.
|
||||
5. kube-proxy IPVS load-balances across CoreDNS pods (there's usually 1).
|
||||
6. The packet arrives at the CoreDNS pod.
|
||||
7. CoreDNS checks its Kubernetes plugin cache for `redis.<ns>.svc.cluster.local`.
|
||||
8. Returns `10.43.7.10` (redis Service ClusterIP) with a low TTL.
|
||||
|
||||
CoreDNS is stateless — if it restarts, pods re-query on their next lookup.
|
||||
|
||||
**DNS caching in pods**: The Go API uses `net.Resolver` which does not
|
||||
cache by default. Each new connection triggers a fresh DNS lookup. This
|
||||
is correct behavior for Kubernetes (where Service IPs are stable but
|
||||
Endpoints change), but it means a CoreDNS outage breaks new connections
|
||||
immediately.
|
||||
|
||||
Next.js (admin) also uses Node's default resolver, similar behavior.
|
||||
|
||||
## What breaks if X fails
|
||||
|
||||
| Failure | Symptom |
|
||||
|---|---|
|
||||
| Flannel daemon on one node crashes | Pods on that node can't reach other nodes' pods; kube-proxy Services sometimes work (kernel conntrack) |
|
||||
| CoreDNS pod crashes (only 1) | New connection DNS lookups fail; existing connections continue |
|
||||
| kube-proxy daemon on one node crashes | Pods on that node can't resolve Service ClusterIPs; direct pod IPs still work |
|
||||
| UFW misconfigured (port 8472 UDP blocked) | Pods on that node can't reach remote pods over overlay |
|
||||
| Node's NIC fails | Node unreachable; Raft loses it; its pods get rescheduled elsewhere |
|
||||
| Hetzner datacenter outage | Entire cluster offline |
|
||||
|
||||
## Operator cheat sheet
|
||||
|
||||
```bash
|
||||
# See all IPs in the cluster
|
||||
kubectl get pods -A -o wide # pod IPs + nodes
|
||||
kubectl get svc -A # Service ClusterIPs
|
||||
|
||||
# Test pod-to-pod DNS from inside a pod
|
||||
kubectl exec -n honeydue deploy/api -- nslookup redis
|
||||
kubectl exec -n honeydue deploy/api -- getent hosts redis
|
||||
|
||||
# Test pod-to-pod TCP connectivity
|
||||
kubectl exec -n honeydue deploy/api -- nc -zv redis 6379
|
||||
kubectl exec -n honeydue deploy/api -- wget -q -O- http://admin:3000/
|
||||
|
||||
# See the node's iptables/IPVS rules (run on a node)
|
||||
ssh deploy@hetzner1 "sudo ipvsadm -Ln"
|
||||
ssh deploy@hetzner1 "sudo iptables -L -n -t nat | head -50"
|
||||
|
||||
# See the cluster's flannel state
|
||||
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{" "}{.status.addresses[?(@.type=="InternalIP")].address}{" "}{.spec.podCIDR}{"\n"}{end}'
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Kubernetes networking concepts][k8s-net]
|
||||
- [Flannel VXLAN backend][flannel-vxlan]
|
||||
- [CoreDNS k8s plugin][coredns-k8s]
|
||||
- [IPVS mode for kube-proxy][ipvs]
|
||||
- [VXLAN RFC 7348][vxlan-rfc]
|
||||
|
||||
[k8s-net]: https://kubernetes.io/docs/concepts/services-networking/
|
||||
[flannel-vxlan]: https://github.com/flannel-io/flannel/blob/master/Documentation/backends.md#vxlan
|
||||
[coredns-k8s]: https://coredns.io/plugins/kubernetes/
|
||||
[ipvs]: https://kubernetes.io/blog/2018/07/09/ipvs-based-in-cluster-load-balancing-deep-dive/
|
||||
[vxlan-rfc]: https://datatracker.ietf.org/doc/html/rfc7348
|
||||
@@ -0,0 +1,357 @@
|
||||
# 04 — Firewall
|
||||
|
||||
## Summary
|
||||
|
||||
Every node runs UFW (Uncomplicated Firewall, a frontend for iptables) with
|
||||
a default-deny-incoming policy. Specific ports are allowed from specific
|
||||
sources only. This chapter lists every rule on every node, why each rule
|
||||
exists, and what breaks without it. It also traces what happens to an
|
||||
inbound packet as it goes through iptables, UFW, and the kernel.
|
||||
|
||||
## Policy
|
||||
|
||||
All three nodes have the same UFW config. The policy:
|
||||
|
||||
| Direction | Default |
|
||||
|---|---|
|
||||
| **Incoming** | **deny** |
|
||||
| Outgoing | allow |
|
||||
| Routed | disabled (we don't NAT) |
|
||||
|
||||
Default deny is a white-list model: unless a rule explicitly allows a
|
||||
packet, it's dropped. This is more secure than default-allow but requires
|
||||
that every legitimate port be enumerated in a rule.
|
||||
|
||||
## Current ruleset per node
|
||||
|
||||
Run `sudo ufw status verbose` on any node to see the live ruleset. The
|
||||
canonical ruleset below, grouped by purpose.
|
||||
|
||||
### Public-facing (anywhere)
|
||||
|
||||
| Port | Protocol | From | Purpose | Comment |
|
||||
|---|---|---|---|---|
|
||||
| 22 | TCP | Anywhere | SSH | |
|
||||
| 80 | TCP | Anywhere | HTTP (Cloudflare → Traefik) | |
|
||||
| 443 | TCP | Anywhere | HTTPS (future, currently unused at origin) | |
|
||||
|
||||
**Why 443 is open but unused**: We're on Cloudflare SSL=Flexible, so
|
||||
Cloudflare talks to origin over plain HTTP:80. Port 443 on origin is
|
||||
only hit by misconfigured clients (who bypass CF DNS and hit node IPs
|
||||
directly). Traefik's config accepts it but we don't require it. Keeping
|
||||
it open smooths a future switch to Full (strict) SSL mode.
|
||||
|
||||
**Future hardening**: Restrict 80 and 443 to Cloudflare's published IP
|
||||
ranges (15 IPv4 CIDRs, 7 IPv6 CIDRs). See [Chapter 13](./13-cloudflare.md)
|
||||
for the ranges and the UFW rule format. Today they're open to anyone.
|
||||
|
||||
### SSH (operator access)
|
||||
|
||||
| Port | Protocol | From | Purpose |
|
||||
|---|---|---|---|
|
||||
| 22 | TCP | Anywhere | SSH login (key-only) |
|
||||
|
||||
SSH is open to the internet but hardened: key-only auth, no root login,
|
||||
`AllowUsers deploy` configured (the stock distribution still allows root;
|
||||
we hardened in bootstrap). See [Chapter 5](./05-security.md) for the full
|
||||
SSH config.
|
||||
|
||||
**TODO** (Chapter 20): Move SSH off :22 to :2222 or similar, tighten to
|
||||
the operator's current IP. Current state is acceptable given key-only +
|
||||
fail2ban defaults.
|
||||
|
||||
### Kubernetes API (kubectl from operator)
|
||||
|
||||
| Port | Protocol | From | Purpose |
|
||||
|---|---|---|---|
|
||||
| 6443 | TCP | 47.185.183.191 (operator IP) | kubectl to kube-apiserver |
|
||||
|
||||
When the operator's public IP changes (moves, new ISP), this rule needs
|
||||
updating on all 3 nodes. Ugly but necessary. A better long-term fix is
|
||||
**Cloudflare Access** or **Tailscale** to avoid pinning operator IPs.
|
||||
|
||||
### Inter-node cluster traffic
|
||||
|
||||
These rules allow the three nodes to talk to each other for cluster state.
|
||||
Each node has an allow rule for each of the **three node IPs** (including
|
||||
its own — the "allow from self" rule exists so local flows are explicit).
|
||||
|
||||
| Port | Protocol | From | Purpose |
|
||||
|---|---|---|---|
|
||||
| 6443 | TCP | other nodes | kube-apiserver (other servers' talk to each other) |
|
||||
| 2379 | TCP | other nodes | etcd client (Raft state reads) |
|
||||
| 2380 | TCP | other nodes | etcd peer (Raft state writes between server nodes) |
|
||||
| 10250 | TCP | other nodes | kubelet (metrics, exec, logs from API server) |
|
||||
| 8472 | UDP | other nodes | Flannel VXLAN overlay |
|
||||
|
||||
### Application-specific (legacy, mostly superfluous on k3s)
|
||||
|
||||
These rules were added during the Swarm era and still exist on the nodes.
|
||||
None of them hurt anything; most are unused on k3s.
|
||||
|
||||
| Port | Protocol | From | Purpose (original) | Status on k3s |
|
||||
|---|---|---|---|---|
|
||||
| 2377 | TCP | node IPs | Swarm cluster management | unused (Swarm gone) |
|
||||
| 7946 | TCP + UDP | node IPs | Swarm gossip | unused |
|
||||
| 4789 | UDP | node IPs | Swarm VXLAN | unused (k3s uses 8472) |
|
||||
| (ESP, proto 50) | — | node IPs | IPSec encrypted overlay | unused |
|
||||
| 500 | UDP | node IPs | IKE key exchange | unused |
|
||||
| 3000 | TCP | node IPs | admin Next.js, when we tried node-IP hardcoding | unused |
|
||||
|
||||
These can be removed in a cleanup pass. They don't affect security because
|
||||
no process listens on those ports anymore.
|
||||
|
||||
## Why each required rule exists
|
||||
|
||||
### Port 22 — SSH (public)
|
||||
|
||||
Obviously needed for operator access. Without it we'd have no way to
|
||||
reach the nodes. Hetzner console's "rescue" mode is an emergency fallback.
|
||||
|
||||
### Port 80 — HTTP (public)
|
||||
|
||||
Cloudflare talks HTTP to origin on port 80 (SSL=Flexible mode). Without
|
||||
this rule, Cloudflare gets connection-refused and returns 521 to users.
|
||||
|
||||
### Port 443 — HTTPS (public)
|
||||
|
||||
Currently unused in SSL=Flexible mode. Open to smooth the future
|
||||
Full-strict migration. No process listens on 443 yet; the kernel would
|
||||
reject connections. Rule is harmless.
|
||||
|
||||
### Port 6443 — kube-apiserver (operator + inter-node)
|
||||
|
||||
**From operator IP**: so `kubectl` works. Without this, `kubectl get pods`
|
||||
times out.
|
||||
|
||||
**From other nodes**: server nodes check each other's apiservers for
|
||||
Raft elections and cross-node controller operations. Without this,
|
||||
nodes can still run pods but can't participate in cluster state changes.
|
||||
|
||||
### Ports 2379/2380 — embedded etcd (inter-node)
|
||||
|
||||
K3s runs etcd as an embedded library inside the server binary. The etcd
|
||||
client port (2379) and peer port (2380) carry Raft protocol messages
|
||||
between the three servers. **Without these rules, Raft cannot replicate
|
||||
state and the cluster loses quorum.**
|
||||
|
||||
This bit us during the k3s install — initially the joins failed because
|
||||
2379/2380 were blocked.
|
||||
|
||||
### Port 10250 — kubelet (inter-node)
|
||||
|
||||
The kubelet on each node exposes a read-only API for the kube-apiserver
|
||||
to call — `kubectl logs`, `kubectl exec`, kubelet metrics scraping.
|
||||
Without this rule, operator commands like `kubectl logs -n honeydue
|
||||
deploy/api` fail with "Error from server: unable to upgrade connection".
|
||||
|
||||
### Port 8472 UDP — Flannel VXLAN (inter-node)
|
||||
|
||||
Pod-to-pod traffic between nodes flows through VXLAN tunnels on UDP 8472.
|
||||
**Without this rule, cross-node pod communication silently fails** — which
|
||||
looks like "admin can't reach api" or "worker can't reach Redis" depending
|
||||
on where pods land.
|
||||
|
||||
This rule is load-bearing. It is the single most important inter-node
|
||||
rule.
|
||||
|
||||
## Inbound packet's journey through UFW/iptables
|
||||
|
||||
When a packet arrives at hetzner1's network interface on port 80:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant NIC as hetzner1 NIC
|
||||
participant PRE as iptables<br/>raw + mangle + nat PREROUTING
|
||||
participant FIL as iptables filter INPUT<br/>(UFW lives here)
|
||||
participant SOCK as Traefik pod socket<br/>(host network)
|
||||
|
||||
NIC->>PRE: Packet: SYN :80 from CF
|
||||
PRE->>PRE: conntrack state: NEW
|
||||
PRE->>FIL: handoff to INPUT chain
|
||||
FIL->>FIL: UFW rules evaluated
|
||||
Note over FIL: Rule: allow 80/tcp from anywhere<br/>→ ACCEPT
|
||||
FIL->>SOCK: delivered to listening socket
|
||||
SOCK->>SOCK: Traefik accepts connection
|
||||
```
|
||||
|
||||
UFW is really a set of wrapper chains on top of iptables. `sudo iptables
|
||||
-L INPUT -n --line-numbers` on any node shows the actual rules; UFW just
|
||||
makes editing them easier.
|
||||
|
||||
## Rule syntax we used
|
||||
|
||||
UFW commands we ran during setup (for reference):
|
||||
|
||||
```bash
|
||||
# Reset to default
|
||||
sudo ufw --force reset
|
||||
|
||||
# Default deny incoming
|
||||
sudo ufw default deny incoming
|
||||
sudo ufw default allow outgoing
|
||||
|
||||
# SSH + web (public)
|
||||
sudo ufw allow 22/tcp comment 'SSH'
|
||||
sudo ufw allow 80/tcp comment 'HTTP'
|
||||
sudo ufw allow 443/tcp comment 'HTTPS'
|
||||
|
||||
# Kubernetes inter-node (repeat for each peer IP)
|
||||
for ip in 178.104.247.152 178.105.32.198 178.104.249.189; do
|
||||
sudo ufw allow from "$ip" to any port 6443 proto tcp comment "k3s-api $ip"
|
||||
sudo ufw allow from "$ip" to any port 2379 proto tcp comment "k3s-etcd-client $ip"
|
||||
sudo ufw allow from "$ip" to any port 2380 proto tcp comment "k3s-etcd-peer $ip"
|
||||
sudo ufw allow from "$ip" to any port 10250 proto tcp comment "k3s-kubelet $ip"
|
||||
sudo ufw allow from "$ip" to any port 8472 proto udp comment "k3s-flannel-vxlan $ip"
|
||||
done
|
||||
|
||||
# Kubectl from operator
|
||||
sudo ufw allow from 47.185.183.191 to any port 6443 proto tcp comment 'kubectl from dev'
|
||||
|
||||
# Enable
|
||||
sudo ufw --force enable
|
||||
```
|
||||
|
||||
Rules persist across reboots via `/etc/ufw/user.rules`.
|
||||
|
||||
## What if we used Hetzner Cloud Firewall instead?
|
||||
|
||||
Hetzner Cloud has a provider-level firewall feature — rule-for-rule
|
||||
equivalent but configured in the Hetzner console (or via API), not on the
|
||||
nodes. Tradeoffs:
|
||||
|
||||
| | Hetzner Cloud Firewall | UFW (current) |
|
||||
|---|---|---|
|
||||
| Cost | Free | Free |
|
||||
| Config location | Hetzner console / API | Per-node `/etc/ufw/` |
|
||||
| Applies to | All traffic to NIC | All traffic to kernel |
|
||||
| Failure mode | Provider-side issue = rules gone | Node-side issue = rules gone |
|
||||
| Inter-node traffic | Same rules for all nodes | Same rules on each node |
|
||||
| Visible to attacker | Yes (provider fingerprints) | Yes (iptables probe) |
|
||||
| Rule ordering | UI-based | `iptables -L` |
|
||||
|
||||
Either works. A future improvement: move the stable rules to Hetzner
|
||||
Cloud Firewall (one source of truth) and leave only the dynamic rules
|
||||
(operator IP, ad-hoc debug) on the nodes.
|
||||
|
||||
## Why we don't use iptables directly
|
||||
|
||||
UFW is a frontend. `iptables` works, but the rules are harder to read and
|
||||
edit. `sudo ufw allow from X to any port Y proto Z comment 'Z-rule'` is
|
||||
clearer than writing the equivalent `-A INPUT ...` rule directly.
|
||||
|
||||
Also, UFW's `comment` field lets us explain each rule, which becomes
|
||||
critical when the ruleset grows past ~10 rules.
|
||||
|
||||
## Testing the firewall
|
||||
|
||||
From the operator workstation (47.185.183.191):
|
||||
|
||||
```bash
|
||||
# Should work (22/tcp open)
|
||||
ssh deploy@hetzner1 exit
|
||||
|
||||
# Should work (80/tcp open)
|
||||
curl -I -H "Host: api.myhoneydue.com" http://hetzner1/api/health/
|
||||
|
||||
# Should work (443/tcp open; TLS handshake will fail because nothing listens)
|
||||
curl -kI https://178.104.247.152/
|
||||
|
||||
# Should work (6443 allowed from operator IP)
|
||||
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
|
||||
kubectl get nodes
|
||||
|
||||
# Should time out (default-deny from arbitrary ports)
|
||||
curl http://178.104.247.152:3000/ # not open to operator
|
||||
curl http://178.104.247.152:6379/ # Redis not exposed publicly
|
||||
```
|
||||
|
||||
From another peer node (hetzner2 trying to reach hetzner1):
|
||||
|
||||
```bash
|
||||
# Should work (k3s API allowed from peer node IPs)
|
||||
curl -k https://178.104.247.152:6443/healthz
|
||||
|
||||
# Should work (etcd client from peer)
|
||||
nc -zv 178.104.247.152 2379
|
||||
```
|
||||
|
||||
## The hidden dependency: kubelet/containerd also need ports
|
||||
|
||||
Beyond the UFW rules, the kubelet also listens on:
|
||||
- **10255/tcp** — kubelet read-only port (no auth, deprecated; disabled by default in k3s)
|
||||
- **10256/tcp** — kube-proxy health
|
||||
- **10257/tcp** — kube-controller-manager health
|
||||
- **10259/tcp** — kube-scheduler health
|
||||
|
||||
These are bound to `localhost` only, so they don't need UFW rules. But
|
||||
they're important to know about when debugging — if one of these health
|
||||
endpoints isn't responding, the relevant component is broken.
|
||||
|
||||
## Legacy rules to clean up
|
||||
|
||||
The following rules are on the nodes from the Swarm era and can be
|
||||
removed in a future cleanup pass:
|
||||
|
||||
```bash
|
||||
# On each node, list Swarm-era rules
|
||||
sudo ufw status numbered | grep -E "2377|7946|4789|500|3000|esp"
|
||||
|
||||
# Remove by number (highest-to-lowest to avoid renumbering)
|
||||
# Example:
|
||||
sudo ufw --force delete 15
|
||||
sudo ufw --force delete 14
|
||||
# ... etc.
|
||||
```
|
||||
|
||||
We left them in because they don't affect security (no process listens on
|
||||
those ports), and removing them requires careful testing that nothing in
|
||||
k3s secretly relies on 4789/udp or similar.
|
||||
|
||||
## Operator cheat sheet
|
||||
|
||||
```bash
|
||||
# Show the ruleset, with comments, numbered
|
||||
sudo ufw status numbered verbose
|
||||
|
||||
# Add a new rule
|
||||
sudo ufw allow from <ip> to any port <port> proto <tcp|udp> comment '<desc>'
|
||||
|
||||
# Remove a rule by number
|
||||
sudo ufw status numbered
|
||||
sudo ufw --force delete <N>
|
||||
|
||||
# Temporarily disable all rules (emergency)
|
||||
sudo ufw disable
|
||||
|
||||
# Re-enable
|
||||
sudo ufw enable
|
||||
|
||||
# Reload after editing /etc/ufw/ files directly
|
||||
sudo ufw reload
|
||||
```
|
||||
|
||||
## What to do if the firewall locks you out
|
||||
|
||||
Worst case: you apply a rule that blocks your own SSH, UFW enables it
|
||||
immediately, and you can't log back in. Recovery:
|
||||
|
||||
1. Hetzner Cloud Console → Server → Rescue mode
|
||||
2. Boot into rescue, mount the disk
|
||||
3. Edit `/etc/ufw/user.rules` to remove the bad rule
|
||||
4. Reboot back into normal mode
|
||||
|
||||
This has never happened to us but it's the escape hatch. The Console is
|
||||
always a TLS login away.
|
||||
|
||||
## References
|
||||
|
||||
- [UFW man page][ufw-man]
|
||||
- [K3s networking requirements][k3s-reqs]
|
||||
- [Kubernetes ports and protocols][k8s-ports]
|
||||
- [Cloudflare IP ranges][cf-ips]
|
||||
|
||||
[ufw-man]: https://manpages.ubuntu.com/manpages/noble/en/man8/ufw.8.html
|
||||
[k3s-reqs]: https://docs.k3s.io/installation/requirements#networking
|
||||
[k8s-ports]: https://kubernetes.io/docs/reference/networking/ports-and-protocols/
|
||||
[cf-ips]: https://www.cloudflare.com/ips/
|
||||
@@ -0,0 +1,526 @@
|
||||
# 05 — Security
|
||||
|
||||
## Summary
|
||||
|
||||
Security on this deployment is layered: Cloudflare at the edge, UFW at
|
||||
the node, k3s RBAC + Pod Security at the orchestrator, TLS between
|
||||
long-haul components, and dedicated service accounts with dropped
|
||||
capabilities inside containers. This chapter documents each layer, the
|
||||
rationale, and what's currently missing (and why).
|
||||
|
||||
## Threat model
|
||||
|
||||
Who we're defending against, in rough order of likelihood:
|
||||
|
||||
1. **Opportunistic scanners** — bots scanning random IPv4 ranges for
|
||||
known vulnerabilities. Mitigated by the firewall.
|
||||
2. **Credential stuffing / brute-force** — especially against SSH and
|
||||
admin login. Mitigated by key-only SSH, strong passwords, rate limits.
|
||||
3. **Compromised external service** — if Neon, Backblaze, or Cloudflare
|
||||
were breached, attacker would have access to whatever we store there.
|
||||
Mitigated by scoped credentials, least-privilege API keys.
|
||||
4. **Compromised container image** — if Gitea or our build pipeline
|
||||
were compromised, malicious code could reach prod. Mitigated by
|
||||
(a) Gitea is behind authentication, (b) image pull secrets scoped,
|
||||
(c) containers run non-root with minimal capabilities.
|
||||
5. **Insider threat** — not really a threat for a solo operator.
|
||||
6. **State actor** — not in threat model. At our scale this is
|
||||
effectively unaddressable without becoming a security company.
|
||||
|
||||
Explicitly **not** in threat model:
|
||||
- DDoS at a scale that saturates Cloudflare. We pay $0 for CF; their
|
||||
DDoS mitigation is included but not unlimited. If we got hit with a
|
||||
large attack, we'd move to a paid plan.
|
||||
- Physical access to Hetzner datacenters. That's their problem.
|
||||
|
||||
## Layer 1 — Cloudflare edge
|
||||
|
||||
Cloudflare sits in front of every public request.
|
||||
|
||||
### What Cloudflare does for us
|
||||
|
||||
| Protection | How it works |
|
||||
|---|---|
|
||||
| TLS termination | CF presents a cert for `*.myhoneydue.com`; clients encrypt to CF |
|
||||
| DDoS mitigation | Automatic on all plans including Free |
|
||||
| Bot filtering | "Under Attack" mode + bot score based blocking |
|
||||
| IP concealment | Origin IPs not in DNS; attackers can't directly scan |
|
||||
| WAF rules | CF Free includes managed ruleset for common exploits |
|
||||
| Rate limiting | Free tier: 10k requests/10min; more on paid plans |
|
||||
|
||||
### What Cloudflare does **not** do
|
||||
|
||||
- **Authenticate users** — that's the app's job
|
||||
- **Authorize requests** — that's the app's job
|
||||
- **Protect origin if origin IP leaks** — once someone knows a node IP
|
||||
they can bypass CF. Mitigation: keep origin firewall strict (Chapter 4).
|
||||
- **Encrypt between CF and origin** — we're on SSL=Flexible, so CF↔origin
|
||||
is HTTP. This is in our TODO (Chapter 20, upgrade to Full-strict).
|
||||
|
||||
### The proxy-IP problem
|
||||
|
||||
Cloudflare publishes its IP ranges
|
||||
([cloudflare.com/ips](https://www.cloudflare.com/ips/)). Any client can
|
||||
verify a request came from a CF IP by checking the remote address. Our
|
||||
Traefik is configured to trust `X-Forwarded-Proto` (so the Go API sees
|
||||
`https` even though origin received HTTP) only from CF IP ranges:
|
||||
|
||||
```yaml
|
||||
# deploy-k3s/manifests/traefik-helmchartconfig.yaml
|
||||
additionalArguments:
|
||||
- "--entrypoints.web.forwardedHeaders.trustedIPs=173.245.48.0/20,..."
|
||||
```
|
||||
|
||||
This means a malicious request that bypasses CF (by hitting the node IP
|
||||
directly) can't spoof headers — Traefik ignores `X-Forwarded-*` unless
|
||||
the source IP is in CF's ranges.
|
||||
|
||||
**TODO** (Chapter 20): Enforce at UFW level — allow 80/tcp only from
|
||||
CF IP ranges. Today any IP can reach the origin on port 80.
|
||||
|
||||
## Layer 2 — Node (OS, SSH, firewall)
|
||||
|
||||
Each node runs Ubuntu 24.04.3 LTS with:
|
||||
|
||||
### SSH hardening
|
||||
|
||||
`/etc/ssh/sshd_config` on each node:
|
||||
|
||||
```
|
||||
Port 22
|
||||
PermitRootLogin no
|
||||
PasswordAuthentication no
|
||||
PubkeyAuthentication yes
|
||||
AllowUsers deploy
|
||||
```
|
||||
|
||||
Result:
|
||||
- Only the `deploy` user can log in
|
||||
- Only with a public key (no password)
|
||||
- Root cannot log in remotely
|
||||
|
||||
The public key authorized for `deploy`:
|
||||
|
||||
```
|
||||
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBU9xTTBD78tYUqHijgyU9PDqtmS4NuM/6uy8XgDzva+ hetzner2@myhoneydue.com
|
||||
```
|
||||
|
||||
(Note: the comment field says "hetzner2" but it's the key for all three
|
||||
nodes — the comment is the key's identifier, not a restriction.)
|
||||
|
||||
Private key is at `~/.ssh/hetzner` on the operator workstation.
|
||||
|
||||
### Sudo
|
||||
|
||||
The `deploy` user has unrestricted sudo with no password
|
||||
(`/etc/sudoers.d/deploy`):
|
||||
|
||||
```
|
||||
deploy ALL=(ALL) NOPASSWD: ALL
|
||||
```
|
||||
|
||||
This is convenient but broad. A compromise of the `deploy` SSH key =
|
||||
root on the node. Mitigations:
|
||||
- Key is stored only on the operator workstation, not checked into git
|
||||
- Operator workstation has disk encryption (macOS FileVault)
|
||||
- Operator workstation has a passphrase for the key (ssh-agent cache)
|
||||
|
||||
Future hardening: scope sudo to specific commands that deploy workflows
|
||||
need (e.g., `/usr/sbin/ufw`, `/usr/bin/systemctl`), but this requires
|
||||
enumerating every command we might run, which breaks ad-hoc debugging.
|
||||
|
||||
### fail2ban
|
||||
|
||||
**Not installed.** fail2ban would ban IPs that fail SSH auth repeatedly.
|
||||
Because we disable password auth entirely, the attack surface is tiny (an
|
||||
attacker with the private key wins; failed-public-key attempts are
|
||||
functionally DDoS, not credential-stuffing). Installing fail2ban is on
|
||||
the TODO list anyway because it buys us rate-limiting on SSH bot noise.
|
||||
|
||||
### unattended-upgrades
|
||||
|
||||
**Not installed.** Security patches require manual `apt upgrade`. This is
|
||||
a gap. Install and configure for security-only updates as soon as time
|
||||
permits.
|
||||
|
||||
### UFW firewall
|
||||
|
||||
See [Chapter 4](./04-firewall.md) for the complete ruleset. Summary:
|
||||
default-deny incoming, specific allows for SSH (22), HTTP (80), HTTPS
|
||||
(443), k3s API from operator IP (6443), and inter-node cluster ports.
|
||||
|
||||
## Layer 3 — Kubernetes RBAC
|
||||
|
||||
K3s inherits full Kubernetes RBAC. Every component that talks to the API
|
||||
server has a ServiceAccount with only the permissions it needs.
|
||||
|
||||
### System accounts
|
||||
|
||||
K3s creates these by default:
|
||||
- `kube-system:admin` — cluster admin, used by `kubectl`
|
||||
- `kube-system:coredns` — for CoreDNS
|
||||
- `kube-system:traefik` — for Traefik ingress controller
|
||||
- `kube-system:helm-install-traefik` — for the Helm chart installer
|
||||
|
||||
We don't touch these.
|
||||
|
||||
### Application service accounts
|
||||
|
||||
Our `rbac.yaml` creates four ServiceAccounts in the `honeydue` namespace:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: api
|
||||
namespace: honeydue
|
||||
automountServiceAccountToken: false # ← important
|
||||
```
|
||||
|
||||
Same for `admin`, `worker`, `redis`.
|
||||
|
||||
**`automountServiceAccountToken: false`** means pods don't get a k8s
|
||||
API token mounted in `/var/run/secrets/kubernetes.io/serviceaccount/`.
|
||||
Without it, a compromised pod cannot query the Kubernetes API even if
|
||||
the default service account has broad permissions.
|
||||
|
||||
### What the app pods CAN'T do
|
||||
|
||||
Our app service accounts have **no RoleBindings or ClusterRoleBindings**.
|
||||
They cannot:
|
||||
- List, get, create, update, delete any Kubernetes resource
|
||||
- Read other namespaces' secrets
|
||||
- Schedule workloads
|
||||
- View cluster state
|
||||
|
||||
If the api container were fully compromised (RCE), the attacker would
|
||||
have:
|
||||
- Network access to other pods in the `honeydue` namespace (Chapter 16)
|
||||
- Read access to our ConfigMap + Secrets (mounted into the container)
|
||||
- No ability to pivot to other parts of the cluster via the k8s API
|
||||
|
||||
## Layer 4 — Pod Security
|
||||
|
||||
Every pod runs with restrictive security context:
|
||||
|
||||
```yaml
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000 # api; different per service
|
||||
runAsGroup: 1000
|
||||
fsGroup: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
|
||||
containers:
|
||||
- securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: true
|
||||
capabilities:
|
||||
drop: ["ALL"]
|
||||
```
|
||||
|
||||
### What each setting does
|
||||
|
||||
| Setting | Effect |
|
||||
|---|---|
|
||||
| `runAsNonRoot: true` | Pod refuses to start if the image's default user is root |
|
||||
| `runAsUser: 1000` | Override to UID 1000 (app user) |
|
||||
| `allowPrivilegeEscalation: false` | Process cannot become root via setuid, ptrace, etc. |
|
||||
| `readOnlyRootFilesystem: true` | `/` is read-only; writes require explicit volumes |
|
||||
| `capabilities: drop: [ALL]` | No Linux capabilities (NET_ADMIN, SYS_TIME, etc.) |
|
||||
| `seccompProfile: RuntimeDefault` | Restrict syscalls to containerd's default seccomp allowlist |
|
||||
|
||||
Read-only root means our app images must declare writable volumes for
|
||||
anything mutable:
|
||||
|
||||
```yaml
|
||||
volumeMounts:
|
||||
- name: tmp
|
||||
mountPath: /tmp
|
||||
volumes:
|
||||
- name: tmp
|
||||
emptyDir:
|
||||
sizeLimit: 64Mi
|
||||
```
|
||||
|
||||
If the app needs to write somewhere else (e.g., Next.js cache), we mount
|
||||
an emptyDir there explicitly.
|
||||
|
||||
### Traefik exception
|
||||
|
||||
Traefik needs `CAP_NET_BIND_SERVICE` to bind ports 80/443 on the host
|
||||
network. Its security context adds just that one capability back:
|
||||
|
||||
```yaml
|
||||
securityContext:
|
||||
capabilities:
|
||||
drop: [ALL]
|
||||
add: [NET_BIND_SERVICE]
|
||||
readOnlyRootFilesystem: true
|
||||
runAsGroup: 65532
|
||||
runAsNonRoot: true
|
||||
runAsUser: 65532
|
||||
```
|
||||
|
||||
The `net.ipv4.ip_unprivileged_port_start=0` sysctl on the nodes
|
||||
complements this — on older kernels NET_BIND_SERVICE alone isn't enough
|
||||
in the host netns.
|
||||
|
||||
### Pod Security Admission (PSA)
|
||||
|
||||
Kubernetes has a built-in admission controller for enforcing Pod Security
|
||||
Standards at the namespace level:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: honeydue
|
||||
labels:
|
||||
pod-security.kubernetes.io/enforce: restricted
|
||||
pod-security.kubernetes.io/enforce-version: latest
|
||||
```
|
||||
|
||||
We **don't currently set this**. We get the equivalent effect from
|
||||
the explicit securityContext on each pod, but namespace-level enforcement
|
||||
would catch new workloads that forget to set it. **TODO** (Chapter 20).
|
||||
|
||||
## Layer 5 — Network Policies
|
||||
|
||||
The `deploy-k3s/manifests/network-policies.yaml` scaffold defines:
|
||||
|
||||
- **default-deny-all** — deny all ingress and egress by default in the
|
||||
`honeydue` namespace
|
||||
- **allow-dns** — allow egress UDP/TCP 53 to CoreDNS
|
||||
- **allow-ingress-to-api** — allow Traefik (`kube-system` namespace) to
|
||||
reach api pods on port 8000
|
||||
- **allow-ingress-to-admin** — same, for admin:3000
|
||||
|
||||
**These are not currently applied.** Without them, our pods can freely
|
||||
talk to anything — including, theoretically, malicious destinations if
|
||||
an attacker gets RCE inside a pod.
|
||||
|
||||
**TODO** (Chapter 20): Apply network policies. The scaffold is there; we
|
||||
just need to `kubectl apply -f deploy-k3s/manifests/network-policies.yaml`
|
||||
and test that nothing breaks.
|
||||
|
||||
### What network policies would prevent
|
||||
|
||||
| Attack scenario | NetworkPolicy blocks |
|
||||
|---|---|
|
||||
| Pod A compromised, attacker SSHs sideways to pod B | Yes (explicit allow needed) |
|
||||
| Pod RCE → scan internal networks | Yes (default deny egress) |
|
||||
| Pod RCE → exfil to attacker's C2 | Yes (outbound to internet needs egress rule) |
|
||||
|
||||
Without policies, all of these work.
|
||||
|
||||
## TLS and encryption
|
||||
|
||||
### CF ↔ user
|
||||
|
||||
Always TLS 1.2+ (CF doesn't support older). CF presents an automatically-
|
||||
renewed Let's Encrypt or CF-managed cert for `*.myhoneydue.com`.
|
||||
|
||||
### CF ↔ origin
|
||||
|
||||
**Plaintext HTTP** (SSL = Flexible). An attacker with access to the
|
||||
Cloudflare-to-Hetzner path could read traffic. In practice nobody who
|
||||
isn't Cloudflare or Hetzner sits on that path.
|
||||
|
||||
**TODO** (Chapter 20): Upgrade to SSL = Full (strict) with a Cloudflare
|
||||
Origin CA certificate. This encrypts CF ↔ origin and verifies that
|
||||
origin's cert is the CF-issued one (prevents MitM if DNS is compromised).
|
||||
|
||||
### API ↔ Neon Postgres
|
||||
|
||||
**TLS 1.3** via `DB_SSLMODE=require`. The Go app's postgres driver (pgx)
|
||||
negotiates TLS and verifies Neon's cert against the system CA bundle.
|
||||
Connection fails if TLS can't be established.
|
||||
|
||||
### API ↔ Backblaze B2
|
||||
|
||||
**HTTPS** (B2 doesn't support HTTP). `B2_USE_SSL=true` in our ConfigMap
|
||||
(though actually the app reads `STORAGE_USE_SSL` — see Chapter 9 for this
|
||||
vestigial variable's story).
|
||||
|
||||
### Worker ↔ Fastmail SMTP
|
||||
|
||||
**STARTTLS** on port 587. The Go `wneessen/go-mail` library uses
|
||||
`TLSOpportunistic` mode — which means it connects plain then upgrades via
|
||||
STARTTLS. Fastmail always supports STARTTLS, so in practice every
|
||||
connection is encrypted.
|
||||
|
||||
### API/worker ↔ Redis
|
||||
|
||||
**Plaintext** inside the cluster. Redis 7 supports TLS (redis-tls.conf,
|
||||
`redis-server --tls-port`), but we haven't enabled it because Redis is
|
||||
on the overlay network, not exposed externally, and only holds cache +
|
||||
queue state.
|
||||
|
||||
### Pod-to-pod (Flannel overlay)
|
||||
|
||||
**Plaintext VXLAN** over Hetzner's public network. See
|
||||
[Chapter 3 §Layer 3](./03-networking.md#layer-3--pod-overlay-flannel-vxlan).
|
||||
TODO to switch to WireGuard backend.
|
||||
|
||||
## Secrets management
|
||||
|
||||
### Kubernetes Secrets
|
||||
|
||||
Our k8s Secrets are stored in etcd. etcd-at-rest encryption is **not
|
||||
currently enabled** — a compromise of the etcd data directory would
|
||||
expose Secret values. Given:
|
||||
- Nodes have disk encryption at the Hetzner hypervisor layer
|
||||
- Attacker needs root on the node to read etcd
|
||||
- Our operator access is already root-via-sudo
|
||||
|
||||
This is an accepted risk. **TODO** (Chapter 20): enable encryption at rest
|
||||
for etcd. K3s supports it via `--secrets-encryption` flag on the server.
|
||||
|
||||
### What Secrets we have
|
||||
|
||||
```
|
||||
$ kubectl get secrets -n honeydue
|
||||
NAME TYPE DATA AGE
|
||||
gitea-credentials kubernetes.io/dockerconfigjson 1 ...
|
||||
honeydue-apns-key Opaque 1 ...
|
||||
honeydue-secrets Opaque 9 ...
|
||||
```
|
||||
|
||||
Contents:
|
||||
|
||||
| Secret | Key | Source |
|
||||
|---|---|---|
|
||||
| `gitea-credentials` | `.dockerconfigjson` | PAT for Gitea registry (image pulls) |
|
||||
| `honeydue-apns-key` | `apns_auth_key.p8` | Placeholder p8 file (push off) |
|
||||
| `honeydue-secrets` | `POSTGRES_PASSWORD` | Neon DB password |
|
||||
| `honeydue-secrets` | `SECRET_KEY` | 64-char random, app signing key |
|
||||
| `honeydue-secrets` | `EMAIL_HOST_PASSWORD` | Fastmail app password |
|
||||
| `honeydue-secrets` | `FCM_SERVER_KEY` | "disabled-no-push-accounts-yet" placeholder |
|
||||
| `honeydue-secrets` | `REDIS_PASSWORD` | Empty (no auth on internal Redis) |
|
||||
| `honeydue-secrets` | `B2_KEY_ID` | B2 app key ID |
|
||||
| `honeydue-secrets` | `B2_APP_KEY` | B2 app key secret |
|
||||
| `honeydue-secrets` | `ADMIN_EMAIL` | `admin@myhoneydue.com` |
|
||||
| `honeydue-secrets` | `ADMIN_PASSWORD` | Generated 24-char initial admin password |
|
||||
|
||||
### Source of truth
|
||||
|
||||
The Secret values came from:
|
||||
- `deploy/secrets/*.txt` files on the operator workstation (gitignored)
|
||||
- `deploy/prod.env` (gitignored)
|
||||
- `deploy/registry.env` (gitignored)
|
||||
|
||||
These Swarm-era files are still the canonical source. If you need to
|
||||
recreate Secrets in a new cluster:
|
||||
|
||||
```bash
|
||||
cd honeyDueAPI-go
|
||||
kubectl create secret generic honeydue-secrets -n honeydue \
|
||||
--from-literal=POSTGRES_PASSWORD="$(cat deploy/secrets/postgres_password.txt)" \
|
||||
--from-literal=SECRET_KEY="$(cat deploy/secrets/secret_key.txt)" \
|
||||
--from-literal=EMAIL_HOST_PASSWORD="$(cat deploy/secrets/email_host_password.txt)" \
|
||||
...
|
||||
```
|
||||
|
||||
The full recreation script is in Chapter 17 (Runbook).
|
||||
|
||||
### Secret rotation
|
||||
|
||||
Not automated. To rotate (e.g., after a compromise):
|
||||
|
||||
1. Generate new value: `openssl rand -base64 32`
|
||||
2. Update the secret:
|
||||
```bash
|
||||
kubectl create secret generic honeydue-secrets -n honeydue \
|
||||
--from-literal=SECRET_KEY='new-value' \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
```
|
||||
3. Restart dependent pods:
|
||||
```bash
|
||||
kubectl rollout restart -n honeydue deploy/api deploy/worker
|
||||
```
|
||||
4. Update `deploy/secrets/secret_key.txt` to match
|
||||
5. Revoke the old credential at the source (Neon, Fastmail, etc.)
|
||||
|
||||
## Container image provenance
|
||||
|
||||
Images come from `gitea.treytartt.com/admin/*`. We have **no image
|
||||
signing or verification** (cosign/sigstore) in place. A compromise of
|
||||
the Gitea registry = the ability to push malicious images that would be
|
||||
pulled into prod on the next rollout.
|
||||
|
||||
Mitigations:
|
||||
- Gitea itself is behind login; PAT is scoped to read:packages +
|
||||
write:packages only
|
||||
- Gitea runs on the operator's infrastructure (same operator account)
|
||||
- Image tags are SHA-pinned (`:237c6b8`) not `:latest` → attacker can't
|
||||
replace an existing tag's image without us noticing the digest change
|
||||
|
||||
**TODO** (Chapter 20): Add cosign signing at build time, verify at pull
|
||||
time.
|
||||
|
||||
## Operator workstation security
|
||||
|
||||
The operator workstation has:
|
||||
- macOS with FileVault (full disk encryption)
|
||||
- Login password required
|
||||
- Private keys in `~/.ssh/` (mode 0600)
|
||||
- Kubeconfig at `~/.kube/honeydue-k3s.yaml` (mode 0600) — contains a bearer
|
||||
token to the cluster
|
||||
|
||||
**Losing the laptop would require immediate credential rotation:**
|
||||
- New SSH key, redeploy public part on all 3 nodes
|
||||
- New kubeconfig: run `sudo cat /etc/rancher/k3s/k3s.yaml` on hetzner1,
|
||||
copy to workstation, update `KUBECONFIG` env
|
||||
- Rotate operator-access PATs on Gitea, Neon, Cloudflare, Backblaze
|
||||
|
||||
## Compliance notes
|
||||
|
||||
This stack is **not currently certified** for:
|
||||
- HIPAA — we transit and store health-related data but haven't contractually
|
||||
bound any BAA
|
||||
- SOC 2 — no auditing, no documented controls beyond this document
|
||||
- PCI-DSS — we don't handle card data; Apple/Google IAP handles payments
|
||||
- GDPR — we follow GDPR best practices (data minimization, user deletion)
|
||||
but haven't had a formal assessment
|
||||
|
||||
If honeyDue ever needs any of these, the infrastructure is compatible
|
||||
but the operational processes around it would need formal work.
|
||||
|
||||
## Operator cheat sheet
|
||||
|
||||
```bash
|
||||
# See all RBAC-related resources in a namespace
|
||||
kubectl get sa,role,rolebinding -n honeydue
|
||||
|
||||
# Check what a ServiceAccount can do
|
||||
kubectl auth can-i --list --as=system:serviceaccount:honeydue:api -n honeydue
|
||||
|
||||
# Verify pod is running with expected security context
|
||||
kubectl get pod <pod> -n honeydue -o jsonpath='{.spec.securityContext}'
|
||||
kubectl get pod <pod> -n honeydue -o jsonpath='{.spec.containers[0].securityContext}'
|
||||
|
||||
# List all Secrets (without revealing content)
|
||||
kubectl get secret -n honeydue
|
||||
kubectl describe secret honeydue-secrets -n honeydue # shows keys, not values
|
||||
|
||||
# Decode a secret (CAREFUL: prints plaintext)
|
||||
kubectl get secret honeydue-secrets -n honeydue -o jsonpath='{.data.SECRET_KEY}' | base64 -d
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Kubernetes Pod Security Standards][psa]
|
||||
- [Kubernetes RBAC][rbac]
|
||||
- [Kubernetes NetworkPolicy][netpol]
|
||||
- [Cloudflare IP ranges][cf-ips]
|
||||
- [K3s secrets encryption][k3s-secrets]
|
||||
- [SSH hardening guide][ssh-guide]
|
||||
|
||||
[psa]: https://kubernetes.io/docs/concepts/security/pod-security-standards/
|
||||
[rbac]: https://kubernetes.io/docs/reference/access-authn-authz/rbac/
|
||||
[netpol]: https://kubernetes.io/docs/concepts/services-networking/network-policies/
|
||||
[cf-ips]: https://www.cloudflare.com/ips/
|
||||
[k3s-secrets]: https://docs.k3s.io/security/secrets-encryption
|
||||
[ssh-guide]: https://linux-audit.com/audit-and-harden-your-ssh-configuration/
|
||||
@@ -0,0 +1,419 @@
|
||||
# 06 — Traefik Ingress
|
||||
|
||||
## Summary
|
||||
|
||||
Traefik is the reverse proxy that routes external HTTP requests to the
|
||||
right application pod based on the `Host:` header. We run Traefik v3 as a
|
||||
Kubernetes DaemonSet with `hostNetwork: true` — each of the three nodes
|
||||
has its own Traefik pod listening directly on the node's `:80`/`:443`.
|
||||
Cloudflare round-robins DNS across the three node IPs, so any node can
|
||||
serve any request. No external load balancer.
|
||||
|
||||
## Why Traefik
|
||||
|
||||
K3s bundles Traefik by default. The alternatives:
|
||||
|
||||
| Option | Pros | Cons |
|
||||
|---|---|---|
|
||||
| **Traefik v3 (bundled)** | Zero install, excellent k8s integration, middleware system, active development | Helm-driven config is indirect |
|
||||
| NGINX Ingress | Most popular, battle-tested | Another thing to install, more config surface |
|
||||
| HAProxy Ingress | Extremely performant | More hands-on, older docs |
|
||||
| Caddy | Simple config, auto-HTTPS | `caddy-docker-proxy` / Ingress integration is less mature |
|
||||
| Envoy / Istio | Most featureful | Massive overkill at our scale |
|
||||
|
||||
Traefik came "free" with K3s, does the job, and its
|
||||
[Swarm provider][traefik-swarm] is what we would have used if we'd
|
||||
fixed our Swarm architecture. Using it on k3s keeps the mental model
|
||||
consistent.
|
||||
|
||||
## Deployment model
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph CF[Cloudflare edge]
|
||||
DNS[DNS A records:<br/>api.myhoneydue.com → 3 node IPs<br/>admin.myhoneydue.com → 3 node IPs]
|
||||
end
|
||||
|
||||
subgraph N1[hetzner1]
|
||||
T1[Traefik pod<br/>hostNetwork:true<br/>:80/:443]
|
||||
kernel1[Linux kernel<br/>net.ipv4.ip_unprivileged_port_start=0]
|
||||
end
|
||||
subgraph N2[hetzner2]
|
||||
T2[Traefik pod<br/>hostNetwork:true<br/>:80/:443]
|
||||
kernel2[Linux kernel]
|
||||
end
|
||||
subgraph N3[hetzner3]
|
||||
T3[Traefik pod<br/>hostNetwork:true<br/>:80/:443]
|
||||
kernel3[Linux kernel]
|
||||
end
|
||||
|
||||
subgraph Cluster[k3s cluster services]
|
||||
APISvc[api Service :8000]
|
||||
AdminSvc[admin Service :3000]
|
||||
end
|
||||
|
||||
DNS -. HTTP :80 .-> T1 & T2 & T3
|
||||
T1 & T2 & T3 -- reverse_proxy --> APISvc & AdminSvc
|
||||
```
|
||||
|
||||
### ASCII fallback
|
||||
|
||||
```
|
||||
Cloudflare DNS
|
||||
┌───────────────────┐
|
||||
│ api → 3 IPs │
|
||||
│ admin→ 3 IPs │
|
||||
└─────────┬─────────┘
|
||||
│ HTTP :80
|
||||
┌───────────────────┼───────────────────┐
|
||||
▼ ▼ ▼
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐
|
||||
│ hetzner1 │ │ hetzner2 │ │ hetzner3 │
|
||||
│ Traefik │ │ Traefik │ │ Traefik │
|
||||
│ :80/443 │ │ :80/443 │ │ :80/443 │
|
||||
│(hostNet) │ │(hostNet) │ │(hostNet) │
|
||||
└────┬─────┘ └────┬─────┘ └────┬─────┘
|
||||
│ │ │
|
||||
└── ClusterIP ──────┼── ClusterIP ──────┘
|
||||
▼
|
||||
┌────────────────────────┐
|
||||
│ api Service :8000 │
|
||||
│ admin Service :3000 │
|
||||
└────────────────────────┘
|
||||
```
|
||||
|
||||
## Why DaemonSet + hostNetwork
|
||||
|
||||
**What we're trying to achieve**: Any public-facing node should answer
|
||||
:80/:443. Cloudflare round-robins DNS; whichever node it picks, that
|
||||
node must serve.
|
||||
|
||||
**The default k3s Traefik deployment** is a single-replica Deployment
|
||||
exposed via a LoadBalancer Service. That requires either:
|
||||
- Hetzner Load Balancer (+ $8.49/mo, another thing to manage), **or**
|
||||
- K3s' built-in `servicelb` (klipper-lb) which binds node ports
|
||||
dynamically to proxy to the Service
|
||||
|
||||
Neither was quite what we wanted. With three replicas of the stock Traefik
|
||||
behind klipper-lb, each Traefik pod is reachable but there's an extra hop
|
||||
through klipper's proxy daemon.
|
||||
|
||||
**DaemonSet + hostNetwork** is cleaner: each Traefik pod *is* the host's
|
||||
:80/:443. No proxy daemon, no LB Service, no VIP. Cloudflare DNS →
|
||||
node IP → kernel → Traefik, one hop.
|
||||
|
||||
### Trade-offs of hostNetwork
|
||||
|
||||
**Pro:**
|
||||
- One fewer layer of indirection; lower latency
|
||||
- No Service needed; no kube-proxy in the ingress path
|
||||
- Standard Cloudflare round-robin DNS is the failover mechanism
|
||||
|
||||
**Con:**
|
||||
- Traefik is in the host netns; it sees the node's interfaces, not
|
||||
the cluster overlay
|
||||
- Traefik still joins the cluster-DNS resolution (via `hostNetwork`'s
|
||||
default DNS policy) so it can resolve Service names like `api`
|
||||
- Port conflicts possible if anything else wants :80/:443 on the node
|
||||
(nothing else does in our setup)
|
||||
|
||||
### Trade-offs of DaemonSet
|
||||
|
||||
**Pro:**
|
||||
- One Traefik per node; matches our Cloudflare 3-IP round-robin
|
||||
exactly
|
||||
- Any node down = Cloudflare's origin health checks route around it
|
||||
|
||||
**Con:**
|
||||
- Updates require `maxUnavailable > 0` (host ports conflict during
|
||||
surge) → brief moment where one node is down during rollout
|
||||
- 3× the memory usage vs. 1-replica Deployment (but Traefik is tiny
|
||||
— ~128 MB total across all three)
|
||||
|
||||
## Our Traefik configuration
|
||||
|
||||
We reconfigure the bundled K3s Traefik via a `HelmChartConfig`. K3s
|
||||
uses the `helm-controller` to manage bundled addons; `HelmChartConfig`
|
||||
lets us override values without disabling-and-replacing the chart.
|
||||
|
||||
Full config at
|
||||
`deploy-k3s/manifests/traefik-helmchartconfig.yaml`. Key settings:
|
||||
|
||||
```yaml
|
||||
apiVersion: helm.cattle.io/v1
|
||||
kind: HelmChartConfig
|
||||
metadata:
|
||||
name: traefik
|
||||
namespace: kube-system
|
||||
spec:
|
||||
valuesContent: |-
|
||||
deployment:
|
||||
kind: DaemonSet # was Deployment
|
||||
hostNetwork: true
|
||||
service:
|
||||
enabled: false # no LoadBalancer Service
|
||||
ports:
|
||||
web:
|
||||
port: 80
|
||||
hostPort: 80
|
||||
websecure:
|
||||
port: 443
|
||||
hostPort: 443
|
||||
updateStrategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxUnavailable: 1
|
||||
maxSurge: 0
|
||||
securityContext:
|
||||
capabilities:
|
||||
drop: [ALL]
|
||||
add: [NET_BIND_SERVICE]
|
||||
readOnlyRootFilesystem: true
|
||||
runAsGroup: 65532
|
||||
runAsNonRoot: true
|
||||
runAsUser: 65532
|
||||
additionalArguments:
|
||||
- "--entrypoints.web.forwardedHeaders.trustedIPs=<CF ranges>"
|
||||
```
|
||||
|
||||
### Why each setting
|
||||
|
||||
- **`kind: DaemonSet`** — one Traefik per node. Default is a Deployment
|
||||
with 1 replica.
|
||||
- **`hostNetwork: true`** — Traefik runs in the host's network namespace
|
||||
so it can bind real :80/:443 on the node.
|
||||
- **`service.enabled: false`** — no LoadBalancer Service is created.
|
||||
With `hostNetwork`, we don't need one.
|
||||
- **`ports.*.hostPort`** — explicit host port binding. Matches the
|
||||
container port (DaemonSet semantics with `hostPort: 80` ensure the
|
||||
kubelet schedules at most one Traefik per node).
|
||||
- **`updateStrategy.maxUnavailable: 1, maxSurge: 0`** — we accept one
|
||||
node being down during a Traefik update (host port can't be shared).
|
||||
The Traefik Helm chart rejects this config combination with
|
||||
`maxSurge > 0` — this was the second config iteration.
|
||||
- **Security context** — non-root (UID 65532), read-only root filesystem,
|
||||
only `NET_BIND_SERVICE` capability. See Chapter 5.
|
||||
- **`forwardedHeaders.trustedIPs`** — Cloudflare's IP ranges. Traefik
|
||||
trusts `X-Forwarded-Proto` et al. only from these ranges, so a
|
||||
bypassing client can't spoof the proto header.
|
||||
|
||||
### Forwarded-headers trustedIPs
|
||||
|
||||
The full list of trusted CF ranges is in our `additionalArguments`. It's
|
||||
the union of CF's published IPv4 and IPv6 ranges. When Cloudflare passes
|
||||
a request to origin, it adds `X-Forwarded-For` and `X-Forwarded-Proto`
|
||||
headers; Traefik only honors these if the request came from one of these
|
||||
IPs. Every other client's headers are ignored.
|
||||
|
||||
If CF publishes new IP ranges (rare but possible), the
|
||||
`trustedIPs` list needs updating. It's a raw string in our
|
||||
HelmChartConfig — we'd need to edit, apply, and bump the helm job.
|
||||
|
||||
## Traefik v3 vs v2
|
||||
|
||||
K3s ships Traefik v3 (currently `3.6.10`). The v2 → v3 migration
|
||||
changed a few things:
|
||||
- `swarmMode` removed (replaced by a `swarm` provider, but we don't
|
||||
use Swarm anyway)
|
||||
- Encoded-character handling changed (v3 warns about RFC 3986 handling;
|
||||
we ignore the warning)
|
||||
- Middleware CRD group is `traefik.io/v1alpha1` (was `containo.us`)
|
||||
|
||||
Our deployment handles all of this automatically via the bundled
|
||||
chart.
|
||||
|
||||
## Ingress resources
|
||||
|
||||
We define two standard k8s `Ingress` resources in
|
||||
`deploy-k3s/manifests/ingress/ingress-simple.yaml`:
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: honeydue-api
|
||||
namespace: honeydue
|
||||
spec:
|
||||
ingressClassName: traefik
|
||||
rules:
|
||||
- host: api.myhoneydue.com
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service: {name: api, port: {number: 8000}}
|
||||
- host: myhoneydue.com
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service: {name: api, port: {number: 8000}}
|
||||
---
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: honeydue-admin
|
||||
namespace: honeydue
|
||||
spec:
|
||||
ingressClassName: traefik
|
||||
rules:
|
||||
- host: admin.myhoneydue.com
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service: {name: admin, port: {number: 3000}}
|
||||
```
|
||||
|
||||
Traefik watches for Ingress resources with `ingressClassName: traefik`
|
||||
and programs its router table accordingly. Changes are applied within
|
||||
seconds — no restart needed.
|
||||
|
||||
### What pathType: Prefix means
|
||||
|
||||
Every request starting with `/` matches (which is everything). Alternative
|
||||
is `Exact` (matches only the literal path). `Prefix` is the default for
|
||||
most Ingress controllers and matches how users think about URL routing.
|
||||
|
||||
## How requests flow
|
||||
|
||||
1. **Cloudflare DNS** resolves `api.myhoneydue.com` to one of three IPs
|
||||
(round-robin). Say it picks `178.105.32.198` (hetzner2).
|
||||
2. **Cloudflare edge** establishes TCP to `178.105.32.198:80` (plain HTTP,
|
||||
SSL=Flexible). Original HTTPS terminated at CF.
|
||||
3. **UFW on hetzner2** accepts the SYN (80/tcp open from anywhere).
|
||||
4. **Linux kernel** sees a listener on 0.0.0.0:80 (the Traefik pod).
|
||||
Hands off the SYN.
|
||||
5. **Traefik accepts** the connection. Reads the HTTP request.
|
||||
6. **Traefik matches** the `Host:` header against its router table.
|
||||
`Host: api.myhoneydue.com` → `honeydue-api` Ingress → `api` Service.
|
||||
7. **Traefik dials** `10.43.167.83:8000` (api Service ClusterIP). This
|
||||
goes through the cluster DNS (CoreDNS) and kube-proxy (IPVS).
|
||||
8. **kube-proxy IPVS** rewrites the destination to a live api pod endpoint
|
||||
— say `10.42.2.6:8000` (api pod on hetzner3).
|
||||
9. **Flannel VXLAN** encapsulates the packet and sends to hetzner3
|
||||
(UDP :8472 between node IPs).
|
||||
10. **hetzner3's kernel** decapsulates, delivers to the api pod.
|
||||
11. **api pod** processes, returns response.
|
||||
12. **Response flows back** the reverse path.
|
||||
|
||||
Cloudflare caches 200 responses at the edge (default TTL varies; for
|
||||
HTML/JSON usually 0 unless we set `Cache-Control` headers). So the
|
||||
second request for the same URL might not reach the origin at all.
|
||||
|
||||
## Middleware (mostly unused)
|
||||
|
||||
Traefik supports middleware — small functions run before/after the proxy.
|
||||
The `deploy-k3s/manifests/ingress/middleware.yaml` scaffold defines:
|
||||
|
||||
- **`rate-limit`** — 100 req/min average, 200 burst
|
||||
- **`security-headers`** — HSTS, X-Frame-Options, CSP, etc.
|
||||
- **`cloudflare-only`** — IP allowlist restricting origin to CF ranges
|
||||
- **`admin-auth`** — HTTP basic auth for admin panel
|
||||
|
||||
**None of these are currently attached to our Ingresses.** To enable,
|
||||
add the `traefik.ingress.kubernetes.io/router.middlewares` annotation to
|
||||
the Ingress:
|
||||
|
||||
```yaml
|
||||
metadata:
|
||||
annotations:
|
||||
traefik.ingress.kubernetes.io/router.middlewares: honeydue-security-headers@kubernetescrd,honeydue-rate-limit@kubernetescrd
|
||||
```
|
||||
|
||||
We left them off to minimize surface area for the first week of the new
|
||||
cluster. Enabling is TODO in Chapter 20.
|
||||
|
||||
## Traefik dashboard
|
||||
|
||||
Disabled. The Traefik dashboard (`/dashboard/` and `/api/`) exposes
|
||||
runtime state and is potentially information leaky. The bundled k3s
|
||||
Traefik disables it by default, and we haven't re-enabled it.
|
||||
|
||||
If needed for debugging:
|
||||
|
||||
```bash
|
||||
# Port-forward to a Traefik pod
|
||||
kubectl port-forward -n kube-system daemonset/traefik 9000:9000
|
||||
# (the chart exposes the dashboard on :9000 when enabled)
|
||||
# Then visit http://localhost:9000/dashboard/
|
||||
```
|
||||
|
||||
This requires kubectl access and isn't exposed publicly.
|
||||
|
||||
## Version pinning
|
||||
|
||||
We take whatever Traefik version is bundled with K3s (currently 3.6.10).
|
||||
The bundled chart is pinned to a specific version in K3s' release notes;
|
||||
when we upgrade K3s the Traefik version can change. If that ever breaks
|
||||
something, we can pin a specific version via the HelmChartConfig's
|
||||
`version` field:
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
version: 39.0.501+up39.0.5 # specific chart version
|
||||
```
|
||||
|
||||
## Limitations we accept
|
||||
|
||||
- **No sticky sessions.** Every request to `api.myhoneydue.com` can go
|
||||
to a different pod. Our Go API is stateless — this is fine.
|
||||
- **No canary deployments** (yet). Traefik supports weighted routing
|
||||
via its CRDs (`TraefikService`) but we don't use them. TODO if/when
|
||||
we do gradual rollouts.
|
||||
- **No mTLS.** Traefik supports mutual TLS client auth for sensitive
|
||||
endpoints. We don't use it.
|
||||
- **Single ingress class.** Everything goes through the same Traefik.
|
||||
For multi-tenant setups we'd want separate ingress classes with
|
||||
separate policies.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Symptom | Likely cause | Fix |
|
||||
|---|---|---|
|
||||
| 404 from Traefik | Ingress doesn't match `Host:` | Check Ingress host field, DNS |
|
||||
| 502 from Traefik | Backend Service has no endpoints | `kubectl get endpoints -n honeydue` |
|
||||
| 503 from Traefik | Circuit breaker / backend unhealthy | Check pod logs, readiness probe |
|
||||
| 504 from Traefik | Backend slow | Check pod CPU/memory, DB connections |
|
||||
| Connection refused at 80 | Traefik pod not running or kernel not listening | `kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik`; `ssh deploy@node 'ss -lntp | grep :80'` |
|
||||
| Mixed content error in browser | `X-Forwarded-Proto` not honored by app | Check `trustedIPs` includes CF; check app reads the header |
|
||||
|
||||
## Operator cheat sheet
|
||||
|
||||
```bash
|
||||
# Traefik pods per node
|
||||
kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik -o wide
|
||||
|
||||
# Traefik logs (all pods)
|
||||
kubectl logs -n kube-system -l app.kubernetes.io/name=traefik --tail=50 --prefix
|
||||
|
||||
# Ingress status
|
||||
kubectl get ingress -n honeydue
|
||||
|
||||
# List all routers Traefik sees (requires dashboard or API)
|
||||
kubectl exec -n kube-system daemonset/traefik -- traefik healthcheck
|
||||
|
||||
# Re-apply config
|
||||
kubectl apply -f deploy-k3s/manifests/traefik-helmchartconfig.yaml
|
||||
kubectl delete job -n kube-system helm-install-traefik # triggers reinstall
|
||||
|
||||
# Restart all Traefik pods
|
||||
kubectl rollout restart daemonset/traefik -n kube-system
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Traefik v3 docs][traefik]
|
||||
- [Traefik Swarm provider][traefik-swarm]
|
||||
- [K3s Traefik customization][k3s-traefik]
|
||||
- [HelmChartConfig docs][k3s-helm]
|
||||
- [Cloudflare IP ranges][cf-ips]
|
||||
|
||||
[traefik]: https://doc.traefik.io/traefik/v3.6/
|
||||
[traefik-swarm]: https://doc.traefik.io/traefik/providers/swarm/
|
||||
[k3s-traefik]: https://docs.k3s.io/networking/networking-services#traefik-ingress-controller
|
||||
[k3s-helm]: https://docs.k3s.io/helm#customizing-packaged-components-with-helmchartconfig
|
||||
[cf-ips]: https://www.cloudflare.com/ips/
|
||||
@@ -0,0 +1,575 @@
|
||||
# 07 — Services
|
||||
|
||||
## Summary
|
||||
|
||||
Four workloads run in the `honeydue` namespace: **api** (Go REST API, 3
|
||||
replicas), **admin** (Next.js panel, 1 replica), **worker** (Go background
|
||||
jobs, 1 replica), and **redis** (cache + job queue, 1 replica, PVC-backed).
|
||||
This chapter deep-dives each: container image, resource limits, probes,
|
||||
volumes, and why each knob is set the way it is.
|
||||
|
||||
## Overview
|
||||
|
||||
| Service | Image | Replicas | Ports | Role |
|
||||
|---|---|---|---|---|
|
||||
| `api` | `gitea.treytartt.com/admin/honeydue-api:<sha>` | 3 | 8000 | HTTP REST API |
|
||||
| `admin` | `gitea.treytartt.com/admin/honeydue-admin:<sha>` | 1 | 3000 | Next.js admin panel |
|
||||
| `worker` | `gitea.treytartt.com/admin/honeydue-worker:<sha>` | 1 | — | Background job processor |
|
||||
| `redis` | `redis:7-alpine` | 1 | 6379 | Cache + Asynq queue |
|
||||
|
||||
All four are Kubernetes `Deployment` workloads (not StatefulSets, not
|
||||
DaemonSets). They share:
|
||||
- ServiceAccount with `automountServiceAccountToken: false` (Chapter 5)
|
||||
- `imagePullSecrets: [gitea-credentials]` (Chapter 11)
|
||||
- `envFrom: configMapRef: honeydue-config` (Chapter 10)
|
||||
- Individual env vars wired to `honeydue-secrets` keys
|
||||
- Read-only root filesystem with `tmp` emptyDir mounted at `/tmp`
|
||||
|
||||
## Service 1 — api (Go REST API)
|
||||
|
||||
### What it does
|
||||
|
||||
The Go HTTP API — the heart of the app. Handlers for user auth,
|
||||
residences, tasks, contractors, documents, subscriptions, notifications,
|
||||
etc. Reads/writes to Neon Postgres, reads/writes to Redis cache, reads
|
||||
from Backblaze B2.
|
||||
|
||||
Also serves a marketing landing page at `/` (static HTML + CSS from
|
||||
`/app/static/`). This is why the `myhoneydue.com` apex domain routes to
|
||||
the api service (Chapter 6).
|
||||
|
||||
### Deployment spec highlights
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: api
|
||||
spec:
|
||||
replicas: 3
|
||||
strategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxUnavailable: 0
|
||||
maxSurge: 1
|
||||
template:
|
||||
spec:
|
||||
serviceAccountName: api
|
||||
imagePullSecrets: [name: gitea-credentials]
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
runAsGroup: 1000
|
||||
fsGroup: 1000
|
||||
seccompProfile: { type: RuntimeDefault }
|
||||
containers:
|
||||
- name: api
|
||||
image: gitea.treytartt.com/admin/honeydue-api:237c6b8
|
||||
ports: [containerPort: 8000]
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: true
|
||||
capabilities: { drop: [ALL] }
|
||||
envFrom: [configMapRef: {name: honeydue-config}]
|
||||
env:
|
||||
- name: POSTGRES_PASSWORD
|
||||
valueFrom: { secretKeyRef: {name: honeydue-secrets, key: POSTGRES_PASSWORD} }
|
||||
- name: SECRET_KEY
|
||||
valueFrom: { secretKeyRef: {name: honeydue-secrets, key: SECRET_KEY} }
|
||||
# ... all other secrets
|
||||
volumeMounts:
|
||||
- { name: apns-key, mountPath: /secrets/apns, readOnly: true }
|
||||
- { name: tmp, mountPath: /tmp }
|
||||
resources:
|
||||
requests: { cpu: 100m, memory: 128Mi }
|
||||
limits: { cpu: 1000m, memory: 512Mi }
|
||||
startupProbe: { httpGet: {path: /api/health/, port: 8000}, failureThreshold: 48, periodSeconds: 5 }
|
||||
readinessProbe: { httpGet: {path: /api/health/, port: 8000}, initialDelaySeconds: 5, periodSeconds: 10, timeoutSeconds: 5 }
|
||||
livenessProbe: { httpGet: {path: /api/health/, port: 8000}, initialDelaySeconds: 30, periodSeconds: 30, timeoutSeconds: 10 }
|
||||
volumes:
|
||||
- name: apns-key
|
||||
secret:
|
||||
secretName: honeydue-apns-key
|
||||
items: [key: apns_auth_key.p8, path: apns_auth_key.p8]
|
||||
- name: tmp
|
||||
emptyDir: {sizeLimit: 64Mi}
|
||||
```
|
||||
|
||||
### Why each setting
|
||||
|
||||
**`replicas: 3`** — one per node via anti-affinity rules (not strictly
|
||||
required but helpful). Three gives us HA (one pod down = two still
|
||||
serve traffic) and headroom for rolling updates.
|
||||
|
||||
**`maxUnavailable: 0, maxSurge: 1`** — during a rollout, start a 4th
|
||||
pod before killing any old one. Ensures the service stays at 3 live
|
||||
pods throughout. `maxUnavailable: 0` means zero downtime updates — but
|
||||
depends on readinessProbe being accurate.
|
||||
|
||||
**`runAsUser: 1000`** — the `app` user created in the Dockerfile. Image
|
||||
doesn't run as root.
|
||||
|
||||
**`readOnlyRootFilesystem: true`** — prevents any attacker-introduced
|
||||
file writes to the image layer. Go binary doesn't need to write to `/`;
|
||||
only `/tmp` is mutable.
|
||||
|
||||
**`startupProbe.failureThreshold: 48`** (= 48 × 5s = 240s grace) — this
|
||||
was bumped up from the scaffold default of 12. Reason: on first boot,
|
||||
the Go app runs `MigrateWithLock()` which acquires a Postgres advisory
|
||||
lock and runs AutoMigrate. First replica takes ~90s; subsequent
|
||||
replicas wait on the lock. With 3 replicas all starting simultaneously
|
||||
and the lock serializing them, 240s is the right grace. See
|
||||
[Chapter 19](./19-postmortem-swarm.md) for the detailed story.
|
||||
|
||||
**`readinessProbe.initialDelaySeconds: 5`** — after the startupProbe
|
||||
passes, wait 5s before starting readiness checks. Prevents a racy
|
||||
initial failure.
|
||||
|
||||
**`livenessProbe.initialDelaySeconds: 30`** — don't start restarting on
|
||||
liveness failures for 30s after readiness passes. Avoids cascading
|
||||
failures from false-negative liveness checks.
|
||||
|
||||
**`resources.requests/limits`** — Kubernetes uses `requests` for
|
||||
scheduling (how much a pod "reserves") and `limits` for enforcement
|
||||
(max it can use before throttling/OOM). Our api is CPU-bursty for
|
||||
complex query handling, so we give it 100m baseline with a 1000m ceiling.
|
||||
512Mi memory ceiling is comfortable — in practice api uses ~100-200Mi.
|
||||
|
||||
**`volumes.apns-key`** — mounts the `honeydue-apns-key` Secret as a file
|
||||
at `/secrets/apns/apns_auth_key.p8`. The `APNS_AUTH_KEY_PATH` env var
|
||||
points to this path. Even though push is currently disabled, the file
|
||||
must exist because the Go app may try to stat it on startup.
|
||||
|
||||
**`volumes.tmp`** — `emptyDir` with `sizeLimit: 64Mi`. Bounded so a
|
||||
runaway process can't fill the node's disk.
|
||||
|
||||
### The Service
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: api
|
||||
namespace: honeydue
|
||||
spec:
|
||||
type: ClusterIP
|
||||
selector: {app.kubernetes.io/name: api}
|
||||
ports:
|
||||
- port: 8000
|
||||
targetPort: 8000
|
||||
protocol: TCP
|
||||
```
|
||||
|
||||
ClusterIP `10.43.167.83`. Reachable as `api.honeydue.svc.cluster.local` or
|
||||
just `api` from inside the namespace.
|
||||
|
||||
### HorizontalPodAutoscaler (not yet enabled)
|
||||
|
||||
`deploy-k3s/manifests/api/hpa.yaml` defines an HPA that would scale api
|
||||
between 3 and 6 replicas based on CPU (70% util) and memory (80% util).
|
||||
|
||||
**Not currently applied.** `metrics-server` runs but we haven't run
|
||||
`kubectl apply -f api/hpa.yaml`. TODO in Chapter 20.
|
||||
|
||||
## Service 2 — admin (Next.js panel)
|
||||
|
||||
### What it does
|
||||
|
||||
Server-rendered admin UI. Authenticates admin users against a
|
||||
separate `admin_users` table in Postgres (seeded with `ADMIN_EMAIL` +
|
||||
`ADMIN_PASSWORD` on first migration). Lets operators view/manage
|
||||
users, residences, tasks, subscriptions, etc.
|
||||
|
||||
Built as a Next.js 16 standalone server.
|
||||
|
||||
### Why 1 replica
|
||||
|
||||
Low traffic. It's an internal tool. One pod suffices. If it crashes,
|
||||
Kubernetes restarts it in ~10s. If the hosting node dies, Kubernetes
|
||||
reschedules to another node.
|
||||
|
||||
The cost of running 3 replicas is tiny (Next.js is ~128MB per pod) but
|
||||
has no operational benefit. When the admin panel becomes user-facing,
|
||||
revisit.
|
||||
|
||||
### Deployment highlights
|
||||
|
||||
```yaml
|
||||
replicas: 1
|
||||
strategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxUnavailable: 0
|
||||
maxSurge: 1
|
||||
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1001 # different from api (1000) for isolation
|
||||
runAsGroup: 1001
|
||||
fsGroup: 1001
|
||||
|
||||
containers:
|
||||
- image: gitea.treytartt.com/admin/honeydue-admin:<sha>
|
||||
ports: [containerPort: 3000]
|
||||
env:
|
||||
- name: PORT
|
||||
value: "3000"
|
||||
- name: HOSTNAME
|
||||
value: "0.0.0.0"
|
||||
- name: NEXT_PUBLIC_API_URL
|
||||
valueFrom: {configMapKeyRef: {name: honeydue-config, key: NEXT_PUBLIC_API_URL}}
|
||||
volumeMounts:
|
||||
- {name: nextjs-cache, mountPath: /app/.next/cache}
|
||||
- {name: tmp, mountPath: /tmp}
|
||||
resources:
|
||||
requests: {cpu: 50m, memory: 64Mi}
|
||||
limits: {cpu: 500m, memory: 256Mi}
|
||||
startupProbe:
|
||||
httpGet: {path: /, port: 3000} # was /admin/ — wrong for this app (Chapter 19)
|
||||
failureThreshold: 24
|
||||
periodSeconds: 5
|
||||
readinessProbe:
|
||||
httpGet: {path: /, port: 3000}
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 5
|
||||
```
|
||||
|
||||
**Probe path `/`** — Next.js serves at root. `/admin/` (scaffold default)
|
||||
returns 404 and killed the pod repeatedly during initial bring-up.
|
||||
See Chapter 19 §Admin probe path for the story.
|
||||
|
||||
**`runAsUser: 1001`** — different from api's 1000 so that if one
|
||||
service were compromised, the stolen UID would at least be distinct
|
||||
from other services' (minor defense-in-depth).
|
||||
|
||||
**`nextjs-cache`** — emptyDir mount for Next.js's server-side cache.
|
||||
Without it, the read-only rootfs would prevent Next from caching
|
||||
server-rendered pages. Not a persistent volume because cache is
|
||||
regenerable on restart.
|
||||
|
||||
### The Service
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: admin
|
||||
spec:
|
||||
type: ClusterIP
|
||||
selector: {app.kubernetes.io/name: admin}
|
||||
ports: [port: 3000, targetPort: 3000]
|
||||
```
|
||||
|
||||
ClusterIP `10.43.136.168`.
|
||||
|
||||
## Service 3 — worker (Go + Asynq)
|
||||
|
||||
### What it does
|
||||
|
||||
Runs scheduled background jobs via [Asynq](https://github.com/hibiken/asynq)
|
||||
(a Redis-backed job queue for Go):
|
||||
|
||||
- **Task reminders** (14:00 UTC daily) — notify users of upcoming tasks
|
||||
- **Overdue reminders** (15:00 UTC daily) — notify users of overdue tasks
|
||||
- **Daily digest** (03:00 UTC daily) — summary email per user
|
||||
- **Onboarding emails** — multi-step drip campaign for new users
|
||||
- **Cleanup jobs** — expired tokens, stale data
|
||||
|
||||
### Why 1 replica (hard requirement)
|
||||
|
||||
Asynq uses a `Scheduler` component that does cron-like scheduling. The
|
||||
Scheduler is **not leader-elected** by default — if you run two, both
|
||||
fire every cron task. Users get duplicate emails.
|
||||
|
||||
The asynq docs cover this: to scale scheduling, migrate to
|
||||
`PeriodicTaskManager` + `PeriodicTaskConfigProvider` which coordinate
|
||||
via Redis. Not yet done in our codebase.
|
||||
|
||||
Until then: `replicas: 1` is a hard constraint. See the comment in the
|
||||
deployment manifest:
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
# Asynq's Scheduler is a singleton — running >1 replica fires every cron
|
||||
# task once per replica (duplicate daily digests, onboarding emails, etc.).
|
||||
# Keep at 1 until asynq.PeriodicTaskManager with Redis leader election is
|
||||
# wired in cmd/worker/main.go.
|
||||
replicas: 1
|
||||
```
|
||||
|
||||
### What happens if the worker pod dies?
|
||||
|
||||
- Asynq schedule state is in Redis (which has AOF persistence)
|
||||
- When a new worker pod starts, it re-registers the scheduler and picks up
|
||||
where it left off
|
||||
- Any job that was in-flight (dequeued but not acknowledged) gets retried
|
||||
by Asynq's automatic retry logic (see the `worker.RetryOptions` in the
|
||||
Go code)
|
||||
- Cron jobs that were supposed to fire during the downtime: fire on the
|
||||
next tick
|
||||
|
||||
A 5-minute worker outage = 5 minutes of delayed jobs. Not great but
|
||||
acceptable.
|
||||
|
||||
### PodDisruptionBudget
|
||||
|
||||
```yaml
|
||||
apiVersion: policy/v1
|
||||
kind: PodDisruptionBudget
|
||||
metadata:
|
||||
name: worker-pdb
|
||||
spec:
|
||||
minAvailable: 0
|
||||
selector: {matchLabels: {app.kubernetes.io/name: worker}}
|
||||
```
|
||||
|
||||
`minAvailable: 0` means voluntary disruptions (`kubectl drain`) can take
|
||||
the worker down. This matches the singleton constraint: there's only one,
|
||||
it's OK to drain.
|
||||
|
||||
### No Service
|
||||
|
||||
worker doesn't listen on any HTTP port for application traffic — it's a
|
||||
queue consumer, not a web server. So there's **no Kubernetes Service**
|
||||
for it.
|
||||
|
||||
(On Swarm we had the worker expose a health endpoint at `:6060/health`;
|
||||
the k3s scaffold doesn't replicate this. Future work.)
|
||||
|
||||
## Service 4 — redis
|
||||
|
||||
### What it does
|
||||
|
||||
- Caching layer (ETag-based lookups, user session cache)
|
||||
- Asynq queue backend (job state, scheduled tasks, retry state)
|
||||
|
||||
### Why 1 replica
|
||||
|
||||
Single-instance Redis with AOF persistence. Not replicated, not
|
||||
clustered. Downsides:
|
||||
- Node outage = Redis outage (cache regenerates, queue state is preserved
|
||||
by AOF on the PVC)
|
||||
- No failover — if the node hosting Redis dies, Redis restarts on another
|
||||
node *but* the PVC is local-path (per-node), so the data is gone
|
||||
|
||||
For our scale this is acceptable. Redis holds no authoritative state
|
||||
(everything that matters is in Postgres). Cache regenerates on first
|
||||
request; Asynq retries enqueue on failure.
|
||||
|
||||
### PVC
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: redis-data
|
||||
spec:
|
||||
accessModes: [ReadWriteOnce]
|
||||
storageClassName: local-path
|
||||
resources: {requests: {storage: 5Gi}}
|
||||
```
|
||||
|
||||
Uses k3s' built-in `local-path-provisioner`. The PVC binds to a local
|
||||
directory on the node where the Redis pod lands (`/var/lib/rancher/k3s/storage/`).
|
||||
`ReadWriteOnce` means only one pod at a time.
|
||||
|
||||
### Node affinity
|
||||
|
||||
```yaml
|
||||
nodeSelector:
|
||||
honeydue/redis: "true"
|
||||
```
|
||||
|
||||
We labeled `ubuntu-8gb-nbg1-2` (hetzner1) with `honeydue/redis=true` so
|
||||
Redis always lands there. This ensures the PVC finds its backing
|
||||
storage (since PVCs with `local-path` are per-node).
|
||||
|
||||
```bash
|
||||
kubectl label node ubuntu-8gb-nbg1-2 honeydue/redis=true --overwrite
|
||||
```
|
||||
|
||||
### Why not Redis Sentinel / Cluster
|
||||
|
||||
Complexity. At our scale (~a few req/s, kilobytes of cache), a single
|
||||
Redis does fine. If Redis becomes critical-path for availability, we'd:
|
||||
- Use a managed Redis (Upstash, Dragonfly Cloud) — $5-15/mo, their problem
|
||||
- Or run Redis Sentinel with 3 replicas — manageable but operational work
|
||||
|
||||
Neither is needed yet.
|
||||
|
||||
### Redis config
|
||||
|
||||
From the deployment:
|
||||
|
||||
```yaml
|
||||
command:
|
||||
- sh
|
||||
- -c
|
||||
- |
|
||||
ARGS="--appendonly yes --appendfsync everysec --maxmemory 256mb --maxmemory-policy noeviction"
|
||||
if [ -n "$REDIS_PASSWORD" ]; then
|
||||
ARGS="$ARGS --requirepass $REDIS_PASSWORD"
|
||||
fi
|
||||
exec redis-server $ARGS
|
||||
```
|
||||
|
||||
Settings:
|
||||
- **`--appendonly yes --appendfsync everysec`** — AOF persistence,
|
||||
fsync every second. Survives restarts with up to 1 second of data
|
||||
loss.
|
||||
- **`--maxmemory 256mb`** — Redis will refuse new data if it grows past
|
||||
256 MB. Gives us a safety cap.
|
||||
- **`--maxmemory-policy noeviction`** — we'd rather get errors than
|
||||
silently drop data. This is the right choice when Redis holds queue
|
||||
state (losing a queue item silently = missed job).
|
||||
|
||||
The `REDIS_PASSWORD` env var is optional. Currently empty (no auth). The
|
||||
Redis pod is only reachable from inside the overlay network, and our
|
||||
NetworkPolicies (once enabled) would restrict egress further.
|
||||
|
||||
## Resource summary
|
||||
|
||||
Combined requests and limits across all services:
|
||||
|
||||
| Service | CPU requests | CPU limits | Memory requests | Memory limits | Replicas |
|
||||
|---|---|---|---|---|---|
|
||||
| api | 100m | 1000m | 128Mi | 512Mi | 3 |
|
||||
| admin | 50m | 500m | 64Mi | 256Mi | 1 |
|
||||
| worker | 50m | 500m | 64Mi | 256Mi | 1 |
|
||||
| redis | 100m | 500m | 128Mi | 512Mi | 1 |
|
||||
| traefik (kube-system) | ~100m | unlimited | ~50Mi | unlimited | 3 |
|
||||
| **Total requests** | **~750m** | | **~550Mi** | | |
|
||||
|
||||
Each node has 4000m CPU + 8192Mi memory. Total cluster capacity is
|
||||
12000m + 24576Mi. We're using roughly 6% CPU and 2% memory for requests
|
||||
— tons of headroom.
|
||||
|
||||
## Health check semantics
|
||||
|
||||
Kubernetes distinguishes three probe types:
|
||||
|
||||
- **startupProbe** — is the container done starting? Runs until it passes
|
||||
once, then stops. While running, the other probes are disabled.
|
||||
Failing startupProbe = container killed and restarted.
|
||||
- **readinessProbe** — is the container ready to serve traffic? A failing
|
||||
pod is removed from Service endpoints (traffic stops flowing to it)
|
||||
but the pod keeps running.
|
||||
- **livenessProbe** — is the container healthy? A failing pod is killed
|
||||
and restarted.
|
||||
|
||||
### Why we tuned startupProbe separately
|
||||
|
||||
The api's first-boot migration takes 90–240s. If we only had a
|
||||
readinessProbe with a typical initialDelay of 5s + failureThreshold of 3,
|
||||
the pod would be killed before migration finishes. startupProbe lets us
|
||||
give generous first-boot grace (240s) without affecting the sharper
|
||||
ongoing readiness/liveness checks.
|
||||
|
||||
### Probe path design
|
||||
|
||||
Each service's `/health` endpoint should be:
|
||||
- Cheap (no DB query, no external call)
|
||||
- Fast (< 100ms)
|
||||
- Honest (returns 200 iff the process can serve)
|
||||
|
||||
Our api's `/api/health/` does a trivial check. It does NOT verify Postgres
|
||||
connectivity (to avoid cascading DB failures tearing down all api pods).
|
||||
If Postgres is down, api pods stay "ready" and return 5xx for actual
|
||||
endpoints — that's the right behavior.
|
||||
|
||||
## Log routing
|
||||
|
||||
All container logs go to stdout/stderr. containerd captures them to
|
||||
`/var/log/containers/` on the node. `kubectl logs` fetches them via the
|
||||
kubelet's /api/v1/pods/<pod>/log endpoint.
|
||||
|
||||
We have **no log aggregation** in the cluster (no Loki, no ELK, no
|
||||
Datadog). For debugging we use:
|
||||
|
||||
```bash
|
||||
kubectl logs -n honeydue deploy/api -f --prefix
|
||||
kubectl logs -n honeydue deploy/api --previous # previous pod's logs
|
||||
```
|
||||
|
||||
See [Chapter 15](./15-observability.md).
|
||||
|
||||
## Rolling update semantics
|
||||
|
||||
When you push a new image and `kubectl set image` or `kubectl apply` with
|
||||
a new image tag:
|
||||
|
||||
1. Kubernetes creates a new ReplicaSet with the new image
|
||||
2. Starts 1 new pod (per `maxSurge: 1`)
|
||||
3. Waits for it to pass readinessProbe
|
||||
4. Removes 1 pod from the old ReplicaSet
|
||||
5. Repeats until all N pods are on the new ReplicaSet
|
||||
6. Old ReplicaSet stays around (for rollback) with 0 replicas
|
||||
|
||||
For api (3 replicas): total rollout time is roughly
|
||||
`3 × (pod_startup_time + small_buffer)` = ~15 minutes in the cold-boot
|
||||
case, seconds for warm updates where migrations are no-op.
|
||||
|
||||
During the rollout:
|
||||
- Service endpoint set updates as pods become ready
|
||||
- kube-proxy IPVS is reprogrammed on each node
|
||||
- Traefik's connection pool to the Service invalidates gradually
|
||||
|
||||
Users see no downtime if the new image is compatible. If it's broken:
|
||||
|
||||
```bash
|
||||
kubectl rollout undo deployment/api -n honeydue
|
||||
```
|
||||
|
||||
Reverts to the previous ReplicaSet. Typically takes 30 seconds to
|
||||
stabilize.
|
||||
|
||||
## Why no StatefulSet
|
||||
|
||||
For Redis (the only stateful thing we run), we use a Deployment + PVC.
|
||||
StatefulSet is designed for:
|
||||
- Ordered startup (pod-0 before pod-1)
|
||||
- Stable hostnames (pod-0 gets DNS name `redis-0.redis`)
|
||||
- Per-replica PVCs
|
||||
|
||||
We have one Redis replica. None of those features matter for a
|
||||
singleton. Deployment + PVC + nodeSelector is simpler and equivalent.
|
||||
|
||||
If we ever run Redis Sentinel or Cluster, we'd migrate to StatefulSet.
|
||||
|
||||
## Operator cheat sheet
|
||||
|
||||
```bash
|
||||
# See all pods in honeydue namespace
|
||||
kubectl get pods -n honeydue -o wide
|
||||
|
||||
# Per-service rollout status
|
||||
kubectl rollout status deployment/api -n honeydue
|
||||
|
||||
# Scale a service
|
||||
kubectl scale deployment/api -n honeydue --replicas=5
|
||||
|
||||
# Restart all pods (e.g., to re-read a configmap)
|
||||
kubectl rollout restart deployment/api -n honeydue
|
||||
|
||||
# Exec into a pod
|
||||
kubectl exec -it -n honeydue deploy/admin -- /bin/sh
|
||||
|
||||
# Describe a pod (shows events, probe state, restarts)
|
||||
kubectl describe pod -n honeydue <pod-name>
|
||||
|
||||
# Resource usage
|
||||
kubectl top pods -n honeydue
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Kubernetes Deployments][deploy]
|
||||
- [Pod lifecycle + probes][probes]
|
||||
- [Asynq scheduler limitations][asynq-sched]
|
||||
- [K3s local-path provisioner][k3s-lp]
|
||||
|
||||
[deploy]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
|
||||
[probes]: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-lifecycle
|
||||
[asynq-sched]: https://github.com/hibiken/asynq/wiki/Periodic-Tasks
|
||||
[k3s-lp]: https://docs.k3s.io/storage#setting-up-the-local-storage-provider
|
||||
@@ -0,0 +1,298 @@
|
||||
# 08 — Database (Neon Postgres)
|
||||
|
||||
## Summary
|
||||
|
||||
Authoritative user data lives in a Neon-managed Postgres database in AWS
|
||||
us-east-1. Connections use TLS (`DB_SSLMODE=require`). Schema is managed
|
||||
via GORM AutoMigrate inside the api binary, coordinated across replicas
|
||||
by a Postgres advisory lock to prevent concurrent migration attempts.
|
||||
|
||||
## Why Neon
|
||||
|
||||
### Decision matrix
|
||||
|
||||
At deploy time we considered:
|
||||
|
||||
| Option | Setup effort | Monthly cost | Backup/PITR | Scale ceiling | Notes |
|
||||
|---|---|---|---|---|---|
|
||||
| **Neon Launch** | Zero (managed) | $5-15 | Included | Large | **Picked** |
|
||||
| Postgres on a Hetzner VPS | High | $8 (VPS) | Manual | Medium | More ops |
|
||||
| AWS RDS | Medium | $30+ | Included | Huge | Overkill, expensive |
|
||||
| Supabase Free | Zero | $0 | Limited | Small | Free tier has quota limits |
|
||||
| CNPG on our k3s | High (Helm) | $0 (using cluster) | Self-rolled | Medium | Operational burden |
|
||||
|
||||
Neon Launch won on:
|
||||
- **Serverless**: scales compute to zero when idle (cheap)
|
||||
- **Branch databases**: we can create dev/staging branches from prod in seconds
|
||||
- **Connection pooling built-in**: PgBouncer on the hostname suffix `-pooler`
|
||||
- **Point-in-time recovery** included (paid tier)
|
||||
- **Pay-as-you-go** with a $5 minimum — fits a bootstrapped app
|
||||
|
||||
### Connection details
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Hostname | `ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech` |
|
||||
| Port | 5432 |
|
||||
| Username | `neondb_owner` |
|
||||
| Database | `honeyDue` (case-sensitive!) |
|
||||
| TLS mode | `require` (enforced by Neon; app pg driver verifies) |
|
||||
| Branch | production (Neon's concept — isolated DB within the project) |
|
||||
|
||||
### The database name is case-sensitive
|
||||
|
||||
Postgres identifiers are lowercase unless quoted. Neon's UI created the
|
||||
database as `"honeyDue"` (quoted, camelCase preserved). In `prod.env` /
|
||||
ConfigMap we must use exactly `POSTGRES_DB=honeyDue` — lowercase
|
||||
`honeydue` gets a `database "honeydue" does not exist` error. This bit
|
||||
us during the initial Swarm deploy (Chapter 19 §Neon DB name).
|
||||
|
||||
## Connection pooling
|
||||
|
||||
### Why it matters
|
||||
|
||||
Postgres is memory-hungry per connection (~5-10 MB each). 3 api replicas
|
||||
× `DB_MAX_OPEN_CONNS=25` = up to 75 direct Postgres connections. Add
|
||||
the worker's 25. Neon's free tier caps at 100 concurrent connections;
|
||||
paid tiers much higher.
|
||||
|
||||
### PgBouncer on Neon
|
||||
|
||||
Neon provides a built-in PgBouncer at `-pooler` subdomain. Our hostname
|
||||
already includes `-pooler` handling in the route, so connections go
|
||||
through PgBouncer transparently.
|
||||
|
||||
Modes PgBouncer supports:
|
||||
- **session** — one server connection held per client session (transparent)
|
||||
- **transaction** — server connection released after each transaction (high-throughput)
|
||||
- **statement** — per-statement (most aggressive; breaks many features)
|
||||
|
||||
Neon's pooler runs in **transaction mode**. This is compatible with GORM
|
||||
out of the box (we don't use session-level features like prepared
|
||||
statements or session variables).
|
||||
|
||||
### Connection pool settings
|
||||
|
||||
In `prod.env`:
|
||||
|
||||
```
|
||||
DB_MAX_OPEN_CONNS=25
|
||||
DB_MAX_IDLE_CONNS=10
|
||||
DB_MAX_LIFETIME=600s
|
||||
```
|
||||
|
||||
These are the Go `database/sql` pool settings (GORM uses `database/sql`
|
||||
underneath):
|
||||
|
||||
- **MaxOpenConns: 25** — at most 25 concurrent connections per replica
|
||||
- **MaxIdleConns: 10** — keep up to 10 warm connections ready to reuse
|
||||
- **MaxLifetime: 600s** — recycle connections after 10 min (prevents
|
||||
stale state in long-lived connections, good for Neon's idle timeout)
|
||||
|
||||
### Worst-case connection count
|
||||
|
||||
3 api + 1 worker replicas × 25 conns = 100 peak. Right at Neon free
|
||||
tier's ceiling, with zero margin. **This is a real risk** — a spike that
|
||||
saturates the pool on all replicas simultaneously would exhaust Neon's
|
||||
limit.
|
||||
|
||||
Mitigations to consider:
|
||||
- Drop `DB_MAX_OPEN_CONNS` to 15 → 60 peak. Safe on free tier.
|
||||
- Upgrade to Neon Scale plan (1000+ connections).
|
||||
- Rely on Neon's PgBouncer to multiplex — the raw backend connections
|
||||
to Postgres-proper are pooled, not our TCP connections to Neon.
|
||||
|
||||
Currently we trust Neon's pooler to handle the multiplexing and run with
|
||||
the default 25/10. If we hit connection errors in prod, adjust.
|
||||
|
||||
## Schema management
|
||||
|
||||
### GORM AutoMigrate
|
||||
|
||||
On startup, the Go API's `cmd/api/main.go` calls
|
||||
`database.MigrateWithLock()` which:
|
||||
|
||||
1. Opens a dedicated Postgres connection
|
||||
2. `SELECT pg_advisory_lock(1751412071)` — acquires a session-level
|
||||
advisory lock on a hardcoded key
|
||||
3. Calls `db.AutoMigrate(&models.*{})` for every GORM model
|
||||
4. `SELECT pg_advisory_unlock(...)` via deferred function
|
||||
5. Close the connection
|
||||
|
||||
The advisory lock serializes migrations across replicas: when 3 api
|
||||
pods start simultaneously, one acquires the lock and migrates; the
|
||||
others block on the lock. Once the first finishes (≤2s for already-
|
||||
migrated schema, up to 90s on first cold boot), the next acquires and
|
||||
sees the schema is current (no-op migrate).
|
||||
|
||||
### Why an advisory lock
|
||||
|
||||
Without it, concurrent `CREATE TABLE IF NOT EXISTS ...` statements from
|
||||
multiple replicas would race — Postgres usually handles it, but GORM's
|
||||
AutoMigrate also alters tables (adds columns, indexes) which can deadlock
|
||||
under concurrency.
|
||||
|
||||
The advisory lock pattern (also used by Rails + Django + Alembic) is the
|
||||
canonical solution.
|
||||
|
||||
### The lock key
|
||||
|
||||
`1751412071` is a hardcoded integer in `internal/database/database.go`.
|
||||
Arbitrary but unique — as long as nothing else in the Postgres instance
|
||||
uses the same advisory lock key, no conflicts.
|
||||
|
||||
### First-boot behavior
|
||||
|
||||
On a **fresh database** (new Neon project), the first api pod runs
|
||||
through every model's `CREATE TABLE` statement. This is ~50 tables for
|
||||
honeyDue and takes ~90 seconds.
|
||||
|
||||
On a **warm database** (tables already exist), AutoMigrate is fast —
|
||||
typically under 2 seconds. It still runs (GORM checks every model
|
||||
against the schema) but finds no work to do.
|
||||
|
||||
### Where this bit us
|
||||
|
||||
With 3 api pods starting simultaneously and migrations taking 90s first
|
||||
time, the lock queue for the last replica is ~180s. We needed a
|
||||
startupProbe grace of 240s to cover this without false restart loops.
|
||||
See Chapter 7 §startupProbe and Chapter 19 §MigrateWithLock.
|
||||
|
||||
### Downside: no schema versioning
|
||||
|
||||
AutoMigrate can only *add* — new tables, new columns, new indexes. It
|
||||
won't drop columns, rename them, or change types destructively. For
|
||||
those we'd need raw SQL migrations (a tool like `golang-migrate` or
|
||||
`dbmate`).
|
||||
|
||||
Today: we accept that schema changes are additive-only. When we need
|
||||
destructive changes, we'd hand-write them.
|
||||
|
||||
## What's in the database
|
||||
|
||||
Major tables (see `honeyDueAPI-go/internal/models/`):
|
||||
|
||||
| Table | Purpose |
|
||||
|---|---|
|
||||
| `auth_user` | Users (Django legacy name kept for compatibility) |
|
||||
| `user_userprofile` | Profile data |
|
||||
| `authtoken_token` | API auth tokens |
|
||||
| `residence_residence` | Properties users manage |
|
||||
| `task_task` | Maintenance tasks |
|
||||
| `task_taskcompletion` | Task completion history |
|
||||
| `contractor_contractor` | Contractor contacts |
|
||||
| `documents_document` | Document records (files in B2) |
|
||||
| `notification_notification` | In-app notifications |
|
||||
| `subscription_usersubscription` | IAP subscriptions |
|
||||
| `admin_users` | Next.js admin panel users |
|
||||
|
||||
See `honeyDueAPI-go/docs/TASK_LOGIC_ARCHITECTURE.md` for the task logic
|
||||
model details.
|
||||
|
||||
## Backup and recovery
|
||||
|
||||
### Neon's built-in
|
||||
|
||||
Neon Launch includes **point-in-time recovery** within the last 24h
|
||||
(longer on Scale plan). To restore:
|
||||
|
||||
1. Go to Neon console → project → Backups
|
||||
2. Create a branch from a timestamp
|
||||
3. Point the app at the new branch (change `DB_HOST` in our ConfigMap)
|
||||
|
||||
Done. No tape-wrangling.
|
||||
|
||||
### What we don't have
|
||||
|
||||
- Off-site backup (if Neon itself is compromised, we have no exfil). A
|
||||
nightly `pg_dump` to Backblaze B2 would close this gap. **TODO**
|
||||
(Chapter 20).
|
||||
- Tested DR drills. We've never actually restored from a Neon backup
|
||||
into a new branch and pointed the app at it. Should be routine; hasn't
|
||||
been exercised.
|
||||
|
||||
## Migrations from old MyCrib/Casera data
|
||||
|
||||
honeyDue originally ran on a Django codebase (MyCrib / Casera-era). The
|
||||
schema inherits Django's naming (`app_model` table names, `_id` suffix
|
||||
foreign keys). The Go app's GORM models have `TableName()` methods that
|
||||
preserve this:
|
||||
|
||||
```go
|
||||
func (Task) TableName() string { return "task_task" }
|
||||
```
|
||||
|
||||
This isn't ideal (GORM's default `tasks` would be cleaner), but changing
|
||||
would require a migration that renames every table — more risk than
|
||||
value.
|
||||
|
||||
## Neon regions
|
||||
|
||||
Neon's default region for new projects is `aws-us-east-1` (Virginia).
|
||||
Our DB is there. Latency from Nuremberg to us-east-1 is **~90-120ms
|
||||
round trip**.
|
||||
|
||||
This is the slowest hop in our data flow. Every api request that needs
|
||||
a DB query (most of them) pays this latency at least once.
|
||||
|
||||
**When this matters**: When we start seeing ~200ms+ response times from
|
||||
complex endpoints, it's likely DB latency dominant. Options:
|
||||
- Migrate Neon to `aws-eu-central-1` (Frankfurt) — shaves ~90ms off
|
||||
- Add Redis caching for hot reads (Chapter 7)
|
||||
- Read replicas (Neon supports them on paid tiers)
|
||||
|
||||
## Environment variables the app reads
|
||||
|
||||
From ConfigMap:
|
||||
|
||||
| Var | Purpose |
|
||||
|---|---|
|
||||
| `DB_HOST` | Neon pooler hostname |
|
||||
| `DB_PORT` | 5432 |
|
||||
| `POSTGRES_USER` | `neondb_owner` |
|
||||
| `POSTGRES_DB` | `honeyDue` |
|
||||
| `DB_SSLMODE` | `require` |
|
||||
| `DB_MAX_OPEN_CONNS` | 25 |
|
||||
| `DB_MAX_IDLE_CONNS` | 10 |
|
||||
| `DB_MAX_LIFETIME` | `600s` |
|
||||
|
||||
From Secret (`honeydue-secrets`):
|
||||
|
||||
| Var | Purpose |
|
||||
|---|---|
|
||||
| `POSTGRES_PASSWORD` | Neon DB password |
|
||||
|
||||
## Operator cheat sheet
|
||||
|
||||
```bash
|
||||
# Connect to Neon from workstation (requires psql + the password)
|
||||
PGPASSWORD="<pw>" psql -h ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech \
|
||||
-U neondb_owner -d honeyDue
|
||||
|
||||
# From a pod (lets you debug against the actual in-cluster network path)
|
||||
kubectl exec -n honeydue -it deploy/api -- sh
|
||||
# inside the pod (no psql by default, but wget + JSON API works)
|
||||
wget -qO- http://127.0.0.1:8000/api/health/
|
||||
|
||||
# See current migration state (no direct CLI, but the api logs show it)
|
||||
kubectl logs -n honeydue deploy/api | grep -i migration
|
||||
|
||||
# See active connections (run against Neon)
|
||||
SELECT count(*), usename, state, application_name
|
||||
FROM pg_stat_activity
|
||||
GROUP BY usename, state, application_name;
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Neon docs][neon-docs]
|
||||
- [Neon pricing][neon-pricing]
|
||||
- [Postgres advisory locks][pg-locks]
|
||||
- [GORM AutoMigrate][gorm-automigrate]
|
||||
- [honeyDue task architecture][task-arch] (repo-local)
|
||||
|
||||
[neon-docs]: https://neon.com/docs/introduction
|
||||
[neon-pricing]: https://neon.com/pricing
|
||||
[pg-locks]: https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS
|
||||
[gorm-automigrate]: https://gorm.io/docs/migration.html
|
||||
[task-arch]: ../../docs/TASK_LOGIC_ARCHITECTURE.md
|
||||
@@ -0,0 +1,265 @@
|
||||
# 09 — Object Storage (Backblaze B2)
|
||||
|
||||
## Summary
|
||||
|
||||
User-uploaded files (photos, documents, task completion attachments) go
|
||||
to Backblaze B2 via its S3-compatible API. The Go API uses `minio-go/v7`
|
||||
as the client. This works around a Swarm-era problem where named volumes
|
||||
are per-node — uploads on node A were invisible to replicas on B and C.
|
||||
With k3s we could use a shared PVC instead, but B2 is cheaper, offsite,
|
||||
and already set up.
|
||||
|
||||
## Why Backblaze B2
|
||||
|
||||
### Decision matrix
|
||||
|
||||
| Option | Price per TB stored | Egress | Pros | Cons |
|
||||
|---|---|---|---|---|
|
||||
| **Backblaze B2** | **$6/mo** | $0.01/GB, free via CF | Cheap, hard spending caps, S3-compatible | US-West/East regions only (not EU) |
|
||||
| AWS S3 Standard | $23/mo | $0.09/GB | Most ubiquitous | Expensive |
|
||||
| Cloudflare R2 | $15/mo | Free (!) | Zero egress, CF-native | Newer, fewer features |
|
||||
| DigitalOcean Spaces | $5/mo for 250GB + $0.01/GB | Free 1TB, $0.01/GB after | Simple | Less reliable than AWS |
|
||||
| Local PVC on k3s | $0 | $0 | Already in cluster | Per-node, no HA, no offsite |
|
||||
|
||||
B2 won because:
|
||||
1. **Hard spending cap** — unique in the industry. No surprise AWS bill.
|
||||
2. **Cheapest at rest** — 3–4× cheaper than S3.
|
||||
3. **Free egress through Cloudflare** — we already use CF; when we
|
||||
eventually serve upload URLs through CF, egress is free.
|
||||
4. **Mature S3-compatible API** — minio-go talks to it natively.
|
||||
|
||||
Rejected:
|
||||
- **R2** was the close second. Zero egress is amazing. Rejected
|
||||
primarily for inertia (B2 already set up in the MyCrib era). A future
|
||||
migration to R2 would be reasonable.
|
||||
- **Local PVC** doesn't work for our setup because we want uploads
|
||||
durable and accessible from any node/replica.
|
||||
|
||||
## Configuration
|
||||
|
||||
Bucket: `honeyDueProd` (mixed case; B2 allows this, minio-go handles it
|
||||
via path-style addressing — see §path-style below).
|
||||
|
||||
Region: `us-east-005` (B2's South Carolina region — closer to our
|
||||
Neon DB in AWS us-east-1 than the West Coast options).
|
||||
|
||||
Endpoint: `s3.us-east-005.backblazeb2.com`
|
||||
|
||||
### Environment variables
|
||||
|
||||
From ConfigMap:
|
||||
|
||||
| Var | Value |
|
||||
|---|---|
|
||||
| `B2_ENDPOINT` | `s3.us-east-005.backblazeb2.com` |
|
||||
| `B2_BUCKET_NAME` | `honeyDueProd` |
|
||||
| `B2_REGION` | `us-east-005` |
|
||||
| `B2_USE_SSL` | `true` (but see §vestigial var below) |
|
||||
|
||||
From Secret:
|
||||
|
||||
| Var | Value |
|
||||
|---|---|
|
||||
| `B2_KEY_ID` | App key ID (B2-specific identifier) |
|
||||
| `B2_APP_KEY` | App key secret |
|
||||
|
||||
### App key scope
|
||||
|
||||
The B2 app key is **bucket-scoped**, not account-scoped. Can only
|
||||
read/write the `honeyDueProd` bucket. Cannot:
|
||||
- List other buckets
|
||||
- Delete the bucket
|
||||
- Create new buckets
|
||||
- Touch account settings
|
||||
|
||||
This is the B2 equivalent of an IAM role with least privilege. If the
|
||||
key leaks, the damage is limited to the `honeyDueProd` bucket.
|
||||
|
||||
## The minio-go client
|
||||
|
||||
The Go app uses `github.com/minio/minio-go/v7` — a Go SDK compatible
|
||||
with any S3-flavored API. Relevant code at
|
||||
`internal/services/storage_backend_s3.go`:
|
||||
|
||||
```go
|
||||
client, err := minio.New(endpoint, &minio.Options{
|
||||
Creds: credentials.NewStaticV4(keyID, appKey, ""),
|
||||
Secure: useSSL,
|
||||
Region: region,
|
||||
})
|
||||
```
|
||||
|
||||
### Path-style vs virtual-hosted addressing
|
||||
|
||||
S3's URL scheme has two flavors:
|
||||
|
||||
- **Virtual-hosted**: `https://mybucket.s3.amazonaws.com/mykey`
|
||||
- **Path-style**: `https://s3.amazonaws.com/mybucket/mykey`
|
||||
|
||||
With virtual-hosted style, the bucket name must be DNS-compatible —
|
||||
lowercase, no uppercase letters. `honeyDueProd` fails this.
|
||||
|
||||
With path-style, the bucket name is just a URL path segment — any valid
|
||||
string works.
|
||||
|
||||
minio-go auto-detects: for AWS S3 it prefers virtual-hosted; for
|
||||
non-AWS endpoints (like B2) it defaults to path-style. So
|
||||
`honeyDueProd` with capital letters works transparently.
|
||||
|
||||
## The `B2_USE_SSL` vestigial variable
|
||||
|
||||
`prod.env` has `B2_USE_SSL=true`. But the Go app's
|
||||
`internal/config/config.go:295` reads the env var
|
||||
`STORAGE_USE_SSL`, not `B2_USE_SSL`:
|
||||
|
||||
```go
|
||||
S3UseSSL: viper.GetString("STORAGE_USE_SSL") == "" || viper.GetBool("STORAGE_USE_SSL"),
|
||||
```
|
||||
|
||||
Whoever wrote the original config used `B2_USE_SSL` in `prod.env` and
|
||||
`STORAGE_USE_SSL` in the code. They don't match.
|
||||
|
||||
**Net effect**: The app reads `STORAGE_USE_SSL`, which is unset, and
|
||||
the default `(empty) || true` evaluates to `true`. So SSL is always on,
|
||||
despite `B2_USE_SSL=false` or `true` or anything else.
|
||||
|
||||
This is a dormant bug. Anyone setting `B2_USE_SSL=false` expecting to
|
||||
disable TLS would be surprised it stays on. Fortunately that's the
|
||||
right default for production B2 (which only accepts HTTPS anyway).
|
||||
|
||||
**TODO**: Rename `STORAGE_USE_SSL` → `B2_USE_SSL` in the Go code to
|
||||
match the config. Documented in Chapter 19 §Vestigial config.
|
||||
|
||||
## What we store there
|
||||
|
||||
Today (limited rollout):
|
||||
- User profile photos
|
||||
- Task completion photos
|
||||
- Document uploads (PDFs, images attached to records)
|
||||
|
||||
File keys follow a hierarchy like:
|
||||
```
|
||||
users/<user_id>/profile/<uuid>.jpg
|
||||
residences/<residence_id>/documents/<uuid>.pdf
|
||||
tasks/<task_id>/completions/<uuid>.jpg
|
||||
```
|
||||
|
||||
Max file size is **10 MB** per upload (`STORAGE_MAX_FILE_SIZE=10485760`).
|
||||
Allowed MIME types: `image/jpeg`, `image/png`, `image/gif`, `image/webp`,
|
||||
`application/pdf` (`STORAGE_ALLOWED_TYPES`).
|
||||
|
||||
## Access control
|
||||
|
||||
### Upload flow
|
||||
|
||||
1. Client POSTs to `/api/upload/`
|
||||
2. Go API validates the user is authenticated and authorized for the
|
||||
target resource
|
||||
3. Go API streams the upload to B2 via minio-go's `PutObject`
|
||||
4. B2 returns a key
|
||||
5. Go API stores the key in Postgres
|
||||
6. Returns the key to the client
|
||||
|
||||
The B2 bucket is **private**. Clients can't GET directly; they always
|
||||
go through the Go API.
|
||||
|
||||
### Download flow (current)
|
||||
|
||||
1. Client requests `/api/media/<key>`
|
||||
2. Go API checks the user can access this key
|
||||
3. Go API fetches from B2 and streams back to the client
|
||||
|
||||
This proxies every download through the api. For high-traffic media
|
||||
that's inefficient (api becomes an egress bottleneck).
|
||||
|
||||
### Future: signed URLs
|
||||
|
||||
We could generate time-limited signed URLs for B2 objects:
|
||||
|
||||
```go
|
||||
url, err := s3Client.PresignedGetObject(ctx, bucket, key, 1*time.Hour, nil)
|
||||
```
|
||||
|
||||
Returns a URL the client can GET directly from B2, scoped to a specific
|
||||
object, valid for 1h. Saves api bandwidth and latency.
|
||||
|
||||
Not yet implemented. TODO (Chapter 20).
|
||||
|
||||
## Lifecycle and retention
|
||||
|
||||
We have **no lifecycle rules** set on the bucket. Objects live forever
|
||||
unless the app deletes them.
|
||||
|
||||
When a user deletes their account, the app should delete their B2
|
||||
objects. This is currently not automated — a compliance gap for any
|
||||
"right to be forgotten" request.
|
||||
|
||||
**TODO** (Chapter 20): Either:
|
||||
- Implement explicit cleanup in the user deletion handler, or
|
||||
- Add B2 lifecycle rule tied to object metadata (tag objects with
|
||||
user ID; rule deletes tagged objects when user is soft-deleted)
|
||||
|
||||
## Backup of B2
|
||||
|
||||
We have no backup of B2 objects. B2 itself replicates within the region,
|
||||
but:
|
||||
- Accidental deletion via our app = data gone
|
||||
- B2 itself being compromised = data gone
|
||||
|
||||
B2 offers **Object Lock** (WORM — write once read many) which prevents
|
||||
deletion for a retention period. Not enabled; revisit if/when user data
|
||||
sensitivity justifies it.
|
||||
|
||||
## Cost projection
|
||||
|
||||
Current usage is **small** — estimated <50 GB stored.
|
||||
|
||||
```
|
||||
50 GB × $0.006/GB = $0.30/mo storage
|
||||
1 GB/mo egress (mostly uncached media served via api) → $0.01 (first
|
||||
3× of stored amount is free anyway, so effectively $0)
|
||||
```
|
||||
|
||||
Total B2 cost: **< $1/mo**. Hard spending cap set to $20/mo in B2
|
||||
console — if we ever breach that, something's wrong and we want to
|
||||
know immediately.
|
||||
|
||||
At 100k users each uploading ~10 MB average:
|
||||
- 1 TB stored = $6/mo
|
||||
- Egress depends on access patterns; with signed URLs served through CF
|
||||
the egress could still be ~free
|
||||
|
||||
## Operator cheat sheet
|
||||
|
||||
```bash
|
||||
# List bucket contents (requires mc or aws CLI configured with B2 creds)
|
||||
mc alias set b2 https://s3.us-east-005.backblazeb2.com <KEY_ID> <APP_KEY>
|
||||
mc ls b2/honeyDueProd/
|
||||
|
||||
# Count objects
|
||||
mc find b2/honeyDueProd/ --type f | wc -l
|
||||
|
||||
# Download an object
|
||||
mc cp b2/honeyDueProd/<key> ./
|
||||
|
||||
# Check B2 console for usage graphs:
|
||||
# https://secure.backblaze.com/b2_buckets.htm
|
||||
```
|
||||
|
||||
From inside a Go api pod:
|
||||
```bash
|
||||
# Check the in-cluster client config
|
||||
kubectl exec -n honeydue deploy/api -- env | grep B2_
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Backblaze B2 docs][b2-docs]
|
||||
- [B2 S3-compatible API][b2-s3]
|
||||
- [minio-go/v7][minio-go]
|
||||
- [S3 path-style vs virtual-hosted][s3-style]
|
||||
|
||||
[b2-docs]: https://www.backblaze.com/docs/
|
||||
[b2-s3]: https://www.backblaze.com/docs/cloud-storage-s3-compatible-api
|
||||
[minio-go]: https://github.com/minio/minio-go
|
||||
[s3-style]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html
|
||||
@@ -0,0 +1,369 @@
|
||||
# 10 — Secrets & Config
|
||||
|
||||
## Summary
|
||||
|
||||
Non-sensitive config (hostnames, ports, feature flags, etc.) lives in
|
||||
`honeydue-config` ConfigMap. Sensitive values (DB password, signing
|
||||
keys, API keys) live in `honeydue-secrets` and `honeydue-apns-key`
|
||||
Secrets. Container registry auth lives in `gitea-credentials` (type
|
||||
`kubernetes.io/dockerconfigjson`). This chapter maps every env var to
|
||||
its source and explains what's stored where.
|
||||
|
||||
## Structure
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph SourceWorkstation[Operator workstation]
|
||||
ProdEnv[deploy/prod.env]
|
||||
Secrets[deploy/secrets/*.txt]
|
||||
Registry[deploy/registry.env]
|
||||
end
|
||||
|
||||
subgraph K8s[Kubernetes cluster]
|
||||
CM[honeydue-config<br/>ConfigMap]
|
||||
S1[honeydue-secrets<br/>Secret]
|
||||
S2[honeydue-apns-key<br/>Secret]
|
||||
S3[gitea-credentials<br/>Secret]
|
||||
end
|
||||
|
||||
subgraph Pods
|
||||
Api[api pod]
|
||||
Admin[admin pod]
|
||||
Worker[worker pod]
|
||||
end
|
||||
|
||||
ProdEnv -. kubectl create configmap<br/>--from-env-file .-> CM
|
||||
Secrets -. kubectl create secret<br/>--from-file/--from-literal .-> S1
|
||||
Secrets -. --from-file .-> S2
|
||||
Registry -. kubectl create secret docker-registry .-> S3
|
||||
|
||||
CM -- envFrom --> Api & Admin & Worker
|
||||
S1 -- env: secretKeyRef --> Api & Worker
|
||||
S2 -- volumeMounts --> Api & Worker
|
||||
S3 -- imagePullSecrets --> Api & Admin & Worker
|
||||
```
|
||||
|
||||
## ConfigMap: honeydue-config
|
||||
|
||||
Built from `deploy/prod.env` (minus sensitive keys). Contents (58 keys,
|
||||
abbreviated):
|
||||
|
||||
```
|
||||
ADMIN_PANEL_URL=https://admin.myhoneydue.com
|
||||
ALLOWED_HOSTS=api.myhoneydue.com,myhoneydue.com
|
||||
APNS_AUTH_KEY_ID=DISABLED01
|
||||
APNS_AUTH_KEY_PATH=/secrets/apns/apns_auth_key.p8
|
||||
APNS_PRODUCTION=false
|
||||
APNS_TEAM_ID=DISABLED01
|
||||
APNS_TOPIC=com.tt.honeyDue
|
||||
APNS_USE_SANDBOX=false
|
||||
BASE_URL=https://myhoneydue.com
|
||||
B2_BUCKET_NAME=honeyDueProd
|
||||
B2_ENDPOINT=s3.us-east-005.backblazeb2.com
|
||||
B2_REGION=us-east-005
|
||||
B2_USE_SSL=true
|
||||
CORS_ALLOWED_ORIGINS=https://myhoneydue.com,https://admin.myhoneydue.com
|
||||
DAILY_DIGEST_HOUR=3
|
||||
DB_HOST=ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech
|
||||
DB_MAX_IDLE_CONNS=10
|
||||
DB_MAX_LIFETIME=600s
|
||||
DB_MAX_OPEN_CONNS=25
|
||||
DB_PORT=5432
|
||||
DB_SSLMODE=require
|
||||
DEBUG=false
|
||||
DEFAULT_FROM_EMAIL=noreply@myhoneydue.com
|
||||
EMAIL_HOST=smtp.fastmail.com
|
||||
EMAIL_HOST_USER=treytartt@fastmail.com
|
||||
EMAIL_PORT=587
|
||||
EMAIL_USE_TLS=true
|
||||
FEATURE_EMAIL_ENABLED=true
|
||||
FEATURE_ONBOARDING_EMAILS_ENABLED=true
|
||||
FEATURE_PDF_REPORTS_ENABLED=true
|
||||
FEATURE_PUSH_ENABLED=false
|
||||
FEATURE_WEBHOOKS_ENABLED=true
|
||||
FEATURE_WORKER_ENABLED=true
|
||||
NEXT_PUBLIC_API_URL=https://api.myhoneydue.com
|
||||
OVERDUE_REMINDER_HOUR=15
|
||||
PORT=8000
|
||||
POSTGRES_DB=honeyDue
|
||||
POSTGRES_USER=neondb_owner
|
||||
REDIS_DB=0
|
||||
REDIS_URL=redis://redis:6379/0
|
||||
STATIC_DIR=/app/static
|
||||
STORAGE_ALLOWED_TYPES=image/jpeg,image/png,image/gif,image/webp,application/pdf
|
||||
STORAGE_BASE_URL=/uploads
|
||||
STORAGE_MAX_FILE_SIZE=10485760
|
||||
STORAGE_UPLOAD_DIR=/app/uploads
|
||||
TASK_REMINDER_HOUR=14
|
||||
TIMEZONE=UTC
|
||||
```
|
||||
|
||||
Plus empty-but-declared keys for optional integrations (Apple/Google
|
||||
auth + IAP).
|
||||
|
||||
### How pods use it
|
||||
|
||||
```yaml
|
||||
envFrom:
|
||||
- configMapRef:
|
||||
name: honeydue-config
|
||||
```
|
||||
|
||||
Every key in the ConfigMap becomes an env var in the container.
|
||||
`envFrom` is bulk — no need to enumerate each one.
|
||||
|
||||
### Changing config
|
||||
|
||||
Edit `deploy/prod.env` locally, regenerate the ConfigMap:
|
||||
|
||||
```bash
|
||||
# Simplified; see scripts for the full version
|
||||
kubectl create configmap honeydue-config -n honeydue \
|
||||
--from-env-file=deploy/prod.env \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
# Pods don't auto-reload env vars. Restart to pick up changes:
|
||||
kubectl rollout restart -n honeydue deploy/api deploy/admin deploy/worker
|
||||
```
|
||||
|
||||
## Secret: honeydue-secrets (Opaque)
|
||||
|
||||
9 keys:
|
||||
|
||||
| Key | Purpose |
|
||||
|---|---|
|
||||
| `POSTGRES_PASSWORD` | Neon DB password |
|
||||
| `SECRET_KEY` | Django-compat signing key (64 chars, base64) |
|
||||
| `EMAIL_HOST_PASSWORD` | Fastmail app password |
|
||||
| `FCM_SERVER_KEY` | FCM push key (currently placeholder, push disabled) |
|
||||
| `REDIS_PASSWORD` | Empty (no auth on in-cluster Redis) |
|
||||
| `B2_KEY_ID` | Backblaze B2 app key ID |
|
||||
| `B2_APP_KEY` | Backblaze B2 app key secret |
|
||||
| `ADMIN_EMAIL` | Next.js admin panel initial admin email |
|
||||
| `ADMIN_PASSWORD` | Next.js admin panel initial admin password |
|
||||
|
||||
### How pods use it
|
||||
|
||||
Individual `env:` entries wire specific Secret keys to env vars:
|
||||
|
||||
```yaml
|
||||
env:
|
||||
- name: POSTGRES_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: POSTGRES_PASSWORD
|
||||
- name: SECRET_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: honeydue-secrets
|
||||
key: SECRET_KEY
|
||||
# ... etc
|
||||
```
|
||||
|
||||
This pattern (vs. `envFrom: secretRef:`) is more explicit — you know
|
||||
exactly which secret keys a pod uses by reading the manifest.
|
||||
|
||||
### ADMIN_PASSWORD — one-time use
|
||||
|
||||
The Go app's `internal/database/database.go:519-538` reads
|
||||
`ADMIN_EMAIL` + `ADMIN_PASSWORD` at startup. If the `admin_users` table
|
||||
doesn't have a row for that email, it inserts one with a bcrypt hash of
|
||||
the password. Already-existing rows are **not** updated.
|
||||
|
||||
So:
|
||||
- First deploy: admin user created
|
||||
- Subsequent deploys: no-op
|
||||
- If you want to rotate the initial admin password: do it in the admin
|
||||
panel UI, not by changing `ADMIN_PASSWORD`
|
||||
|
||||
After first deploy you can technically blank `ADMIN_PASSWORD` in the
|
||||
Secret. Leaving it set is harmless but slightly messy.
|
||||
|
||||
## Secret: honeydue-apns-key (Opaque)
|
||||
|
||||
One file: `apns_auth_key.p8`. Mounted as a volume into api and worker
|
||||
pods at `/secrets/apns/apns_auth_key.p8` (read-only).
|
||||
|
||||
Push is currently **disabled** (`FEATURE_PUSH_ENABLED=false`), so
|
||||
this `.p8` is a throwaway EC P-256 private key generated by
|
||||
`openssl genpkey`. It passes the Go app's "does this file contain
|
||||
`BEGIN PRIVATE KEY`" validation but cannot authenticate against Apple.
|
||||
|
||||
When push is enabled:
|
||||
|
||||
1. Generate a real APNs auth key in Apple Developer console
|
||||
2. Replace `deploy/secrets/apns_auth_key.p8`
|
||||
3. Update `APNS_AUTH_KEY_ID`, `APNS_TEAM_ID`, `APNS_TOPIC` in ConfigMap
|
||||
4. `kubectl create secret generic honeydue-apns-key ... --dry-run=client -o yaml | kubectl apply -f -`
|
||||
5. Set `FEATURE_PUSH_ENABLED=true`
|
||||
6. `kubectl rollout restart` api and worker
|
||||
|
||||
## Secret: gitea-credentials (docker-registry)
|
||||
|
||||
Type `kubernetes.io/dockerconfigjson`. Contains a base64-encoded Docker
|
||||
config for Gitea registry auth.
|
||||
|
||||
Created via:
|
||||
|
||||
```bash
|
||||
kubectl create secret docker-registry gitea-credentials \
|
||||
--namespace=honeydue \
|
||||
--docker-server=gitea.treytartt.com \
|
||||
--docker-username=admin \
|
||||
--docker-password=<gitea PAT> \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
```
|
||||
|
||||
Referenced in every deployment that pulls from Gitea:
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
imagePullSecrets:
|
||||
- name: gitea-credentials
|
||||
```
|
||||
|
||||
When a pod needs to pull an image, the kubelet reads this secret and
|
||||
uses it for the registry authentication.
|
||||
|
||||
## Source files — what's canonical
|
||||
|
||||
The Swarm-era files are still the **source of truth** for secrets:
|
||||
|
||||
| File | Contents | Canonical? |
|
||||
|---|---|---|
|
||||
| `deploy/prod.env` | All non-sensitive config | Yes |
|
||||
| `deploy/secrets/postgres_password.txt` | Neon DB password | Yes |
|
||||
| `deploy/secrets/secret_key.txt` | App signing key | Yes |
|
||||
| `deploy/secrets/email_host_password.txt` | Fastmail password | Yes |
|
||||
| `deploy/secrets/fcm_server_key.txt` | FCM key (placeholder) | Yes |
|
||||
| `deploy/secrets/apns_auth_key.p8` | APNs key (placeholder) | Yes |
|
||||
| `deploy/registry.env` | Gitea registry auth | Yes |
|
||||
| `deploy-k3s/manifests/secrets.yaml.example` | Template only (never committed with real values) | No — template |
|
||||
| In-cluster Secrets | Live state | Derived |
|
||||
|
||||
### Why canonical lives in `deploy/` not `deploy-k3s/`
|
||||
|
||||
Historical. We migrated from Swarm to k3s but kept the source files
|
||||
untouched. Rather than move them now (and break any remaining Swarm-era
|
||||
tooling), we use them from the k3s setup scripts as-is.
|
||||
|
||||
Future cleanup: move to `deploy-k3s/secrets/` for better provenance.
|
||||
|
||||
## Recreating the cluster secrets
|
||||
|
||||
If the k3s cluster is rebuilt, the Secrets need to be recreated from the
|
||||
local source files. Rough procedure:
|
||||
|
||||
```bash
|
||||
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
|
||||
|
||||
# Namespace first
|
||||
kubectl create namespace honeydue
|
||||
|
||||
# Docker config secret for Gitea
|
||||
set -a; source deploy/registry.env; set +a
|
||||
kubectl create secret docker-registry gitea-credentials \
|
||||
-n honeydue \
|
||||
--docker-server="$REGISTRY" \
|
||||
--docker-username="$REGISTRY_USERNAME" \
|
||||
--docker-password="$REGISTRY_TOKEN"
|
||||
|
||||
# Main secrets bundle
|
||||
set -a; source deploy/prod.env; set +a
|
||||
kubectl create secret generic honeydue-secrets -n honeydue \
|
||||
--from-literal=POSTGRES_PASSWORD="$(tr -d '\n' < deploy/secrets/postgres_password.txt)" \
|
||||
--from-literal=SECRET_KEY="$(tr -d '\n' < deploy/secrets/secret_key.txt)" \
|
||||
--from-literal=EMAIL_HOST_PASSWORD="$(tr -d '\n' < deploy/secrets/email_host_password.txt)" \
|
||||
--from-literal=FCM_SERVER_KEY="$(tr -d '\n' < deploy/secrets/fcm_server_key.txt)" \
|
||||
--from-literal=REDIS_PASSWORD="" \
|
||||
--from-literal=B2_KEY_ID="$B2_KEY_ID" \
|
||||
--from-literal=B2_APP_KEY="$B2_APP_KEY" \
|
||||
--from-literal=ADMIN_EMAIL="$ADMIN_EMAIL" \
|
||||
--from-literal=ADMIN_PASSWORD="$ADMIN_PASSWORD"
|
||||
|
||||
# APNS key Secret
|
||||
kubectl create secret generic honeydue-apns-key -n honeydue \
|
||||
--from-file=apns_auth_key.p8=deploy/secrets/apns_auth_key.p8
|
||||
|
||||
# ConfigMap from prod.env (minus secret keys)
|
||||
# See deploy-k3s/scripts/02-setup-secrets.sh for the full version
|
||||
# Simplified:
|
||||
declare -a args
|
||||
secret_keys="POSTGRES_PASSWORD SECRET_KEY EMAIL_HOST_PASSWORD FCM_SERVER_KEY REDIS_PASSWORD B2_KEY_ID B2_APP_KEY ADMIN_EMAIL ADMIN_PASSWORD"
|
||||
while IFS='=' read -r k v; do
|
||||
[[ -z "$k" || "$k" =~ ^# ]] && continue
|
||||
for sk in $secret_keys; do [[ "$k" == "$sk" ]] && continue 2; done
|
||||
args+=(--from-literal="$k=$v")
|
||||
done < deploy/prod.env
|
||||
kubectl create configmap honeydue-config -n honeydue "${args[@]}"
|
||||
```
|
||||
|
||||
The full version with all edge cases is in
|
||||
`deploy-k3s/scripts/02-setup-secrets.sh` (which was written for the
|
||||
GHCR-era assumption; adapt for Gitea).
|
||||
|
||||
## Pitfalls
|
||||
|
||||
### Trailing newlines in secret files
|
||||
|
||||
Secret files created by text editors typically end with a newline. If
|
||||
we pass the content directly, the newline becomes part of the secret
|
||||
— a mismatch to what the app expects.
|
||||
|
||||
We strip trailing newlines with `tr -d '\n'` before creating Secrets.
|
||||
If you forget, your DB password will be silently wrong.
|
||||
|
||||
### Case sensitivity on POSTGRES_DB
|
||||
|
||||
`POSTGRES_DB=honeyDue` must be exactly `honeyDue`. `honeydue` (lowercase)
|
||||
fails with `database "honeydue" does not exist`. Postgres identifiers
|
||||
are case-sensitive if originally quoted at CREATE time.
|
||||
|
||||
### Placeholder detection
|
||||
|
||||
The Swarm-era deploy script rejected values containing `CHANGEME`,
|
||||
`your-`, `paste_here`, etc. When setting up the k3s cluster we had to
|
||||
strip those from `prod.env` first. If you ever see a pod error about
|
||||
"invalid host" or "invalid key id", check if a placeholder leaked
|
||||
through.
|
||||
|
||||
### B2_USE_SSL vs STORAGE_USE_SSL
|
||||
|
||||
The config has `B2_USE_SSL` but the Go code reads `STORAGE_USE_SSL`.
|
||||
See Chapter 9 §Vestigial variable. Setting `B2_USE_SSL=false` in the
|
||||
ConfigMap does nothing; SSL stays on.
|
||||
|
||||
## Operator cheat sheet
|
||||
|
||||
```bash
|
||||
# Print a ConfigMap as env-file format
|
||||
kubectl get cm honeydue-config -n honeydue -o jsonpath='{range .data}{"\n"}{end}'
|
||||
|
||||
# Edit a ConfigMap interactively (DOES NOT restart pods)
|
||||
kubectl edit cm honeydue-config -n honeydue
|
||||
|
||||
# After editing a ConfigMap, restart pods to pick up
|
||||
kubectl rollout restart -n honeydue deploy/api deploy/admin deploy/worker
|
||||
|
||||
# View a Secret (prints base64 — decode with base64 -d)
|
||||
kubectl get secret honeydue-secrets -n honeydue -o yaml
|
||||
|
||||
# Reveal a specific secret value (DANGER: plaintext to stdout)
|
||||
kubectl get secret honeydue-secrets -n honeydue \
|
||||
-o jsonpath='{.data.POSTGRES_PASSWORD}' | base64 -d
|
||||
|
||||
# Update a single secret key
|
||||
kubectl patch secret honeydue-secrets -n honeydue \
|
||||
--type=merge -p "{\"data\":{\"SECRET_KEY\":\"$(echo -n 'newvalue' | base64)\"}}"
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Kubernetes ConfigMaps][cm]
|
||||
- [Kubernetes Secrets][secret]
|
||||
- [Secret types][secret-types]
|
||||
|
||||
[cm]: https://kubernetes.io/docs/concepts/configuration/configmap/
|
||||
[secret]: https://kubernetes.io/docs/concepts/configuration/secret/
|
||||
[secret-types]: https://kubernetes.io/docs/concepts/configuration/secret/#secret-types
|
||||
@@ -0,0 +1,329 @@
|
||||
# 11 — Container Registry (Gitea)
|
||||
|
||||
## Summary
|
||||
|
||||
We host our own container registry on Gitea at `gitea.treytartt.com`.
|
||||
Every image push and pull goes there, not Docker Hub or GHCR. The Gitea
|
||||
instance runs outside this k3s cluster (on its own VPS) and is available
|
||||
at `https://gitea.treytartt.com` with public HTTPS. Image pulls are
|
||||
authenticated via a Personal Access Token stored as a Kubernetes
|
||||
`dockerconfigjson` Secret.
|
||||
|
||||
## Why Gitea
|
||||
|
||||
### Decision matrix
|
||||
|
||||
| Option | Cost | Auth model | Pros | Cons |
|
||||
|---|---|---|---|---|
|
||||
| **Gitea built-in registry** | $0 (already running Gitea) | Gitea PAT | Self-hosted, integrated with code | Another service to maintain |
|
||||
| GHCR (GitHub Container Registry) | Free for public, $0 for private with paid plan | GitHub PAT | Popular, reliable | Uses GitHub; vendor dependency |
|
||||
| Docker Hub | Free tier limited; paid $5-7/mo | Docker Hub account | Ubiquitous | Rate limits on anonymous pulls |
|
||||
| AWS ECR | ~$1/mo for small use | IAM | Integrates with AWS workloads | AWS account required |
|
||||
| Harbor (self-hosted) | $0 | Many options | Best enterprise features | Heavy to operate |
|
||||
|
||||
Gitea won primarily because **the operator was already running Gitea for
|
||||
code hosting**. Container registry is built into Gitea 1.17+ as a free
|
||||
feature. One fewer service to set up.
|
||||
|
||||
Side benefits:
|
||||
- Code and images live together (one backup policy, one access model)
|
||||
- PATs are scoped and rotatable via the same UI
|
||||
- No external vendor to worry about for this critical piece of the
|
||||
deploy pipeline
|
||||
|
||||
Rejected alternatives:
|
||||
- **Docker Hub** — rate limits on unauthenticated pulls would bite us if
|
||||
nodes pull the same image repeatedly during rolling updates
|
||||
- **GHCR** — fine but adds GitHub dependency we don't otherwise have
|
||||
- **Harbor** — massive overkill; we're not a 100-team enterprise
|
||||
|
||||
## Layout
|
||||
|
||||
Images live under the authenticated user's namespace:
|
||||
|
||||
```
|
||||
gitea.treytartt.com/admin/honeydue-api:237c6b8
|
||||
gitea.treytartt.com/admin/honeydue-worker:237c6b8
|
||||
gitea.treytartt.com/admin/honeydue-admin:237c6b8
|
||||
```
|
||||
|
||||
`admin` is the Gitea user that owns the images. Images are **private**
|
||||
by default.
|
||||
|
||||
### Image tagging strategy
|
||||
|
||||
Tags are git short SHAs (e.g., `237c6b8`). Not `:latest`. Not semantic
|
||||
version.
|
||||
|
||||
Rationale:
|
||||
- `:latest` is ambiguous — which build? Rolling updates should roll a
|
||||
*specific* tag so rollbacks are deterministic.
|
||||
- `:v1.2.3` works for released libraries but our app rolls forward
|
||||
continuously; versioning per deploy is unnecessary overhead.
|
||||
- Git SHAs are unique, immutable, and tie each image to the exact
|
||||
commit that built it.
|
||||
|
||||
`PUSH_LATEST_TAG=false` is set in `deploy/cluster.env`. When we rebuild
|
||||
and push, only the SHA tag gets pushed. The `latest` tag is never
|
||||
created by our deploy pipeline.
|
||||
|
||||
## Authentication
|
||||
|
||||
### Creating the PAT
|
||||
|
||||
At <https://gitea.treytartt.com/-/user/settings/applications>, we created
|
||||
a token with scopes:
|
||||
|
||||
- `read:package`
|
||||
- `write:package`
|
||||
|
||||
No other scopes. This token can only interact with package registry; it
|
||||
can't read repo contents, create issues, or touch account settings.
|
||||
|
||||
### PAT on the operator workstation
|
||||
|
||||
Stored in `deploy/registry.env`:
|
||||
|
||||
```
|
||||
REGISTRY=gitea.treytartt.com
|
||||
REGISTRY_NAMESPACE=admin
|
||||
REGISTRY_USERNAME=admin
|
||||
REGISTRY_TOKEN=<pat>
|
||||
```
|
||||
|
||||
This file is `.gitignore`d in `deploy/.gitignore`. If it ever gets
|
||||
committed accidentally, rotate the PAT immediately.
|
||||
|
||||
### PAT in the cluster
|
||||
|
||||
Stored as the `gitea-credentials` Secret (type `dockerconfigjson`) in
|
||||
the `honeydue` namespace. See Chapter 10.
|
||||
|
||||
Kubelet reads this Secret when a pod needs to pull from the Gitea
|
||||
registry.
|
||||
|
||||
## The build pipeline
|
||||
|
||||
### Dockerfile multi-stage
|
||||
|
||||
`honeyDueAPI-go/Dockerfile` has three target stages:
|
||||
|
||||
- `api` — compiled Go binary + static assets for the HTTP API
|
||||
- `worker` — compiled Go binary for the background worker
|
||||
- `admin` — Next.js standalone build of the admin panel
|
||||
|
||||
A single Dockerfile keeps build-cache sharing efficient (the Go builder
|
||||
stage produces binaries for both api and worker; admin reuses its own
|
||||
Node builder stage).
|
||||
|
||||
### Multi-arch cross-compilation
|
||||
|
||||
The operator workstation is **arm64** (Apple Silicon). The Hetzner nodes
|
||||
are **x86_64**. A naive `docker build` on arm64 produces arm64 images
|
||||
that won't run on the nodes (`exec format error`).
|
||||
|
||||
The deploy pipeline uses `docker buildx`:
|
||||
|
||||
```bash
|
||||
docker buildx build \
|
||||
--platform linux/amd64 \
|
||||
--target api \
|
||||
-t gitea.treytartt.com/admin/honeydue-api:$SHA \
|
||||
--push \
|
||||
/Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
|
||||
```
|
||||
|
||||
- **`--platform linux/amd64`** — cross-compile to x86_64
|
||||
- **`--target api`** — which Dockerfile stage to build
|
||||
- **`--push`** — push directly to the registry (skip local image cache)
|
||||
|
||||
The Go stages use the `TARGETARCH` build arg to produce the right
|
||||
architecture binary. Node stages use QEMU emulation (which is slower but
|
||||
acceptable for our ~1 min admin build).
|
||||
|
||||
### Buildx builder
|
||||
|
||||
We use a named buildx builder to keep state out of Docker's default
|
||||
environment:
|
||||
|
||||
```bash
|
||||
docker buildx create --name honeydue-builder --use
|
||||
docker buildx inspect --bootstrap
|
||||
```
|
||||
|
||||
The `honeydue-builder` is a docker-container driver — spawns a
|
||||
BuildKit container when building, tears it down when idle. Supports
|
||||
multi-platform and caches layers across builds.
|
||||
|
||||
## From local file to cluster — the full path
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph dev[Operator workstation]
|
||||
Code[Source code]
|
||||
Dockerfile
|
||||
Buildx[docker buildx]
|
||||
end
|
||||
subgraph Gitea[gitea.treytartt.com]
|
||||
Reg[Package registry]
|
||||
end
|
||||
subgraph K8s[k3s cluster]
|
||||
Kubelet
|
||||
Containerd
|
||||
Pod
|
||||
end
|
||||
|
||||
Code --> Dockerfile
|
||||
Dockerfile --> Buildx
|
||||
Buildx -- push --> Reg
|
||||
Reg -- pull --> Kubelet
|
||||
Kubelet --> Containerd
|
||||
Containerd --> Pod
|
||||
```
|
||||
|
||||
### End-to-end
|
||||
|
||||
1. **Operator pushes code**: commits to `main` locally
|
||||
2. **Operator builds + pushes image**: `docker buildx build --push ...`
|
||||
from the repo root. Build takes 1–3 minutes first time, seconds on
|
||||
warm cache.
|
||||
3. **Image lands in Gitea**: visible at
|
||||
`https://gitea.treytartt.com/admin/-/packages/container/honeydue-api`
|
||||
4. **Operator updates Deployment**: `kubectl set image deployment/api
|
||||
api=gitea.treytartt.com/admin/honeydue-api:$NEW_SHA -n honeydue`
|
||||
5. **K8s begins rolling update**: creates new ReplicaSet with new image
|
||||
6. **Kubelet on target node** sees a pod with an image it doesn't have
|
||||
7. **Kubelet calls containerd**: "pull this image using these creds"
|
||||
8. **Containerd authenticates** to Gitea registry using the PAT from
|
||||
`gitea-credentials` Secret, downloads the image
|
||||
9. **Containerd starts the container** with the new image
|
||||
10. **Readiness probe passes**: new pod joins the Service endpoints
|
||||
11. **Kubelet tears down** an old pod
|
||||
|
||||
## Pushing manually
|
||||
|
||||
If you need to push a one-off image (e.g., testing a fix):
|
||||
|
||||
```bash
|
||||
# Login (once per session)
|
||||
set -a; source deploy/registry.env; set +a
|
||||
printf '%s' "$REGISTRY_TOKEN" | docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin
|
||||
|
||||
# Build + push
|
||||
cd honeyDueAPI-go
|
||||
SHA=$(git rev-parse --short HEAD)
|
||||
docker buildx build \
|
||||
--platform linux/amd64 \
|
||||
--target api \
|
||||
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" \
|
||||
--push .
|
||||
|
||||
# Logout (don't leave creds in ~/.docker/config.json)
|
||||
docker logout gitea.treytartt.com
|
||||
```
|
||||
|
||||
## Image sizes
|
||||
|
||||
Current images:
|
||||
|
||||
| Image | Size | Layers |
|
||||
|---|---|---|
|
||||
| `honeydue-api` | ~53 MB | Alpine base + Go binary |
|
||||
| `honeydue-worker` | ~50 MB | Alpine base + Go binary |
|
||||
| `honeydue-admin` | ~150 MB | Node 20 alpine + Next.js standalone |
|
||||
|
||||
The Go binaries are statically compiled, CGO_ENABLED=0. Alpine is the
|
||||
base for smallest footprint.
|
||||
|
||||
## Image retention
|
||||
|
||||
Gitea does **not auto-prune** images. Every `:<sha>` tag accumulates
|
||||
forever. The package page at
|
||||
`https://gitea.treytartt.com/admin/-/packages/container/honeydue-api`
|
||||
lists them all.
|
||||
|
||||
At current pace (deploys ~few/week, images ~50-150 MB each), this grows
|
||||
~10 GB/year. Not critical; 80 GB node disk can take years.
|
||||
|
||||
**TODO**: Add a monthly cleanup: delete all but last 30 tags per image.
|
||||
Can be a cron job or a manual quarterly cleanup.
|
||||
|
||||
## Image verification — not yet
|
||||
|
||||
We do not sign images or verify signatures. An attacker who compromised
|
||||
Gitea could push a malicious image under an existing tag (though Gitea
|
||||
should prevent tag reuse if immutable tags are configured).
|
||||
|
||||
**TODO** (Chapter 20): Add [cosign](https://github.com/sigstore/cosign)
|
||||
for signing at build time + `Kyverno` or `Connaisseur` policy to verify
|
||||
at pull time.
|
||||
|
||||
## Gitea registry itself
|
||||
|
||||
The Gitea instance runs outside this k3s cluster on its own VPS
|
||||
(operator's existing infrastructure). It's **not** part of the honeyDue
|
||||
deployment — it's adjacent infrastructure.
|
||||
|
||||
If the Gitea host goes down:
|
||||
- Currently-running pods keep working (they already pulled their images)
|
||||
- New deployments/scale-ups fail at the image-pull step
|
||||
- No impact on existing user traffic
|
||||
|
||||
This is an acceptable external dependency. Gitea host has its own
|
||||
uptime story.
|
||||
|
||||
## Cost
|
||||
|
||||
**$0/mo.** Gitea registry is included in the Gitea install we already
|
||||
pay the VPS for (not accounted to honeyDue's cost).
|
||||
|
||||
If we ever switched to GHCR, cost would still be $0 for public images
|
||||
or bundled with our (nonexistent) GitHub Team subscription.
|
||||
|
||||
## What we don't have
|
||||
|
||||
- **Image scanning** (Trivy, Snyk) — scan images for known CVEs on push
|
||||
- **Image signing** (cosign)
|
||||
- **Multi-region replication** — only hosted in one place
|
||||
- **High availability** — Gitea is single-instance
|
||||
|
||||
For our scale, none of these are needed. TODO (Chapter 20) if the
|
||||
operator appetite increases.
|
||||
|
||||
## Operator cheat sheet
|
||||
|
||||
```bash
|
||||
# List packages via API
|
||||
curl -sS "https://gitea.treytartt.com/api/v1/packages/admin?type=container" \
|
||||
-H "Accept: application/json" | jq .
|
||||
|
||||
# Browse in UI
|
||||
# https://gitea.treytartt.com/admin/-/packages
|
||||
|
||||
# Delete a specific tag via API
|
||||
curl -X DELETE \
|
||||
-H "Authorization: token $GITEA_PAT" \
|
||||
"https://gitea.treytartt.com/api/v1/packages/admin/container/honeydue-api/237c6b8"
|
||||
|
||||
# Login from kubectl side (refresh the Secret)
|
||||
kubectl create secret docker-registry gitea-credentials -n honeydue \
|
||||
--docker-server=gitea.treytartt.com \
|
||||
--docker-username=admin \
|
||||
--docker-password=<new PAT> \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
# After rotating PAT, restart pods that use it for pulls
|
||||
kubectl rollout restart -n honeydue deploy/api deploy/admin deploy/worker
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Gitea Container Registry][gitea-cr]
|
||||
- [Docker buildx multi-platform][buildx]
|
||||
- [Kubernetes image pull secrets][pull-secrets]
|
||||
- [cosign][cosign]
|
||||
|
||||
[gitea-cr]: https://docs.gitea.com/usage/packages/container
|
||||
[buildx]: https://docs.docker.com/build/buildx/
|
||||
[pull-secrets]: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
|
||||
[cosign]: https://github.com/sigstore/cosign
|
||||
@@ -0,0 +1,317 @@
|
||||
# 12 — Data Flow
|
||||
|
||||
## Summary
|
||||
|
||||
This chapter follows a user's request end to end, hop by hop. It's the
|
||||
consolidated picture of Chapters 3, 6, 7, 8, 9 working together. Use
|
||||
this chapter to answer "when X doesn't work, which layer failed?"
|
||||
|
||||
## Scenario: User creates a task
|
||||
|
||||
A user in Austin opens the mobile app and adds a new task for their
|
||||
property. The client sends `POST https://api.myhoneydue.com/api/tasks/`
|
||||
with a JSON body and an auth token. We trace every hop.
|
||||
|
||||
## Hop 1 — Mobile client → Cloudflare edge
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant App as iOS client
|
||||
participant DNS as Local DNS
|
||||
participant CFE as Cloudflare edge (DFW)
|
||||
|
||||
App->>DNS: Resolve api.myhoneydue.com
|
||||
DNS->>App: 104.21.13.7 (Cloudflare edge IP)
|
||||
App->>CFE: TCP SYN :443
|
||||
CFE-->>App: TCP SYN+ACK
|
||||
App->>CFE: TLS ClientHello
|
||||
CFE->>App: TLS ServerHello + cert
|
||||
Note over App,CFE: TLS 1.3 handshake<br/>~1 RTT
|
||||
App->>CFE: HTTP/2 stream<br/>POST /api/tasks/<br/>Authorization: Token <xxx>
|
||||
```
|
||||
|
||||
- Client resolves `api.myhoneydue.com` via OS resolver, gets Cloudflare
|
||||
edge IP (not our origin IP)
|
||||
- Client establishes TLS 1.3 to CF's nearest POP (Dallas for Austin)
|
||||
- Cert presented by CF is `sni.cloudflaressl.com` or a CF-issued
|
||||
`*.myhoneydue.com` — our origin cert is never seen by the client
|
||||
- Latency: ~5–15 ms Austin → DFW
|
||||
|
||||
## Hop 2 — Cloudflare edge → Origin (hetzner)
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant CFE as Cloudflare DFW POP
|
||||
participant DNS as CF internal DNS
|
||||
participant HN as hetzner node (random of 3)
|
||||
participant Traefik as Traefik pod<br/>(host network)
|
||||
|
||||
CFE->>DNS: Which origin for api.myhoneydue.com?
|
||||
DNS->>CFE: One of 178.104.247.152, 178.105.32.198, 178.104.249.189
|
||||
CFE->>HN: TCP SYN :80
|
||||
HN-->>CFE: SYN+ACK
|
||||
CFE->>HN: HTTP/1.1 POST /api/tasks/<br/>Host: api.myhoneydue.com<br/>X-Forwarded-For: <user IP><br/>X-Forwarded-Proto: https<br/>CF-Connecting-IP: <user IP>
|
||||
Note over HN: UFW: allow 80/tcp from<br/>anywhere (anywhere for now)
|
||||
HN->>Traefik: delivered to listener
|
||||
```
|
||||
|
||||
- CF picks one of the 3 node IPs via DNS round-robin. This is per-connection, not per-request.
|
||||
- Protocol between CF and origin: **HTTP/1.1 plaintext** (SSL=Flexible).
|
||||
A future Full-strict upgrade would make this HTTPS.
|
||||
- Latency: ~90–120 ms DFW → Nuremberg
|
||||
- CF adds headers: `CF-Connecting-IP`, `X-Forwarded-For`, `X-Forwarded-Proto`
|
||||
|
||||
## Hop 3 — Traefik → api Service
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Traefik as Traefik pod
|
||||
participant CoreDNS as CoreDNS (10.43.0.10)
|
||||
participant KP as kube-proxy IPVS<br/>(kernel)
|
||||
participant APIPod as api pod<br/>(some node)
|
||||
|
||||
Note over Traefik: Match Host: api.myhoneydue.com<br/>→ honeydue-api Ingress<br/>→ backend: api Service :8000
|
||||
Traefik->>CoreDNS: Resolve "api"
|
||||
CoreDNS->>Traefik: 10.43.167.83 (Service ClusterIP)
|
||||
Traefik->>KP: TCP SYN to 10.43.167.83:8000
|
||||
KP->>KP: IPVS: pick endpoint<br/>from Service endpoint set
|
||||
KP->>APIPod: Rewrite destination<br/>to 10.42.2.6:8000<br/>(Flannel VXLAN if remote node)
|
||||
```
|
||||
|
||||
- Traefik resolves `api` via CoreDNS → gets the Service ClusterIP
|
||||
- Traefik sends to `10.43.167.83:8000`
|
||||
- kube-proxy IPVS (running in-kernel on the node where Traefik lives)
|
||||
intercepts, picks a live endpoint, rewrites
|
||||
- Destination might be local (same node) or remote (VXLAN tunnel to
|
||||
another node)
|
||||
- Latency: <3 ms even cross-node
|
||||
|
||||
## Hop 4 — api → Postgres (Neon)
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant API as api pod (Go)
|
||||
participant Resolv as Pod resolv.conf
|
||||
participant Neon as Neon pooler<br/>AWS us-east-1
|
||||
|
||||
API->>Resolv: Resolve ep-floral-truth-...-pooler.us-east-1.aws.neon.tech
|
||||
Note over Resolv: Goes to CoreDNS<br/>which forwards to upstream<br/>(Hetzner's DNS, then public root)
|
||||
Resolv->>API: Neon pooler IP (e.g., 34.206.177.121)
|
||||
API->>Neon: TCP :5432
|
||||
API->>Neon: TLS 1.3 handshake (DB_SSLMODE=require)
|
||||
API->>Neon: Postgres startup (user, database)
|
||||
API->>Neon: BEGIN<br/>SELECT ... FROM task_task WHERE residence_id = ?<br/>INSERT INTO task_task (...) VALUES (...)<br/>COMMIT
|
||||
Neon-->>API: Query results
|
||||
```
|
||||
|
||||
- Go's database/sql pool may already have an idle connection. If so,
|
||||
skip handshake.
|
||||
- If new connection: ~50 ms TLS handshake + Postgres startup
|
||||
- Query itself: typically ~5–20 ms (single-row read/write on indexed
|
||||
columns)
|
||||
- Total for this hop: often <10 ms on a warm connection, ~80 ms cold
|
||||
|
||||
## Hop 5 — api → Redis (cache miss invalidation)
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant API as api pod
|
||||
participant CoreDNS
|
||||
participant KP as kube-proxy
|
||||
participant Redis as redis pod
|
||||
|
||||
API->>CoreDNS: Resolve "redis"
|
||||
CoreDNS->>API: 10.43.7.10
|
||||
API->>KP: TCP :6379
|
||||
KP->>Redis: rewritten to 10.42.x.y:6379
|
||||
API->>Redis: DEL tasks:user:<user_id> (invalidate cached list)
|
||||
Redis-->>API: OK
|
||||
```
|
||||
|
||||
- Redis connection is usually kept alive in the api's pool
|
||||
- Latency: <1 ms (Redis is on hetzner2, usually a short hop)
|
||||
|
||||
## Hop 6 — api → worker (enqueue side effect)
|
||||
|
||||
For some task creation events, api enqueues a background job
|
||||
(send-notification, update-lookup-table, etc.):
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant API as api pod
|
||||
participant Redis as redis pod (acting as Asynq queue)
|
||||
participant Worker as worker pod
|
||||
|
||||
API->>Redis: RPUSH asynq:queue:default <job JSON>
|
||||
Redis-->>API: OK
|
||||
Note over API,Worker: (Async, no response blocking)
|
||||
Worker->>Redis: BLPOP asynq:queue:default
|
||||
Redis-->>Worker: <job JSON>
|
||||
Worker->>Worker: Process job<br/>(send email, push, etc.)
|
||||
```
|
||||
|
||||
api returns to the caller without waiting for the job.
|
||||
|
||||
## Hop 7 — Response back to user
|
||||
|
||||
Reverse the path:
|
||||
|
||||
1. api returns JSON response to Traefik
|
||||
2. Traefik returns to Cloudflare
|
||||
3. Cloudflare re-encrypts TLS to user
|
||||
4. User receives response
|
||||
|
||||
## End-to-end latency budget
|
||||
|
||||
For a typical "create task" operation:
|
||||
|
||||
| Hop | Latency |
|
||||
|---|---|
|
||||
| User → CF (Austin → DFW) | 5–15 ms |
|
||||
| CF → hetzner (cross-Atlantic) | 90–120 ms |
|
||||
| UFW + kernel + Traefik accept | <1 ms |
|
||||
| Traefik → api (same or cross-node) | 1–3 ms |
|
||||
| api request parsing, auth validation | 1–3 ms |
|
||||
| api → Postgres (query) | 20–60 ms |
|
||||
| api → Redis (invalidate) | <1 ms |
|
||||
| api response generation | 1–5 ms |
|
||||
| Return path | same as forward, reversed |
|
||||
|
||||
**Total**: ~220–310 ms typical. Dominated by the cross-Atlantic CF→origin
|
||||
hop and the Postgres query round trip.
|
||||
|
||||
## Read path (GET /api/tasks/)
|
||||
|
||||
Similar but simpler:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant App as iOS client
|
||||
participant CF as Cloudflare
|
||||
participant Traefik
|
||||
participant API as api pod
|
||||
participant Redis
|
||||
participant Neon
|
||||
|
||||
App->>CF: GET /api/tasks/
|
||||
CF->>Traefik: (no cache hit)
|
||||
Traefik->>API: Route via Service
|
||||
API->>Redis: GET tasks:user:<user_id>
|
||||
alt Cache hit
|
||||
Redis-->>API: cached JSON
|
||||
else Cache miss
|
||||
API->>Neon: SELECT ...
|
||||
Neon-->>API: rows
|
||||
API->>Redis: SET tasks:user:<user_id> <json> EX 300
|
||||
end
|
||||
API-->>Traefik: 200 JSON
|
||||
Traefik-->>CF: 200
|
||||
CF-->>App: 200 (may cache per response headers)
|
||||
```
|
||||
|
||||
## Admin panel data flow
|
||||
|
||||
A different dance because the admin is Next.js:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Browser
|
||||
participant CF
|
||||
participant Traefik
|
||||
participant Admin as admin pod (Next.js)
|
||||
participant AdminAPI as api pod<br/>(via public URL)
|
||||
participant Neon
|
||||
|
||||
Browser->>CF: GET admin.myhoneydue.com/users
|
||||
CF->>Traefik: HTTP :80
|
||||
Traefik->>Admin: Service /users
|
||||
Note over Admin: Next.js SSR:<br/>fetch from NEXT_PUBLIC_API_URL
|
||||
Admin->>CF: GET api.myhoneydue.com/api/admin/users/
|
||||
CF->>Traefik: (api ingress)
|
||||
Traefik->>AdminAPI: Service
|
||||
AdminAPI->>Neon: SELECT ... FROM auth_user
|
||||
Neon-->>AdminAPI: rows
|
||||
AdminAPI-->>Admin: JSON
|
||||
Admin->>Admin: Render HTML
|
||||
Admin-->>Traefik: HTML
|
||||
Traefik-->>CF: HTML
|
||||
CF-->>Browser: HTML
|
||||
```
|
||||
|
||||
Notably, the admin pod's calls to api go **back out to Cloudflare** and
|
||||
in through the public URL. Not the in-cluster Service IP. This is
|
||||
because `NEXT_PUBLIC_API_URL=https://api.myhoneydue.com` — Next.js builds
|
||||
use the same URL for browser-side and server-side fetches.
|
||||
|
||||
This is **suboptimal** — server-side (SSR) calls could use the internal
|
||||
`api.honeydue.svc:8000` URL and skip the CF round-trip. Future
|
||||
optimization: separate `NEXT_PUBLIC_API_URL` (browser) from `API_URL`
|
||||
(server-side).
|
||||
|
||||
## Static asset flow
|
||||
|
||||
For the marketing landing page at `https://myhoneydue.com/`:
|
||||
|
||||
1. CF caches HTML per `Cache-Control` (the Go app sets short TTLs)
|
||||
2. CF caches CSS / JS / images aggressively (via default CF rules)
|
||||
3. First request hits origin, subsequent requests served from CF edge
|
||||
|
||||
The static assets live inside the api container at `/app/static/`.
|
||||
Served by Echo's static file handler at routes `/css`, `/js`, `/images`.
|
||||
|
||||
## Request flow during a rolling update
|
||||
|
||||
When a new api image is deployed, some requests will hit old pods and
|
||||
some will hit new pods for a few minutes:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant CF
|
||||
participant Traefik
|
||||
participant OldPod as api pod v1
|
||||
participant NewPod as api pod v2 (starting)
|
||||
|
||||
Note over NewPod: kubelet starts new pod
|
||||
Note over NewPod: pod connects to Postgres<br/>MigrateWithLock runs (no-op)<br/>HTTP server starts<br/>readinessProbe passes
|
||||
Note over NewPod: kube-proxy updates endpoints<br/>NewPod added to Service pool
|
||||
CF->>Traefik: request 1
|
||||
Traefik->>OldPod: routed (old pod still in pool)
|
||||
CF->>Traefik: request 2
|
||||
Traefik->>NewPod: routed (new pod now in pool)
|
||||
Note over OldPod: Kubelet terminates old pod<br/>(graceful SIGTERM, then SIGKILL after grace)
|
||||
CF->>Traefik: request 3
|
||||
Traefik->>NewPod: routed (OldPod gone from pool)
|
||||
```
|
||||
|
||||
Both old and new handle traffic simultaneously until the rolling update
|
||||
completes. As long as the new code is API-compatible, users don't
|
||||
notice.
|
||||
|
||||
## Failure modes in the data path
|
||||
|
||||
See [Chapter 16 — Failure Modes](./16-failure-modes.md) for a full
|
||||
catalog.
|
||||
|
||||
Quick summary:
|
||||
|
||||
| Layer fails | User sees | Recovery |
|
||||
|---|---|---|
|
||||
| Cloudflare DNS down | Can't resolve api.myhoneydue.com | Manual DNS fallback; extremely rare |
|
||||
| Cloudflare edge down (single POP) | Slow, CF routes to another POP | Automatic |
|
||||
| Node NIC fails | Some requests time out (CF routes away) | Cluster reschedules pods |
|
||||
| UFW misconfig blocks :80 | 521 errors at CF | Re-add rule |
|
||||
| Traefik pod down on one node | CF routes to other nodes | Automatic |
|
||||
| kube-proxy broken on one node | Pods on that node can't reach Services | Restart kubelet |
|
||||
| CoreDNS down | New connections fail DNS | Restart CoreDNS |
|
||||
| Flannel broken between nodes | Cross-node pod communication fails | Restart flannel or node |
|
||||
| api pod OOM | 502 to user briefly | kubelet restarts pod |
|
||||
| Postgres down | 500 errors from api | Neon-side issue; outage |
|
||||
| Redis down | api serves without cache (degraded) | Restart Redis pod |
|
||||
| B2 down | Uploads fail, existing content served if cached | Backblaze-side outage |
|
||||
|
||||
## References
|
||||
|
||||
- [Chapter 3 — Networking](./03-networking.md) for the overlay mechanics
|
||||
- [Chapter 6 — Traefik](./06-traefik-ingress.md) for routing details
|
||||
- [Chapter 7 — Services](./07-services.md) for per-service specifics
|
||||
- [Chapter 16 — Failure Modes](./16-failure-modes.md) for what-if scenarios
|
||||
@@ -0,0 +1,344 @@
|
||||
# 13 — Cloudflare
|
||||
|
||||
## Summary
|
||||
|
||||
Cloudflare sits in front of every public request. It provides DNS
|
||||
(authoritative nameservers for `myhoneydue.com`), TLS termination at
|
||||
the edge, DDoS mitigation, caching, and the round-robin fan-out across
|
||||
our three node IPs. We use the Free plan. TLS mode is "Flexible"
|
||||
(HTTP between CF and origin). This chapter documents every Cloudflare
|
||||
setting that matters.
|
||||
|
||||
## DNS
|
||||
|
||||
### Zone
|
||||
|
||||
`myhoneydue.com`, managed by Cloudflare. Authoritative nameservers:
|
||||
|
||||
```
|
||||
carol.ns.cloudflare.com
|
||||
ishaan.ns.cloudflare.com
|
||||
```
|
||||
|
||||
### Records that matter
|
||||
|
||||
| Type | Name | Content | Proxy | Notes |
|
||||
|---|---|---|---|---|
|
||||
| A | `api` | 178.104.247.152 | 🟠 Proxied | hetzner1 |
|
||||
| A | `api` | 178.105.32.198 | 🟠 Proxied | hetzner2 |
|
||||
| A | `api` | 178.104.249.189 | 🟠 Proxied | hetzner3 |
|
||||
| A | `admin` | 178.104.247.152 | 🟠 Proxied | same 3 IPs |
|
||||
| A | `admin` | 178.105.32.198 | 🟠 Proxied | |
|
||||
| A | `admin` | 178.104.249.189 | 🟠 Proxied | |
|
||||
| A | `@` | 178.104.247.152 | 🟠 Proxied | same 3 IPs |
|
||||
| A | `@` | 178.105.32.198 | 🟠 Proxied | |
|
||||
| A | `@` | 178.104.249.189 | 🟠 Proxied | |
|
||||
|
||||
Three A records per name → Cloudflare selects one per request. With
|
||||
proxying on (orange cloud), **the client never sees these IPs** — it
|
||||
sees a Cloudflare edge IP. CF internally picks which of the three
|
||||
origin IPs to connect to; if one fails the connection, CF retries the
|
||||
next.
|
||||
|
||||
**TXT records for email** (Fastmail sending domain): SPF, DKIM, DMARC.
|
||||
Not our immediate concern; configured by the Fastmail custom-domain
|
||||
setup.
|
||||
|
||||
### Why three A records per name, not one
|
||||
|
||||
With one record pointing at hetzner1:
|
||||
- Only hetzner1 sees traffic
|
||||
- If hetzner1 is unreachable, everything breaks until we change DNS
|
||||
|
||||
With three records:
|
||||
- CF chooses one origin per connection
|
||||
- If one node's port :80 stops responding, CF tries the others
|
||||
- Node upgrades can be done one at a time with no user impact
|
||||
|
||||
This is poor-man's load balancing. A Hetzner Load Balancer or Cloudflare
|
||||
Load Balancer (paid) would be more sophisticated — with active health
|
||||
checks and automatic failover on sub-second latency. Our DNS approach
|
||||
is "good enough" for the traffic volume.
|
||||
|
||||
### Cloudflare's origin health checks
|
||||
|
||||
On Free plan, CF doesn't actively probe origins. It reacts to real
|
||||
connection failures: if an origin returns 5xx repeatedly or connection
|
||||
times out, CF marks it unhealthy for that edge POP for some time.
|
||||
|
||||
Upgrading to **Cloudflare Load Balancing** ($5/mo add-on) would enable
|
||||
active health checks — explicit probes independent of traffic. Useful
|
||||
when you want sub-second failover.
|
||||
|
||||
## TLS
|
||||
|
||||
### Mode: Flexible
|
||||
|
||||
CF Dashboard → SSL/TLS → Overview → **Flexible**.
|
||||
|
||||
**What this means:**
|
||||
- User ↔ Cloudflare: **TLS** (HTTPS)
|
||||
- Cloudflare ↔ Origin: **plaintext HTTP** (port 80)
|
||||
|
||||
**Why we chose it:**
|
||||
- No origin cert required on the Hetzner nodes
|
||||
- Zero Traefik cert-management complexity
|
||||
- Fine for a site where CF terminates all user-facing TLS
|
||||
|
||||
**Downsides:**
|
||||
- An attacker with network access between CF and Hetzner could read
|
||||
traffic. Realistically: nobody between CF's POPs and Hetzner's
|
||||
Nuremberg DC, but it's theoretically plaintext on the wire.
|
||||
- MitM risk if DNS gets hijacked and traffic is routed through an
|
||||
unintended origin.
|
||||
|
||||
### Future: Full (strict)
|
||||
|
||||
The next step up is **Full (strict)**: CF verifies origin's TLS cert
|
||||
and connects over HTTPS. Cloudflare provides free **Origin CA
|
||||
certificates** for this: they're issued by a CF-internal CA that only
|
||||
CF's own edge accepts. An attacker without a CF-signed cert can't
|
||||
impersonate our origin.
|
||||
|
||||
Path to enable:
|
||||
1. Generate Origin CA cert in CF dashboard → SSL/TLS → Origin Server
|
||||
2. Download as PEM
|
||||
3. Create k8s Secret `cloudflare-origin-cert`:
|
||||
```bash
|
||||
kubectl create secret tls cloudflare-origin-cert -n honeydue \
|
||||
--cert=origin.crt --key=origin.key
|
||||
```
|
||||
4. Add `tls:` block to our Ingress:
|
||||
```yaml
|
||||
spec:
|
||||
tls:
|
||||
- hosts: [api.myhoneydue.com]
|
||||
secretName: cloudflare-origin-cert
|
||||
```
|
||||
5. Switch CF SSL mode to Full (strict)
|
||||
|
||||
Trad-off: the `cloudflare-origin-cert` expires (default 15 years), so
|
||||
low maintenance. **TODO** (Chapter 20).
|
||||
|
||||
### Edge certificate
|
||||
|
||||
CF provides a free edge certificate for `*.myhoneydue.com` and
|
||||
`myhoneydue.com`. Auto-renewed by Cloudflare. We don't touch it.
|
||||
|
||||
### Always Use HTTPS
|
||||
|
||||
SSL/TLS → Edge Certificates → **Always Use HTTPS: On** (default).
|
||||
|
||||
Redirects any HTTP → HTTPS at the CF edge. Clients that hit
|
||||
`http://api.myhoneydue.com/*` get 301'd to `https://...`. Origin never
|
||||
sees the HTTP request.
|
||||
|
||||
### HSTS
|
||||
|
||||
**Not currently enabled.** HSTS (HTTP Strict Transport Security) sends
|
||||
a header telling browsers "always use HTTPS for this domain." Once set
|
||||
with long `max-age`, it's **permanent** until it expires — if we later
|
||||
misconfigure TLS, HSTS-enabled browsers refuse to connect at all.
|
||||
|
||||
Enabling HSTS is a TODO but requires confidence in our TLS stability.
|
||||
Not tonight.
|
||||
|
||||
## DDoS mitigation
|
||||
|
||||
CF's Free plan includes basic DDoS protection:
|
||||
- Volumetric attacks absorbed at the edge
|
||||
- Obvious bot patterns blocked (known-bad user agents, headless browsers
|
||||
doing suspicious things)
|
||||
|
||||
Under a large attack, CF might:
|
||||
- Insert a "checking your browser" JavaScript challenge (the ~5-second
|
||||
"Cloudflare is checking your browser" page)
|
||||
- Rate-limit by IP
|
||||
|
||||
Under a sustained, sophisticated attack we might need:
|
||||
- CF Pro plan ($20/mo) for more rule customization
|
||||
- Enterprise plan for negotiated protection
|
||||
- Extra measures like Cloudflare Magic Transit
|
||||
|
||||
So far, not needed.
|
||||
|
||||
## Caching
|
||||
|
||||
Default CF caching:
|
||||
- Static assets (CSS, JS, images) cached aggressively based on extension
|
||||
- HTML pages honored per `Cache-Control` headers from origin
|
||||
- JSON API responses typically not cached (no `Cache-Control: public`)
|
||||
|
||||
Our Go API doesn't set `Cache-Control: public` on any endpoint, so CF
|
||||
treats them as uncacheable. Every API call reaches origin.
|
||||
|
||||
If we wanted to cache certain endpoints (e.g., public lookup tables):
|
||||
```go
|
||||
c.Response().Header().Set("Cache-Control", "public, max-age=300")
|
||||
```
|
||||
And CF will cache for 5 minutes.
|
||||
|
||||
## Firewall rules at CF
|
||||
|
||||
CF Dashboard → Security → WAF. On Free tier:
|
||||
- Managed rules: a small free allowlist of "obvious-attack" patterns
|
||||
- Custom rules: limited (5 on Free, 20 on Pro)
|
||||
|
||||
We have **no custom rules defined** currently. The managed ruleset
|
||||
covers:
|
||||
- SQL injection attempts in query strings
|
||||
- Known-vulnerable bot User-Agents
|
||||
- XSS attempts in common parameters
|
||||
|
||||
## Rate limiting
|
||||
|
||||
CF Free: **10,000 requests per 10 minutes per IP for free rules** (we
|
||||
haven't configured any). The API itself should have rate limits for
|
||||
sensitive endpoints; we don't rely on CF for that.
|
||||
|
||||
## What CF does NOT do for us
|
||||
|
||||
- **Authenticate users** — our app does
|
||||
- **Authorize requests** — our app does
|
||||
- **Encrypt pod-to-pod traffic** — nothing Cloudflare can help with
|
||||
- **Backup origin data** — CF caches but doesn't store copies
|
||||
persistently
|
||||
|
||||
## Turnstile / bot management
|
||||
|
||||
Not enabled. If we start seeing account-creation spam, Cloudflare
|
||||
Turnstile (free) would be a good addition — a CAPTCHA replacement that
|
||||
doesn't require user interaction for most traffic.
|
||||
|
||||
## Origin IP protection
|
||||
|
||||
CF proxying (orange cloud) is the primary protection of our origin IPs.
|
||||
When proxying is on:
|
||||
- DNS queries return CF edge IPs, never origin
|
||||
- HTTP/HTTPS traffic goes through CF
|
||||
|
||||
However, our origin IPs **can leak** via:
|
||||
- Email sending (if the app ever sent email directly from the origin IP)
|
||||
— we use Fastmail so this isn't an issue
|
||||
- Outbound connections (our pods connect out to Neon, B2, Fastmail from
|
||||
the nodes' public IPs; those IPs appear in external logs)
|
||||
- Historical DNS records (services like SecurityTrails log historical
|
||||
DNS; if we ever had unproxied A records, attackers can look them up)
|
||||
|
||||
**If origin IPs leak**, attackers can bypass CF's protection by
|
||||
connecting directly to node IPs. Current mitigation:
|
||||
- UFW only allows :80/:443 from anywhere
|
||||
- Our app has no ports bound to the public IP
|
||||
|
||||
**Future** (Chapter 20): UFW rule to allow :80/:443 only from CF IP
|
||||
ranges. Prevents direct-connect bypass entirely.
|
||||
|
||||
## Cloudflare IP ranges (used in Traefik trustedIPs)
|
||||
|
||||
From [cloudflare.com/ips](https://www.cloudflare.com/ips/):
|
||||
|
||||
IPv4 ranges:
|
||||
```
|
||||
173.245.48.0/20
|
||||
103.21.244.0/22
|
||||
103.22.200.0/22
|
||||
103.31.4.0/22
|
||||
141.101.64.0/18
|
||||
108.162.192.0/18
|
||||
190.93.240.0/20
|
||||
188.114.96.0/20
|
||||
197.234.240.0/22
|
||||
198.41.128.0/17
|
||||
162.158.0.0/15
|
||||
104.16.0.0/13
|
||||
104.24.0.0/14
|
||||
172.64.0.0/13
|
||||
131.0.72.0/22
|
||||
```
|
||||
|
||||
IPv6 ranges:
|
||||
```
|
||||
2400:cb00::/32
|
||||
2606:4700::/32
|
||||
2803:f800::/32
|
||||
2405:b500::/32
|
||||
2405:8100::/32
|
||||
2a06:98c0::/29
|
||||
2c0f:f248::/32
|
||||
```
|
||||
|
||||
These are used in two places:
|
||||
1. **Traefik `forwardedHeaders.trustedIPs`** — we already have this
|
||||
configured (Chapter 6)
|
||||
2. **UFW `allow 80/tcp from <cf-range>`** — NOT configured (TODO)
|
||||
|
||||
CF occasionally adds new ranges. If a future CF range isn't in our
|
||||
list, we'd either trust unknown IPs (if lax) or reject legitimate CF
|
||||
traffic (if strict). The canonical source is the public API:
|
||||
|
||||
```bash
|
||||
curl -sS https://www.cloudflare.com/ips-v4
|
||||
curl -sS https://www.cloudflare.com/ips-v6
|
||||
```
|
||||
|
||||
## API token for programmatic changes
|
||||
|
||||
If we automate DNS changes (e.g., adding new subdomain on deploy),
|
||||
we'd need a CF API token with `Zone:DNS:Edit` scope for the
|
||||
`myhoneydue.com` zone.
|
||||
|
||||
Currently not automated; DNS is managed in the CF dashboard by hand.
|
||||
|
||||
## Cost
|
||||
|
||||
**$0/mo**. Free plan covers everything we use. Paid plans add features
|
||||
we don't need yet:
|
||||
|
||||
| Feature | Free | Pro ($20) | Business ($200) |
|
||||
|---|---|---|---|
|
||||
| DNS + proxying | ✓ | ✓ | ✓ |
|
||||
| Basic DDoS | ✓ | ✓ | ✓ |
|
||||
| SSL (edge + Flexible + Full + Full strict) | ✓ | ✓ | ✓ |
|
||||
| WAF managed rules | ✓ (limited) | ✓ (more) | ✓ (all) |
|
||||
| Custom firewall rules | 5 | 20 | 100 |
|
||||
| Page Rules | 3 | 20 | 50 |
|
||||
| Image Resizing | no | no | ✓ |
|
||||
| Load Balancing | no | $5/mo add-on | ✓ |
|
||||
|
||||
We'd consider Pro ($20/mo) if:
|
||||
- We needed a custom WAF rule beyond the 5-rule limit
|
||||
- We wanted Image Resizing for user-uploaded photos
|
||||
|
||||
Neither is needed today.
|
||||
|
||||
## Operator cheat sheet
|
||||
|
||||
```bash
|
||||
# Query current CF-served DNS
|
||||
dig +short @1.1.1.1 api.myhoneydue.com # returns CF edge IPs when proxied
|
||||
|
||||
# Query our origin directly (bypass CF)
|
||||
curl -sS -H "Host: api.myhoneydue.com" http://178.104.247.152/api/health/
|
||||
|
||||
# Check CF headers (confirm you're going through CF)
|
||||
curl -sS -I https://api.myhoneydue.com/api/health/ | grep -i cf-
|
||||
|
||||
# Purge CF cache (requires API token)
|
||||
curl -X POST \
|
||||
-H "Authorization: Bearer $CF_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
"https://api.cloudflare.com/client/v4/zones/<zone_id>/purge_cache" \
|
||||
-d '{"purge_everything":true}'
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Cloudflare IP ranges][cf-ips]
|
||||
- [Cloudflare SSL modes explained][cf-ssl]
|
||||
- [Origin CA certificates][cf-origin-ca]
|
||||
- [Cloudflare DNS best practices][cf-dns]
|
||||
|
||||
[cf-ips]: https://www.cloudflare.com/ips/
|
||||
[cf-ssl]: https://developers.cloudflare.com/ssl/origin-configuration/ssl-modes/
|
||||
[cf-origin-ca]: https://developers.cloudflare.com/ssl/origin-configuration/origin-ca/
|
||||
[cf-dns]: https://developers.cloudflare.com/dns/
|
||||
@@ -0,0 +1,433 @@
|
||||
# 14 — Deployment Process
|
||||
|
||||
## Summary
|
||||
|
||||
A production deploy is: build a new image, push to Gitea, update the
|
||||
Deployment's image field with the new SHA, Kubernetes rolls new pods in.
|
||||
No downtime if the change is backward-compatible. Rollback is
|
||||
`kubectl rollout undo`. This chapter walks through the full process,
|
||||
plus alternate paths (config-only changes, manifest changes, hotfixes).
|
||||
|
||||
## TL;DR for a code change
|
||||
|
||||
```bash
|
||||
# 1. Commit + get SHA
|
||||
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
|
||||
git add . && git commit -m "..." && SHA=$(git rev-parse --short HEAD)
|
||||
|
||||
# 2. Login to Gitea registry
|
||||
set -a; source deploy/registry.env; set +a
|
||||
printf '%s' "$REGISTRY_TOKEN" | docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin
|
||||
|
||||
# 3. Build + push amd64 image
|
||||
docker buildx build --platform linux/amd64 --target api \
|
||||
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .
|
||||
|
||||
# 4. Roll it in
|
||||
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
|
||||
kubectl set image deployment/api -n honeydue \
|
||||
api="gitea.treytartt.com/admin/honeydue-api:${SHA}"
|
||||
|
||||
# 5. Watch
|
||||
kubectl rollout status -n honeydue deployment/api
|
||||
|
||||
# 6. Log out
|
||||
docker logout "$REGISTRY"
|
||||
```
|
||||
|
||||
~3–5 minutes end to end for api.
|
||||
|
||||
## The build
|
||||
|
||||
### Step 1 — Prepare
|
||||
|
||||
```bash
|
||||
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
|
||||
git status # clean working tree?
|
||||
git log -1 --oneline # this is the SHA that'll ship
|
||||
```
|
||||
|
||||
### Step 2 — Login to Gitea
|
||||
|
||||
```bash
|
||||
set -a; source deploy/registry.env; set +a
|
||||
printf '%s' "$REGISTRY_TOKEN" | \
|
||||
docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin
|
||||
```
|
||||
|
||||
**Note**: `docker login` without `--password-stdin` writes the token to
|
||||
shell history. Don't skip the `printf` trick.
|
||||
|
||||
### Step 3 — Build + push
|
||||
|
||||
```bash
|
||||
SHA=$(git rev-parse --short HEAD)
|
||||
|
||||
# For API
|
||||
docker buildx build \
|
||||
--platform linux/amd64 \
|
||||
--target api \
|
||||
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" \
|
||||
--push .
|
||||
|
||||
# For Worker
|
||||
docker buildx build \
|
||||
--platform linux/amd64 \
|
||||
--target worker \
|
||||
-t "gitea.treytartt.com/admin/honeydue-worker:${SHA}" \
|
||||
--push .
|
||||
|
||||
# For Admin (Next.js)
|
||||
docker buildx build \
|
||||
--platform linux/amd64 \
|
||||
--target admin \
|
||||
-t "gitea.treytartt.com/admin/honeydue-admin:${SHA}" \
|
||||
--push .
|
||||
```
|
||||
|
||||
- `--platform linux/amd64` — cross-compile from operator's arm64 to
|
||||
Hetzner nodes' amd64
|
||||
- `--target X` — select a stage from the multi-stage Dockerfile
|
||||
- `--push` — push to registry in one step; don't leave image in local
|
||||
Docker
|
||||
|
||||
First build is slow (~3–5 min cold). Subsequent builds hit BuildKit
|
||||
layer cache and complete in ~30–60s if only app code changed.
|
||||
|
||||
### Build platform note
|
||||
|
||||
If `docker buildx` isn't configured:
|
||||
|
||||
```bash
|
||||
docker buildx create --name honeydue-builder --use
|
||||
docker buildx inspect --bootstrap
|
||||
```
|
||||
|
||||
This creates a BuildKit container that supports cross-platform builds.
|
||||
The `--bootstrap` line spins it up immediately so errors surface now
|
||||
instead of on first build.
|
||||
|
||||
## The deploy
|
||||
|
||||
### For a single service
|
||||
|
||||
```bash
|
||||
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
|
||||
|
||||
kubectl set image deployment/api -n honeydue \
|
||||
api="gitea.treytartt.com/admin/honeydue-api:${SHA}"
|
||||
```
|
||||
|
||||
This updates the Deployment's image field. Kubernetes:
|
||||
1. Creates a new ReplicaSet with the new image (annotation records
|
||||
rev)
|
||||
2. Starts a new pod (per `maxSurge: 1`)
|
||||
3. Waits for readinessProbe to pass on the new pod (up to 240s for
|
||||
cold api boot)
|
||||
4. Once ready, removes a pod from the old ReplicaSet
|
||||
5. Repeats until all pods are on the new ReplicaSet
|
||||
6. Marks rollout complete
|
||||
|
||||
### Watching the rollout
|
||||
|
||||
```bash
|
||||
kubectl rollout status -n honeydue deployment/api
|
||||
```
|
||||
|
||||
Outputs progress; returns when complete or timed out. Default timeout
|
||||
is 10 minutes.
|
||||
|
||||
More detailed:
|
||||
|
||||
```bash
|
||||
# Watch pods transition
|
||||
kubectl get pods -n honeydue -l app.kubernetes.io/name=api -w
|
||||
|
||||
# Watch events
|
||||
kubectl get events -n honeydue --sort-by=.lastTimestamp -w
|
||||
```
|
||||
|
||||
### For all three services
|
||||
|
||||
```bash
|
||||
for svc in api worker admin; do
|
||||
kubectl set image deployment/$svc -n honeydue \
|
||||
$svc="gitea.treytartt.com/admin/honeydue-${svc}:${SHA}"
|
||||
done
|
||||
|
||||
# Watch all rollouts
|
||||
for svc in api worker admin; do
|
||||
kubectl rollout status -n honeydue deployment/$svc
|
||||
done
|
||||
```
|
||||
|
||||
## Config-only changes (no new image)
|
||||
|
||||
When you change `prod.env` but code is unchanged:
|
||||
|
||||
```bash
|
||||
# 1. Update prod.env locally
|
||||
# 2. Regenerate ConfigMap
|
||||
kubectl create configmap honeydue-config -n honeydue \
|
||||
--from-env-file=deploy/prod.env \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
# 3. Pods do NOT auto-reload env vars. Restart them.
|
||||
kubectl rollout restart -n honeydue deployment/api deployment/admin deployment/worker
|
||||
```
|
||||
|
||||
`rollout restart` triggers a rolling update with the *same* image but
|
||||
forces pod recreation. New pods pick up the updated ConfigMap.
|
||||
|
||||
### Why not auto-reload?
|
||||
|
||||
Kubernetes has no built-in mechanism to restart pods on ConfigMap change.
|
||||
There's no `envFromWatch` equivalent. Third-party operators like
|
||||
Reloader can do it, but we don't run one.
|
||||
|
||||
For sensitive config (like the `SECRET_KEY`), this is actually good —
|
||||
pods don't cycle unexpectedly when someone tweaks the ConfigMap.
|
||||
|
||||
## Secret changes
|
||||
|
||||
Same flow as config:
|
||||
|
||||
```bash
|
||||
# Rotate a value
|
||||
kubectl patch secret honeydue-secrets -n honeydue \
|
||||
--type=merge -p "{\"data\":{\"SECRET_KEY\":\"$(echo -n 'newvalue' | base64)\"}}"
|
||||
|
||||
# Restart pods
|
||||
kubectl rollout restart -n honeydue deployment/api deployment/worker
|
||||
```
|
||||
|
||||
## Manifest changes
|
||||
|
||||
When you add/modify a deployment YAML:
|
||||
|
||||
```bash
|
||||
kubectl apply -f deploy-k3s/manifests/api/deployment.yaml
|
||||
```
|
||||
|
||||
If the change is a spec field that Kubernetes considers a new pod
|
||||
template (e.g., changing resource limits, env, volumes), pods roll.
|
||||
If the change is a scalar like replicas, no pod churn — just new pods
|
||||
added/removed.
|
||||
|
||||
## Rollback
|
||||
|
||||
### Last-known-good rollback
|
||||
|
||||
```bash
|
||||
kubectl rollout undo deployment/api -n honeydue
|
||||
```
|
||||
|
||||
Reverts to the previous ReplicaSet (the one with the previous image).
|
||||
Takes ~30s to stabilize.
|
||||
|
||||
### Rollback to a specific revision
|
||||
|
||||
```bash
|
||||
# See revision history
|
||||
kubectl rollout history deployment/api -n honeydue
|
||||
|
||||
# Revert to specific revision number
|
||||
kubectl rollout undo deployment/api -n honeydue --to-revision=3
|
||||
```
|
||||
|
||||
Kubernetes keeps up to 10 ReplicaSet revisions by default
|
||||
(`spec.revisionHistoryLimit`).
|
||||
|
||||
### Hard rollback (deploy an older image)
|
||||
|
||||
```bash
|
||||
kubectl set image deployment/api -n honeydue \
|
||||
api="gitea.treytartt.com/admin/honeydue-api:<older-sha>"
|
||||
```
|
||||
|
||||
Useful when you want to go back further than the revision history, or
|
||||
to a specific known-good SHA.
|
||||
|
||||
## Rolling update semantics
|
||||
|
||||
```yaml
|
||||
strategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxUnavailable: 0
|
||||
maxSurge: 1
|
||||
```
|
||||
|
||||
For api (3 replicas):
|
||||
- `maxUnavailable: 0` — no pod is removed until replacement is ready
|
||||
- `maxSurge: 1` — up to 4 pods exist simultaneously during rollout
|
||||
|
||||
Timeline (approximate, warm state):
|
||||
- t=0: kubectl set image
|
||||
- t=0: k8s creates new RS with 1 pod
|
||||
- t=30s (or so): new pod readiness probe passes
|
||||
- t=30s: k8s terminates 1 old pod
|
||||
- t=60s: next new pod ready
|
||||
- t=60s: another old pod terminates
|
||||
- ...continues until all on new RS
|
||||
|
||||
For cold-boot (e.g., first deploy on a rebuilt cluster), the
|
||||
MigrateWithLock advisory lock extends this to several minutes. But the
|
||||
rollout is serialized — only one pod starts per iteration, so the lock
|
||||
queue is small.
|
||||
|
||||
## Hotfix workflow
|
||||
|
||||
When we need to ship a fix fast and skip the usual steps:
|
||||
|
||||
1. Fix in code
|
||||
2. Build + push
|
||||
3. `kubectl set image` on the affected service only
|
||||
4. Monitor with `kubectl logs -f`
|
||||
|
||||
Don't skip CI/tests in a real org; for solo operator this is the tradeoff.
|
||||
|
||||
## Integration with Gitea
|
||||
|
||||
Currently no CI/CD. The operator builds from the workstation and pushes
|
||||
manually. Future:
|
||||
|
||||
- Gitea Actions (Drone-like CI) could trigger on push to `main`
|
||||
- Build + push step could run in a GitHub Actions-compatible workflow
|
||||
- Auto-deploy on tag push, manual promote to prod
|
||||
|
||||
**TODO** (Chapter 20).
|
||||
|
||||
## What the old Swarm deploy script did
|
||||
|
||||
Contrast: `deploy/scripts/deploy_prod.sh` (Swarm-era) did:
|
||||
|
||||
1. Validate every config file (placeholder detection, APNS key format,
|
||||
B2 all-or-none)
|
||||
2. Buildx to amd64
|
||||
3. Push to Gitea (we retrofitted this from GHCR)
|
||||
4. SCP bundle to manager node
|
||||
5. `docker secret create` + `docker config create` with versioned names
|
||||
6. `docker stack deploy --with-registry-auth`
|
||||
7. Poll stack services until convergence (420s timeout)
|
||||
8. Prune old secret/config versions
|
||||
9. Healthcheck the final URL; auto-rollback on failure
|
||||
10. Log out of registries
|
||||
|
||||
Our current k3s deploy is more manual but simpler. We'd write a similar
|
||||
script for k3s if deploys become frequent:
|
||||
|
||||
```bash
|
||||
# deploy-k3s/scripts/04-deploy.sh (not yet updated for Gitea)
|
||||
```
|
||||
|
||||
See the scaffold in `deploy-k3s/scripts/`.
|
||||
|
||||
## Common deploy failures
|
||||
|
||||
| Symptom | Likely cause |
|
||||
|---|---|
|
||||
| `ImagePullBackOff` | Image not in registry, or pull secret expired |
|
||||
| Stuck at "Progressing" | Readiness probe not passing; check pod logs |
|
||||
| `CrashLoopBackOff` immediately | App won't start; check pod logs for panic/exit reason |
|
||||
| `CrashLoopBackOff` after migration | Cache service, Redis connection, or post-init code issue |
|
||||
| Old pods never terminate | New pods not ready; rollout doesn't progress |
|
||||
| Rollout succeeds but app is broken | Readiness probe is too lenient; passes on broken app |
|
||||
|
||||
### Debugging commands
|
||||
|
||||
```bash
|
||||
# Describe the deployment (shows events, conditions)
|
||||
kubectl describe deployment api -n honeydue
|
||||
|
||||
# Describe the latest pod
|
||||
kubectl describe pod -n honeydue -l app.kubernetes.io/name=api
|
||||
|
||||
# Logs from currently-running pods
|
||||
kubectl logs -n honeydue -l app.kubernetes.io/name=api --tail=100 --prefix
|
||||
|
||||
# Logs from the last-terminated pod
|
||||
kubectl logs -n honeydue <pod> --previous
|
||||
|
||||
# Events in the namespace (newest first)
|
||||
kubectl get events -n honeydue --sort-by=.lastTimestamp
|
||||
|
||||
# Pause a rollout (stops new pods from being created)
|
||||
kubectl rollout pause deployment/api -n honeydue
|
||||
|
||||
# Resume
|
||||
kubectl rollout resume deployment/api -n honeydue
|
||||
```
|
||||
|
||||
## Zero-downtime considerations
|
||||
|
||||
For zero-downtime deploys, the new image must be:
|
||||
|
||||
1. **Backward-compatible** with the current database schema (schema
|
||||
migrations run before new code)
|
||||
2. **Backward-compatible** with in-flight API requests (don't remove
|
||||
endpoints mid-deploy; deprecate first)
|
||||
3. **Backward-compatible** with Redis data structures (don't change
|
||||
cache key formats abruptly)
|
||||
|
||||
For breaking changes:
|
||||
1. Deploy intermediate version that handles both old and new
|
||||
2. Once rolled out everywhere, deploy breaking-change version
|
||||
3. Two deploys, same day or different days
|
||||
|
||||
We don't have this discipline yet; our API has too few clients to
|
||||
worry about. As mobile clients proliferate, this becomes more important.
|
||||
|
||||
## Blue-green / canary (not yet)
|
||||
|
||||
Kubernetes supports advanced rollout strategies:
|
||||
- **Canary**: route 5% of traffic to new version, scale up gradually
|
||||
- **Blue-green**: run new version alongside old, flip traffic all at
|
||||
once
|
||||
|
||||
These require Traefik's TraefikService CRD with weighted routing, or
|
||||
a service mesh. **TODO** if traffic scale justifies.
|
||||
|
||||
## Cleanup: the old Swarm config
|
||||
|
||||
`deploy/` directory contains the Swarm-era config. It's still there but
|
||||
unused. After we're confident in k3s (a few weeks? month?), remove it:
|
||||
|
||||
```bash
|
||||
rm -rf deploy/
|
||||
```
|
||||
|
||||
Keep the useful files in `deploy-k3s/` only.
|
||||
|
||||
## Operator cheat sheet
|
||||
|
||||
```bash
|
||||
# Full build + deploy
|
||||
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
|
||||
SHA=$(git rev-parse --short HEAD)
|
||||
set -a; source deploy/registry.env; set +a
|
||||
printf '%s' "$REGISTRY_TOKEN" | docker login "$REGISTRY" -u admin --password-stdin
|
||||
docker buildx build --platform linux/amd64 --target api -t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .
|
||||
docker buildx build --platform linux/amd64 --target worker -t "gitea.treytartt.com/admin/honeydue-worker:${SHA}" --push .
|
||||
docker buildx build --platform linux/amd64 --target admin -t "gitea.treytartt.com/admin/honeydue-admin:${SHA}" --push .
|
||||
docker logout gitea.treytartt.com
|
||||
|
||||
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
|
||||
for svc in api worker admin; do
|
||||
kubectl set image deployment/$svc -n honeydue "$svc=gitea.treytartt.com/admin/honeydue-${svc}:${SHA}"
|
||||
done
|
||||
|
||||
for svc in api worker admin; do
|
||||
kubectl rollout status -n honeydue deployment/$svc
|
||||
done
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Kubernetes Deployment rolling update][rolling]
|
||||
- [kubectl rollout][rollout]
|
||||
- [Docker buildx][buildx]
|
||||
|
||||
[rolling]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment
|
||||
[rollout]: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#rollout
|
||||
[buildx]: https://docs.docker.com/build/buildx/
|
||||
@@ -0,0 +1,305 @@
|
||||
# 15 — Observability
|
||||
|
||||
## Summary
|
||||
|
||||
We have minimal observability today: `kubectl logs`, `kubectl top`,
|
||||
Cloudflare Analytics, and the Neon dashboard. No Prometheus, no Grafana,
|
||||
no centralized log aggregator, no APM. This is adequate for the
|
||||
current traffic volume (low) but is a known gap. This chapter documents
|
||||
what we *have* and what we'd add as traffic grows.
|
||||
|
||||
## What we have
|
||||
|
||||
### 1. `kubectl logs`
|
||||
|
||||
Every container's stdout/stderr is captured by containerd and readable
|
||||
via kubectl:
|
||||
|
||||
```bash
|
||||
# Live tail from all api pods
|
||||
kubectl logs -n honeydue -l app.kubernetes.io/name=api -f --prefix
|
||||
|
||||
# Last 100 lines
|
||||
kubectl logs -n honeydue -l app.kubernetes.io/name=api --tail=100
|
||||
|
||||
# Previous pod's logs (before the most recent restart)
|
||||
kubectl logs -n honeydue <pod-name> --previous
|
||||
|
||||
# Events (not logs — k8s-level state changes)
|
||||
kubectl get events -n honeydue --sort-by=.lastTimestamp
|
||||
```
|
||||
|
||||
**Retention**: containerd rotates logs when they exceed 10 MB (default).
|
||||
Only the last ~20 MB of logs is retained per container, on-disk on the
|
||||
node. Once a pod is deleted, its logs are gone.
|
||||
|
||||
For persistent log access we'd need aggregation (see §what we'd add).
|
||||
|
||||
### 2. `kubectl top`
|
||||
|
||||
Pod and node resource usage via metrics-server:
|
||||
|
||||
```bash
|
||||
kubectl top nodes
|
||||
# NAME CPU(cores) CPU(%) MEMORY(bytes) MEMORY(%)
|
||||
# ubuntu-8gb-nbg1-1 169m 4% 748Mi 9%
|
||||
# ubuntu-8gb-nbg1-2 229m 5% 1043Mi 13%
|
||||
# ubuntu-8gb-nbg1-3 124m 3% 770Mi 9%
|
||||
|
||||
kubectl top pods -n honeydue
|
||||
```
|
||||
|
||||
**Retention**: In-memory only. Last few minutes of data. No
|
||||
historical view.
|
||||
|
||||
### 3. Cloudflare Analytics
|
||||
|
||||
CF Dashboard → Analytics & Logs. Per-zone stats:
|
||||
- Requests per second
|
||||
- Bandwidth
|
||||
- Cache hit ratio
|
||||
- Top HTTP status codes
|
||||
- Top request paths
|
||||
- Bot traffic score
|
||||
|
||||
All aggregated, no individual request traces. Good for spotting macro
|
||||
trends ("suddenly 10× more 502s today"), poor for debugging specific
|
||||
issues.
|
||||
|
||||
Free tier retention: 7 days of aggregate stats. Pro extends this.
|
||||
|
||||
### 4. Neon dashboard
|
||||
|
||||
Neon console → project → Monitoring:
|
||||
- Compute utilization (CU-hours consumed)
|
||||
- Query performance (slow queries)
|
||||
- Active connections
|
||||
- Storage usage
|
||||
|
||||
Good for "is the DB busy?" and "am I close to my free tier limit?"
|
||||
Not real-time.
|
||||
|
||||
### 5. Kubernetes events
|
||||
|
||||
`kubectl get events` shows cluster-level state changes: pod scheduling,
|
||||
failures, image pulls, probe failures. Useful for post-mortem on
|
||||
deploys.
|
||||
|
||||
Retention: events are stored in etcd but default to 1 hour.
|
||||
|
||||
## What we don't have (the gap)
|
||||
|
||||
### No log aggregation
|
||||
|
||||
Individual pod logs are on the node. For multi-pod debugging ("show me
|
||||
all api pod logs for user X") we have to:
|
||||
|
||||
```bash
|
||||
# Query all at once with stern (if installed)
|
||||
stern -n honeydue api
|
||||
|
||||
# Or for specific pod
|
||||
kubectl logs -n honeydue <pod> | grep user_id=12345
|
||||
```
|
||||
|
||||
This works but doesn't scale. Grep across 3 pods for a specific
|
||||
user_id is OK. Across 30 pods, intractable.
|
||||
|
||||
**What we'd add**: [Loki](https://grafana.com/oss/loki/) — a lightweight
|
||||
log aggregator designed for k8s. ~$0 to self-host; integrates with
|
||||
Grafana for queries. Or [Betterstack](https://betterstack.com/logs)
|
||||
($10/mo, hosted).
|
||||
|
||||
### No metrics/dashboards
|
||||
|
||||
`kubectl top` tells us "is this pod hot right now?" but not "has CPU
|
||||
been climbing over the past hour?" We'd need:
|
||||
|
||||
- **Prometheus** — scrapes metrics from kubelet and pods' `/metrics`
|
||||
endpoints, stores time series
|
||||
- **Grafana** — queries Prometheus, renders dashboards
|
||||
|
||||
K3s can install these via Helm in ~10 minutes. Adds ~500MB RAM to the
|
||||
cluster. Stability and operational load: moderate.
|
||||
|
||||
**Alternative**: [Kubernetes Dashboard](https://github.com/kubernetes/dashboard)
|
||||
bundled with k3s (disabled by default). Minimal UI over the existing
|
||||
metrics API. Cheaper than Prometheus but less queryable.
|
||||
|
||||
### No distributed tracing
|
||||
|
||||
"This request took 800ms — which hop was slow?" is currently unanswerable
|
||||
beyond "the DB query, probably." A real trace would show:
|
||||
- TLS handshake time
|
||||
- Traefik routing time
|
||||
- Go handler time
|
||||
- Postgres query time
|
||||
- Redis call time
|
||||
- Each B2 request time
|
||||
|
||||
We'd add OpenTelemetry to the Go app and export to Jaeger/Tempo. Work
|
||||
is moderate; value kicks in when we have complex request flows.
|
||||
|
||||
### No alerting
|
||||
|
||||
No PagerDuty, no Slack webhooks, no email on "api is returning 500s."
|
||||
The operator finds out when users complain.
|
||||
|
||||
Cheapest fix: [Uptime Kuma](https://github.com/louislam/uptime-kuma)
|
||||
(self-hosted) or Better Stack Uptime (free for small teams). Ping
|
||||
`https://api.myhoneydue.com/api/health/` every minute; alert if it fails.
|
||||
|
||||
### No APM (Application Performance Monitoring)
|
||||
|
||||
No request-level profiling. We can't see "which endpoint has the highest
|
||||
p99 latency?" or "which SQL query is hot this week?"
|
||||
|
||||
Options: Datadog, New Relic, Honeycomb, self-hosted Tempo+Grafana.
|
||||
All are meaningful work to set up and cost $$$.
|
||||
|
||||
## The app's logging conventions
|
||||
|
||||
The Go app uses zerolog and emits structured JSON:
|
||||
|
||||
```json
|
||||
{
|
||||
"level": "info",
|
||||
"time": "2026-04-24T05:29:40Z",
|
||||
"caller": "/app/cmd/api/main.go:189",
|
||||
"addr": ":8000",
|
||||
"message": "HTTP server listening"
|
||||
}
|
||||
```
|
||||
|
||||
Log levels: `debug`, `info`, `warn`, `error`, `fatal`. Controlled by
|
||||
`DEBUG=true|false` in ConfigMap (true sets level to debug, false sets
|
||||
level to info).
|
||||
|
||||
Every request is logged with:
|
||||
- Method, path, status code
|
||||
- Request ID (for correlating logs across pods)
|
||||
- User ID (if authenticated)
|
||||
- Latency
|
||||
|
||||
```json
|
||||
{
|
||||
"level": "info",
|
||||
"method": "GET",
|
||||
"path": "/api/tasks/",
|
||||
"status": 200,
|
||||
"latency_ms": 42,
|
||||
"user_id": 123,
|
||||
"request_id": "a6b5db35-..."
|
||||
}
|
||||
```
|
||||
|
||||
This is queryable by grep. Better with log aggregation.
|
||||
|
||||
## Health endpoints
|
||||
|
||||
Each service exposes a health endpoint:
|
||||
|
||||
| Service | Endpoint | What it checks |
|
||||
|---|---|---|
|
||||
| api | `/api/health/` | Process alive (doesn't verify DB) |
|
||||
| admin | `/` | Next.js is up |
|
||||
| worker | (none public) | Internal Asynq status |
|
||||
|
||||
Health endpoints are **shallow** — they return 200 if the process is
|
||||
running and listening. They don't try to reach Postgres/Redis/etc.
|
||||
Rationale: if Postgres is briefly down, we don't want all api pods to
|
||||
start failing liveness and cascade-restart.
|
||||
|
||||
## Dozzle (deprecated)
|
||||
|
||||
The Swarm era had [Dozzle](https://github.com/amir20/dozzle) — a
|
||||
lightweight web UI for Docker logs. Accessible via SSH tunnel to the
|
||||
manager node. Not deployed on k3s; `kubectl logs` + `stern` fills the
|
||||
niche.
|
||||
|
||||
## Kubernetes metrics the k8s API exposes
|
||||
|
||||
Even without Prometheus, these are queryable:
|
||||
|
||||
```bash
|
||||
# Resource metrics (via metrics-server)
|
||||
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
|
||||
kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/honeydue/pods
|
||||
|
||||
# Core API (k8s state)
|
||||
kubectl get --raw /api/v1/namespaces/honeydue/pods/<name>
|
||||
|
||||
# Kubelet metrics (per-node; requires tunneling)
|
||||
kubectl get --raw /api/v1/nodes/<node>/proxy/metrics
|
||||
```
|
||||
|
||||
If we ever spin up Prometheus, these are the endpoints it would scrape.
|
||||
|
||||
## Future: what to add and when
|
||||
|
||||
| Trigger | Add |
|
||||
|---|---|
|
||||
| 10k+ daily users | Loki + Grafana for logs |
|
||||
| 100+ req/s sustained | Prometheus + Grafana for metrics |
|
||||
| Performance incidents | OpenTelemetry tracing |
|
||||
| Revenue > $5k/mo | Paid monitoring (Datadog or similar) |
|
||||
| First production outage | Alerting to phone/Slack |
|
||||
|
||||
The overall philosophy: observability is an investment that compounds.
|
||||
Add it before you need it, not after. But also don't over-invest at
|
||||
idle.
|
||||
|
||||
**Next quarter**: set up Uptime Kuma + Loki at minimum.
|
||||
|
||||
## Checking what's installed
|
||||
|
||||
```bash
|
||||
# In kube-system namespace
|
||||
kubectl get pods -n kube-system
|
||||
# Should see: coredns, metrics-server, traefik, local-path-provisioner,
|
||||
# and some k3s-related helm install jobs
|
||||
|
||||
# In honeydue namespace
|
||||
kubectl get pods -n honeydue
|
||||
# api, admin, worker, redis
|
||||
|
||||
# No monitoring namespace (yet)
|
||||
kubectl get namespaces
|
||||
# default, honeydue, kube-node-lease, kube-public, kube-system
|
||||
```
|
||||
|
||||
## Operator cheat sheet
|
||||
|
||||
```bash
|
||||
# Tail all logs in the namespace
|
||||
kubectl logs -n honeydue --all-containers=true --tail=50 -l app.kubernetes.io/part-of=honeydue
|
||||
|
||||
# With stern (if installed: brew install stern)
|
||||
stern -n honeydue .
|
||||
|
||||
# Follow specific pod, including previous runs
|
||||
kubectl logs -n honeydue <pod> -f --previous=false
|
||||
|
||||
# Pod resource usage
|
||||
kubectl top pods -n honeydue --sort-by=memory
|
||||
kubectl top pods -n honeydue --sort-by=cpu
|
||||
|
||||
# Events (cluster-wide)
|
||||
kubectl get events -A --sort-by=.lastTimestamp | tail -20
|
||||
|
||||
# Full state dump for a pod (debugging)
|
||||
kubectl describe pod -n honeydue <pod> > /tmp/pod-dump.txt
|
||||
kubectl logs -n honeydue <pod> > /tmp/pod-logs.txt
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Kubernetes metrics-server][ms]
|
||||
- [K3s metrics][k3s-metrics]
|
||||
- [Loki][loki]
|
||||
- [Stern (multi-pod log tail)][stern]
|
||||
|
||||
[ms]: https://github.com/kubernetes-sigs/metrics-server
|
||||
[k3s-metrics]: https://docs.k3s.io/advanced#enabling-metrics-server
|
||||
[loki]: https://grafana.com/oss/loki/
|
||||
[stern]: https://github.com/stern/stern
|
||||
@@ -0,0 +1,360 @@
|
||||
# 16 — Failure Modes
|
||||
|
||||
## Summary
|
||||
|
||||
Every component in the system has a failure mode, a user-visible
|
||||
symptom, and a recovery story. This chapter enumerates them from the
|
||||
edge inward. Use this as a reference when debugging or when planning
|
||||
resilience improvements.
|
||||
|
||||
## Failure catalog
|
||||
|
||||
### Cloudflare-level
|
||||
|
||||
#### CF edge POP outage
|
||||
|
||||
**Symptom**: users in one geographic region see errors; other regions
|
||||
fine.
|
||||
**Recovery**: automatic — CF routes traffic to next-nearest POP.
|
||||
**Our action**: none; wait for CF.
|
||||
**Frequency**: rare, usually resolved in minutes.
|
||||
|
||||
#### CF global outage (rare but has happened)
|
||||
|
||||
**Symptom**: the whole site unreachable via CF.
|
||||
**Recovery**: manual — disable CF proxy (grey cloud DNS records), users
|
||||
hit origins directly.
|
||||
**Our action**: in Cloudflare dashboard, flip each A record's proxy off.
|
||||
Users then resolve to our node IPs directly; UFW allows :80/:443 from
|
||||
anywhere so they reach Traefik. TLS breaks (origin has no cert in SSL
|
||||
Flexible mode), but HTTP works.
|
||||
**Frequency**: extremely rare (hours-long event happens ~annually).
|
||||
|
||||
#### DNS hijacking
|
||||
|
||||
**Symptom**: users' DNS queries return attacker IPs; all traffic
|
||||
compromised.
|
||||
**Mitigation**: unlikely at CF; users who use DoH/DoT are protected.
|
||||
No mitigation at our level.
|
||||
**Recovery**: requires CF incident response.
|
||||
|
||||
### Node-level
|
||||
|
||||
#### One node's NIC fails
|
||||
|
||||
**Symptom**: Cloudflare's retry logic routes around it within seconds.
|
||||
Users see a brief spike in latency as CF learns the IP is unhealthy.
|
||||
Pods on that node get rescheduled to surviving nodes by Kubernetes
|
||||
after `node-monitor-grace-period` (40s).
|
||||
**Recovery**:
|
||||
- Automatic pod rescheduling takes ~5 min (grace period + pod eviction)
|
||||
- Dead node's Raft vote is missing; cluster stays up (2 of 3 quorum)
|
||||
- Replace the node via Hetzner console when convenient
|
||||
**Our action**: verify `kubectl get nodes` shows NotReady; check
|
||||
Hetzner console to confirm the node's status; recreate if needed.
|
||||
|
||||
#### Two nodes fail simultaneously
|
||||
|
||||
**Symptom**: Raft loses quorum. Kubernetes API server rejects writes.
|
||||
Existing pods keep running but nothing new can be scheduled/updated.
|
||||
Single surviving node's pods continue serving traffic.
|
||||
**Recovery**:
|
||||
- If a failed node comes back within Raft's leader-election timeout
|
||||
(seconds to minutes), quorum restores
|
||||
- If failed nodes are truly gone, the cluster is broken — need to
|
||||
rebuild
|
||||
**Rebuild procedure**: from the surviving node, `k3s-killall.sh`, then
|
||||
bootstrap a new 3-node cluster from scratch. Data in Neon/B2 is safe;
|
||||
Redis state is lost.
|
||||
|
||||
#### All three nodes fail simultaneously
|
||||
|
||||
**Symptom**: full site outage.
|
||||
**Recovery**: rebuild the cluster from scratch.
|
||||
**Frequency**: Hetzner-region-wide outage, extremely rare.
|
||||
|
||||
#### Node disk fills up
|
||||
|
||||
**Symptom**: pods get evicted ("node is disk-pressure"). Containers
|
||||
can't be scheduled on that node.
|
||||
**Common cause**: container log buildup (containerd rotates at 10 MB
|
||||
per container but across dozens of pod churn cycles, total fills up),
|
||||
local-path PVC fills up, apt cache.
|
||||
**Recovery**:
|
||||
```bash
|
||||
ssh deploy@<node> "sudo df -h; sudo du -sh /var/lib/rancher/* | sort -h"
|
||||
# Then clean up
|
||||
```
|
||||
|
||||
### k3s control plane failures
|
||||
|
||||
#### etcd corruption on one node
|
||||
|
||||
**Symptom**: Raft detects divergence; that node stops serving writes.
|
||||
**Recovery**: remove the node from the cluster, rejoin. Etcd snapshot
|
||||
is pulled from surviving peers automatically.
|
||||
|
||||
#### CoreDNS down
|
||||
|
||||
**Symptom**: pods can't resolve Service names. New TCP connections
|
||||
fail; existing connections continue (they already resolved).
|
||||
Typical manifestation: "DB connection failed — no such host" errors.
|
||||
**Recovery**: k3s automatically restarts CoreDNS pod. If it
|
||||
keeps crashing:
|
||||
```bash
|
||||
kubectl logs -n kube-system deploy/coredns --previous
|
||||
kubectl rollout restart deployment/coredns -n kube-system
|
||||
```
|
||||
**Frequency**: rare.
|
||||
|
||||
#### metrics-server down
|
||||
|
||||
**Symptom**: `kubectl top` returns an error; HPAs can't scale.
|
||||
**Recovery**: restart metrics-server pod. Non-critical; service stays up.
|
||||
```bash
|
||||
kubectl rollout restart deployment/metrics-server -n kube-system
|
||||
```
|
||||
|
||||
### Networking failures
|
||||
|
||||
#### UFW rule accidentally blocks essential traffic
|
||||
|
||||
**Symptom**: Some specific thing stops working (e.g., api can't reach
|
||||
Postgres, cross-node pod traffic fails, kubectl times out).
|
||||
**Recovery**: log in via SSH (if that still works), `sudo ufw status
|
||||
numbered`, `sudo ufw --force delete <N>` to remove offending rule.
|
||||
**If SSH is blocked too**: Hetzner console → Rescue mode → mount disk
|
||||
→ edit `/etc/ufw/user.rules`.
|
||||
|
||||
#### Flannel broken on one node
|
||||
|
||||
**Symptom**: pods on that node can't reach remote pods via overlay.
|
||||
ClusterIP Services involving cross-node endpoints fail.
|
||||
**Recovery**: restart kubelet on that node:
|
||||
```bash
|
||||
ssh deploy@<node> "sudo systemctl restart k3s"
|
||||
```
|
||||
|
||||
#### Kube-proxy broken on one node
|
||||
|
||||
**Symptom**: pods on that node can't reach ClusterIPs. Symptoms look
|
||||
like DNS resolution succeeded but connection refused or timed out.
|
||||
**Recovery**: same as Flannel — restart k3s on the node.
|
||||
|
||||
### Application-level
|
||||
|
||||
#### api pod OOM
|
||||
|
||||
**Symptom**: pod gets killed, kubelet restarts it. User's request
|
||||
returns 502 briefly; subsequent requests routed to healthy pods.
|
||||
Readiness probe removes the OOMing pod from Service endpoints.
|
||||
**Recovery**: automatic (pod restarts). If it keeps OOMing:
|
||||
- Increase `resources.limits.memory` in the deployment
|
||||
- Or debug the memory leak
|
||||
**Check**:
|
||||
```bash
|
||||
kubectl describe pod -n honeydue <pod> | grep -i oom
|
||||
kubectl logs -n honeydue <pod> --previous
|
||||
```
|
||||
|
||||
#### api pod panics
|
||||
|
||||
**Symptom**: goroutine panic kills the process. Kubelet restarts.
|
||||
Similar user impact to OOM.
|
||||
**Recovery**: automatic restart. But if the panic is deterministic
|
||||
(same input → panic), the pod crashloops.
|
||||
**Action**: read the logs, find the panic stack trace, fix the code,
|
||||
deploy.
|
||||
**Circuit-breaker scenario**: if all 3 api pods crashloop on startup
|
||||
because of bad code, kubectl rollout undo to previous revision.
|
||||
|
||||
#### api deadlocks
|
||||
|
||||
**Symptom**: all 3 pods are up, readiness passes (shallow probe), but
|
||||
real requests time out or hang.
|
||||
**Recovery**: liveness probe is the same endpoint as readiness, so it
|
||||
won't help. You'll see gradually increasing 504s at the edge. Manual
|
||||
intervention:
|
||||
```bash
|
||||
kubectl rollout restart deployment/api -n honeydue
|
||||
```
|
||||
|
||||
#### admin pod crashes
|
||||
|
||||
**Symptom**: 502 at Cloudflare when accessing admin.myhoneydue.com.
|
||||
**Recovery**: k8s auto-restarts. Usually within 10-30s.
|
||||
**Impact**: only admins lose access; user-facing api is unaffected.
|
||||
|
||||
#### worker stops processing jobs
|
||||
|
||||
**Symptom**: emails stop being sent, cron jobs stop firing.
|
||||
**Detection**: no direct alert; need to notice via user feedback or
|
||||
missing daily-digest emails. Or check Redis for queue backlog.
|
||||
**Recovery**:
|
||||
```bash
|
||||
kubectl rollout restart deployment/worker -n honeydue
|
||||
```
|
||||
**If persistent**: check logs for specific error:
|
||||
```bash
|
||||
kubectl logs -n honeydue deploy/worker --tail=100
|
||||
```
|
||||
|
||||
#### redis pod dies + node is different
|
||||
|
||||
**Symptom**: Redis schedules to a new node, but the PVC is on the
|
||||
original node (local-path is per-node). New Redis pod comes up but
|
||||
finds an empty data directory (or can't mount at all).
|
||||
**Recovery**:
|
||||
- If the original node is still alive but Redis pod died: pod comes
|
||||
back up on same node with data intact
|
||||
- If the original node is gone: Redis starts empty. Cache regenerates.
|
||||
Asynq queue state is lost; pending jobs re-queue on retry, cron
|
||||
fires re-schedule on next tick.
|
||||
- Ensure the node label `honeydue/redis=true` is on a healthy node:
|
||||
```bash
|
||||
kubectl label node <new-node> honeydue/redis=true --overwrite
|
||||
kubectl label node <dead-node> honeydue/redis- 2>/dev/null || true
|
||||
```
|
||||
|
||||
### External service failures
|
||||
|
||||
#### Neon Postgres outage
|
||||
|
||||
**Symptom**: api logs fill with "failed to connect to database." All
|
||||
mutating API calls fail. Reads from cache continue (via Redis) but
|
||||
eventually cache expires.
|
||||
**Recovery**: no action from us; Neon's problem. Users will see 5xx
|
||||
until Neon is back.
|
||||
**Mitigation for future**: multi-region Neon read replica, or
|
||||
Postgres-level failover.
|
||||
**Frequency**: Neon has had a handful of hours-scale outages since launch.
|
||||
|
||||
#### Backblaze B2 outage
|
||||
|
||||
**Symptom**: image uploads fail; image downloads fail unless cached by
|
||||
CF.
|
||||
**Recovery**: wait. B2 rarely goes down.
|
||||
**Mitigation**: serve downloads via CF with long cache TTL — most
|
||||
users won't notice brief B2 outages for read traffic.
|
||||
|
||||
#### Fastmail SMTP unreachable
|
||||
|
||||
**Symptom**: `worker` can't send transactional emails. Jobs retry per
|
||||
Asynq's retry policy, eventually giving up and logging an error.
|
||||
**Recovery**: automatic retry; wait for Fastmail to come back.
|
||||
**Manual intervention**: re-enqueue jobs from the Asynq UI (we don't
|
||||
expose it yet — future).
|
||||
|
||||
#### Gitea registry unreachable
|
||||
|
||||
**Symptom**: `kubectl rollout` stuck at "Pulling image" for new pods.
|
||||
Existing pods continue running with their already-pulled images.
|
||||
**Recovery**: wait for Gitea to come back.
|
||||
**Mitigation**: K8s has `imagePullPolicy: IfNotPresent` by default on
|
||||
SHA-tagged images, so images aren't re-pulled on every restart if
|
||||
the node already has them cached.
|
||||
|
||||
#### Cloudflare DNS failure
|
||||
|
||||
See §CF failures above.
|
||||
|
||||
## Combined failures
|
||||
|
||||
### "Everything is slow"
|
||||
|
||||
Most often = Neon is being hammered by our load + someone else's noisy
|
||||
neighbor.
|
||||
- Check `kubectl top pods` (are we CPU-bound?)
|
||||
- Check Neon console for query performance
|
||||
- Check CF analytics for traffic spikes
|
||||
|
||||
### "Some users see 502, others don't"
|
||||
|
||||
Usually one node has an unhealthy Traefik or api. Cloudflare routes
|
||||
some connections to it, others to healthy nodes.
|
||||
- `kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik`
|
||||
- `kubectl get pods -n honeydue -l app.kubernetes.io/name=api`
|
||||
- Check per-pod logs
|
||||
|
||||
### "It worked 5 minutes ago, now it doesn't"
|
||||
|
||||
Something recent changed. Check:
|
||||
- Recent deploys: `kubectl rollout history deployment/api -n honeydue`
|
||||
- Recent manifest changes: `kubectl get events -A --sort-by=.lastTimestamp | tail -30`
|
||||
- External: Cloudflare Status page, Neon Status page, Backblaze Status page
|
||||
|
||||
## Planned outages
|
||||
|
||||
### Node upgrades (OS patches)
|
||||
|
||||
```bash
|
||||
# Drain the node (evict pods, block scheduling)
|
||||
kubectl drain ubuntu-8gb-nbg1-1 --ignore-daemonsets --delete-emptydir-data
|
||||
|
||||
# SSH in, upgrade, reboot
|
||||
ssh deploy@hetzner2 "sudo apt update && sudo apt upgrade -y && sudo reboot"
|
||||
|
||||
# Wait for node to come back
|
||||
watch kubectl get nodes
|
||||
|
||||
# Uncordon
|
||||
kubectl uncordon ubuntu-8gb-nbg1-1
|
||||
```
|
||||
|
||||
During the drain, pods from that node reschedule to the survivors.
|
||||
With current workload (api: 3 replicas, everything else: 1), rescheduling
|
||||
1 api pod is fine. Traffic loss: zero.
|
||||
|
||||
Worker pod or Redis pod scheduled on the drained node would be
|
||||
briefly unavailable during reschedule. Acceptable for planned windows.
|
||||
|
||||
### k3s upgrades
|
||||
|
||||
Same per-node drain + upgrade pattern, but with k3s-specific install:
|
||||
|
||||
```bash
|
||||
# On the node
|
||||
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.35.x+k3s1 sh -s - server
|
||||
|
||||
# k3s detects existing install and upgrades in place
|
||||
```
|
||||
|
||||
Do one node at a time. Verify cluster health between each.
|
||||
|
||||
## Disaster recovery
|
||||
|
||||
### Complete cluster loss
|
||||
|
||||
Procedure:
|
||||
1. Provision 3 new Hetzner CX33 nodes (or use existing if healthy)
|
||||
2. Follow bootstrap procedure (Chapter 1 §node hardening)
|
||||
3. Install k3s on each (Chapter 2 §HA architecture)
|
||||
4. Configure kubeconfig
|
||||
5. Apply all manifests:
|
||||
```bash
|
||||
kubectl apply -f deploy-k3s/manifests/namespace.yaml
|
||||
kubectl apply -f deploy-k3s/manifests/rbac.yaml
|
||||
kubectl apply -f deploy-k3s/manifests/traefik-helmchartconfig.yaml
|
||||
# Wait for Traefik to redeploy
|
||||
# ... recreate secrets (see Chapter 10) ...
|
||||
# ... apply rest of manifests ...
|
||||
```
|
||||
6. Update DNS if node IPs changed
|
||||
7. Verify: curl https://api.myhoneydue.com/api/health/
|
||||
|
||||
Estimated time: **1-2 hours** if you've done it before. A lot of
|
||||
context-switching between Hetzner console, SSH, kubectl, and CF.
|
||||
|
||||
Neon data is untouched by any of this. B2 data is untouched. Only
|
||||
state that's lost: Redis cache (regenerates) and any in-flight Asynq
|
||||
jobs that were mid-processing.
|
||||
|
||||
## References
|
||||
|
||||
- [Kubernetes pod lifecycle][lifecycle]
|
||||
- [K3s HA recovery][k3s-ha-recovery]
|
||||
- [Hetzner rescue system][hetzner-rescue]
|
||||
|
||||
[lifecycle]: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/
|
||||
[k3s-ha-recovery]: https://docs.k3s.io/datastore/ha-embedded#new-cluster-with-embedded-db
|
||||
[hetzner-rescue]: https://docs.hetzner.com/cloud/servers/getting-started/enabling-rescue-system/
|
||||
@@ -0,0 +1,369 @@
|
||||
# 17 — Operator Runbook
|
||||
|
||||
## Summary
|
||||
|
||||
Common procedures the operator runs. Each is a numbered sequence of
|
||||
exact commands. If a step is unclear, add a comment; if a procedure
|
||||
fails in an unexpected way, add the symptom + fix to this document.
|
||||
|
||||
## Environment setup
|
||||
|
||||
Every command assumes:
|
||||
|
||||
```bash
|
||||
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
|
||||
cd /Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
|
||||
```
|
||||
|
||||
If you see "Unable to connect to the server," the kubeconfig isn't set.
|
||||
|
||||
## 1. Check cluster health
|
||||
|
||||
```bash
|
||||
kubectl get nodes # all 3 Ready?
|
||||
kubectl get pods -A | grep -vE 'Running|Completed' # anything not running?
|
||||
kubectl top nodes # resource usage
|
||||
kubectl get events -A --sort-by=.lastTimestamp | tail -20
|
||||
```
|
||||
|
||||
## 2. Deploy new code
|
||||
|
||||
### Full deploy (all three services)
|
||||
|
||||
```bash
|
||||
SHA=$(git rev-parse --short HEAD)
|
||||
|
||||
# Login
|
||||
set -a; source deploy/registry.env; set +a
|
||||
printf '%s' "$REGISTRY_TOKEN" | \
|
||||
docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin
|
||||
|
||||
# Build
|
||||
docker buildx build --platform linux/amd64 --target api \
|
||||
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .
|
||||
docker buildx build --platform linux/amd64 --target worker \
|
||||
-t "gitea.treytartt.com/admin/honeydue-worker:${SHA}" --push .
|
||||
docker buildx build --platform linux/amd64 --target admin \
|
||||
-t "gitea.treytartt.com/admin/honeydue-admin:${SHA}" --push .
|
||||
|
||||
# Apply
|
||||
for svc in api worker admin; do
|
||||
kubectl set image deployment/$svc -n honeydue \
|
||||
"$svc=gitea.treytartt.com/admin/honeydue-${svc}:${SHA}"
|
||||
done
|
||||
|
||||
# Watch
|
||||
for svc in api worker admin; do
|
||||
kubectl rollout status -n honeydue deployment/$svc
|
||||
done
|
||||
|
||||
# Logout
|
||||
docker logout gitea.treytartt.com
|
||||
```
|
||||
|
||||
### Single service
|
||||
|
||||
```bash
|
||||
SHA=$(git rev-parse --short HEAD)
|
||||
set -a; source deploy/registry.env; set +a
|
||||
printf '%s' "$REGISTRY_TOKEN" | docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin
|
||||
docker buildx build --platform linux/amd64 --target api \
|
||||
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .
|
||||
kubectl set image deployment/api -n honeydue \
|
||||
api="gitea.treytartt.com/admin/honeydue-api:${SHA}"
|
||||
kubectl rollout status -n honeydue deployment/api
|
||||
docker logout "$REGISTRY"
|
||||
```
|
||||
|
||||
## 3. Rollback
|
||||
|
||||
### Last good
|
||||
|
||||
```bash
|
||||
kubectl rollout undo deployment/api -n honeydue
|
||||
kubectl rollout status -n honeydue deployment/api
|
||||
```
|
||||
|
||||
### Specific SHA
|
||||
|
||||
```bash
|
||||
kubectl set image deployment/api -n honeydue \
|
||||
api="gitea.treytartt.com/admin/honeydue-api:<sha>"
|
||||
```
|
||||
|
||||
## 4. Read logs
|
||||
|
||||
```bash
|
||||
# Follow all api pod logs
|
||||
kubectl logs -n honeydue -l app.kubernetes.io/name=api -f --prefix
|
||||
|
||||
# Errors only
|
||||
kubectl logs -n honeydue -l app.kubernetes.io/name=api --tail=1000 | grep -i error
|
||||
|
||||
# Previous pod (before crash/restart)
|
||||
kubectl logs -n honeydue <pod> --previous
|
||||
```
|
||||
|
||||
## 5. Exec into a pod
|
||||
|
||||
```bash
|
||||
kubectl exec -n honeydue -it deploy/api -- /bin/sh
|
||||
# inside:
|
||||
# wget -qO- http://127.0.0.1:8000/api/health/
|
||||
# env | grep DB_
|
||||
# exit
|
||||
```
|
||||
|
||||
## 6. Rotate a secret
|
||||
|
||||
```bash
|
||||
# For honeydue-secrets keys
|
||||
kubectl patch secret honeydue-secrets -n honeydue \
|
||||
--type=merge \
|
||||
-p "{\"data\":{\"SECRET_KEY\":\"$(echo -n 'new-value' | base64)\"}}"
|
||||
|
||||
# Update local file to match (keep in sync)
|
||||
printf '%s' 'new-value' > deploy/secrets/secret_key.txt
|
||||
|
||||
# Restart pods so they pick up the new secret
|
||||
kubectl rollout restart -n honeydue deploy/api deploy/worker
|
||||
```
|
||||
|
||||
## 7. Change a ConfigMap value
|
||||
|
||||
```bash
|
||||
# Edit deploy/prod.env locally
|
||||
# Regenerate the configmap
|
||||
kubectl create configmap honeydue-config -n honeydue \
|
||||
--from-env-file=deploy/prod.env \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
# Restart to pick up
|
||||
kubectl rollout restart -n honeydue deploy/api deploy/admin deploy/worker
|
||||
```
|
||||
|
||||
## 8. Scale a service
|
||||
|
||||
```bash
|
||||
kubectl scale deployment/api -n honeydue --replicas=5
|
||||
# Then wait
|
||||
kubectl rollout status -n honeydue deployment/api
|
||||
```
|
||||
|
||||
**DO NOT** scale worker above 1 until Asynq PeriodicTaskManager is wired.
|
||||
|
||||
## 9. Drain a node for maintenance
|
||||
|
||||
```bash
|
||||
# Prevent new pods, evict existing
|
||||
kubectl drain <node-hostname> --ignore-daemonsets --delete-emptydir-data
|
||||
|
||||
# Do maintenance (apt upgrade, reboot, etc.)
|
||||
ssh deploy@<node> "sudo apt update && sudo apt upgrade -y && sudo reboot"
|
||||
|
||||
# Wait for node to come back
|
||||
watch kubectl get nodes
|
||||
|
||||
# Allow scheduling again
|
||||
kubectl uncordon <node-hostname>
|
||||
```
|
||||
|
||||
Node hostnames (not SSH aliases!):
|
||||
- `ubuntu-8gb-nbg1-1` (hetzner2)
|
||||
- `ubuntu-8gb-nbg1-2` (hetzner1)
|
||||
- `ubuntu-8gb-nbg1-3` (hetzner3)
|
||||
|
||||
## 10. Add a new node
|
||||
|
||||
```bash
|
||||
# 1. Provision CX33 in Hetzner console
|
||||
# 2. SSH in as root, create deploy user + key
|
||||
# 3. Install k3s as agent (or server)
|
||||
NODE_TOKEN=$(ssh -i ~/.ssh/hetzner deploy@hetzner1 'sudo cat /var/lib/rancher/k3s/server/node-token')
|
||||
ssh -i ~/.ssh/hetzner root@<new-node-ip> "curl -sfL https://get.k3s.io | K3S_TOKEN=\"$NODE_TOKEN\" INSTALL_K3S_EXEC=\"server --server=https://178.104.247.152:6443 --disable=servicelb --write-kubeconfig-mode=644\" sh -"
|
||||
|
||||
# 4. Add UFW rules for inter-node traffic
|
||||
# (see deploy-k3s/scripts/ for the script)
|
||||
|
||||
# 5. Verify
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
## 11. Remove a node
|
||||
|
||||
```bash
|
||||
# Drain first
|
||||
kubectl drain <hostname> --ignore-daemonsets --delete-emptydir-data
|
||||
|
||||
# Tell k3s to leave
|
||||
ssh -i ~/.ssh/hetzner deploy@<node-alias> "sudo systemctl stop k3s && sudo /usr/local/bin/k3s-uninstall.sh"
|
||||
|
||||
# Remove from cluster
|
||||
kubectl delete node <hostname>
|
||||
```
|
||||
|
||||
## 12. Force-restart all pods
|
||||
|
||||
```bash
|
||||
kubectl rollout restart -n honeydue deploy/api deploy/admin deploy/worker deploy/redis
|
||||
```
|
||||
|
||||
Use sparingly. Causes brief downtime per pod.
|
||||
|
||||
## 13. Migrate to a new Neon DB
|
||||
|
||||
```bash
|
||||
# 1. Point a new branch or project on Neon
|
||||
# 2. Update prod.env with new DB_HOST
|
||||
# 3. Apply new ConfigMap
|
||||
kubectl create configmap honeydue-config -n honeydue \
|
||||
--from-env-file=deploy/prod.env \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
# 4. Rolling restart
|
||||
kubectl rollout restart -n honeydue deploy/api deploy/worker
|
||||
```
|
||||
|
||||
## 14. Rotate Gitea registry PAT
|
||||
|
||||
```bash
|
||||
# 1. Create new PAT in Gitea UI
|
||||
# 2. Update deploy/registry.env locally
|
||||
# 3. Update in-cluster Secret
|
||||
kubectl create secret docker-registry gitea-credentials -n honeydue \
|
||||
--docker-server=gitea.treytartt.com \
|
||||
--docker-username=admin \
|
||||
--docker-password=<new-pat> \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
# 4. Delete old PAT from Gitea UI
|
||||
|
||||
# 5. Pods don't re-auth with existing images (already pulled), but
|
||||
# new pulls will use new PAT. Test by rolling a pod:
|
||||
kubectl rollout restart -n honeydue deployment/api
|
||||
```
|
||||
|
||||
## 15. Clean up old images in Gitea
|
||||
|
||||
Manual, via Gitea UI:
|
||||
https://gitea.treytartt.com/admin/-/packages
|
||||
|
||||
Keep ~last 30 tags per image; delete older.
|
||||
|
||||
Or via API:
|
||||
```bash
|
||||
GITEA_PAT="$(grep REGISTRY_TOKEN deploy/registry.env | cut -d= -f2)"
|
||||
# List tags
|
||||
curl -sS -H "Authorization: token $GITEA_PAT" \
|
||||
"https://gitea.treytartt.com/api/v1/packages/admin/container/honeydue-api/versions" | jq .
|
||||
# Delete specific tag
|
||||
curl -X DELETE -H "Authorization: token $GITEA_PAT" \
|
||||
"https://gitea.treytartt.com/api/v1/packages/admin/container/honeydue-api/<tag>"
|
||||
```
|
||||
|
||||
## 16. Recreate the cluster from scratch
|
||||
|
||||
See [Chapter 16 §Disaster recovery](./16-failure-modes.md#disaster-recovery).
|
||||
|
||||
## 17. Connect to Neon directly
|
||||
|
||||
```bash
|
||||
# Get password
|
||||
PW=$(cat deploy/secrets/postgres_password.txt)
|
||||
|
||||
# Connect
|
||||
PGPASSWORD="$PW" psql \
|
||||
-h ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech \
|
||||
-U neondb_owner \
|
||||
-d honeyDue
|
||||
```
|
||||
|
||||
## 18. Check admin user credentials
|
||||
|
||||
```bash
|
||||
# ADMIN_EMAIL is in the honeydue-secrets Secret
|
||||
kubectl get secret honeydue-secrets -n honeydue \
|
||||
-o jsonpath='{.data.ADMIN_EMAIL}' | base64 -d
|
||||
|
||||
# ADMIN_PASSWORD (ONLY VALID FOR FIRST DEPLOY; may have been changed in UI)
|
||||
kubectl get secret honeydue-secrets -n honeydue \
|
||||
-o jsonpath='{.data.ADMIN_PASSWORD}' | base64 -d
|
||||
```
|
||||
|
||||
If you need to reset admin password because nobody remembers it:
|
||||
|
||||
```bash
|
||||
# Generate a new bcrypt hash
|
||||
NEW_PASSWORD='newpassword'
|
||||
HASH=$(htpasswd -bnBC 10 "" "$NEW_PASSWORD" | tr -d ':\n')
|
||||
|
||||
# Update directly in Postgres
|
||||
PGPASSWORD="$(cat deploy/secrets/postgres_password.txt)" psql \
|
||||
-h ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech \
|
||||
-U neondb_owner -d honeyDue \
|
||||
-c "UPDATE admin_users SET password='$HASH' WHERE email='admin@myhoneydue.com'"
|
||||
```
|
||||
|
||||
## 19. Trigger a Helm chart re-run (Traefik etc.)
|
||||
|
||||
If the Traefik HelmChartConfig was updated but chart didn't reconcile:
|
||||
|
||||
```bash
|
||||
kubectl delete job -n kube-system helm-install-traefik
|
||||
# Helm operator re-runs automatically within ~30 seconds
|
||||
kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik -w
|
||||
```
|
||||
|
||||
## 20. Smoke test after any change
|
||||
|
||||
```bash
|
||||
# Through Cloudflare
|
||||
for url in "https://api.myhoneydue.com/api/health/" \
|
||||
"https://admin.myhoneydue.com/" \
|
||||
"https://myhoneydue.com/"; do
|
||||
ok=0
|
||||
for i in $(seq 1 20); do
|
||||
[[ "$(curl -sS -o /dev/null -w '%{http_code}' --max-time 10 "$url")" == "200" ]] && ok=$((ok+1))
|
||||
done
|
||||
printf "%-45s %d/20 ok\n" "$url" "$ok"
|
||||
done
|
||||
```
|
||||
|
||||
Expect 20/20 on all three.
|
||||
|
||||
## 21. Kill everything (emergency rollback)
|
||||
|
||||
If the cluster is so broken you need to reset the app layer:
|
||||
|
||||
```bash
|
||||
# Scale everything to 0
|
||||
kubectl scale -n honeydue deploy/api deploy/admin deploy/worker deploy/redis --replicas=0
|
||||
|
||||
# When ready, scale back up
|
||||
kubectl scale -n honeydue deploy/api --replicas=3
|
||||
kubectl scale -n honeydue deploy/admin deploy/worker deploy/redis --replicas=1
|
||||
```
|
||||
|
||||
During the scale-down, CF returns errors to users because no pod is
|
||||
serving. The rolling update for scale-up takes ~5 min.
|
||||
|
||||
## 22. Find which pod a user's request hit
|
||||
|
||||
Not directly supported (we don't log node/pod name in requests). When
|
||||
we add request logging that includes these, a grep through logs works.
|
||||
|
||||
Workaround: in each pod's logs, search for a unique user identifier:
|
||||
|
||||
```bash
|
||||
stern -n honeydue api | grep "user_id=12345"
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [kubectl cheat sheet][kubectl-cs]
|
||||
- [K3s docs][k3s-docs]
|
||||
- [Neon connect][neon-connect]
|
||||
|
||||
[kubectl-cs]: https://kubernetes.io/docs/reference/kubectl/cheatsheet/
|
||||
[k3s-docs]: https://docs.k3s.io/
|
||||
[neon-connect]: https://neon.com/docs/connect/connect-from-any-app
|
||||
@@ -0,0 +1,243 @@
|
||||
# 18 — Cost
|
||||
|
||||
## Summary
|
||||
|
||||
Current monthly infrastructure cost is ~$30-40. External SaaS (Fastmail,
|
||||
Apple Developer, Google Play) adds ~$8-17/mo depending on push-enable
|
||||
status. This chapter itemizes every line, projects costs at scale
|
||||
(10k, 100k, 1M users), and shows what dials to turn when we need to
|
||||
save or spend.
|
||||
|
||||
## Current monthly cost
|
||||
|
||||
### Compute (Hetzner)
|
||||
|
||||
| Item | Unit cost | Count | Monthly |
|
||||
|---|---:|---|---:|
|
||||
| CX33 (4 vCPU, 8 GB RAM, 80 GB SSD) | $7.99 | 3 | **$23.97** |
|
||||
| Traffic | $0 (20 TB/mo included per node, well below) | — | $0 |
|
||||
| Hetzner Cloud Firewall | $0 | — | $0 |
|
||||
| IPv4 public address | $0 (included) | 3 | $0 |
|
||||
| **Subtotal** | | | **$23.97** |
|
||||
|
||||
### Database (Neon)
|
||||
|
||||
Neon Launch plan: $0.106/CU-hour + $0.35/GB-month storage, $5 minimum.
|
||||
|
||||
At current usage (low traffic, small schema):
|
||||
- ~10 CU-hours/month × $0.106 ≈ $1
|
||||
- ~1 GB storage × $0.35 ≈ $0.35
|
||||
- Hits the $5 minimum
|
||||
|
||||
| Item | Monthly |
|
||||
|---|---:|
|
||||
| Neon Launch ($5 min + usage) | **~$5** |
|
||||
|
||||
### Object storage (Backblaze B2)
|
||||
|
||||
At current usage (~50 GB stored):
|
||||
|
||||
| Item | Monthly |
|
||||
|---|---:|
|
||||
| Storage ($0.006/GB × 50 GB) | $0.30 |
|
||||
| Egress (effectively $0 — mostly served through CF) | $0 |
|
||||
| **Subtotal** | **~$0.30** |
|
||||
|
||||
### Edge (Cloudflare)
|
||||
|
||||
| Item | Monthly |
|
||||
|---|---:|
|
||||
| Cloudflare Free plan (DNS, TLS, CDN, basic DDoS) | **$0** |
|
||||
|
||||
### Registry (Gitea)
|
||||
|
||||
Self-hosted on the operator's existing Gitea VPS. Not charged to
|
||||
honeyDue.
|
||||
|
||||
| Item | Monthly |
|
||||
|---|---:|
|
||||
| Gitea container registry | **$0** |
|
||||
|
||||
### Total infrastructure
|
||||
|
||||
| Category | Monthly |
|
||||
|---|---:|
|
||||
| Compute | $23.97 |
|
||||
| Database | ~$5 |
|
||||
| Storage | ~$0.30 |
|
||||
| Edge | $0 |
|
||||
| Registry | $0 |
|
||||
| **Total** | **~$30** |
|
||||
|
||||
## External SaaS
|
||||
|
||||
Things not part of the deploy but required for the product:
|
||||
|
||||
| Item | Cost | Notes |
|
||||
|---|---:|---|
|
||||
| Fastmail (SMTP for transactional email) | Part of operator's existing plan | — |
|
||||
| Apple Developer Program | $99/year = $8.25/mo | Required for iOS app + APNs |
|
||||
| Google Play Developer | $25 one-time + $0/mo ongoing | — |
|
||||
| Hetzner Cloud Firewall | $0 | Free; we use UFW instead |
|
||||
|
||||
At push-enabled state, total monthly run rate is **~$38-42**.
|
||||
|
||||
## Hidden / untracked costs
|
||||
|
||||
- **Operator time**: The biggest cost for a bootstrapped project.
|
||||
Treating ops time at $100/hr, a 4-hour incident = $400.
|
||||
- **Electricity for operator workstation during builds**: trivial.
|
||||
- **Domain registration (myhoneydue.com)**: ~$12/year = $1/mo.
|
||||
|
||||
## Cost drivers
|
||||
|
||||
### 1. Compute (scales with traffic)
|
||||
|
||||
If api gets >70% CPU utilization, HPA will scale from 3 to 6 replicas.
|
||||
Memory at 3 replicas × 512Mi limit = 1.5 GB; nodes have 8 GB each.
|
||||
Plenty of room before needing more nodes.
|
||||
|
||||
Tipping points:
|
||||
- >6 api replicas needed sustainedly = bigger CX43 (8 vCPU, 16 GB,
|
||||
~$16/mo each) or more CX33s
|
||||
- Heavy worker throughput = need Asynq PeriodicTaskManager (code
|
||||
change, not infra)
|
||||
|
||||
### 2. Database (scales with query volume + data)
|
||||
|
||||
Neon Launch: pay per CU-hour of compute. If idle time ≫ active time,
|
||||
we stay near $5 min. If the app is busy, CU-hours grow.
|
||||
|
||||
Tipping points:
|
||||
- Consistently >$30/mo at Launch → evaluate Neon Scale plan
|
||||
- DB storage >50 GB → $15+/mo just for storage
|
||||
- Active query load → consider read replicas (paid feature)
|
||||
|
||||
### 3. Storage (scales with user uploads)
|
||||
|
||||
B2 at $0.006/GB is cheap. 1 TB = $6/mo.
|
||||
|
||||
Tipping points:
|
||||
- >5 TB stored = consider R2 (free egress) if egress becomes a factor
|
||||
- Very high egress = evaluate moving B2 behind CF Workers
|
||||
|
||||
### 4. Edge
|
||||
|
||||
Cloudflare Free is generous. We move to Pro ($20/mo) if:
|
||||
- We need custom WAF rules beyond 5
|
||||
- We need Image Resizing for user uploads
|
||||
- We need custom Page Rules beyond 3
|
||||
|
||||
## Projections
|
||||
|
||||
### 10,000 daily active users
|
||||
|
||||
Assume 50 API requests per user per day = 500k req/day = ~6 req/s avg.
|
||||
Peaks maybe 3-5× = ~25 req/s.
|
||||
|
||||
Bottleneck: probably Neon free-tier CU-hours. At 25 req/s with DB calls,
|
||||
we'd burn through CU-hours fast. Neon bill: $15-30/mo.
|
||||
|
||||
Compute: 3 CX33s still handle this comfortably.
|
||||
|
||||
| Category | Projected monthly |
|
||||
|---|---:|
|
||||
| Compute | $24 |
|
||||
| Neon | ~$20 |
|
||||
| Storage | ~$2 |
|
||||
| Cloudflare | $0 |
|
||||
| **Total** | **~$46** |
|
||||
|
||||
### 100,000 daily active users
|
||||
|
||||
500k req/s peaks = multi-node api scaling. HPA kicks in.
|
||||
|
||||
| Category | Projected monthly |
|
||||
|---|---:|
|
||||
| Compute (3x CX33) | $24 |
|
||||
| Plus Hetzner LB | $8.49 |
|
||||
| Neon Scale (pay-as-you-go, higher baseline) | $40-60 |
|
||||
| B2 (200 GB stored, some egress) | $2 |
|
||||
| Cloudflare Pro | $20 |
|
||||
| **Total** | **~$95-115** |
|
||||
|
||||
At this scale, operator time becomes the bigger cost. Adding paid
|
||||
monitoring (Betterstack ~$15/mo) and uptime (Betterstack Uptime $5/mo)
|
||||
becomes reasonable.
|
||||
|
||||
### 1,000,000 daily active users
|
||||
|
||||
Bigger question. We'd be re-evaluating:
|
||||
- More Hetzner nodes or bigger instances
|
||||
- Neon at scale vs. self-hosted Postgres
|
||||
- Maybe Cloudflare Workers to offload traffic
|
||||
|
||||
Ballpark: $300-500/mo. At this scale, the company has revenue to
|
||||
justify an ops hire, and this chapter's assumptions break down.
|
||||
|
||||
## Dials to save money
|
||||
|
||||
### Immediate (reduce $)
|
||||
|
||||
| Lever | Savings | Trade-off |
|
||||
|---|---|---|
|
||||
| Switch 3 CX33 → 3 Netcup VPS1000G11 | ~$4/mo | Less polished provider, slightly worse UX |
|
||||
| Disable Neon Launch, use Supabase free tier | ~$5/mo | Supabase free tier limits |
|
||||
| 2 nodes instead of 3 | ~$8/mo | Lose HA, two-node Raft is worse than one |
|
||||
| 1 CX23 (2 vCPU, 4 GB) for admin + worker; 2 CX33 for api | ~$5/mo | Complexity; node roles |
|
||||
|
||||
None of these are compelling. Current cost is in the "don't optimize"
|
||||
zone.
|
||||
|
||||
### Dials to spend when it becomes worth it
|
||||
|
||||
| Spend | Return |
|
||||
|---|---|
|
||||
| Upgrade Neon to Scale ($20+) | More CU-hours, connection count room |
|
||||
| Add Hetzner LB ($8.49) | Real active health checks, sub-second failover |
|
||||
| Add monitoring (Betterstack $15) | Proactive detection of issues |
|
||||
| Add uptime monitoring ($5) | Alerts when site is down |
|
||||
| CF Pro ($20) | Better WAF, Image Resizing |
|
||||
| CF Load Balancing ($5) | Multi-region failover, active checks on origins |
|
||||
|
||||
Cumulatively **~$70/mo** takes us to a fully-monitored, fully-alerted,
|
||||
multi-region-failing-over setup. At 100k users, worth it.
|
||||
|
||||
## Historical spend
|
||||
|
||||
**April 2026 MTD**: ~$35 (Hetzner + Neon prorated).
|
||||
|
||||
**April 2026 (projected)**: $30-40.
|
||||
|
||||
**March 2026**: Pre-launch; no user traffic yet. Just node rentals.
|
||||
~$25.
|
||||
|
||||
## Hetzner April 2026 price adjustment
|
||||
|
||||
CX33 went from ~$6.59 → $7.99/mo on 2026-04-01. Our monthly compute
|
||||
cost rose by $4.20 overnight. This is on our budget radar but isn't a
|
||||
forcing function to switch providers.
|
||||
|
||||
If Hetzner keeps raising prices (which they've historically resisted;
|
||||
the 2026 adjustment was their first in several years), reconsider.
|
||||
|
||||
## Budget alerts
|
||||
|
||||
- **B2**: hard-capped via B2 console at $20/mo. If we breach, something
|
||||
is wrong and B2 rejects further writes.
|
||||
- **Neon**: soft limits via Neon alerts. Set threshold at $20 to get
|
||||
email when approaching.
|
||||
- **Hetzner**: no variable cost at our scale, no alerts needed.
|
||||
- **Cloudflare**: Free plan has hard quotas; no surprise bills possible.
|
||||
|
||||
## References
|
||||
|
||||
- [Hetzner Cloud pricing][hetzner-cloud]
|
||||
- [Neon pricing][neon-pricing]
|
||||
- [Backblaze B2 pricing][b2-pricing]
|
||||
- [Cloudflare Free plan][cf-free]
|
||||
|
||||
[hetzner-cloud]: https://www.hetzner.com/cloud/
|
||||
[neon-pricing]: https://neon.com/pricing
|
||||
[b2-pricing]: https://www.backblaze.com/cloud-storage/pricing
|
||||
[cf-free]: https://www.cloudflare.com/plans/free/
|
||||
@@ -0,0 +1,480 @@
|
||||
# 19 — Postmortem: The Swarm Era
|
||||
|
||||
## Summary
|
||||
|
||||
honeyDue launched on Docker Swarm on 2026-04-23. Over the course of a
|
||||
single afternoon we hit **thirteen distinct bugs** before declaring
|
||||
Swarm unfit and migrating to k3s. This chapter is the forensic record:
|
||||
the symptom of each bug, the root cause, the specific fix, and citations
|
||||
where relevant. It's preserved because these lessons are expensive and
|
||||
future-us should not pay them again.
|
||||
|
||||
**TL;DR**: Twelve of the thirteen bugs were recoverable. The thirteenth
|
||||
was a Docker libnetwork ghost-DNS defect ([moby/moby#52265][moby-52265])
|
||||
that is fundamentally incompatible with single-replica services. No
|
||||
amount of clever config fixed it; we had to change orchestrators.
|
||||
|
||||
## Timeline
|
||||
|
||||
**~18:00** — Infrastructure stood up. Docker Swarm initialized. First
|
||||
build + push to Gitea.
|
||||
|
||||
**~19:30** — First deploy runs. Immediate failures.
|
||||
|
||||
**~22:00** — api + admin returning 200 through Cloudflare. Flaky but
|
||||
working.
|
||||
|
||||
**~23:00** — Admin flapping 50%+ through Cloudflare. Ghost DNS record
|
||||
identified. Workarounds begin.
|
||||
|
||||
**~00:30 (next day)** — Ghost DNS survives every non-nuclear
|
||||
intervention. Research confirms it's a known libnetwork bug. Decision
|
||||
to migrate to k3s.
|
||||
|
||||
**~04:30** — k3s cluster up, all services healthy, 150/150 requests
|
||||
green. Postmortem begins.
|
||||
|
||||
The session ran ~10 hours. The migration itself took ~1 hour.
|
||||
|
||||
## The thirteen bugs
|
||||
|
||||
### 1 — Deploy script array expansion under `set -u`
|
||||
|
||||
**File**: `deploy/scripts/deploy_prod.sh`
|
||||
|
||||
**Symptom**:
|
||||
```
|
||||
./deploy/scripts/deploy_prod.sh: line 339: api_extra[@]: unbound variable
|
||||
```
|
||||
|
||||
**Root cause**: Bash arrays expanded with `"${arr[@]}"` under `set -u`
|
||||
fail when the array is empty. Our deploy script initialized empty
|
||||
arrays conditionally but expanded them unconditionally.
|
||||
|
||||
**Fix**: Use the `${arr[@]+"${arr[@]}"}` safe-expansion idiom, or
|
||||
restructure to avoid passing empty arrays:
|
||||
|
||||
```bash
|
||||
build_and_push api "${API_IMAGE}" ${api_extra[@]+"${api_extra[@]}"}
|
||||
```
|
||||
|
||||
Inside the function, same treatment — use `shift` instead of array
|
||||
slicing.
|
||||
|
||||
**Moral**: `set -u` with bash arrays is a known pitfall. The
|
||||
`"${arr[@]}"` expansion isn't safe under strict mode if arrays can be
|
||||
empty.
|
||||
|
||||
### 2 — Dockerfile Go version mismatch
|
||||
|
||||
**File**: `Dockerfile`
|
||||
|
||||
**Symptom**:
|
||||
```
|
||||
go: go.mod requires go >= 1.25 (running go 1.24.13; GOTOOLCHAIN=local)
|
||||
ERROR: failed to build: failed to solve: process "/bin/sh -c go mod download" did not complete successfully: exit code: 1
|
||||
```
|
||||
|
||||
**Root cause**: `go.mod` specifies `go 1.25`, but the Dockerfile's
|
||||
builder stage used `golang:1.24-alpine`.
|
||||
|
||||
**Fix**: Bumped to `golang:1.25-alpine`. One-character change.
|
||||
|
||||
**Moral**: Keep the Dockerfile base image in sync with `go.mod`'s
|
||||
go directive. CI would catch this; we had none.
|
||||
|
||||
### 3 — dev machine arm64 vs node amd64
|
||||
|
||||
**Symptom**: Would have been `exec format error` on the nodes if we'd
|
||||
deployed without fixing. Caught at build config stage.
|
||||
|
||||
**Root cause**: Operator on Apple Silicon (arm64). Hetzner nodes are
|
||||
amd64. Plain `docker build` produces arm64 images.
|
||||
|
||||
**Fix**: Switched deploy script to use `docker buildx build --platform
|
||||
linux/amd64 --push`. This cross-compiles the Go stages (they honor
|
||||
`TARGETARCH`) and uses QEMU emulation for the Node stages.
|
||||
|
||||
**Moral**: Cross-platform builds are routine for Apple Silicon
|
||||
developers. Document it up front, bake it into the deploy script.
|
||||
|
||||
### 4 — Swarm stack `host_ip` rejected
|
||||
|
||||
**File**: `deploy/swarm-stack.prod.yml` (dozzle service)
|
||||
|
||||
**Symptom**:
|
||||
```
|
||||
services.dozzle.ports.0 Additional property host_ip is not allowed
|
||||
```
|
||||
|
||||
**Root cause**: Docker Compose v3.8 schema allows `host_ip` in long-form
|
||||
port spec. Swarm's `docker stack deploy` parser doesn't.
|
||||
|
||||
**Fix**: Use the short form:
|
||||
```yaml
|
||||
ports:
|
||||
- "127.0.0.1:${DOZZLE_PORT}:8080"
|
||||
```
|
||||
|
||||
But then: Swarm's ingress mesh mode silently ignores the `127.0.0.1`
|
||||
binding and listens on `0.0.0.0` anyway. Only way to get true
|
||||
loopback-only binding is `mode: host`, which changes port-publishing
|
||||
semantics.
|
||||
|
||||
**Moral**: Compose-file compatibility between plain Docker and Swarm
|
||||
is imperfect. Check the [Swarm-specific compose reference][swarm-compose]
|
||||
when in doubt.
|
||||
|
||||
### 5 — Stack file secret references
|
||||
|
||||
**Symptom**:
|
||||
```
|
||||
service worker: undefined secret "honeydue_postgres_password_237c6b8-20260423195810"
|
||||
```
|
||||
|
||||
**Root cause**: The original stack file template used
|
||||
`source: ${POSTGRES_PASSWORD_SECRET}` (which expanded to the versioned
|
||||
secret name like `honeydue_postgres_password_<ts>`) under each service's
|
||||
`secrets:` list.
|
||||
|
||||
Swarm expects `source:` to match the **alias** in the top-level
|
||||
`secrets:` block (`postgres_password`), not the actual secret `name:`.
|
||||
|
||||
**Fix**: Changed every `source:` to the alias form:
|
||||
|
||||
```yaml
|
||||
# Was:
|
||||
- source: ${POSTGRES_PASSWORD_SECRET}
|
||||
target: postgres_password
|
||||
|
||||
# Now:
|
||||
- source: postgres_password
|
||||
target: postgres_password
|
||||
```
|
||||
|
||||
**Moral**: The original template was clever but subtly wrong. It had
|
||||
never successfully deployed — the earlier Dokku setup used a different
|
||||
secret model. Bugs-in-template-code catch you when you first hit them.
|
||||
|
||||
### 6 — API pod crash: `sync.Once` double-unlock
|
||||
|
||||
**File**: `internal/services/cache_service.go:54`
|
||||
|
||||
**Symptom**: api pods completed migrations, started HTTP server, then
|
||||
fataled with:
|
||||
```
|
||||
fatal error: sync: unlock of unlocked mutex
|
||||
goroutine 1 [running]:
|
||||
internal/sync.fatal(...)
|
||||
sync.(*Once).doSlow(...)
|
||||
github.com/treytartt/honeydue-api/internal/services.NewCacheService
|
||||
/app/internal/services/cache_service.go:31
|
||||
```
|
||||
|
||||
**Root cause**: Inside a `sync.Once.Do(func() { ... })` callback, the
|
||||
code did:
|
||||
|
||||
```go
|
||||
cacheOnce.Do(func() {
|
||||
// ...
|
||||
if err := client.Ping(ctx).Err(); err != nil {
|
||||
initErr = fmt.Errorf(...)
|
||||
cacheOnce = sync.Once{} // ← THIS LINE
|
||||
return
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
The intent: "if Redis ping fails, reset the Once so a retry can happen."
|
||||
The reality: the Once's internal mutex is held while `Do` is running the
|
||||
callback. Reassigning `cacheOnce = sync.Once{}` creates a NEW zero-
|
||||
valued Once and replaces the old one. When `Do` tries to release the
|
||||
mutex afterward, the mutex is the new-zero-valued one — which isn't
|
||||
locked. Panic.
|
||||
|
||||
**Fix**: Removed the reset. `main.go` already handles the error
|
||||
gracefully (`cache = nil`, continues without caching). Retries happen
|
||||
via pod restart, not in-process.
|
||||
|
||||
```go
|
||||
if err := client.Ping(ctx).Err(); err != nil {
|
||||
initErr = fmt.Errorf(...)
|
||||
// Don't reassign cacheOnce here — mutating it from inside Do()
|
||||
// is a fatal error. Let main.go handle the error.
|
||||
return
|
||||
}
|
||||
```
|
||||
|
||||
**Moral**: `sync.Once` is simpler than it looks. Never reassign an
|
||||
active sync primitive from within its own callback.
|
||||
|
||||
### 7 — Stack file `maxUnavailable: 2` warning for worker
|
||||
|
||||
**Symptom**: We noticed `WORKER_REPLICAS=2` in `cluster.env` despite
|
||||
the Asynq scheduler being a singleton.
|
||||
|
||||
**Root cause**: Asynq's `Scheduler` is not leader-elected by default.
|
||||
Running >1 replica causes duplicate cron firings — duplicate daily
|
||||
digests, double-welcome emails.
|
||||
|
||||
**Fix**: `WORKER_REPLICAS=1`. Added a comment in `cluster.env.example`
|
||||
explaining why.
|
||||
|
||||
**Moral**: Defaults can be dangerous. Even when a default seems
|
||||
reasonable ("2 replicas for HA"), check against the app's semantics.
|
||||
|
||||
### 8 — `PUSH_LATEST_TAG=true` for prod
|
||||
|
||||
**Symptom**: During a test, we saw `honeydue-api:latest` updating,
|
||||
which would make rollbacks harder.
|
||||
|
||||
**Root cause**: The cluster.env had `PUSH_LATEST_TAG=true` when the
|
||||
design intent was SHA-pinned deploys only.
|
||||
|
||||
**Fix**: `PUSH_LATEST_TAG=false`. SHA tags only.
|
||||
|
||||
**Moral**: Tag-mutable images make rollbacks non-deterministic.
|
||||
Prefer immutable SHA tags.
|
||||
|
||||
### 9 — Neon DB name case sensitivity
|
||||
|
||||
**Symptom**:
|
||||
```
|
||||
server error: ERROR: database "honeydue" does not exist (SQLSTATE 3D000)
|
||||
```
|
||||
|
||||
**Root cause**: Neon's UI created the database as `"honeyDue"` (quoted,
|
||||
camelCase). Postgres treats quoted identifiers case-sensitively at
|
||||
create time. Our `prod.env` had `POSTGRES_DB=honeydue` (lowercase).
|
||||
|
||||
**Fix**: `POSTGRES_DB=honeyDue`.
|
||||
|
||||
**Moral**: Respect Postgres's identifier quoting rules. If something
|
||||
was created with quotes, refer to it with exact case.
|
||||
|
||||
### 10 — Admin DNS ghost A-record (the big one)
|
||||
|
||||
**Symptom**: Through Cloudflare, `admin.myhoneydue.com` returned 502 on
|
||||
~50% of requests. The other 50% succeeded. The pattern was stable over
|
||||
hours.
|
||||
|
||||
**Investigation**:
|
||||
|
||||
The admin service had 1 replica, alive on one of three Swarm nodes.
|
||||
Caddy (reverse proxy at the time) resolved `admin` via Swarm's
|
||||
embedded DNS at `127.0.0.11`. `nslookup admin` returned:
|
||||
|
||||
```
|
||||
Name: admin Address: 10.0.1.36 (current task IP)
|
||||
Name: admin Address: 10.0.1.17 (GHOST — what is this?)
|
||||
```
|
||||
|
||||
Two A records for one-replica service, both returned randomly.
|
||||
|
||||
`10.0.1.17` was checked: that IP now belonged to the **dozzle**
|
||||
container on hetzner3. Nothing listens on dozzle's 3000 port →
|
||||
connection refused → 502.
|
||||
|
||||
The old admin task had run on hetzner3 with IP 10.0.1.17. When it
|
||||
migrated to hetzner1 with IP 10.0.1.36, libnetwork's DNS registration
|
||||
for admin was supposed to update. On hetzner2 and hetzner3, the old
|
||||
10.0.1.17 record never got removed.
|
||||
|
||||
**Things tried, none worked**:
|
||||
|
||||
| Attempt | Result |
|
||||
|---|---|
|
||||
| `endpoint_mode: dnsrr` on admin | DNS still returns both IPs |
|
||||
| Kill + restart Caddy container | DNS still returns both IPs |
|
||||
| Scale admin to 0 and back to 1 | Ghost 10.0.1.17 still in DNS with 0 replicas |
|
||||
| `docker service rm honeydue_admin` | Ghost 10.0.1.17 still in DNS (orphaned) |
|
||||
| Change admin to `mode: global` | Different IPs but ghost remains |
|
||||
| `mode: host` on admin ports + `extra_hosts: host.docker.internal:host-gateway` | `host.docker.internal` resolved to docker0 (172.17.0.1), not reachable from overlay |
|
||||
| Hardcoded 3 node IPs in Caddy + UFW port 3000 node-to-node | ~90% reliable, NAT hairpin issues when Caddy dials its own node |
|
||||
|
||||
**Root cause**: [moby/moby#52265][moby-52265] — Docker libnetwork's
|
||||
overlay network state store doesn't reliably deregister service
|
||||
endpoints when tasks migrate between nodes. Known bug in the 29.x
|
||||
line. Partial fixes in #50236 (29.0) were incomplete; 29.3 still
|
||||
leaks; #52289 is the pending follow-up.
|
||||
|
||||
**Why it only manifests on single-replica services**: With 3 replicas,
|
||||
Caddy's DNS query returns 4 IPs (3 real + 1 ghost). Round-robin
|
||||
succeeds 75% of the time. With 1 replica, 1 real + 1 ghost = 50%
|
||||
failure. More replicas = bug is masked.
|
||||
|
||||
**Final fix**: None at the libnetwork level. The ghost survives every
|
||||
non-cluster-recreating operation. The only clean purge is
|
||||
`docker stack rm` + `docker network rm` + full redeploy. Even then,
|
||||
the bug recurs on the next task migration.
|
||||
|
||||
**Decision**: Migrate to k3s. CoreDNS has none of libnetwork's state-
|
||||
store semantics and the bug class doesn't exist. 4 hours of fighting
|
||||
Swarm → 1-hour k3s migration that just worked.
|
||||
|
||||
**Citations**:
|
||||
- [moby/moby#52265 — Overlay ARP stale entries on 29.3.0][moby-52265]
|
||||
- [moby/moby#51491 — DNS broken after swarm init][moby-51491]
|
||||
- [Dokploy#3480 — Traefik stale VIP on Swarm][dokploy-3480]
|
||||
|
||||
### 11 — IPSec ESP + UDP 500 blocked
|
||||
|
||||
**Symptom**: Earlier in the Swarm setup, api 3/3 was working but
|
||||
cross-node overlay traffic was intermittently failing. This turned out
|
||||
to be a separate bug masking #10 earlier in the session.
|
||||
|
||||
**Root cause**: We had encrypted overlay enabled
|
||||
(`driver_opts: encrypted: "true"`). Swarm's encrypted mode uses IPSec
|
||||
ESP (IP protocol 50) + UDP 500 (IKE). Our UFW only allowed UDP 4789
|
||||
(VXLAN) and 7946 (gossip). ESP was blocked by default-deny. Encrypted
|
||||
packets dropped silently on some flows.
|
||||
|
||||
**Fix**: Added UFW rules for each peer node IP:
|
||||
```bash
|
||||
sudo ufw allow from <peer> to any proto esp
|
||||
sudo ufw allow from <peer> to any port 500 proto udp
|
||||
```
|
||||
|
||||
Once applied, cross-node overlay data path became stable.
|
||||
|
||||
**Moral**: Encrypted Swarm overlay requires more than VXLAN to be open.
|
||||
ESP (protocol 50) and UDP 500 (IKE) for IPSec. Official Docker docs
|
||||
mention this but it's easy to miss.
|
||||
|
||||
### 12 — Admin startupProbe path
|
||||
|
||||
**Symptom**: Admin pod kept restarting with startup probe failures.
|
||||
Kubelet reported:
|
||||
```
|
||||
Startup probe failed: HTTP probe failed with statuscode: 404
|
||||
```
|
||||
|
||||
**Root cause**: The k3s scaffold's `admin/deployment.yaml` had:
|
||||
```yaml
|
||||
startupProbe:
|
||||
httpGet:
|
||||
path: /admin/
|
||||
port: 3000
|
||||
```
|
||||
|
||||
But our admin Next.js app serves at `/`, not `/admin/`. Requests to
|
||||
`/admin/` return 404. K8s considered the pod unhealthy and restart-
|
||||
looped.
|
||||
|
||||
**Fix**: Change probe path to `/`. Also bumped `failureThreshold` from
|
||||
12 to 24 (120s grace) for Next.js's slower-than-expected cold boot
|
||||
when the node's already busy.
|
||||
|
||||
**Moral**: Copy-pasted scaffolds can have assumptions that don't match
|
||||
your app. Always verify probes against actual reachable paths.
|
||||
|
||||
### 13 — MigrateWithLock startup probe grace
|
||||
|
||||
**Symptom**: API pods were getting killed by k8s during migration.
|
||||
First replica was OK (fast migration); replicas 2 and 3 waited on
|
||||
the advisory lock too long and healthchecks tripped.
|
||||
|
||||
**Root cause**: Go app's `MigrateWithLock()` uses
|
||||
`pg_advisory_lock()` to serialize migrations across replicas. First
|
||||
replica does real AutoMigrate (~90s cold); subsequent replicas wait
|
||||
on the lock, then run no-op migrations. Total time for 3rd replica
|
||||
can be 3+ minutes.
|
||||
|
||||
K3s scaffold's `api/deployment.yaml` had:
|
||||
```yaml
|
||||
startupProbe:
|
||||
failureThreshold: 12
|
||||
periodSeconds: 5
|
||||
```
|
||||
|
||||
= 60s grace. Not enough.
|
||||
|
||||
**Fix**: Bumped `failureThreshold` to 48 (= 240s grace). Comment in
|
||||
the manifest explains why. This is *not* a band-aid — the real startup
|
||||
time genuinely is 90-240s depending on lock queue position. The probe
|
||||
should reflect reality, not be optimistic.
|
||||
|
||||
**Moral**: Healthchecks should be realistic, not aspirational. Know
|
||||
what your app actually does at startup.
|
||||
|
||||
## What we learned
|
||||
|
||||
### Docker Swarm is in a bad place in 2026
|
||||
|
||||
Not dead — Mirantis supports it through 2030 — but **nobody is
|
||||
modernizing libnetwork**. When you hit a DNS or networking bug, you're
|
||||
on your own. The fix churn on #52265 (incomplete 29.0 fix → 29.3
|
||||
regression → pending #52289) is a tell: the code has no champion.
|
||||
|
||||
For new deployments, **don't pick Swarm** unless you're doing something
|
||||
Swarm-shaped (tiny, single-replica, no inter-service traffic). K3s is
|
||||
a strictly better choice for anything approximating what we're doing.
|
||||
|
||||
### Investigate before you work around
|
||||
|
||||
We spent a lot of time on clever workarounds for bug #10 (host-mode
|
||||
ports, host.docker.internal, hardcoded node IPs, UFW routing) before
|
||||
doing the 20-minute research task that revealed the bug was a known
|
||||
libnetwork defect. If we'd searched "Swarm DNS stale record 2026" first,
|
||||
we'd have saved ~3 hours.
|
||||
|
||||
### Scaffolds are starting points, not finishing points
|
||||
|
||||
The k3s scaffold in `deploy-k3s/` was excellent — production-grade
|
||||
RBAC, PDBs, security contexts, network policies, Traefik middleware.
|
||||
But its image references (GHCR), TLS assumptions (CF Full strict), and
|
||||
probe paths (admin's `/admin/`) didn't match our actual setup. Every
|
||||
scaffold needs a read-through against your environment before you
|
||||
`kubectl apply -f`.
|
||||
|
||||
### Keep the old config until the new config is proven
|
||||
|
||||
We kept `deploy/` (Swarm) intact during the k3s migration. That meant
|
||||
if k3s failed, we could `git stash` the k3s work and do a fast Swarm
|
||||
redeploy. It took ~4 days before we deleted `deploy/`, by which point
|
||||
we were confident.
|
||||
|
||||
## Files affected by tonight's work
|
||||
|
||||
All in `honeyDueAPI-go`:
|
||||
|
||||
- `Dockerfile` — Go 1.24 → 1.25 (bug #2)
|
||||
- `deploy/scripts/deploy_prod.sh` — buildx refactor, array expansion fixes (bugs #1, #3)
|
||||
- `deploy/swarm-stack.prod.yml` — dozzle host_ip, secret source references, multiple iterations trying to fix #10
|
||||
- `deploy/prod.env` — admin seed env vars, DB_POSTGRES_DB case, B2 values, push-disabled placeholders (bug #9)
|
||||
- `deploy/cluster.env` — WORKER_REPLICAS 2 → 1, PUSH_LATEST_TAG (bugs #7, #8)
|
||||
- `deploy/Caddyfile` — multiple iterations (ultimately deleted when we moved to k3s)
|
||||
- `internal/services/cache_service.go` — removed sync.Once reset (bug #6)
|
||||
- `internal/database/database.go` — (no change, MigrateWithLock semantics investigated)
|
||||
- `deploy-k3s/manifests/api/deployment.yaml` — startupProbe grace (bug #13)
|
||||
- `deploy-k3s/manifests/admin/deployment.yaml` — probe path (bug #12)
|
||||
- `deploy-k3s/manifests/worker/deployment.yaml` — replicas 2 → 1
|
||||
- `deploy-k3s/manifests/pod-disruption-budgets.yaml` — worker minAvailable 1 → 0
|
||||
- `deploy-k3s/manifests/traefik-helmchartconfig.yaml` — NEW (DaemonSet + hostNetwork for Traefik)
|
||||
- `deploy-k3s/manifests/ingress/ingress-simple.yaml` — NEW (simple host routing, no TLS)
|
||||
- `deploy-k3s/MIGRATION_NOTES.md` — NEW
|
||||
|
||||
## What was thrown away
|
||||
|
||||
- Swarm stack definitions (still in `deploy/`, planned for removal)
|
||||
- Caddy Caddyfile (k3s uses Traefik instead)
|
||||
- Several hours of work on Caddy `dynamic a` upstream refresh, host-
|
||||
mode ports, and NAT-hairpin workarounds for bug #10 — all moot
|
||||
once we migrated
|
||||
|
||||
## References
|
||||
|
||||
- [moby/moby#52265 — Overlay ARP stale entries][moby-52265]
|
||||
- [moby/moby#51491 — DNS broken after swarm init][moby-51491]
|
||||
- [Dokploy#3480 — Traefik stale VIP][dokploy-3480]
|
||||
- [Mirantis Swarm LTS commitment][mirantis-swarm]
|
||||
- [Kubernetes probe best practices][k8s-probes]
|
||||
- [Asynq scheduler limitations][asynq-sched]
|
||||
|
||||
[moby-52265]: https://github.com/moby/moby/issues/52265
|
||||
[moby-51491]: https://github.com/moby/moby/issues/51491
|
||||
[dokploy-3480]: https://github.com/Dokploy/dokploy/issues/3480
|
||||
[mirantis-swarm]: https://www.mirantis.com/blog/mirantis-guarantees-long-term-support-for-swarm/
|
||||
[k8s-probes]: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
|
||||
[asynq-sched]: https://github.com/hibiken/asynq/wiki/Periodic-Tasks
|
||||
[swarm-compose]: https://docs.docker.com/reference/compose-file/legacy-versions/
|
||||
@@ -0,0 +1,318 @@
|
||||
# 20 — Roadmap
|
||||
|
||||
## Summary
|
||||
|
||||
A consolidated list of known gaps, improvements, and scaling triggers.
|
||||
Items are grouped by category and roughly ordered by priority. This is
|
||||
the "if we had more time" list referenced throughout the book.
|
||||
|
||||
## High priority (do soon)
|
||||
|
||||
### Uptime monitoring
|
||||
|
||||
**Why**: Right now we find out the site is down when users complain.
|
||||
|
||||
**How**: Set up Uptime Kuma (self-hosted) or Better Stack Uptime
|
||||
(free tier) to ping `https://api.myhoneydue.com/api/health/` every
|
||||
minute, with Slack/email alerts on failure.
|
||||
|
||||
**Effort**: ~30 min for Uptime Kuma deploy, ~10 min for Better Stack
|
||||
signup.
|
||||
|
||||
### Cloudflare origin IP restriction
|
||||
|
||||
**Why**: UFW allows :80 from anywhere. If node IPs leak, direct-connect
|
||||
attackers bypass CF's WAF/DDoS protection.
|
||||
|
||||
**How**: Replace the anywhere-80 UFW rule with 15 IPv4 + 7 IPv6 CF
|
||||
ranges. See [Chapter 13 §CF IP ranges](./13-cloudflare.md#cloudflare-ip-ranges-used-in-traefik-trustedips).
|
||||
|
||||
Automation: a small script that refreshes the CF IP list monthly and
|
||||
re-applies UFW rules.
|
||||
|
||||
**Effort**: 1 hour.
|
||||
|
||||
### Enable network policies in k3s
|
||||
|
||||
**Why**: Currently pods can freely egress anywhere. A compromised pod
|
||||
could exfiltrate data or attack lateral services.
|
||||
|
||||
**How**: `kubectl apply -f deploy-k3s/manifests/network-policies.yaml`.
|
||||
The scaffold defines default-deny + explicit allows for:
|
||||
- DNS egress for all pods
|
||||
- Traefik → api (port 8000)
|
||||
- Traefik → admin (port 3000)
|
||||
- api/worker → Redis
|
||||
- api/worker → external services (Postgres, B2, Fastmail)
|
||||
|
||||
Then test that nothing breaks (might need to adjust allow rules).
|
||||
|
||||
**Effort**: 1-2 hours including testing.
|
||||
|
||||
### Apply Traefik security middleware
|
||||
|
||||
**Why**: Our current Ingress has no rate limiting or security headers
|
||||
beyond what Traefik adds by default.
|
||||
|
||||
**How**: Apply `deploy-k3s/manifests/ingress/middleware.yaml`, annotate
|
||||
Ingresses to use them:
|
||||
|
||||
```yaml
|
||||
metadata:
|
||||
annotations:
|
||||
traefik.ingress.kubernetes.io/router.middlewares: honeydue-security-headers@kubernetescrd,honeydue-rate-limit@kubernetescrd
|
||||
```
|
||||
|
||||
**Effort**: 15 min.
|
||||
|
||||
## Medium priority
|
||||
|
||||
### Upgrade to CF Full (strict) SSL
|
||||
|
||||
**Why**: Currently CF↔origin is plain HTTP. An attacker between CF and
|
||||
Hetzner could read traffic. Full (strict) mode encrypts this leg with
|
||||
a CF-issued origin cert.
|
||||
|
||||
**How**:
|
||||
1. Generate Origin CA cert in CF dashboard → SSL/TLS → Origin Server
|
||||
2. Create `cloudflare-origin-cert` Secret in k8s
|
||||
3. Add `tls:` block to Ingresses
|
||||
4. Switch CF SSL mode to Full (strict)
|
||||
|
||||
**Effort**: 30 min.
|
||||
|
||||
**Citations**: [Cloudflare Origin CA docs][cf-origin-ca]
|
||||
|
||||
### Migration Job for schema changes
|
||||
|
||||
**Why**: Currently every api pod runs `MigrateWithLock()` on startup,
|
||||
serializing on a Postgres advisory lock. Adds 90-240s to cold startup
|
||||
and caused bug #13 in Chapter 19.
|
||||
|
||||
**How**: Create a Kubernetes `Job` resource that runs the api image
|
||||
with a `--migrate-only` flag. Job runs once per deploy, completes when
|
||||
schema is current. api pods get an initContainer that waits for the
|
||||
Job to complete.
|
||||
|
||||
Requires Go code change to support `--migrate-only` flag.
|
||||
|
||||
**Effort**: 3-4 hours (code + job manifest + testing).
|
||||
|
||||
### Redis password
|
||||
|
||||
**Why**: Redis runs in the cluster with no auth. Any compromised pod
|
||||
could read cache or queue state.
|
||||
|
||||
**How**: Set `REDIS_PASSWORD` in `honeydue-secrets`, update api/worker
|
||||
env, update Redis command to include `--requirepass`. Already partially
|
||||
wired up in the manifests.
|
||||
|
||||
**Effort**: 20 min.
|
||||
|
||||
### Image signing with cosign
|
||||
|
||||
**Why**: No guarantee that an image pulled from Gitea is the one we
|
||||
built. Gitea compromise = arbitrary code execution in cluster.
|
||||
|
||||
**How**:
|
||||
1. Install cosign on build machine
|
||||
2. Sign images as part of deploy: `cosign sign gitea.treytartt.com/admin/honeydue-api:<sha>`
|
||||
3. Deploy Kyverno (or Connaisseur) to cluster
|
||||
4. Apply cluster policy requiring all images have valid cosign signatures
|
||||
|
||||
**Effort**: 4-6 hours.
|
||||
|
||||
### etcd encryption at rest
|
||||
|
||||
**Why**: Kubernetes Secrets are stored in etcd unencrypted by default.
|
||||
Node disk compromise = plaintext secrets.
|
||||
|
||||
**How**: K3s supports `--secrets-encryption` flag at server install.
|
||||
Need to recreate cluster or re-install k3s server on each node.
|
||||
|
||||
**Effort**: 1 hour.
|
||||
|
||||
### Automated unattended-upgrades
|
||||
|
||||
**Why**: Currently OS patches require manual `apt upgrade`. Security
|
||||
patches can be delayed.
|
||||
|
||||
**How**:
|
||||
```bash
|
||||
sudo apt install unattended-upgrades
|
||||
# Configure /etc/apt/apt.conf.d/50unattended-upgrades for security-only
|
||||
sudo dpkg-reconfigure -plow unattended-upgrades
|
||||
```
|
||||
|
||||
**Effort**: 30 min per node.
|
||||
|
||||
### fail2ban
|
||||
|
||||
**Why**: SSH is open to the world. No rate limiting on failed attempts.
|
||||
Bot noise is constant.
|
||||
|
||||
**How**: `sudo apt install fail2ban; sudo systemctl enable --now fail2ban`.
|
||||
Default config bans IPs after 5 failed attempts for 10 min.
|
||||
|
||||
**Effort**: 15 min per node.
|
||||
|
||||
### Move SSH off port 22
|
||||
|
||||
**Why**: Port 22 attracts constant scanner noise. Moving to a
|
||||
non-default port cuts >90% of attempts.
|
||||
|
||||
**How**:
|
||||
1. Edit `/etc/ssh/sshd_config` on each node: `Port 2222`
|
||||
2. UFW rule: `sudo ufw allow 2222/tcp`
|
||||
3. Update `~/.ssh/config` on operator: `Port 2222`
|
||||
4. Restart sshd: `sudo systemctl restart ssh`
|
||||
5. Remove UFW rule for port 22 after verifying
|
||||
|
||||
**Effort**: 30 min (and pray).
|
||||
|
||||
## Lower priority
|
||||
|
||||
### Prometheus + Grafana
|
||||
|
||||
**Why**: Historical metrics, dashboards, alerting.
|
||||
|
||||
**How**: `kube-prometheus-stack` Helm chart. Adds ~500 MB RAM across
|
||||
cluster.
|
||||
|
||||
**Effort**: 4-6 hours including dashboard setup.
|
||||
|
||||
### Loki log aggregation
|
||||
|
||||
**Why**: Cross-pod log queries, longer retention.
|
||||
|
||||
**How**: `grafana/loki` + `promtail` DaemonSet. Integrates with existing
|
||||
Grafana.
|
||||
|
||||
**Effort**: 2-3 hours.
|
||||
|
||||
### OpenTelemetry tracing
|
||||
|
||||
**Why**: Request-level profiling. Show which hop dominates p99 latency.
|
||||
|
||||
**How**: Add OpenTelemetry SDK to Go app; export to Jaeger/Tempo.
|
||||
|
||||
**Effort**: 8-12 hours including tuning.
|
||||
|
||||
### Hetzner private network
|
||||
|
||||
**Why**: Currently all inter-node traffic (including Flannel overlay)
|
||||
goes over public network. Private network = less attack surface, no
|
||||
bandwidth costs (if metered in future).
|
||||
|
||||
**How**: Attach Hetzner vswitch to the 3 nodes, reconfigure Flannel to
|
||||
advertise private IPs, update UFW rules to allow from private IP range
|
||||
instead of specific public IPs.
|
||||
|
||||
**Effort**: 2-3 hours including testing Flannel reconfig.
|
||||
|
||||
### Move secrets to Vault
|
||||
|
||||
**Why**: Kubernetes Secrets are base64-encoded etcd values. Vault is
|
||||
purpose-built for secret management with audit logs, dynamic secrets,
|
||||
rotation policies.
|
||||
|
||||
**How**: Deploy Vault in the cluster (or external), migrate secret
|
||||
values, use Vault Agent Injector or External Secrets Operator.
|
||||
|
||||
**Effort**: 6-8 hours.
|
||||
|
||||
Not high priority until we have multiple engineers who shouldn't see
|
||||
every secret, or compliance requirements.
|
||||
|
||||
### Automated backups to B2
|
||||
|
||||
**Why**: Neon's backup is Neon's problem. If Neon-as-a-company
|
||||
disappeared, we'd lose everything.
|
||||
|
||||
**How**: Nightly `pg_dump | gzip | aws s3 cp` (via `s3cmd` for B2) as a
|
||||
CronJob in the cluster.
|
||||
|
||||
**Effort**: 2 hours.
|
||||
|
||||
### Multi-region
|
||||
|
||||
**Why**: ~100 ms CF→origin hop could be reduced by having origins in
|
||||
multiple regions. Not needed at current scale.
|
||||
|
||||
**How**: Add 2 more Hetzner nodes in ash (Ashburn, US). Separate k3s
|
||||
cluster (or one stretched cluster — painful). Cloudflare Load Balancing
|
||||
for geo-based routing.
|
||||
|
||||
**Effort**: Days of work, doubling cost. Don't until traffic justifies.
|
||||
|
||||
### CF Workers for static + caching
|
||||
|
||||
**Why**: Certain endpoints (the marketing landing page, public API
|
||||
lookups) could serve from CF Workers with near-zero origin load.
|
||||
|
||||
**How**: Move static pages to Cloudflare Pages; cache API responses
|
||||
with `Cache-Control: public, max-age=300`.
|
||||
|
||||
**Effort**: 4-6 hours.
|
||||
|
||||
### WireGuard-encrypted overlay
|
||||
|
||||
**Why**: Current Flannel VXLAN is plaintext between nodes. An attacker
|
||||
with Hetzner-internal network access could read pod-to-pod traffic.
|
||||
|
||||
**How**: K3s supports `--flannel-backend=wireguard-native`. Reinstall
|
||||
k3s server on each node with the new backend.
|
||||
|
||||
**Effort**: 2-3 hours (requires brief downtime).
|
||||
|
||||
## Scaling triggers
|
||||
|
||||
| Trigger | Action |
|
||||
|---|---|
|
||||
| p99 latency > 500ms sustained | Investigate with tracing; consider CF Workers for cached paths |
|
||||
| API CPU > 70% sustained | HPA already configured; may need more nodes |
|
||||
| DB connections at Neon limit | Upgrade Neon Scale or reduce `DB_MAX_OPEN_CONNS` |
|
||||
| Redis memory > 80% | Scale Redis memory; consider cache sharding |
|
||||
| B2 storage > 500 GB | Evaluate if R2 (free egress) is cheaper overall |
|
||||
| Active users > 100k | Evaluate multi-region, CF Pro, paid monitoring |
|
||||
| Revenue > $5k/mo | Hire ops help; this document assumes solo operator |
|
||||
|
||||
## Known gaps we accept
|
||||
|
||||
- **No canary deploys**: all-or-nothing rollouts via `kubectl set image`
|
||||
- **No feature flags** (app-level): code is deployed as-is. Can't toggle
|
||||
features without re-deploying
|
||||
- **No A/B testing infra**: out of scope for current product stage
|
||||
- **No Windows/tablet-specific CDN rules**: CF serves everyone the same
|
||||
responses
|
||||
- **No explicit blue-green**: rolling updates only
|
||||
|
||||
## Stuff to delete when brave
|
||||
|
||||
- `deploy/` (the Swarm era) — once we've been on k3s 30 days
|
||||
- Legacy UFW rules from the Swarm era (2377, 7946, 4789, ESP, 500, 3000)
|
||||
— they don't hurt but they're confusing
|
||||
- `deploy-k3s/manifests/secrets.yaml.example` — we don't use this
|
||||
pattern, we create secrets imperatively
|
||||
|
||||
## Stuff that could go wrong and we should plan for
|
||||
|
||||
- **Hetzner price hike**: 2026-04-01 already happened. If another one
|
||||
comes, we could migrate to Netcup or OVH for savings.
|
||||
- **Neon EOL free tier**: Neon could change pricing policy. Fallback:
|
||||
self-host Postgres on a Hetzner box or migrate to Supabase.
|
||||
- **Cloudflare Free plan changes**: CF could restrict Free features.
|
||||
Fallback: BunnyCDN, or raw nodes without CDN.
|
||||
- **Gitea host outage**: If Gitea is down, deploys can't pull new
|
||||
images. Existing pods continue. For long outages, we'd cache images
|
||||
locally or temporarily push to Docker Hub.
|
||||
|
||||
## Progress tracker
|
||||
|
||||
As items are done, mark them here. Think of this as a running changelog.
|
||||
|
||||
- [x] k3s migration from Swarm (2026-04-24)
|
||||
- [x] Traefik DaemonSet + hostNetwork
|
||||
- [x] Admin seed via ADMIN_EMAIL + ADMIN_PASSWORD
|
||||
- [x] Documentation book (this doc set)
|
||||
- [ ] All other items above
|
||||
@@ -0,0 +1,112 @@
|
||||
# honeyDue Production Deployment — The Book
|
||||
|
||||
This is the complete reference for the honeyDue production deployment as it
|
||||
exists on **2026-04-24**. It serves two audiences:
|
||||
|
||||
1. **A new engineer** learning the system for the first time. Start at
|
||||
Chapter 0 (Overview) and read in order. Concepts are built up; nothing is
|
||||
assumed beyond "you've deployed web apps before."
|
||||
2. **The operator** (future-you) needing a specific fact fast. Every chapter
|
||||
opens with a one-paragraph summary and has an operator runbook at its end.
|
||||
The appendices are a cheat sheet.
|
||||
|
||||
The deployment is non-trivial. It's a 3-node HA Kubernetes cluster running
|
||||
a Go API, a Next.js admin panel, a background worker, Redis, and Traefik —
|
||||
all fronted by Cloudflare, integrated with Neon Postgres, Backblaze B2, and
|
||||
a self-hosted Gitea registry. This book explains **why each of those pieces
|
||||
was chosen** (often over two or three alternatives we tried first), what
|
||||
they do, and how to operate them.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
### Part I — The System
|
||||
|
||||
- [00 — Overview](./00-overview.md) — what's running, at a glance
|
||||
- [01 — Infrastructure](./01-infrastructure.md) — Hetzner nodes, specs, cost, region
|
||||
- [02 — Orchestrator Choice](./02-orchestrator-choice.md) — why k3s (and not Swarm, full k8s, or Nomad)
|
||||
|
||||
### Part II — Networking
|
||||
|
||||
- [03 — Networking](./03-networking.md) — flannel, CoreDNS, kube-proxy, the overlay story
|
||||
- [04 — Firewall](./04-firewall.md) — every UFW rule on every node, rationale
|
||||
- [13 — Cloudflare](./13-cloudflare.md) — DNS, SSL modes, round-robin origin pool
|
||||
|
||||
### Part III — Security
|
||||
|
||||
- [05 — Security](./05-security.md) — RBAC, Pod Security, secrets, TLS chain
|
||||
- [06 — Traefik Ingress](./06-traefik-ingress.md) — host-network DaemonSet, cert plan
|
||||
|
||||
### Part IV — Workloads
|
||||
|
||||
- [07 — Services](./07-services.md) — api, admin, worker, redis per-service deep dive
|
||||
- [08 — Database](./08-database.md) — Neon Postgres, advisory-lock migrations
|
||||
- [09 — Storage](./09-storage.md) — Backblaze B2, minio-go client details
|
||||
- [10 — Secrets & Config](./10-secrets-config.md) — ConfigMap, Secret, env mapping
|
||||
- [11 — Registry](./11-registry.md) — Gitea container registry, multi-arch builds
|
||||
|
||||
### Part V — Operation
|
||||
|
||||
- [12 — Data Flow](./12-data-flow.md) — end-to-end request lifecycle
|
||||
- [14 — Deployment Process](./14-deployment-process.md) — how to roll new code
|
||||
- [15 — Observability](./15-observability.md) — logs, metrics, tracing
|
||||
- [16 — Failure Modes](./16-failure-modes.md) — what happens when X dies
|
||||
- [17 — Runbook](./17-runbook.md) — common ops tasks
|
||||
|
||||
### Part VI — Context
|
||||
|
||||
- [18 — Cost](./18-cost.md) — what this costs to run, per service
|
||||
- [19 — Swarm Postmortem](./19-postmortem-swarm.md) — the story of why we migrated from Docker Swarm
|
||||
- [20 — Roadmap](./20-roadmap.md) — known TODOs and scaling triggers
|
||||
|
||||
### Appendices
|
||||
|
||||
- [A — Glossary](./appendices/a-glossary.md)
|
||||
- [B — kubectl Cheat Sheet](./appendices/b-commands.md)
|
||||
- [C — File Locations](./appendices/c-file-locations.md)
|
||||
- [D — References & Citations](./appendices/d-references.md)
|
||||
|
||||
## Quick Facts
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Orchestrator | K3s v1.34.6+k3s1 (3 nodes, HA control plane) |
|
||||
| Ingress | Traefik v3 (DaemonSet, hostNetwork) |
|
||||
| Nodes | 3× Hetzner Cloud CX33 (4 vCPU, 8 GB RAM, 80 GB SSD) in `nbg1` (Nuremberg) |
|
||||
| DNS & Edge | Cloudflare (Free plan), SSL=Flexible, round-robin 3 node A records |
|
||||
| Database | Neon Postgres, `ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech` |
|
||||
| Cache + Queue | Redis 7-alpine, in-cluster, 1 replica, PVC-backed, pinned to `nbg1-2` |
|
||||
| Object Storage | Backblaze B2, `honeyDueProd` bucket, `us-east-005` region |
|
||||
| Image Registry | Self-hosted Gitea v1.25.5 at `gitea.treytartt.com` |
|
||||
| Transactional Email | Fastmail SMTP (`smtp.fastmail.com:587`) |
|
||||
| Domains | `api.myhoneydue.com`, `admin.myhoneydue.com`, `myhoneydue.com` |
|
||||
| Monthly Cost (current) | ~$30–40 (3× Hetzner + Neon Launch + B2 + Cloudflare Free + Gitea free) |
|
||||
| kubeconfig | `~/.kube/honeydue-k3s.yaml` on operator workstation |
|
||||
| Repo | `honeyDueAPI-go/deploy-k3s/` for manifests, `deploy/` is the legacy Swarm config |
|
||||
|
||||
## How to Read This Book
|
||||
|
||||
- **"Why did we…?"** answers are in the chapter covering that component. Every
|
||||
major design choice has an explicit rejection of 1–3 alternatives.
|
||||
- **Historical bugs** are in Chapter 19. The rest of the book describes the
|
||||
current (fixed) state; 19 is the forensic record of what was broken and
|
||||
how we figured it out.
|
||||
- **Operator commands** you'll run regularly are in Appendix B. Chapter 17
|
||||
has longer procedures (cert rotation, DB migration, etc.).
|
||||
- **Citations** throughout use footnote-style links to the canonical source
|
||||
(k3s docs, moby issues, Cloudflare docs, etc.). Appendix D collects them.
|
||||
|
||||
## Conventions
|
||||
|
||||
- Kubernetes namespace for the app is `honeydue`.
|
||||
- SSH aliases are `hetzner1`, `hetzner2`, `hetzner3` in your `~/.ssh/config`.
|
||||
- Node hostnames in the cluster are `ubuntu-8gb-nbg1-{1,2,3}` (Hetzner-assigned).
|
||||
- The mapping is non-obvious because the Hetzner hostname suffix order does
|
||||
not match SSH alias order:
|
||||
|
||||
| SSH alias | Public IP | Hostname in k3s |
|
||||
|---|---|---|
|
||||
| hetzner1 | 178.104.247.152 | `ubuntu-8gb-nbg1-2` |
|
||||
| hetzner2 | 178.105.32.198 | `ubuntu-8gb-nbg1-1` |
|
||||
| hetzner3 | 178.104.249.189 | `ubuntu-8gb-nbg1-3` |
|
||||
|
||||
When a chapter refers to "hetzner1" it means the box at 178.104.247.152 / `nbg1-2`.
|
||||
@@ -0,0 +1,207 @@
|
||||
# Appendix A — Glossary
|
||||
|
||||
Alphabetical. Cross-referenced to chapters where each term is used in
|
||||
detail.
|
||||
|
||||
## Kubernetes / k3s
|
||||
|
||||
**ClusterIP**: Internal IP of a Kubernetes Service. Stable; load-
|
||||
balances to backing pods. (Chapter 3)
|
||||
|
||||
**containerd**: Container runtime bundled with k3s. Replaces Docker for
|
||||
the runtime layer. (Chapter 2)
|
||||
|
||||
**ConfigMap**: Kubernetes resource holding non-sensitive config (env
|
||||
vars). Mounted into pods via `envFrom`. (Chapter 10)
|
||||
|
||||
**CoreDNS**: Cluster-internal DNS resolver. Every pod's
|
||||
`/etc/resolv.conf` points to the CoreDNS Service. (Chapter 3)
|
||||
|
||||
**CRD (Custom Resource Definition)**: Kubernetes extension mechanism
|
||||
for third-party resource types. Traefik's `IngressRoute` and
|
||||
`Middleware` are CRDs. (Chapter 6)
|
||||
|
||||
**DaemonSet**: Workload that runs exactly one pod per node. We use it
|
||||
for Traefik so each node has its own ingress pod. (Chapter 6)
|
||||
|
||||
**Deployment**: Kubernetes workload for stateless pods. Supports rolling
|
||||
updates. Most of our services are Deployments. (Chapter 7)
|
||||
|
||||
**Endpoints**: The actual pod IPs backing a Service's ClusterIP.
|
||||
Dynamically updated as pods come and go. (Chapter 3)
|
||||
|
||||
**etcd**: Distributed key-value store holding cluster state. K3s
|
||||
embeds it. Raft-replicated across server nodes. (Chapter 2)
|
||||
|
||||
**Flannel**: Kubernetes CNI (Container Network Interface) plugin for
|
||||
pod-to-pod networking. Uses VXLAN tunneling. (Chapter 3)
|
||||
|
||||
**HPA (HorizontalPodAutoscaler)**: K8s resource that scales Deployment
|
||||
replicas based on CPU/memory usage. Not currently enabled for us.
|
||||
(Chapter 7)
|
||||
|
||||
**Ingress**: K8s resource describing external-to-internal routing rules.
|
||||
Traefik watches Ingresses and programs itself accordingly. (Chapter 6)
|
||||
|
||||
**IPVS**: Linux kernel feature for in-kernel L4 load balancing. Our
|
||||
kube-proxy uses it. (Chapter 3)
|
||||
|
||||
**k3s**: Lightweight Kubernetes distribution by Rancher/SUSE. What we
|
||||
run. (Chapter 2)
|
||||
|
||||
**kubectl**: Kubernetes CLI tool. Runs on operator workstation.
|
||||
(Chapter 17)
|
||||
|
||||
**kubelet**: Agent running on each node, responsible for pod lifecycle.
|
||||
(Chapter 2)
|
||||
|
||||
**kube-proxy**: Service-to-pod routing component. Runs on each node in
|
||||
IPVS mode. (Chapter 3)
|
||||
|
||||
**Namespace**: Kubernetes logical grouping. Our app lives in `honeydue`.
|
||||
System services in `kube-system`. (Chapter 7)
|
||||
|
||||
**NetworkPolicy**: K8s resource defining allowed traffic between pods.
|
||||
Not currently applied. (Chapter 5)
|
||||
|
||||
**Node**: A physical or virtual machine running Kubernetes. We have 3.
|
||||
(Chapter 1)
|
||||
|
||||
**PDB (PodDisruptionBudget)**: Constraint on voluntary pod disruptions
|
||||
(drain, upgrade). Keeps N replicas available. (Chapter 7)
|
||||
|
||||
**Pod**: Smallest Kubernetes unit — one or more containers sharing
|
||||
network and storage. Our pods are usually one-container. (Chapter 7)
|
||||
|
||||
**PVC (PersistentVolumeClaim)**: Request for persistent storage. Redis
|
||||
uses one. (Chapter 7)
|
||||
|
||||
**RBAC**: Role-Based Access Control. Governs who/what can do what via
|
||||
the Kubernetes API. (Chapter 5)
|
||||
|
||||
**ReplicaSet**: Managed by a Deployment; ensures N pods of a template
|
||||
are running. Each deploy creates a new ReplicaSet. (Chapter 14)
|
||||
|
||||
**Secret**: K8s resource holding sensitive values. Base64-encoded;
|
||||
stored in etcd (unencrypted by default). (Chapter 10)
|
||||
|
||||
**Service**: K8s resource providing a stable endpoint (ClusterIP) for
|
||||
a set of pods. (Chapter 3)
|
||||
|
||||
**ServiceAccount**: Identity used by pods to authenticate to the
|
||||
Kubernetes API. We disable token mounting for our app pods.
|
||||
(Chapter 5)
|
||||
|
||||
**Taint / Toleration**: Mechanism to prevent pods from being scheduled
|
||||
on certain nodes. Not used in our setup. (Chapter 7)
|
||||
|
||||
## Docker / Swarm
|
||||
|
||||
**libnetwork**: Docker's networking library. Provides overlay
|
||||
networking for Swarm. Source of the DNS ghost bug (Chapter 19).
|
||||
|
||||
**mode: global**: Swarm deploy mode for services running one pod per
|
||||
node. (Chapter 19)
|
||||
|
||||
**mode: host**: Port publishing mode that binds to node's real
|
||||
interface, bypassing the ingress mesh. (Chapter 4)
|
||||
|
||||
**Overlay network**: Encrypted or unencrypted virtual network spanning
|
||||
Swarm nodes. (Chapter 19)
|
||||
|
||||
**Swarm**: Docker's built-in orchestrator. What we used to run.
|
||||
(Chapter 19)
|
||||
|
||||
**VXLAN**: Virtual Extensible LAN. Layer-2 over Layer-3 tunneling.
|
||||
Used by both Swarm overlay and Kubernetes Flannel. (Chapter 3)
|
||||
|
||||
## Cloudflare
|
||||
|
||||
**Flexible SSL**: CF SSL mode where CF↔origin is HTTP. Our current
|
||||
setup. (Chapter 13)
|
||||
|
||||
**Full (strict) SSL**: CF SSL mode where CF↔origin is HTTPS with cert
|
||||
verification. Our target. (Chapter 13)
|
||||
|
||||
**Origin CA**: CF-internal certificate authority that issues certs CF's
|
||||
edge trusts. Used for Full strict mode. (Chapter 13)
|
||||
|
||||
**POP (Point of Presence)**: A CF edge location. ~300 globally.
|
||||
(Chapter 13)
|
||||
|
||||
**Proxied (orange cloud)**: DNS record with CF proxying on. Traffic
|
||||
goes through CF. (Chapter 13)
|
||||
|
||||
**Workers**: CF's serverless compute at the edge. We don't use yet.
|
||||
(Chapter 20)
|
||||
|
||||
## Hetzner
|
||||
|
||||
**CX33**: Hetzner Cloud instance type. 4 vCPU, 8 GB RAM, 80 GB SSD.
|
||||
(Chapter 1)
|
||||
|
||||
**Cloud Firewall**: Hetzner's provider-level firewall feature. We use
|
||||
UFW on nodes instead. (Chapter 4)
|
||||
|
||||
**nbg1**: Nuremberg datacenter code. Our region. (Chapter 1)
|
||||
|
||||
## Neon
|
||||
|
||||
**Branch**: Neon's isolation primitive. Each project can have multiple
|
||||
branches (prod, staging, dev). (Chapter 8)
|
||||
|
||||
**CU (Compute Unit)**: Neon's pricing unit for compute.
|
||||
(Chapter 8)
|
||||
|
||||
**Launch plan**: Neon's entry-level paid plan. $5 min + usage.
|
||||
(Chapter 8)
|
||||
|
||||
**Pooler**: Neon's built-in PgBouncer instance at the `-pooler` hostname
|
||||
suffix. (Chapter 8)
|
||||
|
||||
## Backblaze B2
|
||||
|
||||
**B2**: Backblaze's object storage. What we use for uploads.
|
||||
(Chapter 9)
|
||||
|
||||
**App key**: B2's bucket-scoped credential. Not an IAM-flavored role.
|
||||
(Chapter 9)
|
||||
|
||||
**S3-compatible**: API that speaks AWS S3 protocol. B2 supports it.
|
||||
(Chapter 9)
|
||||
|
||||
## Go + Asynq
|
||||
|
||||
**AutoMigrate**: GORM function that syncs DB schema to Go structs.
|
||||
(Chapter 8)
|
||||
|
||||
**Asynq**: Go library for background job queues. Redis-backed.
|
||||
(Chapter 7)
|
||||
|
||||
**GORM**: Go ORM we use. (Chapter 8)
|
||||
|
||||
**pgx**: Go Postgres driver used by GORM. (Chapter 8)
|
||||
|
||||
**sync.Once**: Go stdlib primitive for "run this exactly once." Source
|
||||
of bug #6 (Chapter 19).
|
||||
|
||||
## Other
|
||||
|
||||
**advisory lock**: A Postgres lock that doesn't block rows but lets
|
||||
apps coordinate voluntarily. We use for migration serialization.
|
||||
(Chapter 8)
|
||||
|
||||
**AOF (Append-Only File)**: Redis persistence mode that logs every
|
||||
write. (Chapter 7)
|
||||
|
||||
**MTU**: Maximum Transmission Unit. Packet size limit. VXLAN reduces
|
||||
effective MTU by 50 bytes. (Chapter 3)
|
||||
|
||||
**Raft**: Consensus algorithm. Used by etcd. (Chapter 2)
|
||||
|
||||
**STARTTLS**: SMTP upgrade from plain to TLS. Used for Fastmail.
|
||||
(Chapter 5)
|
||||
|
||||
**UFW**: Uncomplicated Firewall. Frontend for iptables. (Chapter 4)
|
||||
|
||||
**VXLAN**: See Docker/Swarm section.
|
||||
@@ -0,0 +1,305 @@
|
||||
# Appendix B — kubectl Cheat Sheet
|
||||
|
||||
Specific to this deployment. Assumes:
|
||||
|
||||
```bash
|
||||
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
|
||||
```
|
||||
|
||||
## Viewing state
|
||||
|
||||
```bash
|
||||
# All pods in our namespace
|
||||
kubectl get pods -n honeydue
|
||||
|
||||
# With node placement + IPs
|
||||
kubectl get pods -n honeydue -o wide
|
||||
|
||||
# All resources in our namespace
|
||||
kubectl get all -n honeydue
|
||||
|
||||
# Cluster-wide pod overview
|
||||
kubectl get pods -A
|
||||
|
||||
# Node health
|
||||
kubectl get nodes
|
||||
kubectl top nodes
|
||||
|
||||
# What's using RAM
|
||||
kubectl top pods -n honeydue --sort-by=memory
|
||||
|
||||
# What's using CPU
|
||||
kubectl top pods -n honeydue --sort-by=cpu
|
||||
```
|
||||
|
||||
## Logs
|
||||
|
||||
```bash
|
||||
# Follow all api pod logs
|
||||
kubectl logs -n honeydue -l app.kubernetes.io/name=api -f --prefix
|
||||
|
||||
# One specific pod
|
||||
kubectl logs -n honeydue <pod-name>
|
||||
|
||||
# Previous pod's logs (after crash)
|
||||
kubectl logs -n honeydue <pod-name> --previous
|
||||
|
||||
# Filtered
|
||||
kubectl logs -n honeydue deploy/api | grep -i error
|
||||
kubectl logs -n honeydue deploy/api --since=1h
|
||||
|
||||
# stern is nicer for multi-pod (if installed)
|
||||
stern -n honeydue api
|
||||
```
|
||||
|
||||
## Deploying new code
|
||||
|
||||
```bash
|
||||
SHA=$(git rev-parse --short HEAD)
|
||||
|
||||
# Build + push (requires docker login to Gitea first)
|
||||
docker buildx build --platform linux/amd64 --target api \
|
||||
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .
|
||||
|
||||
# Roll it in
|
||||
kubectl set image deployment/api -n honeydue \
|
||||
api="gitea.treytartt.com/admin/honeydue-api:${SHA}"
|
||||
|
||||
# Watch
|
||||
kubectl rollout status -n honeydue deployment/api
|
||||
```
|
||||
|
||||
## Rolling update controls
|
||||
|
||||
```bash
|
||||
# Pause a rollout in progress (new pods stop being created)
|
||||
kubectl rollout pause deployment/api -n honeydue
|
||||
|
||||
# Resume
|
||||
kubectl rollout resume deployment/api -n honeydue
|
||||
|
||||
# Rollback to previous version
|
||||
kubectl rollout undo deployment/api -n honeydue
|
||||
|
||||
# Rollback to specific revision
|
||||
kubectl rollout history deployment/api -n honeydue
|
||||
kubectl rollout undo deployment/api -n honeydue --to-revision=3
|
||||
|
||||
# Force restart (re-pulls image if digest changed; reloads ConfigMap)
|
||||
kubectl rollout restart deployment/api -n honeydue
|
||||
```
|
||||
|
||||
## Scaling
|
||||
|
||||
```bash
|
||||
# Scale up
|
||||
kubectl scale deployment/api -n honeydue --replicas=5
|
||||
|
||||
# Scale down
|
||||
kubectl scale deployment/api -n honeydue --replicas=3
|
||||
|
||||
# Kill everything (emergency)
|
||||
kubectl scale deployment -n honeydue --all --replicas=0
|
||||
|
||||
# Bring back
|
||||
kubectl scale deployment/api -n honeydue --replicas=3
|
||||
kubectl scale deployment/admin deployment/worker deployment/redis -n honeydue --replicas=1
|
||||
```
|
||||
|
||||
## Debugging a pod
|
||||
|
||||
```bash
|
||||
# Describe = events + state + restart history
|
||||
kubectl describe pod -n honeydue <pod-name>
|
||||
|
||||
# Shell in
|
||||
kubectl exec -it -n honeydue deploy/api -- /bin/sh
|
||||
|
||||
# Inside:
|
||||
# Test HTTP locally (bypasses Traefik, Service, overlay)
|
||||
wget -qO- http://127.0.0.1:8000/api/health/
|
||||
|
||||
# Test cross-Service DNS
|
||||
getent hosts redis
|
||||
getent hosts admin
|
||||
getent hosts postgres
|
||||
|
||||
# Run arbitrary command (one-shot)
|
||||
kubectl exec -n honeydue deploy/api -- env | grep POSTGRES
|
||||
```
|
||||
|
||||
## Networking checks
|
||||
|
||||
```bash
|
||||
# Resolve a Service from a pod
|
||||
kubectl exec -n honeydue deploy/api -- nslookup redis
|
||||
|
||||
# Check Service endpoints (the actual IPs behind a ClusterIP)
|
||||
kubectl get endpoints -n honeydue api
|
||||
|
||||
# Traffic test via Service
|
||||
kubectl run test --rm -it --image=alpine/curl -- sh
|
||||
# curl http://api.honeydue.svc:8000/api/health/
|
||||
|
||||
# List all Ingresses
|
||||
kubectl get ingress -A
|
||||
```
|
||||
|
||||
## Secret / Config
|
||||
|
||||
```bash
|
||||
# List
|
||||
kubectl get secrets -n honeydue
|
||||
kubectl get configmap -n honeydue
|
||||
|
||||
# Describe (shows keys, not values)
|
||||
kubectl describe secret honeydue-secrets -n honeydue
|
||||
|
||||
# Read a value (DANGER: plaintext to stdout)
|
||||
kubectl get secret honeydue-secrets -n honeydue \
|
||||
-o jsonpath='{.data.POSTGRES_PASSWORD}' | base64 -d; echo
|
||||
|
||||
# Update a single secret key
|
||||
kubectl patch secret honeydue-secrets -n honeydue \
|
||||
--type=merge -p "{\"data\":{\"SECRET_KEY\":\"$(echo -n 'new-val' | base64)\"}}"
|
||||
|
||||
# Regenerate ConfigMap from prod.env
|
||||
kubectl create configmap honeydue-config -n honeydue \
|
||||
--from-env-file=deploy/prod.env \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
# Edit a ConfigMap interactively (does NOT restart pods)
|
||||
kubectl edit configmap honeydue-config -n honeydue
|
||||
```
|
||||
|
||||
## Node management
|
||||
|
||||
```bash
|
||||
# Prevent scheduling on a node
|
||||
kubectl cordon <node-hostname>
|
||||
|
||||
# Prevent scheduling + evict existing pods
|
||||
kubectl drain <node-hostname> --ignore-daemonsets --delete-emptydir-data
|
||||
|
||||
# Allow scheduling again
|
||||
kubectl uncordon <node-hostname>
|
||||
|
||||
# Label a node
|
||||
kubectl label node <node-hostname> honeydue/redis=true --overwrite
|
||||
|
||||
# Remove a label
|
||||
kubectl label node <node-hostname> honeydue/redis-
|
||||
```
|
||||
|
||||
## Events (the timeline)
|
||||
|
||||
```bash
|
||||
# All events, newest last
|
||||
kubectl get events -A --sort-by=.lastTimestamp
|
||||
|
||||
# Watch live
|
||||
kubectl get events -A --sort-by=.lastTimestamp -w
|
||||
|
||||
# Only warnings
|
||||
kubectl get events -A --field-selector type=Warning
|
||||
|
||||
# Events for a specific pod
|
||||
kubectl describe pod -n honeydue <pod> | awk '/Events:/,0'
|
||||
```
|
||||
|
||||
## Traefik-specific
|
||||
|
||||
```bash
|
||||
# All Traefik pods (DaemonSet, so one per node)
|
||||
kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik -o wide
|
||||
|
||||
# Restart Traefik across all nodes
|
||||
kubectl rollout restart daemonset/traefik -n kube-system
|
||||
|
||||
# View Traefik config (via ConfigMap)
|
||||
kubectl get cm -n kube-system traefik -o yaml | less
|
||||
|
||||
# See the HelmChartConfig we applied
|
||||
kubectl get helmchartconfig -n kube-system traefik -o yaml
|
||||
|
||||
# Force Helm re-reconcile
|
||||
kubectl delete job -n kube-system helm-install-traefik
|
||||
```
|
||||
|
||||
## Cluster-wide operations
|
||||
|
||||
```bash
|
||||
# API server health
|
||||
kubectl cluster-info
|
||||
|
||||
# All namespaces
|
||||
kubectl get namespaces
|
||||
|
||||
# All k3s-system pods
|
||||
kubectl get pods -n kube-system
|
||||
|
||||
# All ServiceAccounts in our namespace
|
||||
kubectl get sa -n honeydue
|
||||
|
||||
# Check what an SA can do
|
||||
kubectl auth can-i --list --as=system:serviceaccount:honeydue:api
|
||||
```
|
||||
|
||||
## Hetzner SSH (not kubectl but oft needed)
|
||||
|
||||
```bash
|
||||
# SSH in
|
||||
ssh -i ~/.ssh/hetzner deploy@hetzner1
|
||||
|
||||
# Check k3s service
|
||||
ssh -i ~/.ssh/hetzner deploy@hetzner1 'sudo systemctl status k3s'
|
||||
|
||||
# Per-node commands in parallel (e.g., apt upgrade)
|
||||
for h in hetzner1 hetzner2 hetzner3; do
|
||||
ssh -i ~/.ssh/hetzner "deploy@$h" 'sudo apt update && sudo apt upgrade -y'
|
||||
done
|
||||
```
|
||||
|
||||
## Emergency: cluster is wedged
|
||||
|
||||
```bash
|
||||
# Check all nodes Ready
|
||||
kubectl get nodes
|
||||
|
||||
# If one is NotReady
|
||||
ssh -i ~/.ssh/hetzner deploy@<node> 'sudo systemctl restart k3s'
|
||||
|
||||
# If still bad, kill k3s on that node and check
|
||||
ssh -i ~/.ssh/hetzner deploy@<node> 'sudo /usr/local/bin/k3s-killall.sh'
|
||||
ssh -i ~/.ssh/hetzner deploy@<node> 'sudo systemctl start k3s'
|
||||
|
||||
# Last resort: uninstall + rejoin
|
||||
# ssh -i ~/.ssh/hetzner deploy@<node> 'sudo /usr/local/bin/k3s-uninstall.sh'
|
||||
# then re-join via the k3s install command
|
||||
```
|
||||
|
||||
## One-liners worth memorizing
|
||||
|
||||
```bash
|
||||
# Heavy smoke test through CF
|
||||
for url in https://api.myhoneydue.com/api/health/ https://admin.myhoneydue.com/ https://myhoneydue.com/; do
|
||||
ok=0
|
||||
for i in $(seq 1 20); do
|
||||
[[ "$(curl -sS -o /dev/null -w '%{http_code}' --max-time 10 "$url")" == "200" ]] && ok=$((ok+1))
|
||||
done
|
||||
printf "%-45s %d/20\n" "$url" "$ok"
|
||||
done
|
||||
|
||||
# Pods not ready
|
||||
kubectl get pods -A | awk '$3!="Running" && $3!="Completed" && $3!="STATUS"'
|
||||
|
||||
# Restart everything in our namespace
|
||||
for d in api admin worker redis; do
|
||||
kubectl rollout restart deploy/$d -n honeydue
|
||||
done
|
||||
|
||||
# Watch all rollouts simultaneously
|
||||
for d in api admin worker redis; do
|
||||
kubectl rollout status deploy/$d -n honeydue &
|
||||
done; wait
|
||||
```
|
||||
@@ -0,0 +1,216 @@
|
||||
# Appendix C — File Locations
|
||||
|
||||
Complete map of where every significant file lives — on the operator
|
||||
workstation, in the git repo, and on the Hetzner nodes.
|
||||
|
||||
## Operator workstation
|
||||
|
||||
### Kubernetes
|
||||
|
||||
| Path | Purpose |
|
||||
|---|---|
|
||||
| `~/.kube/honeydue-k3s.yaml` | kubeconfig for the k3s cluster. Contains an admin bearer token. Mode 0600. |
|
||||
| `~/.kube/config` | Default kubeconfig (points elsewhere, not our cluster). |
|
||||
|
||||
Set `KUBECONFIG=~/.kube/honeydue-k3s.yaml` before any `kubectl` command.
|
||||
|
||||
### SSH
|
||||
|
||||
| Path | Purpose |
|
||||
|---|---|
|
||||
| `~/.ssh/hetzner` | Private key for node SSH (ed25519). Mode 0600. |
|
||||
| `~/.ssh/hetzner.pub` | Public key corresponding to above. |
|
||||
| `~/.ssh/config` | Host aliases for hetzner1/hetzner2/hetzner3 → node IPs. |
|
||||
|
||||
Public key content:
|
||||
```
|
||||
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBU9xTTBD78tYUqHijgyU9PDqtmS4NuM/6uy8XgDzva+ hetzner2@myhoneydue.com
|
||||
```
|
||||
|
||||
### Docker
|
||||
|
||||
| Path | Purpose |
|
||||
|---|---|
|
||||
| `~/.docker/config.json` | Docker CLI config. After `docker login` to Gitea, contains creds. **Log out after each deploy** to not leave PATs on disk. |
|
||||
| `~/Library/Containers/com.docker.docker/` | Docker Desktop state (macOS). |
|
||||
|
||||
## Git repo (`/Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go/`)
|
||||
|
||||
### Top-level
|
||||
|
||||
| Path | Purpose |
|
||||
|---|---|
|
||||
| `CLAUDE.md` | Project-wide instructions for Claude assistant. Never commit secrets here. |
|
||||
| `Dockerfile` | Multi-stage Docker build: api, worker, admin targets. |
|
||||
| `go.mod`, `go.sum` | Go module definition. |
|
||||
| `package.json` (admin-ui/) | Next.js dependencies. |
|
||||
|
||||
### Application code
|
||||
|
||||
| Path | Purpose |
|
||||
|---|---|
|
||||
| `cmd/api/main.go` | API server entry point. |
|
||||
| `cmd/worker/main.go` | Background worker entry point. |
|
||||
| `cmd/admin/main.go` | (may or may not exist for Go admin variant) |
|
||||
| `internal/config/` | Viper configuration loading. |
|
||||
| `internal/database/` | Postgres connection, migrations. |
|
||||
| `internal/handlers/` | HTTP handlers (one file per domain). |
|
||||
| `internal/services/` | Business logic. `cache_service.go` is where the sync.Once bug was (Chapter 19). |
|
||||
| `internal/repositories/` | GORM repositories. |
|
||||
| `internal/router/router.go` | Echo routes, including static file serving. CSP is set here. |
|
||||
| `internal/middleware/` | Echo middleware (auth, logging, etc.). |
|
||||
| `internal/task/` | Task predicates/scopes/categorization. See `docs/TASK_LOGIC_ARCHITECTURE.md`. |
|
||||
|
||||
### Deploy config (Swarm era — still exists, unused)
|
||||
|
||||
| Path | Purpose |
|
||||
|---|---|
|
||||
| `deploy/` | Legacy Swarm deploy root. |
|
||||
| `deploy/prod.env` | Non-secret config (ConfigMap source). **Gitignored.** |
|
||||
| `deploy/registry.env` | Gitea PAT + registry URL. **Gitignored.** |
|
||||
| `deploy/cluster.env` | Swarm cluster settings. Partly used for k3s too (manager host). **Gitignored.** |
|
||||
| `deploy/secrets/postgres_password.txt` | Neon password. **Gitignored.** |
|
||||
| `deploy/secrets/secret_key.txt` | App signing key (≥32 chars). **Gitignored.** |
|
||||
| `deploy/secrets/email_host_password.txt` | Fastmail password. **Gitignored.** |
|
||||
| `deploy/secrets/fcm_server_key.txt` | FCM key (placeholder, push off). **Gitignored.** |
|
||||
| `deploy/secrets/apns_auth_key.p8` | APNs key (placeholder, push off). **Gitignored.** |
|
||||
| `deploy/swarm-stack.prod.yml` | Swarm stack definition. Unused after migration. |
|
||||
| `deploy/Caddyfile` | Caddy config. Unused after migration. |
|
||||
| `deploy/scripts/deploy_prod.sh` | Swarm deploy script. Unused. |
|
||||
| `deploy/DEPLOYING.md`, `deploy/README.md`, `deploy/shit_deploy_cant_do.md` | Swarm-era docs. Historical reference. |
|
||||
|
||||
### Deploy config (k3s)
|
||||
|
||||
| Path | Purpose |
|
||||
|---|---|
|
||||
| `deploy-k3s/README.md` | k3s deployment README (scaffold version). |
|
||||
| `deploy-k3s/MIGRATION_NOTES.md` | Notes from Swarm → k3s migration. |
|
||||
| `deploy-k3s/SECURITY.md` | Security posture doc (scaffold). |
|
||||
| `deploy-k3s/config.yaml.example` | Template for a unified config.yaml (unused — we kept Swarm's file layout). |
|
||||
| `deploy-k3s/manifests/namespace.yaml` | Creates `honeydue` namespace. |
|
||||
| `deploy-k3s/manifests/rbac.yaml` | ServiceAccounts + `automountServiceAccountToken: false`. |
|
||||
| `deploy-k3s/manifests/pod-disruption-budgets.yaml` | PDBs for api (2/3) and worker (0/1). |
|
||||
| `deploy-k3s/manifests/network-policies.yaml` | Default-deny + allows. NOT currently applied. |
|
||||
| `deploy-k3s/manifests/api/deployment.yaml` | api Deployment. |
|
||||
| `deploy-k3s/manifests/api/service.yaml` | api ClusterIP Service. |
|
||||
| `deploy-k3s/manifests/api/hpa.yaml` | api HorizontalPodAutoscaler. NOT currently applied. |
|
||||
| `deploy-k3s/manifests/admin/deployment.yaml` | admin Deployment. |
|
||||
| `deploy-k3s/manifests/admin/service.yaml` | admin Service. |
|
||||
| `deploy-k3s/manifests/worker/deployment.yaml` | worker Deployment. |
|
||||
| `deploy-k3s/manifests/redis/deployment.yaml` | Redis Deployment. |
|
||||
| `deploy-k3s/manifests/redis/service.yaml` | Redis Service. |
|
||||
| `deploy-k3s/manifests/redis/pvc.yaml` | Redis PersistentVolumeClaim. |
|
||||
| `deploy-k3s/manifests/ingress/ingress.yaml` | Full Ingress with TLS + middleware (scaffold; needs CF origin cert). |
|
||||
| `deploy-k3s/manifests/ingress/ingress-simple.yaml` | Simple Ingress without TLS (what we actually apply). |
|
||||
| `deploy-k3s/manifests/ingress/middleware.yaml` | Traefik middleware CRDs. Not currently applied. |
|
||||
| `deploy-k3s/manifests/traefik-helmchartconfig.yaml` | Our DaemonSet + hostNetwork override for Traefik. |
|
||||
| `deploy-k3s/manifests/secrets.yaml.example` | Template (never deployed). |
|
||||
| `deploy-k3s/scripts/01-provision-cluster.sh` | hetzner-k3s provisioning (we didn't use it; existing nodes). |
|
||||
| `deploy-k3s/scripts/02-setup-secrets.sh` | Creates Secrets + ConfigMap (scaffold version; we ran commands manually). |
|
||||
| `deploy-k3s/scripts/03-deploy.sh` | Applies manifests (unused; we ran kubectl manually). |
|
||||
| `deploy-k3s/scripts/04-verify.sh` | Post-deploy verification. |
|
||||
| `deploy-k3s/scripts/rollback.sh` | Rollback helper. |
|
||||
|
||||
### Documentation
|
||||
|
||||
| Path | Purpose |
|
||||
|---|---|
|
||||
| `docs/deployment/` | **This book.** |
|
||||
| `docs/TASK_LOGIC_ARCHITECTURE.md` | Task logic internals. |
|
||||
| `docs/PUSH_NOTIFICATIONS.md` | Push notifications setup (for future). |
|
||||
| `docs/SUBSCRIPTION_WEBHOOKS.md` | Apple/Google subscription webhooks. |
|
||||
| `docs/Dokku_notes` | Pre-Swarm era deployment notes. Historical. |
|
||||
| `docs/server_2026_2_24.md` | Earlier architecture doc (predates k3s migration). |
|
||||
|
||||
## On the Hetzner nodes
|
||||
|
||||
### System
|
||||
|
||||
| Path | Purpose |
|
||||
|---|---|
|
||||
| `/etc/ssh/sshd_config` | SSH config — `PermitRootLogin no`, `PasswordAuthentication no`, `AllowUsers deploy`. |
|
||||
| `/etc/sudoers.d/deploy` | `deploy ALL=(ALL) NOPASSWD: ALL`. |
|
||||
| `/etc/ufw/` | UFW configuration. See Chapter 4 for rule inventory. |
|
||||
| `/etc/sysctl.d/99-unprivileged-ports.conf` | `net.ipv4.ip_unprivileged_port_start=0` for Traefik. |
|
||||
| `/home/deploy/.ssh/authorized_keys` | Our hetzner.pub. |
|
||||
|
||||
### K3s
|
||||
|
||||
| Path | Purpose |
|
||||
|---|---|
|
||||
| `/etc/rancher/k3s/k3s.yaml` | Kubeconfig (localhost-scoped; we copied to workstation). |
|
||||
| `/etc/systemd/system/k3s.service` | systemd service file. |
|
||||
| `/etc/systemd/system/k3s.service.env` | K3s install args (INSTALL_K3S_EXEC). |
|
||||
| `/var/lib/rancher/k3s/` | K3s state root (etcd, containerd, PVC storage). |
|
||||
| `/var/lib/rancher/k3s/server/node-token` | Token for joining additional nodes. |
|
||||
| `/var/lib/rancher/k3s/storage/` | local-path PVC storage. Redis data lives here. |
|
||||
| `/var/lib/rancher/k3s/agent/containerd/` | containerd state. |
|
||||
| `/var/log/containers/` | Container log files. |
|
||||
|
||||
### Commands installed
|
||||
|
||||
| Path | Purpose |
|
||||
|---|---|
|
||||
| `/usr/local/bin/k3s` | The k3s binary. |
|
||||
| `/usr/local/bin/kubectl` | Symlink to k3s (CLI for this cluster). |
|
||||
| `/usr/local/bin/crictl` | containerd CLI. |
|
||||
| `/usr/local/bin/k3s-killall.sh` | Emergency kill-all-k3s script. |
|
||||
| `/usr/local/bin/k3s-uninstall.sh` | Clean uninstall script. |
|
||||
|
||||
### Docker (legacy; disabled)
|
||||
|
||||
| Path | Purpose |
|
||||
|---|---|
|
||||
| `/etc/systemd/system/docker.service` | systemd unit (stopped + disabled). |
|
||||
| `/var/lib/docker/` | Docker state (unused on current cluster). |
|
||||
|
||||
## On Cloudflare
|
||||
|
||||
Not a filesystem, but worth noting the dashboard hierarchy:
|
||||
|
||||
```
|
||||
Websites → myhoneydue.com
|
||||
├── DNS → Records (A records for api, admin, @)
|
||||
├── SSL/TLS → Overview (SSL mode: Flexible)
|
||||
├── SSL/TLS → Edge Certificates (Always Use HTTPS: On)
|
||||
├── SSL/TLS → Origin Server (would live the Origin CA cert if we enabled it)
|
||||
├── Rules → Overview (where Origin Rules live if we had them)
|
||||
├── Rules → Page Rules (none)
|
||||
├── Security → WAF (managed rules only)
|
||||
├── Speed → Optimization (default)
|
||||
└── Analytics & Logs (read-only stats)
|
||||
```
|
||||
|
||||
## On Gitea (`gitea.treytartt.com`)
|
||||
|
||||
The image registry lives at:
|
||||
|
||||
```
|
||||
gitea.treytartt.com/admin/-/packages # UI listing of all packages
|
||||
gitea.treytartt.com/admin/-/packages/container/honeydue-api # API image
|
||||
gitea.treytartt.com/admin/-/packages/container/honeydue-worker # Worker image
|
||||
gitea.treytartt.com/admin/-/packages/container/honeydue-admin # Admin image
|
||||
```
|
||||
|
||||
Per-version tags visible in the UI with `docker pull` commands.
|
||||
|
||||
PATs at `gitea.treytartt.com/-/user/settings/applications`.
|
||||
|
||||
## On Neon
|
||||
|
||||
```
|
||||
console.neon.tech → project → Branches (production branch default)
|
||||
console.neon.tech → project → Monitoring (CU-hour usage, slow queries)
|
||||
console.neon.tech → project → Operations (history of schema changes)
|
||||
```
|
||||
|
||||
Connection strings at `console.neon.tech → project → Connection Details`.
|
||||
|
||||
## On Backblaze B2
|
||||
|
||||
```
|
||||
secure.backblaze.com/b2_buckets.htm # Buckets list
|
||||
secure.backblaze.com/b2_app_keys.htm # App keys
|
||||
```
|
||||
|
||||
`honeyDueProd` bucket → Files tab for browsing contents.
|
||||
@@ -0,0 +1,202 @@
|
||||
# Appendix D — References & Citations
|
||||
|
||||
Every external link cited anywhere in this book, grouped by topic.
|
||||
|
||||
## Docker / Moby
|
||||
|
||||
- [moby/moby#52265 — Overlay ARP stale entries on 29.3.0 regression][moby-52265] (Chapter 19, primary root-cause citation)
|
||||
- [moby/moby#51491 — DNS broken after `docker swarm init` on 29.0.0][moby-51491]
|
||||
- [Dokploy#3480 — Traefik routes intermittently timeout due to stale VIP][dokploy-3480]
|
||||
- [Mirantis: Commits to Long-Term Support for Swarm Through 2030][mirantis-swarm]
|
||||
- [Better Stack: Hetzner Cloud Review 2026][bstack-swarm]
|
||||
- [VirtualizationHowTo: Is Docker Swarm Still Safe in 2026?][vht-swarm]
|
||||
- [bleevht: Where Docker Swarm Still Fits in 2026][bleevht-swarm]
|
||||
- [Docker buildx multi-platform builds][buildx]
|
||||
- [Compose specification][compose-spec]
|
||||
|
||||
## Kubernetes / k3s
|
||||
|
||||
- [K3s documentation home][k3s-docs]
|
||||
- [K3s architecture][k3s-arch]
|
||||
- [K3s requirements (networking ports)][k3s-reqs]
|
||||
- [K3s advanced config — metrics server][k3s-metrics]
|
||||
- [K3s HA datastore recovery][k3s-ha-recovery]
|
||||
- [K3s storage — local-path provisioner][k3s-lp]
|
||||
- [K3s Helm integration — HelmChartConfig][k3s-helm]
|
||||
- [K3s Traefik customization][k3s-traefik]
|
||||
- [K3s secrets encryption][k3s-secrets]
|
||||
- [Kubernetes concepts — Services & Networking][k8s-net]
|
||||
- [Kubernetes Ingress][k8s-ingress]
|
||||
- [Kubernetes Deployments — rolling updates][rolling]
|
||||
- [kubectl rollout][rollout]
|
||||
- [kubectl cheat sheet][kubectl-cs]
|
||||
- [Pod lifecycle + probes][probes]
|
||||
- [Pod Security Standards][psa]
|
||||
- [Kubernetes RBAC][rbac]
|
||||
- [NetworkPolicy][netpol]
|
||||
- [Ports and Protocols reference][k8s-ports]
|
||||
- [metrics-server][ms]
|
||||
|
||||
## Traefik
|
||||
|
||||
- [Traefik v3 documentation][traefik]
|
||||
- [Traefik Swarm provider][traefik-swarm]
|
||||
- [Traefik migrate v2 → v3][traefik-v3]
|
||||
|
||||
## Cloudflare
|
||||
|
||||
- [IP ranges][cf-ips]
|
||||
- [SSL modes explained][cf-ssl]
|
||||
- [Origin CA certificates][cf-origin-ca]
|
||||
- [DNS best practices][cf-dns]
|
||||
- [Free plan][cf-free]
|
||||
|
||||
## Hetzner
|
||||
|
||||
- [Hetzner Cloud][hetzner-cloud]
|
||||
- [Hetzner price adjustment 2026-04-01][hetzner-prices]
|
||||
- [Hetzner rescue system][hetzner-rescue]
|
||||
- [hetzner-k3s tool][hetzner-k3s]
|
||||
|
||||
## Neon / Postgres
|
||||
|
||||
- [Neon docs][neon-docs]
|
||||
- [Neon pricing][neon-pricing]
|
||||
- [Neon usage-based pricing announcement][neon-blog]
|
||||
- [Neon connect from any app][neon-connect]
|
||||
- [Postgres advisory locks][pg-locks]
|
||||
- [GORM AutoMigrate][gorm-automigrate]
|
||||
|
||||
## Backblaze B2
|
||||
|
||||
- [B2 documentation][b2-docs]
|
||||
- [B2 S3-compatible API][b2-s3]
|
||||
- [B2 pricing][b2-pricing]
|
||||
- [minio-go SDK][minio-go]
|
||||
- [S3 path-style vs virtual-hosted addressing][s3-style]
|
||||
|
||||
## Gitea
|
||||
|
||||
- [Gitea container registry docs][gitea-cr]
|
||||
|
||||
## CNI / Networking
|
||||
|
||||
- [Flannel VXLAN backend][flannel-vxlan]
|
||||
- [CoreDNS Kubernetes plugin][coredns-k8s]
|
||||
- [IPVS mode for kube-proxy deep dive][ipvs]
|
||||
- [VXLAN RFC 7348][vxlan-rfc]
|
||||
- [Kubernetes NetworkPolicy][netpol]
|
||||
|
||||
## Security tools
|
||||
|
||||
- [cosign (image signing)][cosign]
|
||||
- [Loki (logs)][loki]
|
||||
- [Stern (multi-pod log tailing)][stern]
|
||||
- [fail2ban][fail2ban]
|
||||
|
||||
## Asynq
|
||||
|
||||
- [Asynq documentation][asynq]
|
||||
- [Asynq periodic tasks (scheduler limitations)][asynq-sched]
|
||||
|
||||
## Miscellaneous
|
||||
|
||||
- [Let's Encrypt][le]
|
||||
- [UFW man page][ufw-man]
|
||||
- [SSH hardening guide][ssh-guide]
|
||||
- [pg_dump][pg-dump]
|
||||
|
||||
---
|
||||
|
||||
## Link definitions
|
||||
|
||||
<!-- Docker / Moby -->
|
||||
[moby-52265]: https://github.com/moby/moby/issues/52265
|
||||
[moby-51491]: https://github.com/moby/moby/issues/51491
|
||||
[dokploy-3480]: https://github.com/Dokploy/dokploy/issues/3480
|
||||
[mirantis-swarm]: https://www.mirantis.com/blog/mirantis-guarantees-long-term-support-for-swarm/
|
||||
[bstack-swarm]: https://betterstack.com/community/guides/web-servers/hetzner-cloud-review/
|
||||
[vht-swarm]: https://www.virtualizationhowto.com/2026/03/is-docker-swarm-still-safe-in-2026/
|
||||
[bleevht-swarm]: https://bleevht.substack.com/p/where-docker-swarm-still-fits-in
|
||||
[buildx]: https://docs.docker.com/build/buildx/
|
||||
[compose-spec]: https://docs.docker.com/reference/compose-file/
|
||||
|
||||
<!-- Kubernetes / k3s -->
|
||||
[k3s-docs]: https://docs.k3s.io/
|
||||
[k3s-arch]: https://docs.k3s.io/architecture
|
||||
[k3s-reqs]: https://docs.k3s.io/installation/requirements#networking
|
||||
[k3s-metrics]: https://docs.k3s.io/advanced#enabling-metrics-server
|
||||
[k3s-ha-recovery]: https://docs.k3s.io/datastore/ha-embedded#new-cluster-with-embedded-db
|
||||
[k3s-lp]: https://docs.k3s.io/storage#setting-up-the-local-storage-provider
|
||||
[k3s-helm]: https://docs.k3s.io/helm#customizing-packaged-components-with-helmchartconfig
|
||||
[k3s-traefik]: https://docs.k3s.io/networking/networking-services#traefik-ingress-controller
|
||||
[k3s-secrets]: https://docs.k3s.io/security/secrets-encryption
|
||||
[k8s-net]: https://kubernetes.io/docs/concepts/services-networking/
|
||||
[k8s-ingress]: https://kubernetes.io/docs/concepts/services-networking/ingress/
|
||||
[rolling]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment
|
||||
[rollout]: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#rollout
|
||||
[kubectl-cs]: https://kubernetes.io/docs/reference/kubectl/cheatsheet/
|
||||
[probes]: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-lifecycle
|
||||
[psa]: https://kubernetes.io/docs/concepts/security/pod-security-standards/
|
||||
[rbac]: https://kubernetes.io/docs/reference/access-authn-authz/rbac/
|
||||
[netpol]: https://kubernetes.io/docs/concepts/services-networking/network-policies/
|
||||
[k8s-ports]: https://kubernetes.io/docs/reference/networking/ports-and-protocols/
|
||||
[ms]: https://github.com/kubernetes-sigs/metrics-server
|
||||
|
||||
<!-- Traefik -->
|
||||
[traefik]: https://doc.traefik.io/traefik/v3.6/
|
||||
[traefik-swarm]: https://doc.traefik.io/traefik/providers/swarm/
|
||||
[traefik-v3]: https://doc.traefik.io/traefik/migrate/v2-to-v3-details/
|
||||
|
||||
<!-- Cloudflare -->
|
||||
[cf-ips]: https://www.cloudflare.com/ips/
|
||||
[cf-ssl]: https://developers.cloudflare.com/ssl/origin-configuration/ssl-modes/
|
||||
[cf-origin-ca]: https://developers.cloudflare.com/ssl/origin-configuration/origin-ca/
|
||||
[cf-dns]: https://developers.cloudflare.com/dns/
|
||||
[cf-free]: https://www.cloudflare.com/plans/free/
|
||||
|
||||
<!-- Hetzner -->
|
||||
[hetzner-cloud]: https://www.hetzner.com/cloud/
|
||||
[hetzner-prices]: https://docs.hetzner.com/general/infrastructure-and-availability/price-adjustment/
|
||||
[hetzner-rescue]: https://docs.hetzner.com/cloud/servers/getting-started/enabling-rescue-system/
|
||||
[hetzner-k3s]: https://github.com/vitobotta/hetzner-k3s
|
||||
|
||||
<!-- Neon / Postgres -->
|
||||
[neon-docs]: https://neon.com/docs/introduction
|
||||
[neon-pricing]: https://neon.com/pricing
|
||||
[neon-blog]: https://neon.com/blog/new-usage-based-pricing
|
||||
[neon-connect]: https://neon.com/docs/connect/connect-from-any-app
|
||||
[pg-locks]: https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS
|
||||
[gorm-automigrate]: https://gorm.io/docs/migration.html
|
||||
|
||||
<!-- B2 -->
|
||||
[b2-docs]: https://www.backblaze.com/docs/
|
||||
[b2-s3]: https://www.backblaze.com/docs/cloud-storage-s3-compatible-api
|
||||
[b2-pricing]: https://www.backblaze.com/cloud-storage/pricing
|
||||
[minio-go]: https://github.com/minio/minio-go
|
||||
[s3-style]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html
|
||||
|
||||
<!-- Gitea -->
|
||||
[gitea-cr]: https://docs.gitea.com/usage/packages/container
|
||||
|
||||
<!-- CNI -->
|
||||
[flannel-vxlan]: https://github.com/flannel-io/flannel/blob/master/Documentation/backends.md#vxlan
|
||||
[coredns-k8s]: https://coredns.io/plugins/kubernetes/
|
||||
[ipvs]: https://kubernetes.io/blog/2018/07/09/ipvs-based-in-cluster-load-balancing-deep-dive/
|
||||
[vxlan-rfc]: https://datatracker.ietf.org/doc/html/rfc7348
|
||||
|
||||
<!-- Security tools -->
|
||||
[cosign]: https://github.com/sigstore/cosign
|
||||
[loki]: https://grafana.com/oss/loki/
|
||||
[stern]: https://github.com/stern/stern
|
||||
[fail2ban]: https://www.fail2ban.org/
|
||||
|
||||
<!-- Asynq -->
|
||||
[asynq]: https://github.com/hibiken/asynq
|
||||
[asynq-sched]: https://github.com/hibiken/asynq/wiki/Periodic-Tasks
|
||||
|
||||
<!-- Misc -->
|
||||
[le]: https://letsencrypt.org/
|
||||
[ufw-man]: https://manpages.ubuntu.com/manpages/noble/en/man8/ufw.8.html
|
||||
[ssh-guide]: https://linux-audit.com/audit-and-harden-your-ssh-configuration/
|
||||
[pg-dump]: https://www.postgresql.org/docs/current/app-pgdump.html
|
||||
@@ -62,13 +62,24 @@ func SetupRouter(deps *Dependencies) *echo.Echo {
|
||||
e.Use(custommiddleware.StructuredLogger())
|
||||
|
||||
// Security headers (X-Frame-Options, X-Content-Type-Options, X-XSS-Protection, etc.)
|
||||
//
|
||||
// CSP is permissive enough to serve the marketing landing page at / (which
|
||||
// loads same-origin CSS/JS/images and Google Fonts over https). JSON API
|
||||
// responses are unaffected — they don't load any assets, so any CSP is fine.
|
||||
// frame-ancestors stays 'none' to block clickjacking.
|
||||
e.Use(middleware.SecureWithConfig(middleware.SecureConfig{
|
||||
XSSProtection: "1; mode=block",
|
||||
ContentTypeNosniff: "nosniff",
|
||||
XFrameOptions: "SAMEORIGIN",
|
||||
HSTSMaxAge: 31536000, // 1 year in seconds
|
||||
ReferrerPolicy: "strict-origin-when-cross-origin",
|
||||
ContentSecurityPolicy: "default-src 'none'; frame-ancestors 'none'",
|
||||
XSSProtection: "1; mode=block",
|
||||
ContentTypeNosniff: "nosniff",
|
||||
XFrameOptions: "SAMEORIGIN",
|
||||
HSTSMaxAge: 31536000,
|
||||
ReferrerPolicy: "strict-origin-when-cross-origin",
|
||||
ContentSecurityPolicy: "default-src 'self'; " +
|
||||
"style-src 'self' https://fonts.googleapis.com; " +
|
||||
"font-src 'self' https://fonts.gstatic.com data:; " +
|
||||
"img-src 'self' data:; " +
|
||||
"script-src 'self'; " +
|
||||
"connect-src 'self'; " +
|
||||
"frame-ancestors 'none'",
|
||||
}))
|
||||
e.Use(middleware.BodyLimitWithConfig(middleware.BodyLimitConfig{
|
||||
Limit: "1M", // 1MB default for JSON payloads
|
||||
|
||||
@@ -50,8 +50,12 @@ func NewCacheService(cfg *config.RedisConfig) (*CacheService, error) {
|
||||
|
||||
if err := client.Ping(ctx).Err(); err != nil {
|
||||
initErr = fmt.Errorf("failed to connect to Redis: %w", err)
|
||||
// Reset Once so a retry is possible after transient failures
|
||||
cacheOnce = sync.Once{}
|
||||
// NOTE: Don't reassign `cacheOnce = sync.Once{}` here. Mutating the
|
||||
// Once from within its own Do() callback fatals with "unlock of
|
||||
// unlocked mutex" because Do is holding the inner lock while we
|
||||
// zero it. main.go handles the error (caching disabled, keep running);
|
||||
// a pod restart is the right "retry" path for a transient Redis
|
||||
// outage, not in-process.
|
||||
return
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user