Migrate prod deploy from Swarm to K3s; add full deployment book
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled

Infrastructure:
- Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers)
- Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh
- All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept
  temporarily for reference

Bug fixes surfaced during migration:
- Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25)
- cache_service.go: remove sync.Once reassignment from inside Do()
  callback (was causing 'unlock of unlocked mutex' fatal after
  Redis Ping failure)
- router.go: relax CSP from 'default-src none' to 'default-src self'
  + allowlist fonts.googleapis.com so the marketing landing page CSS
  actually loads in browsers
- deploy/scripts/deploy_prod.sh: use docker buildx with
  --platform linux/amd64 so arm64 (Apple Silicon) dev machines produce
  images runnable on x86_64 Hetzner nodes; fix array expansion under
  set -u
- deploy/swarm-stack.prod.yml: fix secret source references to use
  top-level aliases (the '\${X_SECRET}' form never actually resolved);
  dozzle ports: long-form host_ip is rejected by Swarm, switched to
  short-form (bound to 0.0.0.0 with UFW-based loopback restriction);
  worker replicas 2 -> 1 (Asynq scheduler singleton)
- deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/'
  (Next.js serves at root; /admin/ returned 404 and killed pods);
  startupProbe failureThreshold 12 -> 24
- deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable
  1 -> 0 (singleton)
- deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold
  12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot;
  real startup takes up to 240s)
- .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/
  and admin/src/app/api/*, hiding legitimate files)

New files:
- deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet +
  hostNetwork override for k3s-bundled Traefik
- deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress
  without TLS (CF Flexible SSL) and without middleware
- deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log

Documentation:
- docs/deployment/ — full deployment book, 26 files, ~42k words:
  - Part I Overview, infrastructure, orchestrator choice (Ch 0-2)
  - Part II Networking, firewall, Cloudflare (Ch 3-4, 13)
  - Part III Security, Traefik ingress (Ch 5-6)
  - Part IV Services, DB, storage, secrets, registry (Ch 7-11)
  - Part V Data flow, deploy process, observability, failures, runbook
    (Ch 12, 14-17)
  - Part VI Cost, Swarm postmortem, roadmap (Ch 18-20)
  - Appendices: glossary, kubectl cheat sheet, file locations,
    consolidated citations
- README.md: Production Deployment section replaced with pointer to
  the book; Go version bumped to 1.25

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-04-24 07:20:21 -05:00
parent 4ec4bbbfe8
commit 6f303dbbaa
46 changed files with 9785 additions and 93 deletions
+207
View File
@@ -0,0 +1,207 @@
# Appendix A — Glossary
Alphabetical. Cross-referenced to chapters where each term is used in
detail.
## Kubernetes / k3s
**ClusterIP**: Internal IP of a Kubernetes Service. Stable; load-
balances to backing pods. (Chapter 3)
**containerd**: Container runtime bundled with k3s. Replaces Docker for
the runtime layer. (Chapter 2)
**ConfigMap**: Kubernetes resource holding non-sensitive config (env
vars). Mounted into pods via `envFrom`. (Chapter 10)
**CoreDNS**: Cluster-internal DNS resolver. Every pod's
`/etc/resolv.conf` points to the CoreDNS Service. (Chapter 3)
**CRD (Custom Resource Definition)**: Kubernetes extension mechanism
for third-party resource types. Traefik's `IngressRoute` and
`Middleware` are CRDs. (Chapter 6)
**DaemonSet**: Workload that runs exactly one pod per node. We use it
for Traefik so each node has its own ingress pod. (Chapter 6)
**Deployment**: Kubernetes workload for stateless pods. Supports rolling
updates. Most of our services are Deployments. (Chapter 7)
**Endpoints**: The actual pod IPs backing a Service's ClusterIP.
Dynamically updated as pods come and go. (Chapter 3)
**etcd**: Distributed key-value store holding cluster state. K3s
embeds it. Raft-replicated across server nodes. (Chapter 2)
**Flannel**: Kubernetes CNI (Container Network Interface) plugin for
pod-to-pod networking. Uses VXLAN tunneling. (Chapter 3)
**HPA (HorizontalPodAutoscaler)**: K8s resource that scales Deployment
replicas based on CPU/memory usage. Not currently enabled for us.
(Chapter 7)
**Ingress**: K8s resource describing external-to-internal routing rules.
Traefik watches Ingresses and programs itself accordingly. (Chapter 6)
**IPVS**: Linux kernel feature for in-kernel L4 load balancing. Our
kube-proxy uses it. (Chapter 3)
**k3s**: Lightweight Kubernetes distribution by Rancher/SUSE. What we
run. (Chapter 2)
**kubectl**: Kubernetes CLI tool. Runs on operator workstation.
(Chapter 17)
**kubelet**: Agent running on each node, responsible for pod lifecycle.
(Chapter 2)
**kube-proxy**: Service-to-pod routing component. Runs on each node in
IPVS mode. (Chapter 3)
**Namespace**: Kubernetes logical grouping. Our app lives in `honeydue`.
System services in `kube-system`. (Chapter 7)
**NetworkPolicy**: K8s resource defining allowed traffic between pods.
Not currently applied. (Chapter 5)
**Node**: A physical or virtual machine running Kubernetes. We have 3.
(Chapter 1)
**PDB (PodDisruptionBudget)**: Constraint on voluntary pod disruptions
(drain, upgrade). Keeps N replicas available. (Chapter 7)
**Pod**: Smallest Kubernetes unit — one or more containers sharing
network and storage. Our pods are usually one-container. (Chapter 7)
**PVC (PersistentVolumeClaim)**: Request for persistent storage. Redis
uses one. (Chapter 7)
**RBAC**: Role-Based Access Control. Governs who/what can do what via
the Kubernetes API. (Chapter 5)
**ReplicaSet**: Managed by a Deployment; ensures N pods of a template
are running. Each deploy creates a new ReplicaSet. (Chapter 14)
**Secret**: K8s resource holding sensitive values. Base64-encoded;
stored in etcd (unencrypted by default). (Chapter 10)
**Service**: K8s resource providing a stable endpoint (ClusterIP) for
a set of pods. (Chapter 3)
**ServiceAccount**: Identity used by pods to authenticate to the
Kubernetes API. We disable token mounting for our app pods.
(Chapter 5)
**Taint / Toleration**: Mechanism to prevent pods from being scheduled
on certain nodes. Not used in our setup. (Chapter 7)
## Docker / Swarm
**libnetwork**: Docker's networking library. Provides overlay
networking for Swarm. Source of the DNS ghost bug (Chapter 19).
**mode: global**: Swarm deploy mode for services running one pod per
node. (Chapter 19)
**mode: host**: Port publishing mode that binds to node's real
interface, bypassing the ingress mesh. (Chapter 4)
**Overlay network**: Encrypted or unencrypted virtual network spanning
Swarm nodes. (Chapter 19)
**Swarm**: Docker's built-in orchestrator. What we used to run.
(Chapter 19)
**VXLAN**: Virtual Extensible LAN. Layer-2 over Layer-3 tunneling.
Used by both Swarm overlay and Kubernetes Flannel. (Chapter 3)
## Cloudflare
**Flexible SSL**: CF SSL mode where CF↔origin is HTTP. Our current
setup. (Chapter 13)
**Full (strict) SSL**: CF SSL mode where CF↔origin is HTTPS with cert
verification. Our target. (Chapter 13)
**Origin CA**: CF-internal certificate authority that issues certs CF's
edge trusts. Used for Full strict mode. (Chapter 13)
**POP (Point of Presence)**: A CF edge location. ~300 globally.
(Chapter 13)
**Proxied (orange cloud)**: DNS record with CF proxying on. Traffic
goes through CF. (Chapter 13)
**Workers**: CF's serverless compute at the edge. We don't use yet.
(Chapter 20)
## Hetzner
**CX33**: Hetzner Cloud instance type. 4 vCPU, 8 GB RAM, 80 GB SSD.
(Chapter 1)
**Cloud Firewall**: Hetzner's provider-level firewall feature. We use
UFW on nodes instead. (Chapter 4)
**nbg1**: Nuremberg datacenter code. Our region. (Chapter 1)
## Neon
**Branch**: Neon's isolation primitive. Each project can have multiple
branches (prod, staging, dev). (Chapter 8)
**CU (Compute Unit)**: Neon's pricing unit for compute.
(Chapter 8)
**Launch plan**: Neon's entry-level paid plan. $5 min + usage.
(Chapter 8)
**Pooler**: Neon's built-in PgBouncer instance at the `-pooler` hostname
suffix. (Chapter 8)
## Backblaze B2
**B2**: Backblaze's object storage. What we use for uploads.
(Chapter 9)
**App key**: B2's bucket-scoped credential. Not an IAM-flavored role.
(Chapter 9)
**S3-compatible**: API that speaks AWS S3 protocol. B2 supports it.
(Chapter 9)
## Go + Asynq
**AutoMigrate**: GORM function that syncs DB schema to Go structs.
(Chapter 8)
**Asynq**: Go library for background job queues. Redis-backed.
(Chapter 7)
**GORM**: Go ORM we use. (Chapter 8)
**pgx**: Go Postgres driver used by GORM. (Chapter 8)
**sync.Once**: Go stdlib primitive for "run this exactly once." Source
of bug #6 (Chapter 19).
## Other
**advisory lock**: A Postgres lock that doesn't block rows but lets
apps coordinate voluntarily. We use for migration serialization.
(Chapter 8)
**AOF (Append-Only File)**: Redis persistence mode that logs every
write. (Chapter 7)
**MTU**: Maximum Transmission Unit. Packet size limit. VXLAN reduces
effective MTU by 50 bytes. (Chapter 3)
**Raft**: Consensus algorithm. Used by etcd. (Chapter 2)
**STARTTLS**: SMTP upgrade from plain to TLS. Used for Fastmail.
(Chapter 5)
**UFW**: Uncomplicated Firewall. Frontend for iptables. (Chapter 4)
**VXLAN**: See Docker/Swarm section.
+305
View File
@@ -0,0 +1,305 @@
# Appendix B — kubectl Cheat Sheet
Specific to this deployment. Assumes:
```bash
export KUBECONFIG=~/.kube/honeydue-k3s.yaml
```
## Viewing state
```bash
# All pods in our namespace
kubectl get pods -n honeydue
# With node placement + IPs
kubectl get pods -n honeydue -o wide
# All resources in our namespace
kubectl get all -n honeydue
# Cluster-wide pod overview
kubectl get pods -A
# Node health
kubectl get nodes
kubectl top nodes
# What's using RAM
kubectl top pods -n honeydue --sort-by=memory
# What's using CPU
kubectl top pods -n honeydue --sort-by=cpu
```
## Logs
```bash
# Follow all api pod logs
kubectl logs -n honeydue -l app.kubernetes.io/name=api -f --prefix
# One specific pod
kubectl logs -n honeydue <pod-name>
# Previous pod's logs (after crash)
kubectl logs -n honeydue <pod-name> --previous
# Filtered
kubectl logs -n honeydue deploy/api | grep -i error
kubectl logs -n honeydue deploy/api --since=1h
# stern is nicer for multi-pod (if installed)
stern -n honeydue api
```
## Deploying new code
```bash
SHA=$(git rev-parse --short HEAD)
# Build + push (requires docker login to Gitea first)
docker buildx build --platform linux/amd64 --target api \
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" --push .
# Roll it in
kubectl set image deployment/api -n honeydue \
api="gitea.treytartt.com/admin/honeydue-api:${SHA}"
# Watch
kubectl rollout status -n honeydue deployment/api
```
## Rolling update controls
```bash
# Pause a rollout in progress (new pods stop being created)
kubectl rollout pause deployment/api -n honeydue
# Resume
kubectl rollout resume deployment/api -n honeydue
# Rollback to previous version
kubectl rollout undo deployment/api -n honeydue
# Rollback to specific revision
kubectl rollout history deployment/api -n honeydue
kubectl rollout undo deployment/api -n honeydue --to-revision=3
# Force restart (re-pulls image if digest changed; reloads ConfigMap)
kubectl rollout restart deployment/api -n honeydue
```
## Scaling
```bash
# Scale up
kubectl scale deployment/api -n honeydue --replicas=5
# Scale down
kubectl scale deployment/api -n honeydue --replicas=3
# Kill everything (emergency)
kubectl scale deployment -n honeydue --all --replicas=0
# Bring back
kubectl scale deployment/api -n honeydue --replicas=3
kubectl scale deployment/admin deployment/worker deployment/redis -n honeydue --replicas=1
```
## Debugging a pod
```bash
# Describe = events + state + restart history
kubectl describe pod -n honeydue <pod-name>
# Shell in
kubectl exec -it -n honeydue deploy/api -- /bin/sh
# Inside:
# Test HTTP locally (bypasses Traefik, Service, overlay)
wget -qO- http://127.0.0.1:8000/api/health/
# Test cross-Service DNS
getent hosts redis
getent hosts admin
getent hosts postgres
# Run arbitrary command (one-shot)
kubectl exec -n honeydue deploy/api -- env | grep POSTGRES
```
## Networking checks
```bash
# Resolve a Service from a pod
kubectl exec -n honeydue deploy/api -- nslookup redis
# Check Service endpoints (the actual IPs behind a ClusterIP)
kubectl get endpoints -n honeydue api
# Traffic test via Service
kubectl run test --rm -it --image=alpine/curl -- sh
# curl http://api.honeydue.svc:8000/api/health/
# List all Ingresses
kubectl get ingress -A
```
## Secret / Config
```bash
# List
kubectl get secrets -n honeydue
kubectl get configmap -n honeydue
# Describe (shows keys, not values)
kubectl describe secret honeydue-secrets -n honeydue
# Read a value (DANGER: plaintext to stdout)
kubectl get secret honeydue-secrets -n honeydue \
-o jsonpath='{.data.POSTGRES_PASSWORD}' | base64 -d; echo
# Update a single secret key
kubectl patch secret honeydue-secrets -n honeydue \
--type=merge -p "{\"data\":{\"SECRET_KEY\":\"$(echo -n 'new-val' | base64)\"}}"
# Regenerate ConfigMap from prod.env
kubectl create configmap honeydue-config -n honeydue \
--from-env-file=deploy/prod.env \
--dry-run=client -o yaml | kubectl apply -f -
# Edit a ConfigMap interactively (does NOT restart pods)
kubectl edit configmap honeydue-config -n honeydue
```
## Node management
```bash
# Prevent scheduling on a node
kubectl cordon <node-hostname>
# Prevent scheduling + evict existing pods
kubectl drain <node-hostname> --ignore-daemonsets --delete-emptydir-data
# Allow scheduling again
kubectl uncordon <node-hostname>
# Label a node
kubectl label node <node-hostname> honeydue/redis=true --overwrite
# Remove a label
kubectl label node <node-hostname> honeydue/redis-
```
## Events (the timeline)
```bash
# All events, newest last
kubectl get events -A --sort-by=.lastTimestamp
# Watch live
kubectl get events -A --sort-by=.lastTimestamp -w
# Only warnings
kubectl get events -A --field-selector type=Warning
# Events for a specific pod
kubectl describe pod -n honeydue <pod> | awk '/Events:/,0'
```
## Traefik-specific
```bash
# All Traefik pods (DaemonSet, so one per node)
kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik -o wide
# Restart Traefik across all nodes
kubectl rollout restart daemonset/traefik -n kube-system
# View Traefik config (via ConfigMap)
kubectl get cm -n kube-system traefik -o yaml | less
# See the HelmChartConfig we applied
kubectl get helmchartconfig -n kube-system traefik -o yaml
# Force Helm re-reconcile
kubectl delete job -n kube-system helm-install-traefik
```
## Cluster-wide operations
```bash
# API server health
kubectl cluster-info
# All namespaces
kubectl get namespaces
# All k3s-system pods
kubectl get pods -n kube-system
# All ServiceAccounts in our namespace
kubectl get sa -n honeydue
# Check what an SA can do
kubectl auth can-i --list --as=system:serviceaccount:honeydue:api
```
## Hetzner SSH (not kubectl but oft needed)
```bash
# SSH in
ssh -i ~/.ssh/hetzner deploy@hetzner1
# Check k3s service
ssh -i ~/.ssh/hetzner deploy@hetzner1 'sudo systemctl status k3s'
# Per-node commands in parallel (e.g., apt upgrade)
for h in hetzner1 hetzner2 hetzner3; do
ssh -i ~/.ssh/hetzner "deploy@$h" 'sudo apt update && sudo apt upgrade -y'
done
```
## Emergency: cluster is wedged
```bash
# Check all nodes Ready
kubectl get nodes
# If one is NotReady
ssh -i ~/.ssh/hetzner deploy@<node> 'sudo systemctl restart k3s'
# If still bad, kill k3s on that node and check
ssh -i ~/.ssh/hetzner deploy@<node> 'sudo /usr/local/bin/k3s-killall.sh'
ssh -i ~/.ssh/hetzner deploy@<node> 'sudo systemctl start k3s'
# Last resort: uninstall + rejoin
# ssh -i ~/.ssh/hetzner deploy@<node> 'sudo /usr/local/bin/k3s-uninstall.sh'
# then re-join via the k3s install command
```
## One-liners worth memorizing
```bash
# Heavy smoke test through CF
for url in https://api.myhoneydue.com/api/health/ https://admin.myhoneydue.com/ https://myhoneydue.com/; do
ok=0
for i in $(seq 1 20); do
[[ "$(curl -sS -o /dev/null -w '%{http_code}' --max-time 10 "$url")" == "200" ]] && ok=$((ok+1))
done
printf "%-45s %d/20\n" "$url" "$ok"
done
# Pods not ready
kubectl get pods -A | awk '$3!="Running" && $3!="Completed" && $3!="STATUS"'
# Restart everything in our namespace
for d in api admin worker redis; do
kubectl rollout restart deploy/$d -n honeydue
done
# Watch all rollouts simultaneously
for d in api admin worker redis; do
kubectl rollout status deploy/$d -n honeydue &
done; wait
```
@@ -0,0 +1,216 @@
# Appendix C — File Locations
Complete map of where every significant file lives — on the operator
workstation, in the git repo, and on the Hetzner nodes.
## Operator workstation
### Kubernetes
| Path | Purpose |
|---|---|
| `~/.kube/honeydue-k3s.yaml` | kubeconfig for the k3s cluster. Contains an admin bearer token. Mode 0600. |
| `~/.kube/config` | Default kubeconfig (points elsewhere, not our cluster). |
Set `KUBECONFIG=~/.kube/honeydue-k3s.yaml` before any `kubectl` command.
### SSH
| Path | Purpose |
|---|---|
| `~/.ssh/hetzner` | Private key for node SSH (ed25519). Mode 0600. |
| `~/.ssh/hetzner.pub` | Public key corresponding to above. |
| `~/.ssh/config` | Host aliases for hetzner1/hetzner2/hetzner3 → node IPs. |
Public key content:
```
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBU9xTTBD78tYUqHijgyU9PDqtmS4NuM/6uy8XgDzva+ hetzner2@myhoneydue.com
```
### Docker
| Path | Purpose |
|---|---|
| `~/.docker/config.json` | Docker CLI config. After `docker login` to Gitea, contains creds. **Log out after each deploy** to not leave PATs on disk. |
| `~/Library/Containers/com.docker.docker/` | Docker Desktop state (macOS). |
## Git repo (`/Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go/`)
### Top-level
| Path | Purpose |
|---|---|
| `CLAUDE.md` | Project-wide instructions for Claude assistant. Never commit secrets here. |
| `Dockerfile` | Multi-stage Docker build: api, worker, admin targets. |
| `go.mod`, `go.sum` | Go module definition. |
| `package.json` (admin-ui/) | Next.js dependencies. |
### Application code
| Path | Purpose |
|---|---|
| `cmd/api/main.go` | API server entry point. |
| `cmd/worker/main.go` | Background worker entry point. |
| `cmd/admin/main.go` | (may or may not exist for Go admin variant) |
| `internal/config/` | Viper configuration loading. |
| `internal/database/` | Postgres connection, migrations. |
| `internal/handlers/` | HTTP handlers (one file per domain). |
| `internal/services/` | Business logic. `cache_service.go` is where the sync.Once bug was (Chapter 19). |
| `internal/repositories/` | GORM repositories. |
| `internal/router/router.go` | Echo routes, including static file serving. CSP is set here. |
| `internal/middleware/` | Echo middleware (auth, logging, etc.). |
| `internal/task/` | Task predicates/scopes/categorization. See `docs/TASK_LOGIC_ARCHITECTURE.md`. |
### Deploy config (Swarm era — still exists, unused)
| Path | Purpose |
|---|---|
| `deploy/` | Legacy Swarm deploy root. |
| `deploy/prod.env` | Non-secret config (ConfigMap source). **Gitignored.** |
| `deploy/registry.env` | Gitea PAT + registry URL. **Gitignored.** |
| `deploy/cluster.env` | Swarm cluster settings. Partly used for k3s too (manager host). **Gitignored.** |
| `deploy/secrets/postgres_password.txt` | Neon password. **Gitignored.** |
| `deploy/secrets/secret_key.txt` | App signing key (≥32 chars). **Gitignored.** |
| `deploy/secrets/email_host_password.txt` | Fastmail password. **Gitignored.** |
| `deploy/secrets/fcm_server_key.txt` | FCM key (placeholder, push off). **Gitignored.** |
| `deploy/secrets/apns_auth_key.p8` | APNs key (placeholder, push off). **Gitignored.** |
| `deploy/swarm-stack.prod.yml` | Swarm stack definition. Unused after migration. |
| `deploy/Caddyfile` | Caddy config. Unused after migration. |
| `deploy/scripts/deploy_prod.sh` | Swarm deploy script. Unused. |
| `deploy/DEPLOYING.md`, `deploy/README.md`, `deploy/shit_deploy_cant_do.md` | Swarm-era docs. Historical reference. |
### Deploy config (k3s)
| Path | Purpose |
|---|---|
| `deploy-k3s/README.md` | k3s deployment README (scaffold version). |
| `deploy-k3s/MIGRATION_NOTES.md` | Notes from Swarm → k3s migration. |
| `deploy-k3s/SECURITY.md` | Security posture doc (scaffold). |
| `deploy-k3s/config.yaml.example` | Template for a unified config.yaml (unused — we kept Swarm's file layout). |
| `deploy-k3s/manifests/namespace.yaml` | Creates `honeydue` namespace. |
| `deploy-k3s/manifests/rbac.yaml` | ServiceAccounts + `automountServiceAccountToken: false`. |
| `deploy-k3s/manifests/pod-disruption-budgets.yaml` | PDBs for api (2/3) and worker (0/1). |
| `deploy-k3s/manifests/network-policies.yaml` | Default-deny + allows. NOT currently applied. |
| `deploy-k3s/manifests/api/deployment.yaml` | api Deployment. |
| `deploy-k3s/manifests/api/service.yaml` | api ClusterIP Service. |
| `deploy-k3s/manifests/api/hpa.yaml` | api HorizontalPodAutoscaler. NOT currently applied. |
| `deploy-k3s/manifests/admin/deployment.yaml` | admin Deployment. |
| `deploy-k3s/manifests/admin/service.yaml` | admin Service. |
| `deploy-k3s/manifests/worker/deployment.yaml` | worker Deployment. |
| `deploy-k3s/manifests/redis/deployment.yaml` | Redis Deployment. |
| `deploy-k3s/manifests/redis/service.yaml` | Redis Service. |
| `deploy-k3s/manifests/redis/pvc.yaml` | Redis PersistentVolumeClaim. |
| `deploy-k3s/manifests/ingress/ingress.yaml` | Full Ingress with TLS + middleware (scaffold; needs CF origin cert). |
| `deploy-k3s/manifests/ingress/ingress-simple.yaml` | Simple Ingress without TLS (what we actually apply). |
| `deploy-k3s/manifests/ingress/middleware.yaml` | Traefik middleware CRDs. Not currently applied. |
| `deploy-k3s/manifests/traefik-helmchartconfig.yaml` | Our DaemonSet + hostNetwork override for Traefik. |
| `deploy-k3s/manifests/secrets.yaml.example` | Template (never deployed). |
| `deploy-k3s/scripts/01-provision-cluster.sh` | hetzner-k3s provisioning (we didn't use it; existing nodes). |
| `deploy-k3s/scripts/02-setup-secrets.sh` | Creates Secrets + ConfigMap (scaffold version; we ran commands manually). |
| `deploy-k3s/scripts/03-deploy.sh` | Applies manifests (unused; we ran kubectl manually). |
| `deploy-k3s/scripts/04-verify.sh` | Post-deploy verification. |
| `deploy-k3s/scripts/rollback.sh` | Rollback helper. |
### Documentation
| Path | Purpose |
|---|---|
| `docs/deployment/` | **This book.** |
| `docs/TASK_LOGIC_ARCHITECTURE.md` | Task logic internals. |
| `docs/PUSH_NOTIFICATIONS.md` | Push notifications setup (for future). |
| `docs/SUBSCRIPTION_WEBHOOKS.md` | Apple/Google subscription webhooks. |
| `docs/Dokku_notes` | Pre-Swarm era deployment notes. Historical. |
| `docs/server_2026_2_24.md` | Earlier architecture doc (predates k3s migration). |
## On the Hetzner nodes
### System
| Path | Purpose |
|---|---|
| `/etc/ssh/sshd_config` | SSH config — `PermitRootLogin no`, `PasswordAuthentication no`, `AllowUsers deploy`. |
| `/etc/sudoers.d/deploy` | `deploy ALL=(ALL) NOPASSWD: ALL`. |
| `/etc/ufw/` | UFW configuration. See Chapter 4 for rule inventory. |
| `/etc/sysctl.d/99-unprivileged-ports.conf` | `net.ipv4.ip_unprivileged_port_start=0` for Traefik. |
| `/home/deploy/.ssh/authorized_keys` | Our hetzner.pub. |
### K3s
| Path | Purpose |
|---|---|
| `/etc/rancher/k3s/k3s.yaml` | Kubeconfig (localhost-scoped; we copied to workstation). |
| `/etc/systemd/system/k3s.service` | systemd service file. |
| `/etc/systemd/system/k3s.service.env` | K3s install args (INSTALL_K3S_EXEC). |
| `/var/lib/rancher/k3s/` | K3s state root (etcd, containerd, PVC storage). |
| `/var/lib/rancher/k3s/server/node-token` | Token for joining additional nodes. |
| `/var/lib/rancher/k3s/storage/` | local-path PVC storage. Redis data lives here. |
| `/var/lib/rancher/k3s/agent/containerd/` | containerd state. |
| `/var/log/containers/` | Container log files. |
### Commands installed
| Path | Purpose |
|---|---|
| `/usr/local/bin/k3s` | The k3s binary. |
| `/usr/local/bin/kubectl` | Symlink to k3s (CLI for this cluster). |
| `/usr/local/bin/crictl` | containerd CLI. |
| `/usr/local/bin/k3s-killall.sh` | Emergency kill-all-k3s script. |
| `/usr/local/bin/k3s-uninstall.sh` | Clean uninstall script. |
### Docker (legacy; disabled)
| Path | Purpose |
|---|---|
| `/etc/systemd/system/docker.service` | systemd unit (stopped + disabled). |
| `/var/lib/docker/` | Docker state (unused on current cluster). |
## On Cloudflare
Not a filesystem, but worth noting the dashboard hierarchy:
```
Websites → myhoneydue.com
├── DNS → Records (A records for api, admin, @)
├── SSL/TLS → Overview (SSL mode: Flexible)
├── SSL/TLS → Edge Certificates (Always Use HTTPS: On)
├── SSL/TLS → Origin Server (would live the Origin CA cert if we enabled it)
├── Rules → Overview (where Origin Rules live if we had them)
├── Rules → Page Rules (none)
├── Security → WAF (managed rules only)
├── Speed → Optimization (default)
└── Analytics & Logs (read-only stats)
```
## On Gitea (`gitea.treytartt.com`)
The image registry lives at:
```
gitea.treytartt.com/admin/-/packages # UI listing of all packages
gitea.treytartt.com/admin/-/packages/container/honeydue-api # API image
gitea.treytartt.com/admin/-/packages/container/honeydue-worker # Worker image
gitea.treytartt.com/admin/-/packages/container/honeydue-admin # Admin image
```
Per-version tags visible in the UI with `docker pull` commands.
PATs at `gitea.treytartt.com/-/user/settings/applications`.
## On Neon
```
console.neon.tech → project → Branches (production branch default)
console.neon.tech → project → Monitoring (CU-hour usage, slow queries)
console.neon.tech → project → Operations (history of schema changes)
```
Connection strings at `console.neon.tech → project → Connection Details`.
## On Backblaze B2
```
secure.backblaze.com/b2_buckets.htm # Buckets list
secure.backblaze.com/b2_app_keys.htm # App keys
```
`honeyDueProd` bucket → Files tab for browsing contents.
+202
View File
@@ -0,0 +1,202 @@
# Appendix D — References & Citations
Every external link cited anywhere in this book, grouped by topic.
## Docker / Moby
- [moby/moby#52265 — Overlay ARP stale entries on 29.3.0 regression][moby-52265] (Chapter 19, primary root-cause citation)
- [moby/moby#51491 — DNS broken after `docker swarm init` on 29.0.0][moby-51491]
- [Dokploy#3480 — Traefik routes intermittently timeout due to stale VIP][dokploy-3480]
- [Mirantis: Commits to Long-Term Support for Swarm Through 2030][mirantis-swarm]
- [Better Stack: Hetzner Cloud Review 2026][bstack-swarm]
- [VirtualizationHowTo: Is Docker Swarm Still Safe in 2026?][vht-swarm]
- [bleevht: Where Docker Swarm Still Fits in 2026][bleevht-swarm]
- [Docker buildx multi-platform builds][buildx]
- [Compose specification][compose-spec]
## Kubernetes / k3s
- [K3s documentation home][k3s-docs]
- [K3s architecture][k3s-arch]
- [K3s requirements (networking ports)][k3s-reqs]
- [K3s advanced config — metrics server][k3s-metrics]
- [K3s HA datastore recovery][k3s-ha-recovery]
- [K3s storage — local-path provisioner][k3s-lp]
- [K3s Helm integration — HelmChartConfig][k3s-helm]
- [K3s Traefik customization][k3s-traefik]
- [K3s secrets encryption][k3s-secrets]
- [Kubernetes concepts — Services & Networking][k8s-net]
- [Kubernetes Ingress][k8s-ingress]
- [Kubernetes Deployments — rolling updates][rolling]
- [kubectl rollout][rollout]
- [kubectl cheat sheet][kubectl-cs]
- [Pod lifecycle + probes][probes]
- [Pod Security Standards][psa]
- [Kubernetes RBAC][rbac]
- [NetworkPolicy][netpol]
- [Ports and Protocols reference][k8s-ports]
- [metrics-server][ms]
## Traefik
- [Traefik v3 documentation][traefik]
- [Traefik Swarm provider][traefik-swarm]
- [Traefik migrate v2 → v3][traefik-v3]
## Cloudflare
- [IP ranges][cf-ips]
- [SSL modes explained][cf-ssl]
- [Origin CA certificates][cf-origin-ca]
- [DNS best practices][cf-dns]
- [Free plan][cf-free]
## Hetzner
- [Hetzner Cloud][hetzner-cloud]
- [Hetzner price adjustment 2026-04-01][hetzner-prices]
- [Hetzner rescue system][hetzner-rescue]
- [hetzner-k3s tool][hetzner-k3s]
## Neon / Postgres
- [Neon docs][neon-docs]
- [Neon pricing][neon-pricing]
- [Neon usage-based pricing announcement][neon-blog]
- [Neon connect from any app][neon-connect]
- [Postgres advisory locks][pg-locks]
- [GORM AutoMigrate][gorm-automigrate]
## Backblaze B2
- [B2 documentation][b2-docs]
- [B2 S3-compatible API][b2-s3]
- [B2 pricing][b2-pricing]
- [minio-go SDK][minio-go]
- [S3 path-style vs virtual-hosted addressing][s3-style]
## Gitea
- [Gitea container registry docs][gitea-cr]
## CNI / Networking
- [Flannel VXLAN backend][flannel-vxlan]
- [CoreDNS Kubernetes plugin][coredns-k8s]
- [IPVS mode for kube-proxy deep dive][ipvs]
- [VXLAN RFC 7348][vxlan-rfc]
- [Kubernetes NetworkPolicy][netpol]
## Security tools
- [cosign (image signing)][cosign]
- [Loki (logs)][loki]
- [Stern (multi-pod log tailing)][stern]
- [fail2ban][fail2ban]
## Asynq
- [Asynq documentation][asynq]
- [Asynq periodic tasks (scheduler limitations)][asynq-sched]
## Miscellaneous
- [Let's Encrypt][le]
- [UFW man page][ufw-man]
- [SSH hardening guide][ssh-guide]
- [pg_dump][pg-dump]
---
## Link definitions
<!-- Docker / Moby -->
[moby-52265]: https://github.com/moby/moby/issues/52265
[moby-51491]: https://github.com/moby/moby/issues/51491
[dokploy-3480]: https://github.com/Dokploy/dokploy/issues/3480
[mirantis-swarm]: https://www.mirantis.com/blog/mirantis-guarantees-long-term-support-for-swarm/
[bstack-swarm]: https://betterstack.com/community/guides/web-servers/hetzner-cloud-review/
[vht-swarm]: https://www.virtualizationhowto.com/2026/03/is-docker-swarm-still-safe-in-2026/
[bleevht-swarm]: https://bleevht.substack.com/p/where-docker-swarm-still-fits-in
[buildx]: https://docs.docker.com/build/buildx/
[compose-spec]: https://docs.docker.com/reference/compose-file/
<!-- Kubernetes / k3s -->
[k3s-docs]: https://docs.k3s.io/
[k3s-arch]: https://docs.k3s.io/architecture
[k3s-reqs]: https://docs.k3s.io/installation/requirements#networking
[k3s-metrics]: https://docs.k3s.io/advanced#enabling-metrics-server
[k3s-ha-recovery]: https://docs.k3s.io/datastore/ha-embedded#new-cluster-with-embedded-db
[k3s-lp]: https://docs.k3s.io/storage#setting-up-the-local-storage-provider
[k3s-helm]: https://docs.k3s.io/helm#customizing-packaged-components-with-helmchartconfig
[k3s-traefik]: https://docs.k3s.io/networking/networking-services#traefik-ingress-controller
[k3s-secrets]: https://docs.k3s.io/security/secrets-encryption
[k8s-net]: https://kubernetes.io/docs/concepts/services-networking/
[k8s-ingress]: https://kubernetes.io/docs/concepts/services-networking/ingress/
[rolling]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment
[rollout]: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#rollout
[kubectl-cs]: https://kubernetes.io/docs/reference/kubectl/cheatsheet/
[probes]: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-lifecycle
[psa]: https://kubernetes.io/docs/concepts/security/pod-security-standards/
[rbac]: https://kubernetes.io/docs/reference/access-authn-authz/rbac/
[netpol]: https://kubernetes.io/docs/concepts/services-networking/network-policies/
[k8s-ports]: https://kubernetes.io/docs/reference/networking/ports-and-protocols/
[ms]: https://github.com/kubernetes-sigs/metrics-server
<!-- Traefik -->
[traefik]: https://doc.traefik.io/traefik/v3.6/
[traefik-swarm]: https://doc.traefik.io/traefik/providers/swarm/
[traefik-v3]: https://doc.traefik.io/traefik/migrate/v2-to-v3-details/
<!-- Cloudflare -->
[cf-ips]: https://www.cloudflare.com/ips/
[cf-ssl]: https://developers.cloudflare.com/ssl/origin-configuration/ssl-modes/
[cf-origin-ca]: https://developers.cloudflare.com/ssl/origin-configuration/origin-ca/
[cf-dns]: https://developers.cloudflare.com/dns/
[cf-free]: https://www.cloudflare.com/plans/free/
<!-- Hetzner -->
[hetzner-cloud]: https://www.hetzner.com/cloud/
[hetzner-prices]: https://docs.hetzner.com/general/infrastructure-and-availability/price-adjustment/
[hetzner-rescue]: https://docs.hetzner.com/cloud/servers/getting-started/enabling-rescue-system/
[hetzner-k3s]: https://github.com/vitobotta/hetzner-k3s
<!-- Neon / Postgres -->
[neon-docs]: https://neon.com/docs/introduction
[neon-pricing]: https://neon.com/pricing
[neon-blog]: https://neon.com/blog/new-usage-based-pricing
[neon-connect]: https://neon.com/docs/connect/connect-from-any-app
[pg-locks]: https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS
[gorm-automigrate]: https://gorm.io/docs/migration.html
<!-- B2 -->
[b2-docs]: https://www.backblaze.com/docs/
[b2-s3]: https://www.backblaze.com/docs/cloud-storage-s3-compatible-api
[b2-pricing]: https://www.backblaze.com/cloud-storage/pricing
[minio-go]: https://github.com/minio/minio-go
[s3-style]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html
<!-- Gitea -->
[gitea-cr]: https://docs.gitea.com/usage/packages/container
<!-- CNI -->
[flannel-vxlan]: https://github.com/flannel-io/flannel/blob/master/Documentation/backends.md#vxlan
[coredns-k8s]: https://coredns.io/plugins/kubernetes/
[ipvs]: https://kubernetes.io/blog/2018/07/09/ipvs-based-in-cluster-load-balancing-deep-dive/
[vxlan-rfc]: https://datatracker.ietf.org/doc/html/rfc7348
<!-- Security tools -->
[cosign]: https://github.com/sigstore/cosign
[loki]: https://grafana.com/oss/loki/
[stern]: https://github.com/stern/stern
[fail2ban]: https://www.fail2ban.org/
<!-- Asynq -->
[asynq]: https://github.com/hibiken/asynq
[asynq-sched]: https://github.com/hibiken/asynq/wiki/Periodic-Tasks
<!-- Misc -->
[le]: https://letsencrypt.org/
[ufw-man]: https://manpages.ubuntu.com/manpages/noble/en/man8/ufw.8.html
[ssh-guide]: https://linux-audit.com/audit-and-harden-your-ssh-configuration/
[pg-dump]: https://www.postgresql.org/docs/current/app-pgdump.html