Migrate prod deploy from Swarm to K3s; add full deployment book
Infrastructure:
- Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers)
- Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh
- All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept
temporarily for reference
Bug fixes surfaced during migration:
- Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25)
- cache_service.go: remove sync.Once reassignment from inside Do()
callback (was causing 'unlock of unlocked mutex' fatal after
Redis Ping failure)
- router.go: relax CSP from 'default-src none' to 'default-src self'
+ allowlist fonts.googleapis.com so the marketing landing page CSS
actually loads in browsers
- deploy/scripts/deploy_prod.sh: use docker buildx with
--platform linux/amd64 so arm64 (Apple Silicon) dev machines produce
images runnable on x86_64 Hetzner nodes; fix array expansion under
set -u
- deploy/swarm-stack.prod.yml: fix secret source references to use
top-level aliases (the '\${X_SECRET}' form never actually resolved);
dozzle ports: long-form host_ip is rejected by Swarm, switched to
short-form (bound to 0.0.0.0 with UFW-based loopback restriction);
worker replicas 2 -> 1 (Asynq scheduler singleton)
- deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/'
(Next.js serves at root; /admin/ returned 404 and killed pods);
startupProbe failureThreshold 12 -> 24
- deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable
1 -> 0 (singleton)
- deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold
12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot;
real startup takes up to 240s)
- .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/
and admin/src/app/api/*, hiding legitimate files)
New files:
- deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet +
hostNetwork override for k3s-bundled Traefik
- deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress
without TLS (CF Flexible SSL) and without middleware
- deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log
Documentation:
- docs/deployment/ — full deployment book, 26 files, ~42k words:
- Part I Overview, infrastructure, orchestrator choice (Ch 0-2)
- Part II Networking, firewall, Cloudflare (Ch 3-4, 13)
- Part III Security, Traefik ingress (Ch 5-6)
- Part IV Services, DB, storage, secrets, registry (Ch 7-11)
- Part V Data flow, deploy process, observability, failures, runbook
(Ch 12, 14-17)
- Part VI Cost, Swarm postmortem, roadmap (Ch 18-20)
- Appendices: glossary, kubectl cheat sheet, file locations,
consolidated citations
- README.md: Production Deployment section replaced with pointer to
the book; Go version bumped to 1.25
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,207 @@
|
||||
# Appendix A — Glossary
|
||||
|
||||
Alphabetical. Cross-referenced to chapters where each term is used in
|
||||
detail.
|
||||
|
||||
## Kubernetes / k3s
|
||||
|
||||
**ClusterIP**: Internal IP of a Kubernetes Service. Stable; load-
|
||||
balances to backing pods. (Chapter 3)
|
||||
|
||||
**containerd**: Container runtime bundled with k3s. Replaces Docker for
|
||||
the runtime layer. (Chapter 2)
|
||||
|
||||
**ConfigMap**: Kubernetes resource holding non-sensitive config (env
|
||||
vars). Mounted into pods via `envFrom`. (Chapter 10)
|
||||
|
||||
**CoreDNS**: Cluster-internal DNS resolver. Every pod's
|
||||
`/etc/resolv.conf` points to the CoreDNS Service. (Chapter 3)
|
||||
|
||||
**CRD (Custom Resource Definition)**: Kubernetes extension mechanism
|
||||
for third-party resource types. Traefik's `IngressRoute` and
|
||||
`Middleware` are CRDs. (Chapter 6)
|
||||
|
||||
**DaemonSet**: Workload that runs exactly one pod per node. We use it
|
||||
for Traefik so each node has its own ingress pod. (Chapter 6)
|
||||
|
||||
**Deployment**: Kubernetes workload for stateless pods. Supports rolling
|
||||
updates. Most of our services are Deployments. (Chapter 7)
|
||||
|
||||
**Endpoints**: The actual pod IPs backing a Service's ClusterIP.
|
||||
Dynamically updated as pods come and go. (Chapter 3)
|
||||
|
||||
**etcd**: Distributed key-value store holding cluster state. K3s
|
||||
embeds it. Raft-replicated across server nodes. (Chapter 2)
|
||||
|
||||
**Flannel**: Kubernetes CNI (Container Network Interface) plugin for
|
||||
pod-to-pod networking. Uses VXLAN tunneling. (Chapter 3)
|
||||
|
||||
**HPA (HorizontalPodAutoscaler)**: K8s resource that scales Deployment
|
||||
replicas based on CPU/memory usage. Not currently enabled for us.
|
||||
(Chapter 7)
|
||||
|
||||
**Ingress**: K8s resource describing external-to-internal routing rules.
|
||||
Traefik watches Ingresses and programs itself accordingly. (Chapter 6)
|
||||
|
||||
**IPVS**: Linux kernel feature for in-kernel L4 load balancing. Our
|
||||
kube-proxy uses it. (Chapter 3)
|
||||
|
||||
**k3s**: Lightweight Kubernetes distribution by Rancher/SUSE. What we
|
||||
run. (Chapter 2)
|
||||
|
||||
**kubectl**: Kubernetes CLI tool. Runs on operator workstation.
|
||||
(Chapter 17)
|
||||
|
||||
**kubelet**: Agent running on each node, responsible for pod lifecycle.
|
||||
(Chapter 2)
|
||||
|
||||
**kube-proxy**: Service-to-pod routing component. Runs on each node in
|
||||
IPVS mode. (Chapter 3)
|
||||
|
||||
**Namespace**: Kubernetes logical grouping. Our app lives in `honeydue`.
|
||||
System services in `kube-system`. (Chapter 7)
|
||||
|
||||
**NetworkPolicy**: K8s resource defining allowed traffic between pods.
|
||||
Not currently applied. (Chapter 5)
|
||||
|
||||
**Node**: A physical or virtual machine running Kubernetes. We have 3.
|
||||
(Chapter 1)
|
||||
|
||||
**PDB (PodDisruptionBudget)**: Constraint on voluntary pod disruptions
|
||||
(drain, upgrade). Keeps N replicas available. (Chapter 7)
|
||||
|
||||
**Pod**: Smallest Kubernetes unit — one or more containers sharing
|
||||
network and storage. Our pods are usually one-container. (Chapter 7)
|
||||
|
||||
**PVC (PersistentVolumeClaim)**: Request for persistent storage. Redis
|
||||
uses one. (Chapter 7)
|
||||
|
||||
**RBAC**: Role-Based Access Control. Governs who/what can do what via
|
||||
the Kubernetes API. (Chapter 5)
|
||||
|
||||
**ReplicaSet**: Managed by a Deployment; ensures N pods of a template
|
||||
are running. Each deploy creates a new ReplicaSet. (Chapter 14)
|
||||
|
||||
**Secret**: K8s resource holding sensitive values. Base64-encoded;
|
||||
stored in etcd (unencrypted by default). (Chapter 10)
|
||||
|
||||
**Service**: K8s resource providing a stable endpoint (ClusterIP) for
|
||||
a set of pods. (Chapter 3)
|
||||
|
||||
**ServiceAccount**: Identity used by pods to authenticate to the
|
||||
Kubernetes API. We disable token mounting for our app pods.
|
||||
(Chapter 5)
|
||||
|
||||
**Taint / Toleration**: Mechanism to prevent pods from being scheduled
|
||||
on certain nodes. Not used in our setup. (Chapter 7)
|
||||
|
||||
## Docker / Swarm
|
||||
|
||||
**libnetwork**: Docker's networking library. Provides overlay
|
||||
networking for Swarm. Source of the DNS ghost bug (Chapter 19).
|
||||
|
||||
**mode: global**: Swarm deploy mode for services running one pod per
|
||||
node. (Chapter 19)
|
||||
|
||||
**mode: host**: Port publishing mode that binds to node's real
|
||||
interface, bypassing the ingress mesh. (Chapter 4)
|
||||
|
||||
**Overlay network**: Encrypted or unencrypted virtual network spanning
|
||||
Swarm nodes. (Chapter 19)
|
||||
|
||||
**Swarm**: Docker's built-in orchestrator. What we used to run.
|
||||
(Chapter 19)
|
||||
|
||||
**VXLAN**: Virtual Extensible LAN. Layer-2 over Layer-3 tunneling.
|
||||
Used by both Swarm overlay and Kubernetes Flannel. (Chapter 3)
|
||||
|
||||
## Cloudflare
|
||||
|
||||
**Flexible SSL**: CF SSL mode where CF↔origin is HTTP. Our current
|
||||
setup. (Chapter 13)
|
||||
|
||||
**Full (strict) SSL**: CF SSL mode where CF↔origin is HTTPS with cert
|
||||
verification. Our target. (Chapter 13)
|
||||
|
||||
**Origin CA**: CF-internal certificate authority that issues certs CF's
|
||||
edge trusts. Used for Full strict mode. (Chapter 13)
|
||||
|
||||
**POP (Point of Presence)**: A CF edge location. ~300 globally.
|
||||
(Chapter 13)
|
||||
|
||||
**Proxied (orange cloud)**: DNS record with CF proxying on. Traffic
|
||||
goes through CF. (Chapter 13)
|
||||
|
||||
**Workers**: CF's serverless compute at the edge. We don't use yet.
|
||||
(Chapter 20)
|
||||
|
||||
## Hetzner
|
||||
|
||||
**CX33**: Hetzner Cloud instance type. 4 vCPU, 8 GB RAM, 80 GB SSD.
|
||||
(Chapter 1)
|
||||
|
||||
**Cloud Firewall**: Hetzner's provider-level firewall feature. We use
|
||||
UFW on nodes instead. (Chapter 4)
|
||||
|
||||
**nbg1**: Nuremberg datacenter code. Our region. (Chapter 1)
|
||||
|
||||
## Neon
|
||||
|
||||
**Branch**: Neon's isolation primitive. Each project can have multiple
|
||||
branches (prod, staging, dev). (Chapter 8)
|
||||
|
||||
**CU (Compute Unit)**: Neon's pricing unit for compute.
|
||||
(Chapter 8)
|
||||
|
||||
**Launch plan**: Neon's entry-level paid plan. $5 min + usage.
|
||||
(Chapter 8)
|
||||
|
||||
**Pooler**: Neon's built-in PgBouncer instance at the `-pooler` hostname
|
||||
suffix. (Chapter 8)
|
||||
|
||||
## Backblaze B2
|
||||
|
||||
**B2**: Backblaze's object storage. What we use for uploads.
|
||||
(Chapter 9)
|
||||
|
||||
**App key**: B2's bucket-scoped credential. Not an IAM-flavored role.
|
||||
(Chapter 9)
|
||||
|
||||
**S3-compatible**: API that speaks AWS S3 protocol. B2 supports it.
|
||||
(Chapter 9)
|
||||
|
||||
## Go + Asynq
|
||||
|
||||
**AutoMigrate**: GORM function that syncs DB schema to Go structs.
|
||||
(Chapter 8)
|
||||
|
||||
**Asynq**: Go library for background job queues. Redis-backed.
|
||||
(Chapter 7)
|
||||
|
||||
**GORM**: Go ORM we use. (Chapter 8)
|
||||
|
||||
**pgx**: Go Postgres driver used by GORM. (Chapter 8)
|
||||
|
||||
**sync.Once**: Go stdlib primitive for "run this exactly once." Source
|
||||
of bug #6 (Chapter 19).
|
||||
|
||||
## Other
|
||||
|
||||
**advisory lock**: A Postgres lock that doesn't block rows but lets
|
||||
apps coordinate voluntarily. We use for migration serialization.
|
||||
(Chapter 8)
|
||||
|
||||
**AOF (Append-Only File)**: Redis persistence mode that logs every
|
||||
write. (Chapter 7)
|
||||
|
||||
**MTU**: Maximum Transmission Unit. Packet size limit. VXLAN reduces
|
||||
effective MTU by 50 bytes. (Chapter 3)
|
||||
|
||||
**Raft**: Consensus algorithm. Used by etcd. (Chapter 2)
|
||||
|
||||
**STARTTLS**: SMTP upgrade from plain to TLS. Used for Fastmail.
|
||||
(Chapter 5)
|
||||
|
||||
**UFW**: Uncomplicated Firewall. Frontend for iptables. (Chapter 4)
|
||||
|
||||
**VXLAN**: See Docker/Swarm section.
|
||||
Reference in New Issue
Block a user