Migrate prod deploy from Swarm to K3s; add full deployment book
Infrastructure:
- Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers)
- Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh
- All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept
temporarily for reference
Bug fixes surfaced during migration:
- Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25)
- cache_service.go: remove sync.Once reassignment from inside Do()
callback (was causing 'unlock of unlocked mutex' fatal after
Redis Ping failure)
- router.go: relax CSP from 'default-src none' to 'default-src self'
+ allowlist fonts.googleapis.com so the marketing landing page CSS
actually loads in browsers
- deploy/scripts/deploy_prod.sh: use docker buildx with
--platform linux/amd64 so arm64 (Apple Silicon) dev machines produce
images runnable on x86_64 Hetzner nodes; fix array expansion under
set -u
- deploy/swarm-stack.prod.yml: fix secret source references to use
top-level aliases (the '\${X_SECRET}' form never actually resolved);
dozzle ports: long-form host_ip is rejected by Swarm, switched to
short-form (bound to 0.0.0.0 with UFW-based loopback restriction);
worker replicas 2 -> 1 (Asynq scheduler singleton)
- deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/'
(Next.js serves at root; /admin/ returned 404 and killed pods);
startupProbe failureThreshold 12 -> 24
- deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable
1 -> 0 (singleton)
- deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold
12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot;
real startup takes up to 240s)
- .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/
and admin/src/app/api/*, hiding legitimate files)
New files:
- deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet +
hostNetwork override for k3s-bundled Traefik
- deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress
without TLS (CF Flexible SSL) and without middleware
- deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log
Documentation:
- docs/deployment/ — full deployment book, 26 files, ~42k words:
- Part I Overview, infrastructure, orchestrator choice (Ch 0-2)
- Part II Networking, firewall, Cloudflare (Ch 3-4, 13)
- Part III Security, Traefik ingress (Ch 5-6)
- Part IV Services, DB, storage, secrets, registry (Ch 7-11)
- Part V Data flow, deploy process, observability, failures, runbook
(Ch 12, 14-17)
- Part VI Cost, Swarm postmortem, roadmap (Ch 18-20)
- Appendices: glossary, kubectl cheat sheet, file locations,
consolidated citations
- README.md: Production Deployment section replaced with pointer to
the book; Go version bumped to 1.25
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,329 @@
|
||||
# 11 — Container Registry (Gitea)
|
||||
|
||||
## Summary
|
||||
|
||||
We host our own container registry on Gitea at `gitea.treytartt.com`.
|
||||
Every image push and pull goes there, not Docker Hub or GHCR. The Gitea
|
||||
instance runs outside this k3s cluster (on its own VPS) and is available
|
||||
at `https://gitea.treytartt.com` with public HTTPS. Image pulls are
|
||||
authenticated via a Personal Access Token stored as a Kubernetes
|
||||
`dockerconfigjson` Secret.
|
||||
|
||||
## Why Gitea
|
||||
|
||||
### Decision matrix
|
||||
|
||||
| Option | Cost | Auth model | Pros | Cons |
|
||||
|---|---|---|---|---|
|
||||
| **Gitea built-in registry** | $0 (already running Gitea) | Gitea PAT | Self-hosted, integrated with code | Another service to maintain |
|
||||
| GHCR (GitHub Container Registry) | Free for public, $0 for private with paid plan | GitHub PAT | Popular, reliable | Uses GitHub; vendor dependency |
|
||||
| Docker Hub | Free tier limited; paid $5-7/mo | Docker Hub account | Ubiquitous | Rate limits on anonymous pulls |
|
||||
| AWS ECR | ~$1/mo for small use | IAM | Integrates with AWS workloads | AWS account required |
|
||||
| Harbor (self-hosted) | $0 | Many options | Best enterprise features | Heavy to operate |
|
||||
|
||||
Gitea won primarily because **the operator was already running Gitea for
|
||||
code hosting**. Container registry is built into Gitea 1.17+ as a free
|
||||
feature. One fewer service to set up.
|
||||
|
||||
Side benefits:
|
||||
- Code and images live together (one backup policy, one access model)
|
||||
- PATs are scoped and rotatable via the same UI
|
||||
- No external vendor to worry about for this critical piece of the
|
||||
deploy pipeline
|
||||
|
||||
Rejected alternatives:
|
||||
- **Docker Hub** — rate limits on unauthenticated pulls would bite us if
|
||||
nodes pull the same image repeatedly during rolling updates
|
||||
- **GHCR** — fine but adds GitHub dependency we don't otherwise have
|
||||
- **Harbor** — massive overkill; we're not a 100-team enterprise
|
||||
|
||||
## Layout
|
||||
|
||||
Images live under the authenticated user's namespace:
|
||||
|
||||
```
|
||||
gitea.treytartt.com/admin/honeydue-api:237c6b8
|
||||
gitea.treytartt.com/admin/honeydue-worker:237c6b8
|
||||
gitea.treytartt.com/admin/honeydue-admin:237c6b8
|
||||
```
|
||||
|
||||
`admin` is the Gitea user that owns the images. Images are **private**
|
||||
by default.
|
||||
|
||||
### Image tagging strategy
|
||||
|
||||
Tags are git short SHAs (e.g., `237c6b8`). Not `:latest`. Not semantic
|
||||
version.
|
||||
|
||||
Rationale:
|
||||
- `:latest` is ambiguous — which build? Rolling updates should roll a
|
||||
*specific* tag so rollbacks are deterministic.
|
||||
- `:v1.2.3` works for released libraries but our app rolls forward
|
||||
continuously; versioning per deploy is unnecessary overhead.
|
||||
- Git SHAs are unique, immutable, and tie each image to the exact
|
||||
commit that built it.
|
||||
|
||||
`PUSH_LATEST_TAG=false` is set in `deploy/cluster.env`. When we rebuild
|
||||
and push, only the SHA tag gets pushed. The `latest` tag is never
|
||||
created by our deploy pipeline.
|
||||
|
||||
## Authentication
|
||||
|
||||
### Creating the PAT
|
||||
|
||||
At <https://gitea.treytartt.com/-/user/settings/applications>, we created
|
||||
a token with scopes:
|
||||
|
||||
- `read:package`
|
||||
- `write:package`
|
||||
|
||||
No other scopes. This token can only interact with package registry; it
|
||||
can't read repo contents, create issues, or touch account settings.
|
||||
|
||||
### PAT on the operator workstation
|
||||
|
||||
Stored in `deploy/registry.env`:
|
||||
|
||||
```
|
||||
REGISTRY=gitea.treytartt.com
|
||||
REGISTRY_NAMESPACE=admin
|
||||
REGISTRY_USERNAME=admin
|
||||
REGISTRY_TOKEN=<pat>
|
||||
```
|
||||
|
||||
This file is `.gitignore`d in `deploy/.gitignore`. If it ever gets
|
||||
committed accidentally, rotate the PAT immediately.
|
||||
|
||||
### PAT in the cluster
|
||||
|
||||
Stored as the `gitea-credentials` Secret (type `dockerconfigjson`) in
|
||||
the `honeydue` namespace. See Chapter 10.
|
||||
|
||||
Kubelet reads this Secret when a pod needs to pull from the Gitea
|
||||
registry.
|
||||
|
||||
## The build pipeline
|
||||
|
||||
### Dockerfile multi-stage
|
||||
|
||||
`honeyDueAPI-go/Dockerfile` has three target stages:
|
||||
|
||||
- `api` — compiled Go binary + static assets for the HTTP API
|
||||
- `worker` — compiled Go binary for the background worker
|
||||
- `admin` — Next.js standalone build of the admin panel
|
||||
|
||||
A single Dockerfile keeps build-cache sharing efficient (the Go builder
|
||||
stage produces binaries for both api and worker; admin reuses its own
|
||||
Node builder stage).
|
||||
|
||||
### Multi-arch cross-compilation
|
||||
|
||||
The operator workstation is **arm64** (Apple Silicon). The Hetzner nodes
|
||||
are **x86_64**. A naive `docker build` on arm64 produces arm64 images
|
||||
that won't run on the nodes (`exec format error`).
|
||||
|
||||
The deploy pipeline uses `docker buildx`:
|
||||
|
||||
```bash
|
||||
docker buildx build \
|
||||
--platform linux/amd64 \
|
||||
--target api \
|
||||
-t gitea.treytartt.com/admin/honeydue-api:$SHA \
|
||||
--push \
|
||||
/Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
|
||||
```
|
||||
|
||||
- **`--platform linux/amd64`** — cross-compile to x86_64
|
||||
- **`--target api`** — which Dockerfile stage to build
|
||||
- **`--push`** — push directly to the registry (skip local image cache)
|
||||
|
||||
The Go stages use the `TARGETARCH` build arg to produce the right
|
||||
architecture binary. Node stages use QEMU emulation (which is slower but
|
||||
acceptable for our ~1 min admin build).
|
||||
|
||||
### Buildx builder
|
||||
|
||||
We use a named buildx builder to keep state out of Docker's default
|
||||
environment:
|
||||
|
||||
```bash
|
||||
docker buildx create --name honeydue-builder --use
|
||||
docker buildx inspect --bootstrap
|
||||
```
|
||||
|
||||
The `honeydue-builder` is a docker-container driver — spawns a
|
||||
BuildKit container when building, tears it down when idle. Supports
|
||||
multi-platform and caches layers across builds.
|
||||
|
||||
## From local file to cluster — the full path
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph dev[Operator workstation]
|
||||
Code[Source code]
|
||||
Dockerfile
|
||||
Buildx[docker buildx]
|
||||
end
|
||||
subgraph Gitea[gitea.treytartt.com]
|
||||
Reg[Package registry]
|
||||
end
|
||||
subgraph K8s[k3s cluster]
|
||||
Kubelet
|
||||
Containerd
|
||||
Pod
|
||||
end
|
||||
|
||||
Code --> Dockerfile
|
||||
Dockerfile --> Buildx
|
||||
Buildx -- push --> Reg
|
||||
Reg -- pull --> Kubelet
|
||||
Kubelet --> Containerd
|
||||
Containerd --> Pod
|
||||
```
|
||||
|
||||
### End-to-end
|
||||
|
||||
1. **Operator pushes code**: commits to `main` locally
|
||||
2. **Operator builds + pushes image**: `docker buildx build --push ...`
|
||||
from the repo root. Build takes 1–3 minutes first time, seconds on
|
||||
warm cache.
|
||||
3. **Image lands in Gitea**: visible at
|
||||
`https://gitea.treytartt.com/admin/-/packages/container/honeydue-api`
|
||||
4. **Operator updates Deployment**: `kubectl set image deployment/api
|
||||
api=gitea.treytartt.com/admin/honeydue-api:$NEW_SHA -n honeydue`
|
||||
5. **K8s begins rolling update**: creates new ReplicaSet with new image
|
||||
6. **Kubelet on target node** sees a pod with an image it doesn't have
|
||||
7. **Kubelet calls containerd**: "pull this image using these creds"
|
||||
8. **Containerd authenticates** to Gitea registry using the PAT from
|
||||
`gitea-credentials` Secret, downloads the image
|
||||
9. **Containerd starts the container** with the new image
|
||||
10. **Readiness probe passes**: new pod joins the Service endpoints
|
||||
11. **Kubelet tears down** an old pod
|
||||
|
||||
## Pushing manually
|
||||
|
||||
If you need to push a one-off image (e.g., testing a fix):
|
||||
|
||||
```bash
|
||||
# Login (once per session)
|
||||
set -a; source deploy/registry.env; set +a
|
||||
printf '%s' "$REGISTRY_TOKEN" | docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin
|
||||
|
||||
# Build + push
|
||||
cd honeyDueAPI-go
|
||||
SHA=$(git rev-parse --short HEAD)
|
||||
docker buildx build \
|
||||
--platform linux/amd64 \
|
||||
--target api \
|
||||
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" \
|
||||
--push .
|
||||
|
||||
# Logout (don't leave creds in ~/.docker/config.json)
|
||||
docker logout gitea.treytartt.com
|
||||
```
|
||||
|
||||
## Image sizes
|
||||
|
||||
Current images:
|
||||
|
||||
| Image | Size | Layers |
|
||||
|---|---|---|
|
||||
| `honeydue-api` | ~53 MB | Alpine base + Go binary |
|
||||
| `honeydue-worker` | ~50 MB | Alpine base + Go binary |
|
||||
| `honeydue-admin` | ~150 MB | Node 20 alpine + Next.js standalone |
|
||||
|
||||
The Go binaries are statically compiled, CGO_ENABLED=0. Alpine is the
|
||||
base for smallest footprint.
|
||||
|
||||
## Image retention
|
||||
|
||||
Gitea does **not auto-prune** images. Every `:<sha>` tag accumulates
|
||||
forever. The package page at
|
||||
`https://gitea.treytartt.com/admin/-/packages/container/honeydue-api`
|
||||
lists them all.
|
||||
|
||||
At current pace (deploys ~few/week, images ~50-150 MB each), this grows
|
||||
~10 GB/year. Not critical; 80 GB node disk can take years.
|
||||
|
||||
**TODO**: Add a monthly cleanup: delete all but last 30 tags per image.
|
||||
Can be a cron job or a manual quarterly cleanup.
|
||||
|
||||
## Image verification — not yet
|
||||
|
||||
We do not sign images or verify signatures. An attacker who compromised
|
||||
Gitea could push a malicious image under an existing tag (though Gitea
|
||||
should prevent tag reuse if immutable tags are configured).
|
||||
|
||||
**TODO** (Chapter 20): Add [cosign](https://github.com/sigstore/cosign)
|
||||
for signing at build time + `Kyverno` or `Connaisseur` policy to verify
|
||||
at pull time.
|
||||
|
||||
## Gitea registry itself
|
||||
|
||||
The Gitea instance runs outside this k3s cluster on its own VPS
|
||||
(operator's existing infrastructure). It's **not** part of the honeyDue
|
||||
deployment — it's adjacent infrastructure.
|
||||
|
||||
If the Gitea host goes down:
|
||||
- Currently-running pods keep working (they already pulled their images)
|
||||
- New deployments/scale-ups fail at the image-pull step
|
||||
- No impact on existing user traffic
|
||||
|
||||
This is an acceptable external dependency. Gitea host has its own
|
||||
uptime story.
|
||||
|
||||
## Cost
|
||||
|
||||
**$0/mo.** Gitea registry is included in the Gitea install we already
|
||||
pay the VPS for (not accounted to honeyDue's cost).
|
||||
|
||||
If we ever switched to GHCR, cost would still be $0 for public images
|
||||
or bundled with our (nonexistent) GitHub Team subscription.
|
||||
|
||||
## What we don't have
|
||||
|
||||
- **Image scanning** (Trivy, Snyk) — scan images for known CVEs on push
|
||||
- **Image signing** (cosign)
|
||||
- **Multi-region replication** — only hosted in one place
|
||||
- **High availability** — Gitea is single-instance
|
||||
|
||||
For our scale, none of these are needed. TODO (Chapter 20) if the
|
||||
operator appetite increases.
|
||||
|
||||
## Operator cheat sheet
|
||||
|
||||
```bash
|
||||
# List packages via API
|
||||
curl -sS "https://gitea.treytartt.com/api/v1/packages/admin?type=container" \
|
||||
-H "Accept: application/json" | jq .
|
||||
|
||||
# Browse in UI
|
||||
# https://gitea.treytartt.com/admin/-/packages
|
||||
|
||||
# Delete a specific tag via API
|
||||
curl -X DELETE \
|
||||
-H "Authorization: token $GITEA_PAT" \
|
||||
"https://gitea.treytartt.com/api/v1/packages/admin/container/honeydue-api/237c6b8"
|
||||
|
||||
# Login from kubectl side (refresh the Secret)
|
||||
kubectl create secret docker-registry gitea-credentials -n honeydue \
|
||||
--docker-server=gitea.treytartt.com \
|
||||
--docker-username=admin \
|
||||
--docker-password=<new PAT> \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
# After rotating PAT, restart pods that use it for pulls
|
||||
kubectl rollout restart -n honeydue deploy/api deploy/admin deploy/worker
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Gitea Container Registry][gitea-cr]
|
||||
- [Docker buildx multi-platform][buildx]
|
||||
- [Kubernetes image pull secrets][pull-secrets]
|
||||
- [cosign][cosign]
|
||||
|
||||
[gitea-cr]: https://docs.gitea.com/usage/packages/container
|
||||
[buildx]: https://docs.docker.com/build/buildx/
|
||||
[pull-secrets]: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
|
||||
[cosign]: https://github.com/sigstore/cosign
|
||||
Reference in New Issue
Block a user