Files
Trey t 6f303dbbaa
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled
Migrate prod deploy from Swarm to K3s; add full deployment book
Infrastructure:
- Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers)
- Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh
- All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept
  temporarily for reference

Bug fixes surfaced during migration:
- Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25)
- cache_service.go: remove sync.Once reassignment from inside Do()
  callback (was causing 'unlock of unlocked mutex' fatal after
  Redis Ping failure)
- router.go: relax CSP from 'default-src none' to 'default-src self'
  + allowlist fonts.googleapis.com so the marketing landing page CSS
  actually loads in browsers
- deploy/scripts/deploy_prod.sh: use docker buildx with
  --platform linux/amd64 so arm64 (Apple Silicon) dev machines produce
  images runnable on x86_64 Hetzner nodes; fix array expansion under
  set -u
- deploy/swarm-stack.prod.yml: fix secret source references to use
  top-level aliases (the '\${X_SECRET}' form never actually resolved);
  dozzle ports: long-form host_ip is rejected by Swarm, switched to
  short-form (bound to 0.0.0.0 with UFW-based loopback restriction);
  worker replicas 2 -> 1 (Asynq scheduler singleton)
- deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/'
  (Next.js serves at root; /admin/ returned 404 and killed pods);
  startupProbe failureThreshold 12 -> 24
- deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable
  1 -> 0 (singleton)
- deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold
  12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot;
  real startup takes up to 240s)
- .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/
  and admin/src/app/api/*, hiding legitimate files)

New files:
- deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet +
  hostNetwork override for k3s-bundled Traefik
- deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress
  without TLS (CF Flexible SSL) and without middleware
- deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log

Documentation:
- docs/deployment/ — full deployment book, 26 files, ~42k words:
  - Part I Overview, infrastructure, orchestrator choice (Ch 0-2)
  - Part II Networking, firewall, Cloudflare (Ch 3-4, 13)
  - Part III Security, Traefik ingress (Ch 5-6)
  - Part IV Services, DB, storage, secrets, registry (Ch 7-11)
  - Part V Data flow, deploy process, observability, failures, runbook
    (Ch 12, 14-17)
  - Part VI Cost, Swarm postmortem, roadmap (Ch 18-20)
  - Appendices: glossary, kubectl cheat sheet, file locations,
    consolidated citations
- README.md: Production Deployment section replaced with pointer to
  the book; Go version bumped to 1.25

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 07:20:54 -05:00

330 lines
11 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 11 — Container Registry (Gitea)
## Summary
We host our own container registry on Gitea at `gitea.treytartt.com`.
Every image push and pull goes there, not Docker Hub or GHCR. The Gitea
instance runs outside this k3s cluster (on its own VPS) and is available
at `https://gitea.treytartt.com` with public HTTPS. Image pulls are
authenticated via a Personal Access Token stored as a Kubernetes
`dockerconfigjson` Secret.
## Why Gitea
### Decision matrix
| Option | Cost | Auth model | Pros | Cons |
|---|---|---|---|---|
| **Gitea built-in registry** | $0 (already running Gitea) | Gitea PAT | Self-hosted, integrated with code | Another service to maintain |
| GHCR (GitHub Container Registry) | Free for public, $0 for private with paid plan | GitHub PAT | Popular, reliable | Uses GitHub; vendor dependency |
| Docker Hub | Free tier limited; paid $5-7/mo | Docker Hub account | Ubiquitous | Rate limits on anonymous pulls |
| AWS ECR | ~$1/mo for small use | IAM | Integrates with AWS workloads | AWS account required |
| Harbor (self-hosted) | $0 | Many options | Best enterprise features | Heavy to operate |
Gitea won primarily because **the operator was already running Gitea for
code hosting**. Container registry is built into Gitea 1.17+ as a free
feature. One fewer service to set up.
Side benefits:
- Code and images live together (one backup policy, one access model)
- PATs are scoped and rotatable via the same UI
- No external vendor to worry about for this critical piece of the
deploy pipeline
Rejected alternatives:
- **Docker Hub** — rate limits on unauthenticated pulls would bite us if
nodes pull the same image repeatedly during rolling updates
- **GHCR** — fine but adds GitHub dependency we don't otherwise have
- **Harbor** — massive overkill; we're not a 100-team enterprise
## Layout
Images live under the authenticated user's namespace:
```
gitea.treytartt.com/admin/honeydue-api:237c6b8
gitea.treytartt.com/admin/honeydue-worker:237c6b8
gitea.treytartt.com/admin/honeydue-admin:237c6b8
```
`admin` is the Gitea user that owns the images. Images are **private**
by default.
### Image tagging strategy
Tags are git short SHAs (e.g., `237c6b8`). Not `:latest`. Not semantic
version.
Rationale:
- `:latest` is ambiguous — which build? Rolling updates should roll a
*specific* tag so rollbacks are deterministic.
- `:v1.2.3` works for released libraries but our app rolls forward
continuously; versioning per deploy is unnecessary overhead.
- Git SHAs are unique, immutable, and tie each image to the exact
commit that built it.
`PUSH_LATEST_TAG=false` is set in `deploy/cluster.env`. When we rebuild
and push, only the SHA tag gets pushed. The `latest` tag is never
created by our deploy pipeline.
## Authentication
### Creating the PAT
At <https://gitea.treytartt.com/-/user/settings/applications>, we created
a token with scopes:
- `read:package`
- `write:package`
No other scopes. This token can only interact with package registry; it
can't read repo contents, create issues, or touch account settings.
### PAT on the operator workstation
Stored in `deploy/registry.env`:
```
REGISTRY=gitea.treytartt.com
REGISTRY_NAMESPACE=admin
REGISTRY_USERNAME=admin
REGISTRY_TOKEN=<pat>
```
This file is `.gitignore`d in `deploy/.gitignore`. If it ever gets
committed accidentally, rotate the PAT immediately.
### PAT in the cluster
Stored as the `gitea-credentials` Secret (type `dockerconfigjson`) in
the `honeydue` namespace. See Chapter 10.
Kubelet reads this Secret when a pod needs to pull from the Gitea
registry.
## The build pipeline
### Dockerfile multi-stage
`honeyDueAPI-go/Dockerfile` has three target stages:
- `api` — compiled Go binary + static assets for the HTTP API
- `worker` — compiled Go binary for the background worker
- `admin` — Next.js standalone build of the admin panel
A single Dockerfile keeps build-cache sharing efficient (the Go builder
stage produces binaries for both api and worker; admin reuses its own
Node builder stage).
### Multi-arch cross-compilation
The operator workstation is **arm64** (Apple Silicon). The Hetzner nodes
are **x86_64**. A naive `docker build` on arm64 produces arm64 images
that won't run on the nodes (`exec format error`).
The deploy pipeline uses `docker buildx`:
```bash
docker buildx build \
--platform linux/amd64 \
--target api \
-t gitea.treytartt.com/admin/honeydue-api:$SHA \
--push \
/Users/treyt/Desktop/code/honeyDue/honeyDueAPI-go
```
- **`--platform linux/amd64`** — cross-compile to x86_64
- **`--target api`** — which Dockerfile stage to build
- **`--push`** — push directly to the registry (skip local image cache)
The Go stages use the `TARGETARCH` build arg to produce the right
architecture binary. Node stages use QEMU emulation (which is slower but
acceptable for our ~1 min admin build).
### Buildx builder
We use a named buildx builder to keep state out of Docker's default
environment:
```bash
docker buildx create --name honeydue-builder --use
docker buildx inspect --bootstrap
```
The `honeydue-builder` is a docker-container driver — spawns a
BuildKit container when building, tears it down when idle. Supports
multi-platform and caches layers across builds.
## From local file to cluster — the full path
```mermaid
flowchart LR
subgraph dev[Operator workstation]
Code[Source code]
Dockerfile
Buildx[docker buildx]
end
subgraph Gitea[gitea.treytartt.com]
Reg[Package registry]
end
subgraph K8s[k3s cluster]
Kubelet
Containerd
Pod
end
Code --> Dockerfile
Dockerfile --> Buildx
Buildx -- push --> Reg
Reg -- pull --> Kubelet
Kubelet --> Containerd
Containerd --> Pod
```
### End-to-end
1. **Operator pushes code**: commits to `main` locally
2. **Operator builds + pushes image**: `docker buildx build --push ...`
from the repo root. Build takes 13 minutes first time, seconds on
warm cache.
3. **Image lands in Gitea**: visible at
`https://gitea.treytartt.com/admin/-/packages/container/honeydue-api`
4. **Operator updates Deployment**: `kubectl set image deployment/api
api=gitea.treytartt.com/admin/honeydue-api:$NEW_SHA -n honeydue`
5. **K8s begins rolling update**: creates new ReplicaSet with new image
6. **Kubelet on target node** sees a pod with an image it doesn't have
7. **Kubelet calls containerd**: "pull this image using these creds"
8. **Containerd authenticates** to Gitea registry using the PAT from
`gitea-credentials` Secret, downloads the image
9. **Containerd starts the container** with the new image
10. **Readiness probe passes**: new pod joins the Service endpoints
11. **Kubelet tears down** an old pod
## Pushing manually
If you need to push a one-off image (e.g., testing a fix):
```bash
# Login (once per session)
set -a; source deploy/registry.env; set +a
printf '%s' "$REGISTRY_TOKEN" | docker login "$REGISTRY" -u "$REGISTRY_USERNAME" --password-stdin
# Build + push
cd honeyDueAPI-go
SHA=$(git rev-parse --short HEAD)
docker buildx build \
--platform linux/amd64 \
--target api \
-t "gitea.treytartt.com/admin/honeydue-api:${SHA}" \
--push .
# Logout (don't leave creds in ~/.docker/config.json)
docker logout gitea.treytartt.com
```
## Image sizes
Current images:
| Image | Size | Layers |
|---|---|---|
| `honeydue-api` | ~53 MB | Alpine base + Go binary |
| `honeydue-worker` | ~50 MB | Alpine base + Go binary |
| `honeydue-admin` | ~150 MB | Node 20 alpine + Next.js standalone |
The Go binaries are statically compiled, CGO_ENABLED=0. Alpine is the
base for smallest footprint.
## Image retention
Gitea does **not auto-prune** images. Every `:<sha>` tag accumulates
forever. The package page at
`https://gitea.treytartt.com/admin/-/packages/container/honeydue-api`
lists them all.
At current pace (deploys ~few/week, images ~50-150 MB each), this grows
~10 GB/year. Not critical; 80 GB node disk can take years.
**TODO**: Add a monthly cleanup: delete all but last 30 tags per image.
Can be a cron job or a manual quarterly cleanup.
## Image verification — not yet
We do not sign images or verify signatures. An attacker who compromised
Gitea could push a malicious image under an existing tag (though Gitea
should prevent tag reuse if immutable tags are configured).
**TODO** (Chapter 20): Add [cosign](https://github.com/sigstore/cosign)
for signing at build time + `Kyverno` or `Connaisseur` policy to verify
at pull time.
## Gitea registry itself
The Gitea instance runs outside this k3s cluster on its own VPS
(operator's existing infrastructure). It's **not** part of the honeyDue
deployment — it's adjacent infrastructure.
If the Gitea host goes down:
- Currently-running pods keep working (they already pulled their images)
- New deployments/scale-ups fail at the image-pull step
- No impact on existing user traffic
This is an acceptable external dependency. Gitea host has its own
uptime story.
## Cost
**$0/mo.** Gitea registry is included in the Gitea install we already
pay the VPS for (not accounted to honeyDue's cost).
If we ever switched to GHCR, cost would still be $0 for public images
or bundled with our (nonexistent) GitHub Team subscription.
## What we don't have
- **Image scanning** (Trivy, Snyk) — scan images for known CVEs on push
- **Image signing** (cosign)
- **Multi-region replication** — only hosted in one place
- **High availability** — Gitea is single-instance
For our scale, none of these are needed. TODO (Chapter 20) if the
operator appetite increases.
## Operator cheat sheet
```bash
# List packages via API
curl -sS "https://gitea.treytartt.com/api/v1/packages/admin?type=container" \
-H "Accept: application/json" | jq .
# Browse in UI
# https://gitea.treytartt.com/admin/-/packages
# Delete a specific tag via API
curl -X DELETE \
-H "Authorization: token $GITEA_PAT" \
"https://gitea.treytartt.com/api/v1/packages/admin/container/honeydue-api/237c6b8"
# Login from kubectl side (refresh the Secret)
kubectl create secret docker-registry gitea-credentials -n honeydue \
--docker-server=gitea.treytartt.com \
--docker-username=admin \
--docker-password=<new PAT> \
--dry-run=client -o yaml | kubectl apply -f -
# After rotating PAT, restart pods that use it for pulls
kubectl rollout restart -n honeydue deploy/api deploy/admin deploy/worker
```
## References
- [Gitea Container Registry][gitea-cr]
- [Docker buildx multi-platform][buildx]
- [Kubernetes image pull secrets][pull-secrets]
- [cosign][cosign]
[gitea-cr]: https://docs.gitea.com/usage/packages/container
[buildx]: https://docs.docker.com/build/buildx/
[pull-secrets]: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
[cosign]: https://github.com/sigstore/cosign