Mark roadmap items done (network policies, Traefik middleware, CF Full strict, CF IP UFW restriction, webapp deploy, APNs wired up, admin URL-baking fix, admin probe bug). Update Chapter 4 (firewall rule inventory now shows CF-only :443, no :80), Chapter 6 (request flow walks through TLS on :443 and middleware hops), Chapter 13 (CF SSL mode is Full strict, not Flexible; documents the origin cert install), Chapter 7 (adds the web service section — proxy pattern, 3 replicas, PostHog build-args), and Appendix C (web manifests, CF origin cert paths on disk, APNs .p8 path, updated network-policies applied status). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
21 KiB
07 — Services
Summary
Five workloads run in the honeydue namespace: api (Go REST API, 3
replicas), admin (Next.js admin panel, 1 replica), web (Next.js
customer-facing app, 3 replicas), worker (Go background jobs, 1
replica), and redis (cache + job queue, 1 replica, PVC-backed).
This chapter deep-dives each: container image, resource limits, probes,
volumes, and why each knob is set the way it is.
Overview
| Service | Image | Replicas | Ports | Role |
|---|---|---|---|---|
api |
gitea.treytartt.com/admin/honeydue-api:<sha> |
3 | 8000 | HTTP REST API |
admin |
gitea.treytartt.com/admin/honeydue-admin:<sha> |
1 | 3000 | Next.js admin panel |
web |
gitea.treytartt.com/admin/honeydue-web:<sha> |
3 | 3000 | Next.js customer-facing web client at app.myhoneydue.com |
worker |
gitea.treytartt.com/admin/honeydue-worker:<sha> |
1 | — | Background job processor |
redis |
redis:7-alpine |
1 | 6379 | Cache + Asynq queue |
All five are Kubernetes Deployment workloads (not StatefulSets, not
DaemonSets). They share:
- ServiceAccount with
automountServiceAccountToken: false(Chapter 5) imagePullSecrets: [gitea-credentials](Chapter 11)envFrom: configMapRef: honeydue-config(Chapter 10)- Individual env vars wired to
honeydue-secretskeys - Read-only root filesystem with
tmpemptyDir mounted at/tmp
Service — web (Next.js customer app)
What it does
Lives at https://app.myhoneydue.com. Next.js 16 standalone build,
served by node server.js inside the container. Sibling repo:
/Users/treyt/Desktop/code/honeyDue/honeyDueAPI-Web/.
Architecture: server-side proxy pattern
Unlike the admin panel (which makes CORS requests directly to
api.myhoneydue.com), the web app uses a proxy pattern:
Browser → https://app.myhoneydue.com/api/proxy/tasks/123/
→ Next.js route handler (src/app/api/proxy/[...path]/route.ts)
→ reads honeydue-token httpOnly cookie
→ attaches Authorization: Token <value>
→ https://api.myhoneydue.com/api/tasks/123/ (server-side fetch)
→ response flows back
Consequences:
- Browser never makes cross-origin requests. No CORS entry needed on
the Go API for
app.myhoneydue.com. - Auth tokens live in httpOnly cookies, not localStorage. XSS can't exfiltrate them.
- The web pod needs outbound HTTPS to
api.myhoneydue.com— covered in theallow-egress-from-webNetworkPolicy (Chapter 5).
Env vars
Build-time (baked into the client bundle by the Dockerfile ARG):
NEXT_PUBLIC_API_URL— only used as a fallback; baked for safetyNEXT_PUBLIC_POSTHOG_KEY— PostHog project API keyNEXT_PUBLIC_POSTHOG_HOST—https://analytics.88oakapps.com
Runtime (ConfigMap):
API_URL=https://api.myhoneydue.com/api— consumed by the server-side proxy handlersPORT=3000,HOSTNAME=0.0.0.0
Deployment spec highlights
- 3 replicas, same as api — this is a production customer surface
topologySpreadConstraintsacrosskubernetes.io/hostname— evicting one node at most kills one podreadOnlyRootFilesystem: true;emptyDirs at/app/.next/cache(Next.js build cache) and/tmp- PDB
web-pdbwithminAvailable: 2 - runAsUser/runAsGroup
1001(matches thenextjsuser created in the Dockerfile)
Why same availability as api
The web client is now the primary user-facing surface. Users hitting
app.myhoneydue.com/login should never see a 502 because a single
node went down. 3 replicas × minAvailable: 2 guarantees at least
two pods stay up through any voluntary disruption.
Service 1 — api (Go REST API)
What it does
The Go HTTP API — the heart of the app. Handlers for user auth, residences, tasks, contractors, documents, subscriptions, notifications, etc. Reads/writes to Neon Postgres, reads/writes to Redis cache, reads from Backblaze B2.
Also serves a marketing landing page at / (static HTML + CSS from
/app/static/). This is why the myhoneydue.com apex domain routes to
the api service (Chapter 6).
Deployment spec highlights
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
template:
spec:
serviceAccountName: api
imagePullSecrets: [name: gitea-credentials]
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
seccompProfile: { type: RuntimeDefault }
containers:
- name: api
image: gitea.treytartt.com/admin/honeydue-api:237c6b8
ports: [containerPort: 8000]
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities: { drop: [ALL] }
envFrom: [configMapRef: {name: honeydue-config}]
env:
- name: POSTGRES_PASSWORD
valueFrom: { secretKeyRef: {name: honeydue-secrets, key: POSTGRES_PASSWORD} }
- name: SECRET_KEY
valueFrom: { secretKeyRef: {name: honeydue-secrets, key: SECRET_KEY} }
# ... all other secrets
volumeMounts:
- { name: apns-key, mountPath: /secrets/apns, readOnly: true }
- { name: tmp, mountPath: /tmp }
resources:
requests: { cpu: 100m, memory: 128Mi }
limits: { cpu: 1000m, memory: 512Mi }
startupProbe: { httpGet: {path: /api/health/, port: 8000}, failureThreshold: 48, periodSeconds: 5 }
readinessProbe: { httpGet: {path: /api/health/, port: 8000}, initialDelaySeconds: 5, periodSeconds: 10, timeoutSeconds: 5 }
livenessProbe: { httpGet: {path: /api/health/, port: 8000}, initialDelaySeconds: 30, periodSeconds: 30, timeoutSeconds: 10 }
volumes:
- name: apns-key
secret:
secretName: honeydue-apns-key
items: [key: apns_auth_key.p8, path: apns_auth_key.p8]
- name: tmp
emptyDir: {sizeLimit: 64Mi}
Why each setting
replicas: 3 — one per node via anti-affinity rules (not strictly
required but helpful). Three gives us HA (one pod down = two still
serve traffic) and headroom for rolling updates.
maxUnavailable: 0, maxSurge: 1 — during a rollout, start a 4th
pod before killing any old one. Ensures the service stays at 3 live
pods throughout. maxUnavailable: 0 means zero downtime updates — but
depends on readinessProbe being accurate.
runAsUser: 1000 — the app user created in the Dockerfile. Image
doesn't run as root.
readOnlyRootFilesystem: true — prevents any attacker-introduced
file writes to the image layer. Go binary doesn't need to write to /;
only /tmp is mutable.
startupProbe.failureThreshold: 48 (= 48 × 5s = 240s grace) — this
was bumped up from the scaffold default of 12. Reason: on first boot,
the Go app runs MigrateWithLock() which acquires a Postgres advisory
lock and runs AutoMigrate. First replica takes ~90s; subsequent
replicas wait on the lock. With 3 replicas all starting simultaneously
and the lock serializing them, 240s is the right grace. See
Chapter 19 for the detailed story.
readinessProbe.initialDelaySeconds: 5 — after the startupProbe
passes, wait 5s before starting readiness checks. Prevents a racy
initial failure.
livenessProbe.initialDelaySeconds: 30 — don't start restarting on
liveness failures for 30s after readiness passes. Avoids cascading
failures from false-negative liveness checks.
resources.requests/limits — Kubernetes uses requests for
scheduling (how much a pod "reserves") and limits for enforcement
(max it can use before throttling/OOM). Our api is CPU-bursty for
complex query handling, so we give it 100m baseline with a 1000m ceiling.
512Mi memory ceiling is comfortable — in practice api uses ~100-200Mi.
volumes.apns-key — mounts the honeydue-apns-key Secret as a file
at /secrets/apns/apns_auth_key.p8. The APNS_AUTH_KEY_PATH env var
points to this path. Even though push is currently disabled, the file
must exist because the Go app may try to stat it on startup.
volumes.tmp — emptyDir with sizeLimit: 64Mi. Bounded so a
runaway process can't fill the node's disk.
The Service
apiVersion: v1
kind: Service
metadata:
name: api
namespace: honeydue
spec:
type: ClusterIP
selector: {app.kubernetes.io/name: api}
ports:
- port: 8000
targetPort: 8000
protocol: TCP
ClusterIP 10.43.167.83. Reachable as api.honeydue.svc.cluster.local or
just api from inside the namespace.
HorizontalPodAutoscaler (not yet enabled)
deploy-k3s/manifests/api/hpa.yaml defines an HPA that would scale api
between 3 and 6 replicas based on CPU (70% util) and memory (80% util).
Not currently applied. metrics-server runs but we haven't run
kubectl apply -f api/hpa.yaml. TODO in Chapter 20.
Service 2 — admin (Next.js panel)
What it does
Server-rendered admin UI. Authenticates admin users against a
separate admin_users table in Postgres (seeded with ADMIN_EMAIL +
ADMIN_PASSWORD on first migration). Lets operators view/manage
users, residences, tasks, subscriptions, etc.
Built as a Next.js 16 standalone server.
Why 1 replica
Low traffic. It's an internal tool. One pod suffices. If it crashes, Kubernetes restarts it in ~10s. If the hosting node dies, Kubernetes reschedules to another node.
The cost of running 3 replicas is tiny (Next.js is ~128MB per pod) but has no operational benefit. When the admin panel becomes user-facing, revisit.
Deployment highlights
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
securityContext:
runAsNonRoot: true
runAsUser: 1001 # different from api (1000) for isolation
runAsGroup: 1001
fsGroup: 1001
containers:
- image: gitea.treytartt.com/admin/honeydue-admin:<sha>
ports: [containerPort: 3000]
env:
- name: PORT
value: "3000"
- name: HOSTNAME
value: "0.0.0.0"
- name: NEXT_PUBLIC_API_URL
valueFrom: {configMapKeyRef: {name: honeydue-config, key: NEXT_PUBLIC_API_URL}}
volumeMounts:
- {name: nextjs-cache, mountPath: /app/.next/cache}
- {name: tmp, mountPath: /tmp}
resources:
requests: {cpu: 50m, memory: 64Mi}
limits: {cpu: 500m, memory: 256Mi}
startupProbe:
httpGet: {path: /, port: 3000} # was /admin/ — wrong for this app (Chapter 19)
failureThreshold: 24
periodSeconds: 5
readinessProbe:
httpGet: {path: /, port: 3000}
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
Probe path / — Next.js serves at root. /admin/ (scaffold default)
returns 404 and killed the pod repeatedly during initial bring-up.
See Chapter 19 §Admin probe path for the story.
runAsUser: 1001 — different from api's 1000 so that if one
service were compromised, the stolen UID would at least be distinct
from other services' (minor defense-in-depth).
nextjs-cache — emptyDir mount for Next.js's server-side cache.
Without it, the read-only rootfs would prevent Next from caching
server-rendered pages. Not a persistent volume because cache is
regenerable on restart.
The Service
apiVersion: v1
kind: Service
metadata:
name: admin
spec:
type: ClusterIP
selector: {app.kubernetes.io/name: admin}
ports: [port: 3000, targetPort: 3000]
ClusterIP 10.43.136.168.
Service 3 — worker (Go + Asynq)
What it does
Runs scheduled background jobs via Asynq (a Redis-backed job queue for Go):
- Task reminders (14:00 UTC daily) — notify users of upcoming tasks
- Overdue reminders (15:00 UTC daily) — notify users of overdue tasks
- Daily digest (03:00 UTC daily) — summary email per user
- Onboarding emails — multi-step drip campaign for new users
- Cleanup jobs — expired tokens, stale data
Why 1 replica (hard requirement)
Asynq uses a Scheduler component that does cron-like scheduling. The
Scheduler is not leader-elected by default — if you run two, both
fire every cron task. Users get duplicate emails.
The asynq docs cover this: to scale scheduling, migrate to
PeriodicTaskManager + PeriodicTaskConfigProvider which coordinate
via Redis. Not yet done in our codebase.
Until then: replicas: 1 is a hard constraint. See the comment in the
deployment manifest:
spec:
# Asynq's Scheduler is a singleton — running >1 replica fires every cron
# task once per replica (duplicate daily digests, onboarding emails, etc.).
# Keep at 1 until asynq.PeriodicTaskManager with Redis leader election is
# wired in cmd/worker/main.go.
replicas: 1
What happens if the worker pod dies?
- Asynq schedule state is in Redis (which has AOF persistence)
- When a new worker pod starts, it re-registers the scheduler and picks up where it left off
- Any job that was in-flight (dequeued but not acknowledged) gets retried
by Asynq's automatic retry logic (see the
worker.RetryOptionsin the Go code) - Cron jobs that were supposed to fire during the downtime: fire on the next tick
A 5-minute worker outage = 5 minutes of delayed jobs. Not great but acceptable.
PodDisruptionBudget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: worker-pdb
spec:
minAvailable: 0
selector: {matchLabels: {app.kubernetes.io/name: worker}}
minAvailable: 0 means voluntary disruptions (kubectl drain) can take
the worker down. This matches the singleton constraint: there's only one,
it's OK to drain.
No Service
worker doesn't listen on any HTTP port for application traffic — it's a queue consumer, not a web server. So there's no Kubernetes Service for it.
(On Swarm we had the worker expose a health endpoint at :6060/health;
the k3s scaffold doesn't replicate this. Future work.)
Service 4 — redis
What it does
- Caching layer (ETag-based lookups, user session cache)
- Asynq queue backend (job state, scheduled tasks, retry state)
Why 1 replica
Single-instance Redis with AOF persistence. Not replicated, not clustered. Downsides:
- Node outage = Redis outage (cache regenerates, queue state is preserved by AOF on the PVC)
- No failover — if the node hosting Redis dies, Redis restarts on another node but the PVC is local-path (per-node), so the data is gone
For our scale this is acceptable. Redis holds no authoritative state (everything that matters is in Postgres). Cache regenerates on first request; Asynq retries enqueue on failure.
PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: redis-data
spec:
accessModes: [ReadWriteOnce]
storageClassName: local-path
resources: {requests: {storage: 5Gi}}
Uses k3s' built-in local-path-provisioner. The PVC binds to a local
directory on the node where the Redis pod lands (/var/lib/rancher/k3s/storage/).
ReadWriteOnce means only one pod at a time.
Node affinity
nodeSelector:
honeydue/redis: "true"
We labeled ubuntu-8gb-nbg1-2 (hetzner1) with honeydue/redis=true so
Redis always lands there. This ensures the PVC finds its backing
storage (since PVCs with local-path are per-node).
kubectl label node ubuntu-8gb-nbg1-2 honeydue/redis=true --overwrite
Why not Redis Sentinel / Cluster
Complexity. At our scale (~a few req/s, kilobytes of cache), a single Redis does fine. If Redis becomes critical-path for availability, we'd:
- Use a managed Redis (Upstash, Dragonfly Cloud) — $5-15/mo, their problem
- Or run Redis Sentinel with 3 replicas — manageable but operational work
Neither is needed yet.
Redis config
From the deployment:
command:
- sh
- -c
- |
ARGS="--appendonly yes --appendfsync everysec --maxmemory 256mb --maxmemory-policy noeviction"
if [ -n "$REDIS_PASSWORD" ]; then
ARGS="$ARGS --requirepass $REDIS_PASSWORD"
fi
exec redis-server $ARGS
Settings:
--appendonly yes --appendfsync everysec— AOF persistence, fsync every second. Survives restarts with up to 1 second of data loss.--maxmemory 256mb— Redis will refuse new data if it grows past 256 MB. Gives us a safety cap.--maxmemory-policy noeviction— we'd rather get errors than silently drop data. This is the right choice when Redis holds queue state (losing a queue item silently = missed job).
The REDIS_PASSWORD env var is optional. Currently empty (no auth). The
Redis pod is only reachable from inside the overlay network, and our
NetworkPolicies (once enabled) would restrict egress further.
Resource summary
Combined requests and limits across all services:
| Service | CPU requests | CPU limits | Memory requests | Memory limits | Replicas |
|---|---|---|---|---|---|
| api | 100m | 1000m | 128Mi | 512Mi | 3 |
| admin | 50m | 500m | 64Mi | 256Mi | 1 |
| worker | 50m | 500m | 64Mi | 256Mi | 1 |
| redis | 100m | 500m | 128Mi | 512Mi | 1 |
| traefik (kube-system) | ~100m | unlimited | ~50Mi | unlimited | 3 |
| Total requests | ~750m | ~550Mi |
Each node has 4000m CPU + 8192Mi memory. Total cluster capacity is 12000m + 24576Mi. We're using roughly 6% CPU and 2% memory for requests — tons of headroom.
Health check semantics
Kubernetes distinguishes three probe types:
- startupProbe — is the container done starting? Runs until it passes once, then stops. While running, the other probes are disabled. Failing startupProbe = container killed and restarted.
- readinessProbe — is the container ready to serve traffic? A failing pod is removed from Service endpoints (traffic stops flowing to it) but the pod keeps running.
- livenessProbe — is the container healthy? A failing pod is killed and restarted.
Why we tuned startupProbe separately
The api's first-boot migration takes 90–240s. If we only had a readinessProbe with a typical initialDelay of 5s + failureThreshold of 3, the pod would be killed before migration finishes. startupProbe lets us give generous first-boot grace (240s) without affecting the sharper ongoing readiness/liveness checks.
Probe path design
Each service's /health endpoint should be:
- Cheap (no DB query, no external call)
- Fast (< 100ms)
- Honest (returns 200 iff the process can serve)
Our api's /api/health/ does a trivial check. It does NOT verify Postgres
connectivity (to avoid cascading DB failures tearing down all api pods).
If Postgres is down, api pods stay "ready" and return 5xx for actual
endpoints — that's the right behavior.
Log routing
All container logs go to stdout/stderr. containerd captures them to
/var/log/containers/ on the node. kubectl logs fetches them via the
kubelet's /api/v1/pods//log endpoint.
We have no log aggregation in the cluster (no Loki, no ELK, no Datadog). For debugging we use:
kubectl logs -n honeydue deploy/api -f --prefix
kubectl logs -n honeydue deploy/api --previous # previous pod's logs
See Chapter 15.
Rolling update semantics
When you push a new image and kubectl set image or kubectl apply with
a new image tag:
- Kubernetes creates a new ReplicaSet with the new image
- Starts 1 new pod (per
maxSurge: 1) - Waits for it to pass readinessProbe
- Removes 1 pod from the old ReplicaSet
- Repeats until all N pods are on the new ReplicaSet
- Old ReplicaSet stays around (for rollback) with 0 replicas
For api (3 replicas): total rollout time is roughly
3 × (pod_startup_time + small_buffer) = ~15 minutes in the cold-boot
case, seconds for warm updates where migrations are no-op.
During the rollout:
- Service endpoint set updates as pods become ready
- kube-proxy IPVS is reprogrammed on each node
- Traefik's connection pool to the Service invalidates gradually
Users see no downtime if the new image is compatible. If it's broken:
kubectl rollout undo deployment/api -n honeydue
Reverts to the previous ReplicaSet. Typically takes 30 seconds to stabilize.
Why no StatefulSet
For Redis (the only stateful thing we run), we use a Deployment + PVC. StatefulSet is designed for:
- Ordered startup (pod-0 before pod-1)
- Stable hostnames (pod-0 gets DNS name
redis-0.redis) - Per-replica PVCs
We have one Redis replica. None of those features matter for a singleton. Deployment + PVC + nodeSelector is simpler and equivalent.
If we ever run Redis Sentinel or Cluster, we'd migrate to StatefulSet.
Operator cheat sheet
# See all pods in honeydue namespace
kubectl get pods -n honeydue -o wide
# Per-service rollout status
kubectl rollout status deployment/api -n honeydue
# Scale a service
kubectl scale deployment/api -n honeydue --replicas=5
# Restart all pods (e.g., to re-read a configmap)
kubectl rollout restart deployment/api -n honeydue
# Exec into a pod
kubectl exec -it -n honeydue deploy/admin -- /bin/sh
# Describe a pod (shows events, probe state, restarts)
kubectl describe pod -n honeydue <pod-name>
# Resource usage
kubectl top pods -n honeydue