# 07 — Services > **Updated 2026-05-15 (security remediation):** Redis now requires a > password (`config.yaml` `redis.password` → `honeydue-secrets`), all > workloads deploy by immutable `@sha256:` digest, and the redis/vmagent > base images are digest-pinned. `deploy-k3s/SECURITY.md` is the > authoritative current-state record. ## Summary Five workloads run in the `honeydue` namespace: **api** (Go REST API, 3 replicas), **admin** (Next.js admin panel, 1 replica), **web** (Next.js customer-facing app, 3 replicas), **worker** (Go background jobs, 1 replica), and **redis** (cache + job queue, 1 replica, PVC-backed). This chapter deep-dives each: container image, resource limits, probes, volumes, and why each knob is set the way it is. ## Overview | Service | Image | Replicas | Ports | Role | |---|---|---|---|---| | `api` | `gitea.treytartt.com/admin/honeydue-api:` | 3 | 8000 | HTTP REST API | | `admin` | `gitea.treytartt.com/admin/honeydue-admin:` | 1 | 3000 | Next.js admin panel | | `web` | `gitea.treytartt.com/admin/honeydue-web:` | 3 | 3000 | Next.js customer-facing web client at `app.myhoneydue.com` | | `worker` | `gitea.treytartt.com/admin/honeydue-worker:` | 1 | — | Background job processor | | `redis` | `redis:7-alpine` | 1 | 6379 | Cache + Asynq queue | All five are Kubernetes `Deployment` workloads (not StatefulSets, not DaemonSets). They share: - ServiceAccount with `automountServiceAccountToken: false` (Chapter 5) - `imagePullSecrets: [gitea-credentials]` (Chapter 11) - `envFrom: configMapRef: honeydue-config` (Chapter 10) - Individual env vars wired to `honeydue-secrets` keys - Read-only root filesystem with `tmp` emptyDir mounted at `/tmp` ## Service — web (Next.js customer app) ### What it does Lives at `https://app.myhoneydue.com`. Next.js 16 standalone build, served by `node server.js` inside the container. Sibling repo: `/Users/treyt/Desktop/code/honeyDue/honeyDueAPI-Web/`. ### Architecture: server-side proxy pattern Unlike the admin panel (which makes CORS requests directly to `api.myhoneydue.com`), the web app uses a proxy pattern: ``` Browser → https://app.myhoneydue.com/api/proxy/tasks/123/ → Next.js route handler (src/app/api/proxy/[...path]/route.ts) → reads honeydue-token httpOnly cookie → attaches Authorization: Token → https://api.myhoneydue.com/api/tasks/123/ (server-side fetch) → response flows back ``` **Consequences:** - Browser never makes cross-origin requests. No CORS entry needed on the Go API for `app.myhoneydue.com`. - Auth tokens live in httpOnly cookies, not localStorage. XSS can't exfiltrate them. - The web pod needs outbound HTTPS to `api.myhoneydue.com` — covered in the `allow-egress-from-web` NetworkPolicy (Chapter 5). ### Env vars Build-time (baked into the client bundle by the Dockerfile `ARG`): - `NEXT_PUBLIC_API_URL` — only used as a fallback; baked for safety - `NEXT_PUBLIC_POSTHOG_KEY` — PostHog project API key - `NEXT_PUBLIC_POSTHOG_HOST` — `https://analytics.88oakapps.com` Runtime (ConfigMap): - `API_URL=https://api.myhoneydue.com/api` — consumed by the server-side proxy handlers - `PORT=3000`, `HOSTNAME=0.0.0.0` ### Deployment spec highlights - **3 replicas**, same as api — this is a production customer surface - `topologySpreadConstraints` across `kubernetes.io/hostname` — evicting one node at most kills one pod - `readOnlyRootFilesystem: true`; `emptyDir`s at `/app/.next/cache` (Next.js build cache) and `/tmp` - PDB `web-pdb` with `minAvailable: 2` - runAsUser/runAsGroup `1001` (matches the `nextjs` user created in the Dockerfile) ### Why same availability as api The web client is now the primary user-facing surface. Users hitting `app.myhoneydue.com/login` should never see a 502 because a single node went down. 3 replicas × `minAvailable: 2` guarantees at least two pods stay up through any voluntary disruption. ## Service 1 — api (Go REST API) ### What it does The Go HTTP API — the heart of the app. Handlers for user auth, residences, tasks, contractors, documents, subscriptions, notifications, etc. Reads/writes to Neon Postgres, reads/writes to Redis cache, reads from Backblaze B2. Also serves a marketing landing page at `/` (static HTML + CSS from `/app/static/`). This is why the `myhoneydue.com` apex domain routes to the api service (Chapter 6). ### Deployment spec highlights ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: api spec: replicas: 3 strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 0 maxSurge: 1 template: spec: serviceAccountName: api imagePullSecrets: [name: gitea-credentials] securityContext: runAsNonRoot: true runAsUser: 1000 runAsGroup: 1000 fsGroup: 1000 seccompProfile: { type: RuntimeDefault } containers: - name: api image: gitea.treytartt.com/admin/honeydue-api:237c6b8 ports: [containerPort: 8000] securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: { drop: [ALL] } envFrom: [configMapRef: {name: honeydue-config}] env: - name: POSTGRES_PASSWORD valueFrom: { secretKeyRef: {name: honeydue-secrets, key: POSTGRES_PASSWORD} } - name: SECRET_KEY valueFrom: { secretKeyRef: {name: honeydue-secrets, key: SECRET_KEY} } # ... all other secrets volumeMounts: - { name: apns-key, mountPath: /secrets/apns, readOnly: true } - { name: tmp, mountPath: /tmp } resources: requests: { cpu: 100m, memory: 128Mi } limits: { cpu: 1000m, memory: 512Mi } startupProbe: { httpGet: {path: /api/health/, port: 8000}, failureThreshold: 48, periodSeconds: 5 } readinessProbe: { httpGet: {path: /api/health/, port: 8000}, initialDelaySeconds: 5, periodSeconds: 10, timeoutSeconds: 5 } livenessProbe: { httpGet: {path: /api/health/, port: 8000}, initialDelaySeconds: 30, periodSeconds: 30, timeoutSeconds: 10 } volumes: - name: apns-key secret: secretName: honeydue-apns-key items: [key: apns_auth_key.p8, path: apns_auth_key.p8] - name: tmp emptyDir: {sizeLimit: 64Mi} ``` ### Why each setting **`replicas: 3`** — one per node via anti-affinity rules (not strictly required but helpful). Three gives us HA (one pod down = two still serve traffic) and headroom for rolling updates. **`maxUnavailable: 0, maxSurge: 1`** — during a rollout, start a 4th pod before killing any old one. Ensures the service stays at 3 live pods throughout. `maxUnavailable: 0` means zero downtime updates — but depends on readinessProbe being accurate. **`runAsUser: 1000`** — the `app` user created in the Dockerfile. Image doesn't run as root. **`readOnlyRootFilesystem: true`** — prevents any attacker-introduced file writes to the image layer. Go binary doesn't need to write to `/`; only `/tmp` is mutable. **`startupProbe.failureThreshold: 48`** (= 48 × 5s = 240s grace) — historically bumped from the scaffold default of 12 to absorb in-replica migration time. Now that migrations run out-of-band as a Kubernetes Job ([Chapter 8 §Schema management](./08-database.md)), pods boot in seconds and only need a few probe failures of grace, but the budget stays at 240s because cold pods on a fresh Hetzner node still pay ~10s for image pull + startup. See [Chapter 19 §13](./19-postmortem-swarm.md) for the historical context (the in-replica advisory-lock approach this replaced). **`readinessProbe.initialDelaySeconds: 5`** — after the startupProbe passes, wait 5s before starting readiness checks. Prevents a racy initial failure. **`livenessProbe.initialDelaySeconds: 30`** — don't start restarting on liveness failures for 30s after readiness passes. Avoids cascading failures from false-negative liveness checks. **`resources.requests/limits`** — Kubernetes uses `requests` for scheduling (how much a pod "reserves") and `limits` for enforcement (max it can use before throttling/OOM). Our api is CPU-bursty for complex query handling, so we give it 100m baseline with a 1000m ceiling. 512Mi memory ceiling is comfortable — in practice api uses ~100-200Mi. **`volumes.apns-key`** — mounts the `honeydue-apns-key` Secret as a file at `/secrets/apns/apns_auth_key.p8`. The `APNS_AUTH_KEY_PATH` env var points to this path. Even though push is currently disabled, the file must exist because the Go app may try to stat it on startup. **`volumes.tmp`** — `emptyDir` with `sizeLimit: 64Mi`. Bounded so a runaway process can't fill the node's disk. ### The Service ```yaml apiVersion: v1 kind: Service metadata: name: api namespace: honeydue spec: type: ClusterIP selector: {app.kubernetes.io/name: api} ports: - port: 8000 targetPort: 8000 protocol: TCP ``` ClusterIP `10.43.167.83`. Reachable as `api.honeydue.svc.cluster.local` or just `api` from inside the namespace. ### HorizontalPodAutoscaler (not yet enabled) `deploy-k3s/manifests/api/hpa.yaml` defines an HPA that would scale api between 3 and 6 replicas based on CPU (70% util) and memory (80% util). **Not currently applied.** `metrics-server` runs but we haven't run `kubectl apply -f api/hpa.yaml`. TODO in Chapter 20. ## Service 2 — admin (Next.js panel) ### What it does Server-rendered admin UI. Authenticates admin users against a separate `admin_users` table in Postgres (seeded with `ADMIN_EMAIL` + `ADMIN_PASSWORD` on first migration). Lets operators view/manage users, residences, tasks, subscriptions, etc. Built as a Next.js 16 standalone server. ### Why 1 replica Low traffic. It's an internal tool. One pod suffices. If it crashes, Kubernetes restarts it in ~10s. If the hosting node dies, Kubernetes reschedules to another node. The cost of running 3 replicas is tiny (Next.js is ~128MB per pod) but has no operational benefit. When the admin panel becomes user-facing, revisit. ### Deployment highlights ```yaml replicas: 1 strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 0 maxSurge: 1 securityContext: runAsNonRoot: true runAsUser: 1001 # different from api (1000) for isolation runAsGroup: 1001 fsGroup: 1001 containers: - image: gitea.treytartt.com/admin/honeydue-admin: ports: [containerPort: 3000] env: - name: PORT value: "3000" - name: HOSTNAME value: "0.0.0.0" - name: NEXT_PUBLIC_API_URL valueFrom: {configMapKeyRef: {name: honeydue-config, key: NEXT_PUBLIC_API_URL}} volumeMounts: - {name: nextjs-cache, mountPath: /app/.next/cache} - {name: tmp, mountPath: /tmp} resources: requests: {cpu: 50m, memory: 64Mi} limits: {cpu: 500m, memory: 256Mi} startupProbe: httpGet: {path: /, port: 3000} # was /admin/ — wrong for this app (Chapter 19) failureThreshold: 24 periodSeconds: 5 readinessProbe: httpGet: {path: /, port: 3000} initialDelaySeconds: 5 periodSeconds: 10 timeoutSeconds: 5 ``` **Probe path `/`** — Next.js serves at root. `/admin/` (scaffold default) returns 404 and killed the pod repeatedly during initial bring-up. See Chapter 19 §Admin probe path for the story. **`runAsUser: 1001`** — different from api's 1000 so that if one service were compromised, the stolen UID would at least be distinct from other services' (minor defense-in-depth). **`nextjs-cache`** — emptyDir mount for Next.js's server-side cache. Without it, the read-only rootfs would prevent Next from caching server-rendered pages. Not a persistent volume because cache is regenerable on restart. ### The Service ```yaml apiVersion: v1 kind: Service metadata: name: admin spec: type: ClusterIP selector: {app.kubernetes.io/name: admin} ports: [port: 3000, targetPort: 3000] ``` ClusterIP `10.43.136.168`. ## Service 3 — worker (Go + Asynq) ### What it does Runs scheduled background jobs via [Asynq](https://github.com/hibiken/asynq) (a Redis-backed job queue for Go): - **Task reminders** (14:00 UTC daily) — notify users of upcoming tasks - **Overdue reminders** (15:00 UTC daily) — notify users of overdue tasks - **Daily digest** (03:00 UTC daily) — summary email per user - **Onboarding emails** — multi-step drip campaign for new users - **Cleanup jobs** — expired tokens, stale data ### Why 1 replica (hard requirement) Asynq uses a `Scheduler` component that does cron-like scheduling. The Scheduler is **not leader-elected** by default — if you run two, both fire every cron task. Users get duplicate emails. The asynq docs cover this: to scale scheduling, migrate to `PeriodicTaskManager` + `PeriodicTaskConfigProvider` which coordinate via Redis. Not yet done in our codebase. Until then: `replicas: 1` is a hard constraint. See the comment in the deployment manifest: ```yaml spec: # Asynq's Scheduler is a singleton — running >1 replica fires every cron # task once per replica (duplicate daily digests, onboarding emails, etc.). # Keep at 1 until asynq.PeriodicTaskManager with Redis leader election is # wired in cmd/worker/main.go. replicas: 1 ``` ### What happens if the worker pod dies? - Asynq schedule state is in Redis (which has AOF persistence) - When a new worker pod starts, it re-registers the scheduler and picks up where it left off - Any job that was in-flight (dequeued but not acknowledged) gets retried by Asynq's automatic retry logic (see the `worker.RetryOptions` in the Go code) - Cron jobs that were supposed to fire during the downtime: fire on the next tick A 5-minute worker outage = 5 minutes of delayed jobs. Not great but acceptable. ### PodDisruptionBudget ```yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: worker-pdb spec: minAvailable: 0 selector: {matchLabels: {app.kubernetes.io/name: worker}} ``` `minAvailable: 0` means voluntary disruptions (`kubectl drain`) can take the worker down. This matches the singleton constraint: there's only one, it's OK to drain. ### No Service worker doesn't listen on any HTTP port for application traffic — it's a queue consumer, not a web server. So there's **no Kubernetes Service** for it. (On Swarm we had the worker expose a health endpoint at `:6060/health`; the k3s scaffold doesn't replicate this. Future work.) ## Service 4 — redis ### What it does - Caching layer (ETag-based lookups, user session cache) - Asynq queue backend (job state, scheduled tasks, retry state) ### Why 1 replica Single-instance Redis with AOF persistence. Not replicated, not clustered. Downsides: - Node outage = Redis outage (cache regenerates, queue state is preserved by AOF on the PVC) - No failover — if the node hosting Redis dies, Redis restarts on another node *but* the PVC is local-path (per-node), so the data is gone For our scale this is acceptable. Redis holds no authoritative state (everything that matters is in Postgres). Cache regenerates on first request; Asynq retries enqueue on failure. ### PVC ```yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: redis-data spec: accessModes: [ReadWriteOnce] storageClassName: local-path resources: {requests: {storage: 5Gi}} ``` Uses k3s' built-in `local-path-provisioner`. The PVC binds to a local directory on the node where the Redis pod lands (`/var/lib/rancher/k3s/storage/`). `ReadWriteOnce` means only one pod at a time. ### Node affinity ```yaml nodeSelector: honeydue/redis: "true" ``` We labeled `ubuntu-8gb-nbg1-2` (hetzner1) with `honeydue/redis=true` so Redis always lands there. This ensures the PVC finds its backing storage (since PVCs with `local-path` are per-node). ```bash kubectl label node ubuntu-8gb-nbg1-2 honeydue/redis=true --overwrite ``` ### Why not Redis Sentinel / Cluster Complexity. At our scale (~a few req/s, kilobytes of cache), a single Redis does fine. If Redis becomes critical-path for availability, we'd: - Use a managed Redis (Upstash, Dragonfly Cloud) — $5-15/mo, their problem - Or run Redis Sentinel with 3 replicas — manageable but operational work Neither is needed yet. ### Redis config From the deployment: ```yaml command: - sh - -c - | ARGS="--appendonly yes --appendfsync everysec --maxmemory 256mb --maxmemory-policy noeviction" if [ -n "$REDIS_PASSWORD" ]; then ARGS="$ARGS --requirepass $REDIS_PASSWORD" fi exec redis-server $ARGS ``` Settings: - **`--appendonly yes --appendfsync everysec`** — AOF persistence, fsync every second. Survives restarts with up to 1 second of data loss. - **`--maxmemory 256mb`** — Redis will refuse new data if it grows past 256 MB. Gives us a safety cap. - **`--maxmemory-policy noeviction`** — we'd rather get errors than silently drop data. This is the right choice when Redis holds queue state (losing a queue item silently = missed job). The `REDIS_PASSWORD` env var is optional. Currently empty (no auth). The Redis pod is only reachable from inside the overlay network, and our NetworkPolicies (once enabled) would restrict egress further. ## Resource summary Combined requests and limits across all services: | Service | CPU requests | CPU limits | Memory requests | Memory limits | Replicas | |---|---|---|---|---|---| | api | 100m | 1000m | 128Mi | 512Mi | 3 | | admin | 50m | 500m | 64Mi | 256Mi | 1 | | worker | 50m | 500m | 64Mi | 256Mi | 1 | | redis | 100m | 500m | 128Mi | 512Mi | 1 | | traefik (kube-system) | ~100m | unlimited | ~50Mi | unlimited | 3 | | **Total requests** | **~750m** | | **~550Mi** | | | Each node has 4000m CPU + 8192Mi memory. Total cluster capacity is 12000m + 24576Mi. We're using roughly 6% CPU and 2% memory for requests — tons of headroom. ## Health check semantics Kubernetes distinguishes three probe types: - **startupProbe** — is the container done starting? Runs until it passes once, then stops. While running, the other probes are disabled. Failing startupProbe = container killed and restarted. - **readinessProbe** — is the container ready to serve traffic? A failing pod is removed from Service endpoints (traffic stops flowing to it) but the pod keeps running. - **livenessProbe** — is the container healthy? A failing pod is killed and restarted. ### Why we tuned startupProbe separately The api's first-boot migration takes 90–240s. If we only had a readinessProbe with a typical initialDelay of 5s + failureThreshold of 3, the pod would be killed before migration finishes. startupProbe lets us give generous first-boot grace (240s) without affecting the sharper ongoing readiness/liveness checks. ### Probe path design Each service's `/health` endpoint should be: - Cheap (no DB query, no external call) - Fast (< 100ms) - Honest (returns 200 iff the process can serve) Our api's `/api/health/` does a trivial check. It does NOT verify Postgres connectivity (to avoid cascading DB failures tearing down all api pods). If Postgres is down, api pods stay "ready" and return 5xx for actual endpoints — that's the right behavior. ## Log routing All container logs go to stdout/stderr. containerd captures them to `/var/log/containers/` on the node. `kubectl logs` fetches them via the kubelet's /api/v1/pods//log endpoint. We have **no log aggregation** in the cluster (no Loki, no ELK, no Datadog). For debugging we use: ```bash kubectl logs -n honeydue deploy/api -f --prefix kubectl logs -n honeydue deploy/api --previous # previous pod's logs ``` See [Chapter 15](./15-observability.md). ## Rolling update semantics When you push a new image and `kubectl set image` or `kubectl apply` with a new image tag: 1. Kubernetes creates a new ReplicaSet with the new image 2. Starts 1 new pod (per `maxSurge: 1`) 3. Waits for it to pass readinessProbe 4. Removes 1 pod from the old ReplicaSet 5. Repeats until all N pods are on the new ReplicaSet 6. Old ReplicaSet stays around (for rollback) with 0 replicas For api (3 replicas): total rollout time is roughly `3 × (pod_startup_time + small_buffer)` = ~15 minutes in the cold-boot case, seconds for warm updates where migrations are no-op. During the rollout: - Service endpoint set updates as pods become ready - kube-proxy IPVS is reprogrammed on each node - Traefik's connection pool to the Service invalidates gradually Users see no downtime if the new image is compatible. If it's broken: ```bash kubectl rollout undo deployment/api -n honeydue ``` Reverts to the previous ReplicaSet. Typically takes 30 seconds to stabilize. ## Why no StatefulSet For Redis (the only stateful thing we run), we use a Deployment + PVC. StatefulSet is designed for: - Ordered startup (pod-0 before pod-1) - Stable hostnames (pod-0 gets DNS name `redis-0.redis`) - Per-replica PVCs We have one Redis replica. None of those features matter for a singleton. Deployment + PVC + nodeSelector is simpler and equivalent. If we ever run Redis Sentinel or Cluster, we'd migrate to StatefulSet. ## Operator cheat sheet ```bash # See all pods in honeydue namespace kubectl get pods -n honeydue -o wide # Per-service rollout status kubectl rollout status deployment/api -n honeydue # Scale a service kubectl scale deployment/api -n honeydue --replicas=5 # Restart all pods (e.g., to re-read a configmap) kubectl rollout restart deployment/api -n honeydue # Exec into a pod kubectl exec -it -n honeydue deploy/admin -- /bin/sh # Describe a pod (shows events, probe state, restarts) kubectl describe pod -n honeydue # Resource usage kubectl top pods -n honeydue ``` ## References - [Kubernetes Deployments][deploy] - [Pod lifecycle + probes][probes] - [Asynq scheduler limitations][asynq-sched] - [K3s local-path provisioner][k3s-lp] [deploy]: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ [probes]: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-lifecycle [asynq-sched]: https://github.com/hibiken/asynq/wiki/Periodic-Tasks [k3s-lp]: https://docs.k3s.io/storage#setting-up-the-local-storage-provider