# 12 — Data Flow ## Summary This chapter follows a user's request end to end, hop by hop. It's the consolidated picture of Chapters 3, 6, 7, 8, 9 working together. Use this chapter to answer "when X doesn't work, which layer failed?" ## Scenario: User creates a task A user in Austin opens the mobile app and adds a new task for their property. The client sends `POST https://api.myhoneydue.com/api/tasks/` with a JSON body and an auth token. We trace every hop. ## Hop 1 — Mobile client → Cloudflare edge ```mermaid sequenceDiagram participant App as iOS client participant DNS as Local DNS participant CFE as Cloudflare edge (DFW) App->>DNS: Resolve api.myhoneydue.com DNS->>App: 104.21.13.7 (Cloudflare edge IP) App->>CFE: TCP SYN :443 CFE-->>App: TCP SYN+ACK App->>CFE: TLS ClientHello CFE->>App: TLS ServerHello + cert Note over App,CFE: TLS 1.3 handshake
~1 RTT App->>CFE: HTTP/2 stream
POST /api/tasks/
Authorization: Token ``` - Client resolves `api.myhoneydue.com` via OS resolver, gets Cloudflare edge IP (not our origin IP) - Client establishes TLS 1.3 to CF's nearest POP (Dallas for Austin) - Cert presented by CF is `sni.cloudflaressl.com` or a CF-issued `*.myhoneydue.com` — our origin cert is never seen by the client - Latency: ~5–15 ms Austin → DFW ## Hop 2 — Cloudflare edge → Origin (hetzner) ```mermaid sequenceDiagram participant CFE as Cloudflare DFW POP participant DNS as CF internal DNS participant HN as hetzner node (random of 3) participant Traefik as Traefik pod
(host network) CFE->>DNS: Which origin for api.myhoneydue.com? DNS->>CFE: One of 178.104.247.152, 178.105.32.198, 178.104.249.189 CFE->>HN: TCP SYN :80 HN-->>CFE: SYN+ACK CFE->>HN: HTTP/1.1 POST /api/tasks/
Host: api.myhoneydue.com
X-Forwarded-For:
X-Forwarded-Proto: https
CF-Connecting-IP: Note over HN: UFW: allow 80/tcp from
anywhere (anywhere for now) HN->>Traefik: delivered to listener ``` - CF picks one of the 3 node IPs via DNS round-robin. This is per-connection, not per-request. - Protocol between CF and origin: **HTTP/1.1 plaintext** (SSL=Flexible). A future Full-strict upgrade would make this HTTPS. - Latency: ~90–120 ms DFW → Nuremberg - CF adds headers: `CF-Connecting-IP`, `X-Forwarded-For`, `X-Forwarded-Proto` ## Hop 3 — Traefik → api Service ```mermaid sequenceDiagram participant Traefik as Traefik pod participant CoreDNS as CoreDNS (10.43.0.10) participant KP as kube-proxy IPVS
(kernel) participant APIPod as api pod
(some node) Note over Traefik: Match Host: api.myhoneydue.com
→ honeydue-api Ingress
→ backend: api Service :8000 Traefik->>CoreDNS: Resolve "api" CoreDNS->>Traefik: 10.43.167.83 (Service ClusterIP) Traefik->>KP: TCP SYN to 10.43.167.83:8000 KP->>KP: IPVS: pick endpoint
from Service endpoint set KP->>APIPod: Rewrite destination
to 10.42.2.6:8000
(Flannel VXLAN if remote node) ``` - Traefik resolves `api` via CoreDNS → gets the Service ClusterIP - Traefik sends to `10.43.167.83:8000` - kube-proxy IPVS (running in-kernel on the node where Traefik lives) intercepts, picks a live endpoint, rewrites - Destination might be local (same node) or remote (VXLAN tunnel to another node) - Latency: <3 ms even cross-node ## Hop 4 — api → Postgres (Neon) ```mermaid sequenceDiagram participant API as api pod (Go) participant Resolv as Pod resolv.conf participant Neon as Neon pooler
AWS us-east-1 API->>Resolv: Resolve ep-floral-truth-...-pooler.us-east-1.aws.neon.tech Note over Resolv: Goes to CoreDNS
which forwards to upstream
(Hetzner's DNS, then public root) Resolv->>API: Neon pooler IP (e.g., 34.206.177.121) API->>Neon: TCP :5432 API->>Neon: TLS 1.3 handshake (DB_SSLMODE=require) API->>Neon: Postgres startup (user, database) API->>Neon: BEGIN
SELECT ... FROM task_task WHERE residence_id = ?
INSERT INTO task_task (...) VALUES (...)
COMMIT Neon-->>API: Query results ``` - Go's database/sql pool may already have an idle connection. If so, skip handshake. - If new connection: ~50 ms TLS handshake + Postgres startup - Query itself: typically ~5–20 ms (single-row read/write on indexed columns) - Total for this hop: often <10 ms on a warm connection, ~80 ms cold ## Hop 5 — api → Redis (cache miss invalidation) ```mermaid sequenceDiagram participant API as api pod participant CoreDNS participant KP as kube-proxy participant Redis as redis pod API->>CoreDNS: Resolve "redis" CoreDNS->>API: 10.43.7.10 API->>KP: TCP :6379 KP->>Redis: rewritten to 10.42.x.y:6379 API->>Redis: DEL tasks:user: (invalidate cached list) Redis-->>API: OK ``` - Redis connection is usually kept alive in the api's pool - Latency: <1 ms (Redis is on hetzner2, usually a short hop) ## Hop 6 — api → worker (enqueue side effect) For some task creation events, api enqueues a background job (send-notification, update-lookup-table, etc.): ```mermaid sequenceDiagram participant API as api pod participant Redis as redis pod (acting as Asynq queue) participant Worker as worker pod API->>Redis: RPUSH asynq:queue:default Redis-->>API: OK Note over API,Worker: (Async, no response blocking) Worker->>Redis: BLPOP asynq:queue:default Redis-->>Worker: Worker->>Worker: Process job
(send email, push, etc.) ``` api returns to the caller without waiting for the job. ## Hop 7 — Response back to user Reverse the path: 1. api returns JSON response to Traefik 2. Traefik returns to Cloudflare 3. Cloudflare re-encrypts TLS to user 4. User receives response ## End-to-end latency budget For a typical "create task" operation: | Hop | Latency | |---|---| | User → CF (Austin → DFW) | 5–15 ms | | CF → hetzner (cross-Atlantic) | 90–120 ms | | UFW + kernel + Traefik accept | <1 ms | | Traefik → api (same or cross-node) | 1–3 ms | | api request parsing, auth validation | 1–3 ms | | api → Postgres (query) | 20–60 ms | | api → Redis (invalidate) | <1 ms | | api response generation | 1–5 ms | | Return path | same as forward, reversed | **Total**: ~220–310 ms typical. Dominated by the cross-Atlantic CF→origin hop and the Postgres query round trip. ## Read path (GET /api/tasks/) Similar but simpler: ```mermaid sequenceDiagram participant App as iOS client participant CF as Cloudflare participant Traefik participant API as api pod participant Redis participant Neon App->>CF: GET /api/tasks/ CF->>Traefik: (no cache hit) Traefik->>API: Route via Service API->>Redis: GET tasks:user: alt Cache hit Redis-->>API: cached JSON else Cache miss API->>Neon: SELECT ... Neon-->>API: rows API->>Redis: SET tasks:user: EX 300 end API-->>Traefik: 200 JSON Traefik-->>CF: 200 CF-->>App: 200 (may cache per response headers) ``` ## Admin panel data flow A different dance because the admin is Next.js: ```mermaid sequenceDiagram participant Browser participant CF participant Traefik participant Admin as admin pod (Next.js) participant AdminAPI as api pod
(via public URL) participant Neon Browser->>CF: GET admin.myhoneydue.com/users CF->>Traefik: HTTP :80 Traefik->>Admin: Service /users Note over Admin: Next.js SSR:
fetch from NEXT_PUBLIC_API_URL Admin->>CF: GET api.myhoneydue.com/api/admin/users/ CF->>Traefik: (api ingress) Traefik->>AdminAPI: Service AdminAPI->>Neon: SELECT ... FROM auth_user Neon-->>AdminAPI: rows AdminAPI-->>Admin: JSON Admin->>Admin: Render HTML Admin-->>Traefik: HTML Traefik-->>CF: HTML CF-->>Browser: HTML ``` Notably, the admin pod's calls to api go **back out to Cloudflare** and in through the public URL. Not the in-cluster Service IP. This is because `NEXT_PUBLIC_API_URL=https://api.myhoneydue.com` — Next.js builds use the same URL for browser-side and server-side fetches. This is **suboptimal** — server-side (SSR) calls could use the internal `api.honeydue.svc:8000` URL and skip the CF round-trip. Future optimization: separate `NEXT_PUBLIC_API_URL` (browser) from `API_URL` (server-side). ## Static asset flow For the marketing landing page at `https://myhoneydue.com/`: 1. CF caches HTML per `Cache-Control` (the Go app sets short TTLs) 2. CF caches CSS / JS / images aggressively (via default CF rules) 3. First request hits origin, subsequent requests served from CF edge The static assets live inside the api container at `/app/static/`. Served by Echo's static file handler at routes `/css`, `/js`, `/images`. ## Request flow during a rolling update When a new api image is deployed, some requests will hit old pods and some will hit new pods for a few minutes: ```mermaid sequenceDiagram participant CF participant Traefik participant OldPod as api pod v1 participant NewPod as api pod v2 (starting) Note over NewPod: kubelet starts new pod Note over NewPod: pod connects to Postgres
MigrateWithLock runs (no-op)
HTTP server starts
readinessProbe passes Note over NewPod: kube-proxy updates endpoints
NewPod added to Service pool CF->>Traefik: request 1 Traefik->>OldPod: routed (old pod still in pool) CF->>Traefik: request 2 Traefik->>NewPod: routed (new pod now in pool) Note over OldPod: Kubelet terminates old pod
(graceful SIGTERM, then SIGKILL after grace) CF->>Traefik: request 3 Traefik->>NewPod: routed (OldPod gone from pool) ``` Both old and new handle traffic simultaneously until the rolling update completes. As long as the new code is API-compatible, users don't notice. ## Failure modes in the data path See [Chapter 16 — Failure Modes](./16-failure-modes.md) for a full catalog. Quick summary: | Layer fails | User sees | Recovery | |---|---|---| | Cloudflare DNS down | Can't resolve api.myhoneydue.com | Manual DNS fallback; extremely rare | | Cloudflare edge down (single POP) | Slow, CF routes to another POP | Automatic | | Node NIC fails | Some requests time out (CF routes away) | Cluster reschedules pods | | UFW misconfig blocks :80 | 521 errors at CF | Re-add rule | | Traefik pod down on one node | CF routes to other nodes | Automatic | | kube-proxy broken on one node | Pods on that node can't reach Services | Restart kubelet | | CoreDNS down | New connections fail DNS | Restart CoreDNS | | Flannel broken between nodes | Cross-node pod communication fails | Restart flannel or node | | api pod OOM | 502 to user briefly | kubelet restarts pod | | Postgres down | 500 errors from api | Neon-side issue; outage | | Redis down | api serves without cache (degraded) | Restart Redis pod | | B2 down | Uploads fail, existing content served if cached | Backblaze-side outage | ## References - [Chapter 3 — Networking](./03-networking.md) for the overlay mechanics - [Chapter 6 — Traefik](./06-traefik-ingress.md) for routing details - [Chapter 7 — Services](./07-services.md) for per-service specifics - [Chapter 16 — Failure Modes](./16-failure-modes.md) for what-if scenarios