docs: capture latency optimizations + new caching invariants
Shipping commit 88fb175 changed the trace shape and added a new caching
layer with required invalidation rules. Updating the operator-facing
docs so they match the running system.
ch08 (database):
- DB_HOST is the -pooler Neon endpoint, not direct compute
- Connection pool: MaxIdleConns 20 (was 10), MaxLifetime 30m (was 10m),
MaxIdleTime 0 (never close idle)
- New \"Pool warm-up at boot\" section documenting the 20-parallel-ping
warm-up in database.Connect
- Replaced the \"Neon regions\" section: explicit RTT numbers, the
optimization stack that minimizes round-trips, when this still matters
ch15 (observability):
- Replaced the 2,473ms/5-span sample trace with the new 229ms/2-span
post-optimization trace; kept the old one underneath for diff context
ch16 (failure modes):
- Added: stale residence-IDs cache (data freshness bug + recovery)
- Added: Redis at maxmemory limit (verify allkeys-lru policy)
- Added: Neon pooler unreachable but direct endpoint up — emergency
switchover procedure
ch17 (runbook):
- §23 Invalidate residence-IDs cache for a user (DEL key + grep for
missing invalidation in new code)
- §24 Verify DB pool warm-up is working (log pattern + impact test)
- §25 Switch DB host between pooler and direct endpoints
observability-plan.md status flipped from \"plan only\" to shipped
with the latency-cut summary.
README links to the new ch08 latency section.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -378,6 +378,7 @@ Quick links:
|
||||
- **Deploy process** — [docs/deployment/14-deployment-process.md](./docs/deployment/14-deployment-process.md) — `bash deploy-k3s/scripts/03-deploy.sh` builds → pushes → rolls out
|
||||
- **Observability** — [docs/deployment/15-observability.md](./docs/deployment/15-observability.md) — VictoriaMetrics + Jaeger + Grafana on `obs.88oakapps.com`
|
||||
- **Observability plan** — [docs/observability-plan.md](./docs/observability-plan.md) — design doc and rollout phases
|
||||
- **Database / pool tuning** — [docs/deployment/08-database.md](./docs/deployment/08-database.md) — Neon pooler endpoint, GORM pool, warm-up, RTT budget
|
||||
- **Failure modes** — [docs/deployment/16-failure-modes.md](./docs/deployment/16-failure-modes.md) — what happens when X dies
|
||||
- **Swarm postmortem** — [docs/deployment/19-postmortem-swarm.md](./docs/deployment/19-postmortem-swarm.md) — why we migrated
|
||||
|
||||
|
||||
+100
-28
@@ -32,7 +32,7 @@ Neon Launch won on:
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Hostname | `ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech` |
|
||||
| Hostname | `ep-floral-truth-amttbc5a-pooler.c-5.us-east-1.aws.neon.tech` |
|
||||
| Port | 5432 |
|
||||
| Username | `neondb_owner` |
|
||||
| Database | `honeyDue` (case-sensitive!) |
|
||||
@@ -58,9 +58,19 @@ paid tiers much higher.
|
||||
|
||||
### PgBouncer on Neon
|
||||
|
||||
Neon provides a built-in PgBouncer at `-pooler` subdomain. Our hostname
|
||||
already includes `-pooler` handling in the route, so connections go
|
||||
through PgBouncer transparently.
|
||||
Neon provides a built-in PgBouncer at the `-pooler` subdomain. The
|
||||
non-pooler endpoint (`ep-floral-truth-amttbc5a.c-5.us-east-1...`) is
|
||||
the direct compute endpoint and connects straight to Postgres,
|
||||
paying the full TCP+TLS+startup handshake on every cold connection.
|
||||
The `-pooler` endpoint multiplexes through PgBouncer in Neon's
|
||||
infrastructure.
|
||||
|
||||
**We use the `-pooler` endpoint** because the direct endpoint paid
|
||||
~440ms per cold handshake on a transatlantic link, visible as
|
||||
1500ms-tail spikes in /api/tasks/ traces. The pooler keeps backend
|
||||
Postgres connections warm in Neon's data center, so the only
|
||||
latency our Go pods see is one TCP+TLS to PgBouncer (already
|
||||
warm via our pool) plus one query round-trip.
|
||||
|
||||
Modes PgBouncer supports:
|
||||
- **session** — one server connection held per client session (transparent)
|
||||
@@ -68,26 +78,59 @@ Modes PgBouncer supports:
|
||||
- **statement** — per-statement (most aggressive; breaks many features)
|
||||
|
||||
Neon's pooler runs in **transaction mode**. This is compatible with GORM
|
||||
out of the box (we don't use session-level features like prepared
|
||||
statements or session variables).
|
||||
out of the box (we don't use session-level features like LISTEN/NOTIFY
|
||||
or session-scope advisory locks). Note: `database.MigrateWithLock()`
|
||||
needs the *direct* (non-pooler) endpoint because session-level
|
||||
advisory locks don't survive PgBouncer's per-transaction cycling — but
|
||||
the migration helper opens its own ad-hoc connection bypassing the
|
||||
configured pool, so this happens automatically. See `MigrateWithLock`
|
||||
in `internal/database/database.go`.
|
||||
|
||||
### Connection pool settings
|
||||
|
||||
In `prod.env`:
|
||||
In `config.yaml` (rendered into ConfigMap → env vars):
|
||||
|
||||
```
|
||||
DB_MAX_OPEN_CONNS=25
|
||||
DB_MAX_IDLE_CONNS=10
|
||||
DB_MAX_LIFETIME=600s
|
||||
```yaml
|
||||
database:
|
||||
max_open_conns: 25
|
||||
max_idle_conns: 20
|
||||
max_lifetime: "1800s"
|
||||
max_idle_time: "0s"
|
||||
```
|
||||
|
||||
These are the Go `database/sql` pool settings (GORM uses `database/sql`
|
||||
underneath):
|
||||
These map to Go `database/sql` pool settings:
|
||||
|
||||
- **MaxOpenConns: 25** — at most 25 concurrent connections per replica
|
||||
- **MaxIdleConns: 10** — keep up to 10 warm connections ready to reuse
|
||||
- **MaxLifetime: 600s** — recycle connections after 10 min (prevents
|
||||
stale state in long-lived connections, good for Neon's idle timeout)
|
||||
- **MaxOpenConns: 25** — at most 25 concurrent connections per replica.
|
||||
- **MaxIdleConns: 20** — keep up to 20 warm connections per replica
|
||||
ready to reuse. Bumped from 10 because the pooler tolerates many
|
||||
client connections cheaply, and the cost of a cold handshake (~440ms
|
||||
transatlantic) is far higher than the cost of holding an idle
|
||||
connection.
|
||||
- **MaxLifetime: 1800s** — recycle connections after 30 min. Bumped
|
||||
from 600s; with the pooler keeping things warm, longer lifetime
|
||||
reduces churn.
|
||||
- **MaxIdleTime: 0s** — never close idle connections. Lifetime drives
|
||||
recycling instead.
|
||||
|
||||
### Pool warm-up at boot
|
||||
|
||||
`database.Connect()` issues 20 parallel `PingContext` calls
|
||||
immediately after opening the pool. This pre-establishes
|
||||
`MaxIdleConns` connections to the pooler so the first user request
|
||||
doesn't pay any handshake.
|
||||
|
||||
The warm-up is bounded by *one* round-trip time (~440ms cold), not
|
||||
one round-trip per connection — pings run concurrently. Confirmed
|
||||
in pod logs at boot:
|
||||
|
||||
```
|
||||
{"level":"info","requested":20,"warmed":20,"message":"DB pool warm-up complete"}
|
||||
```
|
||||
|
||||
If warm-up partially fails (e.g., 18/20 succeed), the pod still
|
||||
starts; the pool fills the rest under traffic. Failure to ping at all
|
||||
would be caught by the synchronous `sqlDB.Ping()` immediately before,
|
||||
which is fatal.
|
||||
|
||||
### Worst-case connection count
|
||||
|
||||
@@ -229,17 +272,45 @@ value.
|
||||
## Neon regions
|
||||
|
||||
Neon's default region for new projects is `aws-us-east-1` (Virginia).
|
||||
Our DB is there. Latency from Nuremberg to us-east-1 is **~90-120ms
|
||||
round trip**.
|
||||
Our DB is there. Latency from Nuremberg to us-east-1 is **~108ms one-way**
|
||||
TCP-level (verified by `nc -z -w 5` from `hetzner1`), so **~220ms RTT
|
||||
through Neon's pooler stack**.
|
||||
|
||||
This is the slowest hop in our data flow. Every api request that needs
|
||||
a DB query (most of them) pays this latency at least once.
|
||||
a DB query pays this latency at least once. Sub-millisecond Postgres
|
||||
execution time (verified via `EXPLAIN ANALYZE`: 0.04-0.34 ms on every
|
||||
hot path) means **wall-clock latency = network + Neon proxy overhead**.
|
||||
|
||||
**When this matters**: When we start seeing ~200ms+ response times from
|
||||
complex endpoints, it's likely DB latency dominant. Options:
|
||||
- Migrate Neon to `aws-eu-central-1` (Frankfurt) — shaves ~90ms off
|
||||
- Add Redis caching for hot reads (Chapter 7)
|
||||
- Read replicas (Neon supports them on paid tiers)
|
||||
### Optimizations layered on top to minimize round trips
|
||||
|
||||
We don't move the DB region (yet) but we cut the *number* of RTTs per
|
||||
request via:
|
||||
|
||||
1. **Auth caching** (Chapter 7 §Redis) — token + user lookups served
|
||||
from Redis (1-hour TTL) and per-pod in-memory cache (5-min TTL).
|
||||
On warm cache: 0 SQL round-trips for auth.
|
||||
2. **JOIN consolidation** — two-step
|
||||
`find residence-IDs → find tasks IN ids` collapsed into a single
|
||||
query with a Postgres subquery. One RTT instead of two.
|
||||
3. **Single-query auth** — token + user fetched in one INNER JOIN
|
||||
instead of GORM's two-query Preload pattern.
|
||||
4. **Residence-IDs Redis cache** — cached per user with 5-min TTL,
|
||||
invalidated on Create/Delete/Join/Remove. Saves 1 RTT per
|
||||
`/api/documents/`, `/api/contractors/`, `/api/residences/summary/`
|
||||
request.
|
||||
|
||||
After these, a fully-warm `/api/tasks/` is **1 SQL round-trip total
|
||||
(~220ms wall-clock)**. Verified via Jaeger trace — see Chapter 15.
|
||||
|
||||
### When this still matters
|
||||
|
||||
- Any cold-cache request still pays 2-3 RTTs (~500-700ms).
|
||||
- Pod startup pays 1 RTT × 20 (warm-up), but that runs in parallel:
|
||||
~440ms one-shot.
|
||||
|
||||
Long-term fix: migrate Neon to `aws-eu-central-1` (Frankfurt) — drops
|
||||
RTT to ~5ms and brings warm-cache requests under 50ms. Tracked in
|
||||
`docs/observability-plan.md` and Chapter 18 §migration triggers.
|
||||
|
||||
## Environment variables the app reads
|
||||
|
||||
@@ -247,14 +318,15 @@ From ConfigMap:
|
||||
|
||||
| Var | Purpose |
|
||||
|---|---|
|
||||
| `DB_HOST` | Neon pooler hostname |
|
||||
| `DB_HOST` | Neon pooler hostname (`-pooler` suffix) |
|
||||
| `DB_PORT` | 5432 |
|
||||
| `POSTGRES_USER` | `neondb_owner` |
|
||||
| `POSTGRES_DB` | `honeyDue` |
|
||||
| `DB_SSLMODE` | `require` |
|
||||
| `DB_MAX_OPEN_CONNS` | 25 |
|
||||
| `DB_MAX_IDLE_CONNS` | 10 |
|
||||
| `DB_MAX_LIFETIME` | `600s` |
|
||||
| `DB_MAX_IDLE_CONNS` | 20 |
|
||||
| `DB_MAX_LIFETIME` | `1800s` |
|
||||
| `DB_MAX_IDLE_TIME` | `0s` (never close idle) |
|
||||
|
||||
From Secret (`honeydue-secrets`):
|
||||
|
||||
|
||||
@@ -248,7 +248,22 @@ Migrated services (every public method takes ctx):
|
||||
- `NotificationService` — all 12 methods
|
||||
- `SubscriptionService` — all 12 methods including Apple/Google IAP
|
||||
|
||||
Sample trace for `GET /api/tasks/`:
|
||||
Sample trace for `GET /api/tasks/` (warm cache, post-optimization):
|
||||
|
||||
```
|
||||
GET /api/tasks/ (229ms)
|
||||
└── service: SELECT * FROM task_task WHERE residence_id IN
|
||||
(SELECT id FROM residence_residence WHERE...) (227ms)
|
||||
```
|
||||
|
||||
Two spans total. The auth path runs entirely from Redis + in-memory
|
||||
cache (zero SQL queries) thanks to the 1-hour token TTL and 5-min user
|
||||
TTL. The residence-ID lookup is folded into the tasks query as a
|
||||
Postgres subquery, so a single network round-trip to Neon services the
|
||||
whole request. See Chapter 8 §"Optimizations layered on top" for the
|
||||
optimization stack.
|
||||
|
||||
Earlier trace, before the optimization stack landed (commit 88fb175):
|
||||
|
||||
```
|
||||
GET /api/tasks/ (2473ms)
|
||||
@@ -258,10 +273,12 @@ GET /api/tasks/ (2473ms)
|
||||
└── service: SELECT * FROM task_task WHERE residence_id IN(...) (226ms)
|
||||
```
|
||||
|
||||
Each query labeled with `db.statement`. The transatlantic Hetzner→Neon
|
||||
RTT (~110ms one-way) shows up clearly — that's the perf bottleneck the
|
||||
flame graph makes obvious. See [Chapter 18 — Cost](./18-cost.md) for
|
||||
the planned move to a US-region cluster.
|
||||
10× improvement from 2,473ms to 229ms by cutting query count
|
||||
(5 SQL → 1 SQL on warm cache). The 227ms in the surviving query is
|
||||
**1 transatlantic round-trip** to Neon us-east-1 from Hetzner
|
||||
Nuremberg — the physical floor on the current setup. Eliminated by
|
||||
migrating Neon to a EU region; tracked in [Chapter 18 §migration
|
||||
triggers](./18-cost.md) and `docs/observability-plan.md`.
|
||||
|
||||
**Migration pattern (for any future services or middleware):** add
|
||||
`ctx context.Context` as the first arg, change the handler call site
|
||||
|
||||
@@ -245,12 +245,58 @@ finds an empty data directory (or can't mount at all).
|
||||
- If the original node is gone: Redis starts empty. Cache regenerates.
|
||||
Asynq queue state is lost; pending jobs re-queue on retry, cron
|
||||
fires re-schedule on next tick.
|
||||
- Auth caches (token + residence-IDs) regenerate on first user
|
||||
request — first request per user pays full DB lookup, then warm
|
||||
again. Visible as a brief latency spike in the Grafana RED
|
||||
dashboard, not a functional failure.
|
||||
- Ensure the node label `honeydue/redis=true` is on a healthy node:
|
||||
```bash
|
||||
kubectl label node <new-node> honeydue/redis=true --overwrite
|
||||
kubectl label node <dead-node> honeydue/redis- 2>/dev/null || true
|
||||
```
|
||||
|
||||
#### Stale residence-IDs cache (data freshness bug)
|
||||
|
||||
**Symptom**: a user accepts a share-code or has a residence
|
||||
removed, but `/api/tasks/`, `/api/documents/`, `/api/contractors/`,
|
||||
or `/api/residences/summary/` continues to show the old
|
||||
membership for up to 5 minutes.
|
||||
**Cause**: a residence-membership-mutating code path landed
|
||||
without calling `cache.InvalidateResidenceIDsForUsers(...)`. The
|
||||
cache TTL is 5 min so the issue self-heals, but it's user-visible.
|
||||
**Recovery (immediate)**: flush the affected user's cache key
|
||||
manually. See [Chapter 17 §residence-IDs cache invalidation](./17-runbook.md).
|
||||
**Prevention (permanent)**: every mutation that changes
|
||||
`residence_residence.owner_id`, `residence_residence_users.user_id`,
|
||||
or deletes a residence MUST invalidate. Existing call sites for
|
||||
reference: `CreateResidence` (owner), `DeleteResidence`
|
||||
(all members), `JoinWithCode` (joining user), `RemoveUser`
|
||||
(removed user). The pattern lives in
|
||||
`internal/services/residence_id_cache.go`.
|
||||
|
||||
#### Redis at maxmemory limit
|
||||
|
||||
**Symptom**: Redis logs `OOM command not allowed when used memory > 'maxmemory'`.
|
||||
Should be rare — current production usage is ~2.4 MB against a 256 MB
|
||||
limit and the policy is `allkeys-lru` (cache writes evict cold keys
|
||||
instead of erroring).
|
||||
**Recovery**: confirm the policy is still `allkeys-lru`:
|
||||
```bash
|
||||
kubectl -n honeydue exec deploy/redis -- redis-cli CONFIG GET maxmemory-policy
|
||||
```
|
||||
If it's somehow `noeviction`, set it live:
|
||||
```bash
|
||||
kubectl -n honeydue exec deploy/redis -- redis-cli CONFIG SET maxmemory-policy allkeys-lru
|
||||
```
|
||||
And re-apply the manifest at `deploy-k3s/manifests/redis/deployment.yaml`
|
||||
so the change survives a pod restart.
|
||||
|
||||
If memory usage is genuinely climbing toward the cap, check for
|
||||
runaway keys without TTLs:
|
||||
```bash
|
||||
kubectl -n honeydue exec deploy/redis -- redis-cli --bigkeys
|
||||
```
|
||||
|
||||
### External service failures
|
||||
|
||||
#### Neon Postgres outage
|
||||
@@ -264,6 +310,23 @@ until Neon is back.
|
||||
Postgres-level failover.
|
||||
**Frequency**: Neon has had a handful of hours-scale outages since launch.
|
||||
|
||||
#### Neon pooler endpoint unreachable but direct endpoint up
|
||||
|
||||
**Symptom**: `dial tcp ep-floral-truth-amttbc5a-pooler.c-5...: i/o
|
||||
timeout` in api logs but the direct compute endpoint is reachable.
|
||||
Rare — Neon's pooler runs in their infra alongside compute — but
|
||||
possible during pooler maintenance.
|
||||
**Recovery (emergency)**: switch `DB_HOST` in `config.yaml` from the
|
||||
`-pooler` to the direct hostname (drop the `-pooler` segment),
|
||||
re-apply ConfigMap, rolling-restart api and worker:
|
||||
```bash
|
||||
# Edit deploy-k3s/config.yaml: database.host: ep-floral-truth-amttbc5a.c-5...
|
||||
# Then:
|
||||
KUBECONFIG=~/.kube/honeydue.yaml bash deploy-k3s/scripts/03-deploy.sh --skip-build
|
||||
```
|
||||
Cold-handshake latency goes back up (~440ms first hit) but the API
|
||||
keeps serving. Switch back when the pooler recovers.
|
||||
|
||||
#### Backblaze B2 outage
|
||||
|
||||
**Symptom**: image uploads fail; image downloads fail unless cached by
|
||||
|
||||
@@ -358,6 +358,81 @@ Workaround: in each pod's logs, search for a unique user identifier:
|
||||
stern -n honeydue api | grep "user_id=12345"
|
||||
```
|
||||
|
||||
## 23. Invalidate residence-IDs cache for a user
|
||||
|
||||
Used when a user reports stale data ("I joined a residence but my
|
||||
tasks list still shows the old one"). The cache is keyed on user ID
|
||||
with 5-min TTL — most issues self-heal — but you can flush manually.
|
||||
|
||||
```bash
|
||||
# Single user
|
||||
kubectl -n honeydue exec deploy/redis -- redis-cli DEL "residence_ids_user:7"
|
||||
|
||||
# All users (nuclear; everyone pays one DB lookup on next request)
|
||||
kubectl -n honeydue exec deploy/redis -- redis-cli --scan --pattern "residence_ids_user:*" \
|
||||
| xargs -r -n 100 kubectl -n honeydue exec deploy/redis -- redis-cli DEL
|
||||
```
|
||||
|
||||
Mutation paths that should invalidate this cache automatically (any
|
||||
new code that changes membership must call
|
||||
`cache.InvalidateResidenceIDsForUsers(ctx, userIDs...)`):
|
||||
|
||||
- `ResidenceService.CreateResidence` → owner
|
||||
- `ResidenceService.DeleteResidence` → all members
|
||||
- `ResidenceService.JoinWithCode` → joining user
|
||||
- `ResidenceService.RemoveUser` → removed user
|
||||
|
||||
If a user keeps reporting stale data, grep for missing invalidation:
|
||||
|
||||
```bash
|
||||
grep -rn "residenceRepo.*Add\|RemoveUser\|residence_residence_users" internal/ \
|
||||
| grep -v cache | grep -v _test
|
||||
```
|
||||
|
||||
## 24. Verify DB pool warm-up is working
|
||||
|
||||
After a deploy, check the api pod log for the warm-up confirmation:
|
||||
|
||||
```bash
|
||||
kubectl -n honeydue logs -l app.kubernetes.io/name=api --tail=50 \
|
||||
| grep "DB pool warm-up complete"
|
||||
```
|
||||
|
||||
Expected output (per pod):
|
||||
|
||||
```json
|
||||
{"level":"info","requested":20,"warmed":20,"message":"DB pool warm-up complete"}
|
||||
```
|
||||
|
||||
If `warmed` < `requested`, the pool partially failed at boot — pod
|
||||
still starts, fills from there. If `warmed=0`, something's wrong with
|
||||
either Neon connectivity or auth — check the next log line for the
|
||||
specific error.
|
||||
|
||||
To test impact: hit the api right after a rollout. With warm-up
|
||||
working, the first request should be ~250ms (1 RTT). Without warm-up,
|
||||
the first request is ~700ms (full handshake).
|
||||
|
||||
## 25. Switch DB host between pooler and direct endpoints
|
||||
|
||||
The pooler endpoint (`-pooler` suffix) is the default — it cuts
|
||||
cold-handshake latency by ~3 RTTs. The direct endpoint
|
||||
(`ep-floral-truth-amttbc5a.c-5...`) is the fallback.
|
||||
|
||||
```bash
|
||||
# Edit deploy-k3s/config.yaml — change database.host
|
||||
# To pooler: ep-floral-truth-amttbc5a-pooler.c-5.us-east-1.aws.neon.tech
|
||||
# To direct: ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech
|
||||
|
||||
KUBECONFIG=~/.kube/honeydue.yaml bash deploy-k3s/scripts/03-deploy.sh --skip-build
|
||||
```
|
||||
|
||||
The pooler runs in transaction mode so any session-scope feature
|
||||
(LISTEN/NOTIFY, session advisory locks for migrations) auto-falls
|
||||
through to direct via `MigrateWithLock` opening its own connection.
|
||||
But if you ever add session-level features in the data path, they'll
|
||||
need the direct endpoint.
|
||||
|
||||
## References
|
||||
|
||||
- [kubectl cheat sheet][kubectl-cs]
|
||||
|
||||
@@ -6,7 +6,9 @@
|
||||
|
||||
**Why not in the honeyDue k3s cluster:** Frees ~700 MB across the 3 Hetzner nodes, no PVC plumbing, and no need to expose anything from k3s — everything is push-from-app to a public TLS endpoint.
|
||||
|
||||
**Status:** Plan only — nothing implemented yet.
|
||||
**Status:** Fully shipped. VictoriaMetrics + Jaeger + Grafana on `obs.88oakapps.com`, vmagent in-cluster, OTel SDK and otelgorm wired into the api+worker, every authed endpoint produces nested HTTP→service→SQL flame graphs in Jaeger.
|
||||
|
||||
The first round of traces revealed every visible ms was network/proxy overhead — DB execution itself is sub-millisecond. The follow-up work (`internal/services/residence_id_cache.go`, GORM pool warm-up, auth-query JOIN consolidation, switching `DB_HOST` to Neon's `-pooler` endpoint, bumped cache TTLs) cut warm-cache `/api/tasks/` from 2,473 ms / 5 spans to **229 ms / 2 spans** — see commit `88fb175` and Chapter 8 §"Optimizations layered on top".
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user