docs: capture latency optimizations + new caching invariants
Shipping commit 88fb175 changed the trace shape and added a new caching
layer with required invalidation rules. Updating the operator-facing
docs so they match the running system.
ch08 (database):
- DB_HOST is the -pooler Neon endpoint, not direct compute
- Connection pool: MaxIdleConns 20 (was 10), MaxLifetime 30m (was 10m),
MaxIdleTime 0 (never close idle)
- New \"Pool warm-up at boot\" section documenting the 20-parallel-ping
warm-up in database.Connect
- Replaced the \"Neon regions\" section: explicit RTT numbers, the
optimization stack that minimizes round-trips, when this still matters
ch15 (observability):
- Replaced the 2,473ms/5-span sample trace with the new 229ms/2-span
post-optimization trace; kept the old one underneath for diff context
ch16 (failure modes):
- Added: stale residence-IDs cache (data freshness bug + recovery)
- Added: Redis at maxmemory limit (verify allkeys-lru policy)
- Added: Neon pooler unreachable but direct endpoint up — emergency
switchover procedure
ch17 (runbook):
- §23 Invalidate residence-IDs cache for a user (DEL key + grep for
missing invalidation in new code)
- §24 Verify DB pool warm-up is working (log pattern + impact test)
- §25 Switch DB host between pooler and direct endpoints
observability-plan.md status flipped from \"plan only\" to shipped
with the latency-cut summary.
README links to the new ch08 latency section.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -358,6 +358,81 @@ Workaround: in each pod's logs, search for a unique user identifier:
|
||||
stern -n honeydue api | grep "user_id=12345"
|
||||
```
|
||||
|
||||
## 23. Invalidate residence-IDs cache for a user
|
||||
|
||||
Used when a user reports stale data ("I joined a residence but my
|
||||
tasks list still shows the old one"). The cache is keyed on user ID
|
||||
with 5-min TTL — most issues self-heal — but you can flush manually.
|
||||
|
||||
```bash
|
||||
# Single user
|
||||
kubectl -n honeydue exec deploy/redis -- redis-cli DEL "residence_ids_user:7"
|
||||
|
||||
# All users (nuclear; everyone pays one DB lookup on next request)
|
||||
kubectl -n honeydue exec deploy/redis -- redis-cli --scan --pattern "residence_ids_user:*" \
|
||||
| xargs -r -n 100 kubectl -n honeydue exec deploy/redis -- redis-cli DEL
|
||||
```
|
||||
|
||||
Mutation paths that should invalidate this cache automatically (any
|
||||
new code that changes membership must call
|
||||
`cache.InvalidateResidenceIDsForUsers(ctx, userIDs...)`):
|
||||
|
||||
- `ResidenceService.CreateResidence` → owner
|
||||
- `ResidenceService.DeleteResidence` → all members
|
||||
- `ResidenceService.JoinWithCode` → joining user
|
||||
- `ResidenceService.RemoveUser` → removed user
|
||||
|
||||
If a user keeps reporting stale data, grep for missing invalidation:
|
||||
|
||||
```bash
|
||||
grep -rn "residenceRepo.*Add\|RemoveUser\|residence_residence_users" internal/ \
|
||||
| grep -v cache | grep -v _test
|
||||
```
|
||||
|
||||
## 24. Verify DB pool warm-up is working
|
||||
|
||||
After a deploy, check the api pod log for the warm-up confirmation:
|
||||
|
||||
```bash
|
||||
kubectl -n honeydue logs -l app.kubernetes.io/name=api --tail=50 \
|
||||
| grep "DB pool warm-up complete"
|
||||
```
|
||||
|
||||
Expected output (per pod):
|
||||
|
||||
```json
|
||||
{"level":"info","requested":20,"warmed":20,"message":"DB pool warm-up complete"}
|
||||
```
|
||||
|
||||
If `warmed` < `requested`, the pool partially failed at boot — pod
|
||||
still starts, fills from there. If `warmed=0`, something's wrong with
|
||||
either Neon connectivity or auth — check the next log line for the
|
||||
specific error.
|
||||
|
||||
To test impact: hit the api right after a rollout. With warm-up
|
||||
working, the first request should be ~250ms (1 RTT). Without warm-up,
|
||||
the first request is ~700ms (full handshake).
|
||||
|
||||
## 25. Switch DB host between pooler and direct endpoints
|
||||
|
||||
The pooler endpoint (`-pooler` suffix) is the default — it cuts
|
||||
cold-handshake latency by ~3 RTTs. The direct endpoint
|
||||
(`ep-floral-truth-amttbc5a.c-5...`) is the fallback.
|
||||
|
||||
```bash
|
||||
# Edit deploy-k3s/config.yaml — change database.host
|
||||
# To pooler: ep-floral-truth-amttbc5a-pooler.c-5.us-east-1.aws.neon.tech
|
||||
# To direct: ep-floral-truth-amttbc5a.c-5.us-east-1.aws.neon.tech
|
||||
|
||||
KUBECONFIG=~/.kube/honeydue.yaml bash deploy-k3s/scripts/03-deploy.sh --skip-build
|
||||
```
|
||||
|
||||
The pooler runs in transaction mode so any session-scope feature
|
||||
(LISTEN/NOTIFY, session advisory locks for migrations) auto-falls
|
||||
through to direct via `MigrateWithLock` opening its own connection.
|
||||
But if you ever add session-level features in the data path, they'll
|
||||
need the direct endpoint.
|
||||
|
||||
## References
|
||||
|
||||
- [kubectl cheat sheet][kubectl-cs]
|
||||
|
||||
Reference in New Issue
Block a user