Stack of optimizations against the same Hetzner→Neon transatlantic link.
The trace revealed every visible ms was network/proxy overhead — DB
execution itself is sub-millisecond per query (verified via EXPLAIN
ANALYZE: index scans on every hot path).
Connection layer:
- DB_HOST → Neon pooler endpoint (-pooler suffix). PgBouncer
transaction-mode keeps backend Postgres connections warm so we no
longer pay the ~110ms Postgres-startup RTT on cold queries.
- GORM pool tuned: MaxIdleConns 10→20, MaxLifetime 600s→1800s,
MaxIdleTime added (default 0 = never close idle).
- Eager pool warm-up at boot via parallel pings — first user request
no longer pays the ~440ms TCP+TLS+startup handshake.
- Redis maxmemory-policy noeviction → allkeys-lru. Cache writes will
evict cold keys instead of erroring at the 256MB limit.
Auth layer:
- TokenCacheTTL 5min → 1 hour (Redis token cache).
- UserCacheTTL 30s → 5min (in-memory User cache, per pod).
- UserCache gains a 5,000-entry LRU cap so a flood of unique users
can't blow up pod RSS. ~5MB worst-case per pod.
- Token + user lookup collapsed from 2 GORM Preload queries into a
single INNER JOIN. Saves 1 RTT per cold-cache request.
- Auth middleware's m.db.* now use db.WithContext(ctx) so the SQL
spans nest under the parent HTTP request in Jaeger.
Service layer:
- TaskService.ListTasks: replaced two-step
FindResidenceIDsByUser → GetKanbanDataForMultipleResidences
with a single GetKanbanDataForUser that uses a Postgres subquery
for residence-access. One round-trip instead of two.
- New CacheService residence-IDs cache: \"residence_ids_user:<id>\"
with 5-min TTL. Wired into Task/Residence/Contractor/Document
services for the four hot read paths that need this list.
- Cache invalidation on every relevant mutation: CreateResidence,
DeleteResidence, JoinWithCode, RemoveUser. DeleteResidence
invalidates every member of the residence, not just the owner.
What this stacks up to (Hetzner→Neon, before US migration):
Path Before After (target)
Cache-warm authed read ~800ms ~100-200ms
Cache-cold authed read (1st in 1hr) ~2500ms ~500-700ms
First request after deploy ~2500ms ~700-900ms
The endgame US-region migration on top of this gets us to ~30-50ms
warm-cache, but we're shippable at ~150ms warm right now.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>