GET /api/subscription/status/ was the slowest endpoint in the API at
p50≈1750ms / p95≈2425ms — about 12× the floor for our cluster→Neon
geography. Jaeger traces showed seven sequential SQL queries each
costing roughly one transatlantic RTT (~110ms), with the actual queries
running in 0.073ms at the database. Pure network serialization, not slow
SQL.
Three changes, in order of leverage:
1. Cache the assembled SubscriptionStatusResponse per-user in Redis with
a 5-minute TTL. Hot path collapses to a single Redis GET (~5ms) on
warm reads; the TTL is a safety net against missed invalidations.
2. Parallelize the three independent COUNT queries in getUserUsage
(task_task / task_contractor / task_document) via golang.org/x/sync
errgroup. Three RTTs collapse to one. Also dropped the redundant
residence_residence COUNT — len(residenceIDs) from FindResidenceIDsByOwner
is the same number, no need to re-query.
3. Wire explicit invalidation into every mutation that could change a
user's response — residence/task/contractor/document CRUD,
residence membership changes (JoinWithCode, RemoveUser, DeleteResidence),
and every subscription tier flip across the IAP/Stripe/webhook surface.
Residence-scoped invalidations fan out to every user with access via a
new ResidenceRepository.FindUserIDsByResidence helper, so members of a
shared residence don't see stale `usage` numbers when another member
adds a task.
Net effect: warm path goes from ~1350ms to ~5ms (Redis hit). Cold path
goes from ~1350ms to ~250-450ms (5 sequential queries → 2 phases:
residence IDs lookup, then parallel task/contractor/document counts).
Also fixed a pre-existing CheckLimit signature drift in
internal/integration/subscription_is_free_test.go that was blocking the
package build.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stack of optimizations against the same Hetzner→Neon transatlantic link.
The trace revealed every visible ms was network/proxy overhead — DB
execution itself is sub-millisecond per query (verified via EXPLAIN
ANALYZE: index scans on every hot path).
Connection layer:
- DB_HOST → Neon pooler endpoint (-pooler suffix). PgBouncer
transaction-mode keeps backend Postgres connections warm so we no
longer pay the ~110ms Postgres-startup RTT on cold queries.
- GORM pool tuned: MaxIdleConns 10→20, MaxLifetime 600s→1800s,
MaxIdleTime added (default 0 = never close idle).
- Eager pool warm-up at boot via parallel pings — first user request
no longer pays the ~440ms TCP+TLS+startup handshake.
- Redis maxmemory-policy noeviction → allkeys-lru. Cache writes will
evict cold keys instead of erroring at the 256MB limit.
Auth layer:
- TokenCacheTTL 5min → 1 hour (Redis token cache).
- UserCacheTTL 30s → 5min (in-memory User cache, per pod).
- UserCache gains a 5,000-entry LRU cap so a flood of unique users
can't blow up pod RSS. ~5MB worst-case per pod.
- Token + user lookup collapsed from 2 GORM Preload queries into a
single INNER JOIN. Saves 1 RTT per cold-cache request.
- Auth middleware's m.db.* now use db.WithContext(ctx) so the SQL
spans nest under the parent HTTP request in Jaeger.
Service layer:
- TaskService.ListTasks: replaced two-step
FindResidenceIDsByUser → GetKanbanDataForMultipleResidences
with a single GetKanbanDataForUser that uses a Postgres subquery
for residence-access. One round-trip instead of two.
- New CacheService residence-IDs cache: \"residence_ids_user:<id>\"
with 5-min TTL. Wired into Task/Residence/Contractor/Document
services for the four hot read paths that need this list.
- Cache invalidation on every relevant mutation: CreateResidence,
DeleteResidence, JoinWithCode, RemoveUser. DeleteResidence
invalidates every member of the residence, not just the owner.
What this stacks up to (Hetzner→Neon, before US migration):
Path Before After (target)
Cache-warm authed read ~800ms ~100-200ms
Cache-cold authed read (1st in 1hr) ~2500ms ~500-700ms
First request after deploy ~2500ms ~700-900ms
The endgame US-region migration on top of this gets us to ~30-50ms
warm-cache, but we're shippable at ~150ms warm right now.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every public method on these five services now takes ctx context.Context as
the first arg and routes its repo calls through .WithContext(ctx). With
TaskService and ResidenceService already migrated, this means every
in-process service that touches Postgres now produces a flame graph in
Jaeger where the SQL spans nest under the parent HTTP request span.
Endpoints now fully traced (HTTP → service → SQL):
- /api/auth/login, /register, /logout, /me, /verify-email, /resend-verification
- /api/auth/forgot-password, /verify-reset, /reset-password, /update-profile
- /api/contractors/* (CRUD + favorite + by-residence + tasks)
- /api/documents/* (CRUD + activate/deactivate + image upload/delete)
- /api/notifications/* (list, count, mark-read, prefs, devices)
- /api/subscription/* (status, purchase, cancel, triggers, promotions)
- All previously-migrated /api/tasks/* and /api/residences/* paths
Internal helpers also threaded:
- TaskService.sendTaskCompletedNotification → forwards ctx
- TaskService.UpdateUserTimezone → forwards ctx to NotificationService
- ResidenceService.CreateResidence → forwards ctx to SubscriptionService.CheckLimit
- NotificationService.registerAPNSDevice / registerGCMDevice → both take ctx
~75 method signatures, ~120 handler/test call sites updated. Tests pass
green; the only failure is the pre-existing flaky TaskHandler_QuickComplete
SQLite race that fails ~60% of runs on master.
Step 3 of the observability plan is now genuinely complete: every API
endpoint backed by a Go service emits a per-request flame graph with
HTTP → service → SQL spans, plus B2/APNs/FCM/asynq spans where applicable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Database Indexes (migrations 006-009):
- Add case-insensitive indexes for auth lookups (email, username)
- Add composite indexes for task kanban queries
- Add indexes for notification, document, and completion queries
- Add unique index for active share codes
- Remove redundant idx_share_code_active and idx_notification_user_sent
Repository Optimizations:
- Add FindResidenceIDsByUser() lightweight method (IDs only, no preloads)
- Optimize GetResidenceUsers() with single UNION query (was 2 queries)
- Optimize kanban completion preloads to minimal columns (id, task_id, completed_at)
Service Optimizations:
- Remove Category/Priority/Frequency preloads from task queries
- Remove summary calculations from CRUD responses (client calculates)
- Use lightweight FindResidenceIDsByUser() instead of full FindByUser()
These changes reduce database load and response times for common operations.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add GET /contractors/by-residence/:residence_id/ endpoint
- Create comprehensive Bruno API collection (89 endpoints)
- Collection covers all API endpoints with Local and Dev environments
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Send welcome email to new users who sign up via Apple Sign In
- Create notification preferences (all enabled) when new accounts are created
- Add comprehensive integration tests for contractor sharing:
- Personal contractors only visible to creator
- Residence-tied contractors visible to all users with residence access
- Update/delete access control for shared contractors
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Make residence_id nullable in contractor model
- Add created_by_id field to track contractor creator
- Update access control: personal contractors visible only to creator,
residence contractors visible to all residence users
- Add database migration for schema changes
- Update admin panel DTOs and handlers for optional residence
- Fix test utilities for new model structure
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Features:
- PDF service for generating task reports with ReportLab-style formatting
- Storage service for file uploads (local and S3-compatible)
- Admin authentication middleware with JWT support
- Admin user model and repository
Infrastructure:
- Updated Docker configuration for admin panel builds
- Email service enhancements for task notifications
- Updated router with admin and file upload routes
- Environment configuration updates
Tests:
- Unit tests for handlers (auth, residence, task)
- Unit tests for models (user, residence, task)
- Unit tests for repositories (user, residence, task)
- Unit tests for services (residence, task)
- Integration test setup
- Test utilities for mocking database and services
Database:
- Admin user seed data
- Updated test data seeds
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Complete rewrite of Django REST API to Go with:
- Gin web framework for HTTP routing
- GORM for database operations
- GoAdmin for admin panel
- Gorush integration for push notifications
- Redis for caching and job queues
Features implemented:
- User authentication (login, register, logout, password reset)
- Residence management (CRUD, sharing, share codes)
- Task management (CRUD, kanban board, completions)
- Contractor management (CRUD, specialties)
- Document management (CRUD, warranties)
- Notifications (preferences, push notifications)
- Subscription management (tiers, limits)
Infrastructure:
- Docker Compose for local development
- Database migrations and seed data
- Admin panel for data management
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>