perf(subscription-status): cache + parallelize + invalidate on mutations
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled

GET /api/subscription/status/ was the slowest endpoint in the API at
p50≈1750ms / p95≈2425ms — about 12× the floor for our cluster→Neon
geography. Jaeger traces showed seven sequential SQL queries each
costing roughly one transatlantic RTT (~110ms), with the actual queries
running in 0.073ms at the database. Pure network serialization, not slow
SQL.

Three changes, in order of leverage:

1. Cache the assembled SubscriptionStatusResponse per-user in Redis with
   a 5-minute TTL. Hot path collapses to a single Redis GET (~5ms) on
   warm reads; the TTL is a safety net against missed invalidations.

2. Parallelize the three independent COUNT queries in getUserUsage
   (task_task / task_contractor / task_document) via golang.org/x/sync
   errgroup. Three RTTs collapse to one. Also dropped the redundant
   residence_residence COUNT — len(residenceIDs) from FindResidenceIDsByOwner
   is the same number, no need to re-query.

3. Wire explicit invalidation into every mutation that could change a
   user's response — residence/task/contractor/document CRUD,
   residence membership changes (JoinWithCode, RemoveUser, DeleteResidence),
   and every subscription tier flip across the IAP/Stripe/webhook surface.
   Residence-scoped invalidations fan out to every user with access via a
   new ResidenceRepository.FindUserIDsByResidence helper, so members of a
   shared residence don't see stale `usage` numbers when another member
   adds a task.

Net effect: warm path goes from ~1350ms to ~5ms (Redis hit). Cold path
goes from ~1350ms to ~250-450ms (5 sequential queries → 2 phases:
residence IDs lookup, then parallel task/contractor/document counts).

Also fixed a pre-existing CheckLimit signature drift in
internal/integration/subscription_is_free_test.go that was blocking the
package build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-05-01 11:00:23 -07:00
parent 0798ae8d74
commit 9bee436e86
11 changed files with 286 additions and 34 deletions
+61
View File
@@ -495,3 +495,64 @@ func (c *CacheService) InvalidateSubscriptionSettings(ctx context.Context) error
}
return c.Delete(ctx, subscriptionSettingsKey)
}
// === SubscriptionStatus cache (per-user) ===
//
// SubscriptionStatusResponse aggregates subscription tier, all tier limits, and
// per-user usage counts (residences/tasks/contractors/documents). The usage
// part requires 4+ COUNT queries against the transatlantic Neon Postgres at
// ~110ms RTT each — about a second of wall-clock per call before parallelism.
// Caching the assembled response collapses that to a single Redis GET (~5ms).
//
// TTL is short (5 min) so stale state self-heals if any mutation path forgets
// to invalidate. The primary correctness mechanism is explicit invalidation
// via InvalidateSubscriptionStatusForUsers — called from every CRUD on
// residences, tasks, contractors, documents, and subscription itself, fanning
// out to every user with access to the affected residence.
const (
subscriptionStatusKeyPrefix = "sub_status:user:"
subscriptionStatusTTL = 5 * time.Minute
)
// CacheSubscriptionStatus stores the assembled SubscriptionStatusResponse for
// a user. Caller passes any encodable value to keep this package free of
// service-layer types; subscription_service.go marshals/unmarshals.
// Best-effort — Redis errors are returned but not fatal.
func (c *CacheService) CacheSubscriptionStatus(ctx context.Context, userID uint, status interface{}) error {
if c == nil {
return nil
}
key := fmt.Sprintf("%s%d", subscriptionStatusKeyPrefix, userID)
data, err := json.Marshal(status)
if err != nil {
return err
}
return c.client.Set(ctx, key, data, subscriptionStatusTTL).Err()
}
// GetCachedSubscriptionStatus unmarshals the cached response into dest.
// Returns redis.Nil on cache miss so callers can distinguish from genuine errors.
func (c *CacheService) GetCachedSubscriptionStatus(ctx context.Context, userID uint, dest interface{}) error {
if c == nil {
return fmt.Errorf("cache not available")
}
key := fmt.Sprintf("%s%d", subscriptionStatusKeyPrefix, userID)
return c.Get(ctx, key, dest)
}
// InvalidateSubscriptionStatusForUsers drops the cached status for one or more
// users. Used by every mutation that could change a user's usage counts —
// residence create/delete/share, task/contractor/document CRUD, subscription
// purchase/cancel/restore. Membership-changing residence ops fan out to every
// user with access to that residence.
func (c *CacheService) InvalidateSubscriptionStatusForUsers(ctx context.Context, userIDs ...uint) error {
if c == nil || len(userIDs) == 0 {
return nil
}
keys := make([]string, len(userIDs))
for i, id := range userIDs {
keys[i] = fmt.Sprintf("%s%d", subscriptionStatusKeyPrefix, id)
}
return c.Delete(ctx, keys...)
}