Cache SubscriptionSettings + cut monitoring poll noise
Backend CI / Test (push) Has been cancelled
Backend CI / Contract Tests (push) Has been cancelled
Backend CI / Build (push) Has been cancelled
Backend CI / Lint (push) Has been cancelled
Backend CI / Secret Scanning (push) Has been cancelled

Trace data revealed subscription_subscriptionsettings was consuming
1,983s of cumulative DB time per day (180× more than the next-largest
table) for a 32-byte singleton row of admin-toggleable global flags.
Root cause was a 30-second poll loop in monitoring.Service per pod
plus uncached reads on every authed status check / CreateResidence /
Stripe webhook. Fix is layered:

1. Redis cache for SubscriptionSettings — same shape as the
   residence-IDs cache. 30-min TTL, explicit invalidation on admin
   write. New CacheService.{Cache,GetCached,Invalidate}SubscriptionSettings
   plus a cachedSubscriptionSettings helper in services/.

2. SubscriptionService, StripeService, and both admin handlers
   (settings + limitations) now read through the cache. Admin write
   handlers invalidate so toggles propagate cluster-wide within ms
   instead of waiting for the TTL.

3. monitoring.Service.syncSettingsFromDB also reads from Redis first
   (raw redis.Client to avoid a services→monitoring import cycle).
   Polling interval bumped 30s → 5min. Combined with Redis-shared
   cache, cluster-wide DB hits from this poll go from ~480/hour to
   ~2/hour — a 240× reduction.

4. StripeService.CreateCheckoutSession now takes ctx so the cached
   settings span (and the Stripe webhook trace) stay attached to the
   request. Handler call site updated.

5. Admin handlers' direct h.db.First calls switched to
   db.WithContext(ctx) so the resulting orphan SQL spans nest under
   the admin request span in Jaeger.

Net DB query rate for subscription_subscriptionsettings should drop
from 0.101/sec to ~0/sec with occasional invalidation-driven refills,
and the table's cumulative DB time from 1,983s/day to ~10s/day.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-04-26 21:29:30 -05:00
parent c9ac273dbd
commit b67f7f9e6b
10 changed files with 240 additions and 32 deletions
+14 -5
View File
@@ -1,6 +1,7 @@
package services
import (
"context"
"encoding/json"
"fmt"
"time"
@@ -24,6 +25,12 @@ type StripeService struct {
subscriptionRepo *repositories.SubscriptionRepository
userRepo *repositories.UserRepository
webhookSecret string
cache *CacheService
}
// SetCacheService wires Redis caching for SubscriptionSettings reads.
func (s *StripeService) SetCacheService(cache *CacheService) {
s.cache = cache
}
// NewStripeService creates a new Stripe service. It initializes the global
@@ -58,7 +65,7 @@ func NewStripeService(
// CreateCheckoutSession creates a Stripe Checkout Session for a web subscription purchase.
// It ensures the user has a Stripe customer record and configures the session with a trial
// period if the user has not used their trial yet.
func (s *StripeService) CreateCheckoutSession(userID uint, priceID string, successURL string, cancelURL string) (string, error) {
func (s *StripeService) CreateCheckoutSession(ctx context.Context, userID uint, priceID string, successURL string, cancelURL string) (string, error) {
// Get or create the user's subscription record
sub, err := s.subscriptionRepo.GetOrCreate(userID)
if err != nil {
@@ -94,7 +101,7 @@ func (s *StripeService) CreateCheckoutSession(userID uint, priceID string, succe
// Offer a trial period if the user has not used their trial yet
if !sub.TrialUsed {
trialDays, err := s.getTrialDays()
trialDays, err := s.getTrialDays(ctx)
if err != nil {
log.Warn().Err(err).Msg("Failed to get trial duration from settings, skipping trial")
} else if trialDays > 0 {
@@ -444,9 +451,11 @@ func (s *StripeService) findSubscriptionByStripeID(stripeSubID string) (*models.
return sub, nil
}
// getTrialDays reads the trial duration from SubscriptionSettings.
func (s *StripeService) getTrialDays() (int, error) {
settings, err := s.subscriptionRepo.GetSettings()
// getTrialDays reads the trial duration from SubscriptionSettings via the
// shared cache. ctx threads through so the SQL span (on cache miss) attaches
// to the parent webhook trace.
func (s *StripeService) getTrialDays(ctx context.Context) (int, error) {
settings, err := cachedSubscriptionSettings(ctx, s.cache, s.subscriptionRepo)
if err != nil {
return 0, err
}