Stack of optimizations against the same Hetzner→Neon transatlantic link.
The trace revealed every visible ms was network/proxy overhead — DB
execution itself is sub-millisecond per query (verified via EXPLAIN
ANALYZE: index scans on every hot path).
Connection layer:
- DB_HOST → Neon pooler endpoint (-pooler suffix). PgBouncer
transaction-mode keeps backend Postgres connections warm so we no
longer pay the ~110ms Postgres-startup RTT on cold queries.
- GORM pool tuned: MaxIdleConns 10→20, MaxLifetime 600s→1800s,
MaxIdleTime added (default 0 = never close idle).
- Eager pool warm-up at boot via parallel pings — first user request
no longer pays the ~440ms TCP+TLS+startup handshake.
- Redis maxmemory-policy noeviction → allkeys-lru. Cache writes will
evict cold keys instead of erroring at the 256MB limit.
Auth layer:
- TokenCacheTTL 5min → 1 hour (Redis token cache).
- UserCacheTTL 30s → 5min (in-memory User cache, per pod).
- UserCache gains a 5,000-entry LRU cap so a flood of unique users
can't blow up pod RSS. ~5MB worst-case per pod.
- Token + user lookup collapsed from 2 GORM Preload queries into a
single INNER JOIN. Saves 1 RTT per cold-cache request.
- Auth middleware's m.db.* now use db.WithContext(ctx) so the SQL
spans nest under the parent HTTP request in Jaeger.
Service layer:
- TaskService.ListTasks: replaced two-step
FindResidenceIDsByUser → GetKanbanDataForMultipleResidences
with a single GetKanbanDataForUser that uses a Postgres subquery
for residence-access. One round-trip instead of two.
- New CacheService residence-IDs cache: \"residence_ids_user:<id>\"
with 5-min TTL. Wired into Task/Residence/Contractor/Document
services for the four hot read paths that need this list.
- Cache invalidation on every relevant mutation: CreateResidence,
DeleteResidence, JoinWithCode, RemoveUser. DeleteResidence
invalidates every member of the residence, not just the owner.
What this stacks up to (Hetzner→Neon, before US migration):
Path Before After (target)
Cache-warm authed read ~800ms ~100-200ms
Cache-cold authed read (1st in 1hr) ~2500ms ~500-700ms
First request after deploy ~2500ms ~700-900ms
The endgame US-region migration on top of this gets us to ~30-50ms
warm-cache, but we're shippable at ~150ms warm right now.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 1 — OTel SDK: cmd/api and cmd/worker initialize a tracer provider
that exports OTLP/HTTP to obs.88oakapps.com (Jaeger all-in-one). Sampling
is AlwaysSample in dev (DEBUG=true) and TraceIDRatioBased(0.1) in prod,
overridable via OTEL_TRACES_SAMPLER_ARG. Service names are honeydue-api
and honeydue-worker. otelecho.Middleware opens a span per HTTP request.
Step 2 — Manual spans: storage_service.Upload now takes ctx and emits
storage.upload + b2.PutObject spans (size_bytes, key, mime_type, bucket,
result attrs). APNs Send/SendWithCategory and FCM sendOne emit per-token
spans with topic, status_code, reason. Asynq middleware emits
asynq.handle:<task_type> per job with retry/payload attrs and records
asynq_job_duration_seconds.
Step 3 — Database: otelgorm plugin registered in database.Connect, so
any SQL emitted via db.WithContext(ctx) attaches to the request span.
Every repository now exposes WithContext(ctx) *XRepository as the
migration helper. TaskService.ListTasks and GetTasksByResidence are
migrated end-to-end (ctx threaded through handler → service → repo);
remaining services adopt the same pattern incrementally — pre-migration
methods still emit untraced SQL via the unchanged db field.
OBS_TRACES_URL and OBS_INGEST_TOKEN flow from deploy/prod.env →
honeydue-secrets → api+worker Deployments via secretKeyRef (optional).
02-setup-secrets.sh sources them from prod.env on next run; manifests
mark both env vars optional so the deployment rolls without traces if
the secret is absent.
ch15 observability doc now lists what produces spans today vs the
remaining migration work, with the explicit per-method pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds internal/prom package with histograms for HTTP, GORM, B2, APNs, and
FCM, wired into the Echo router (HTTPMiddleware + /metrics) and GORM via
statement-level callbacks (no ctx plumbing needed). Storage and push
clients call ObserveB2Upload / ObserveAPNsSend / ObserveFCMSend at the
network round-trip points.
Existing internal/monitoring metrics move to /metrics/legacy so the
canonical /metrics emits proper histogram buckets for p50/p95/p99 rollups.
deploy-k3s/manifests/observability/vmagent.yaml deploys a single-replica
vmagent in the honeydue namespace that scrapes api Pods on :8000/metrics
every 15s and remote-writes to https://obs.88oakapps.com/api/v1/write
with a bearer token (substituted at deploy time from OBS_INGEST_TOKEN
in deploy/prod.env). NetworkPolicies allow vmagent egress to api Pods
and to the public obs endpoint over :443; the obs side runs
VictoriaMetrics + Jaeger + Grafana on 88oakappsUpdate.
docs/observability-plan.md captures the full plan including resource
budget, instrumentation table, 4-step rollout, and migration triggers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Swarm stack
- Resource limits on all services, stop_grace_period 60s on api/worker/admin
- Dozzle bound to manager loopback only (ssh -L required for access)
- Worker health server on :6060, admin /api/health endpoint
- Redis 200M LRU cap, B2/S3 env vars wired through to api service
Deploy script
- DRY_RUN=1 prints plan + exits
- Auto-rollback on failed healthcheck, docker logout at end
- Versioned-secret pruning keeps last SECRET_KEEP_VERSIONS (default 3)
- PUSH_LATEST_TAG default flipped to false
- B2 all-or-none validation before deploy
Code
- cmd/api takes pg_advisory_lock on a dedicated connection before
AutoMigrate, serialising boot-time migrations across replicas
- cmd/worker exposes an HTTP /health endpoint with graceful shutdown
Docs
- deploy/DEPLOYING.md: step-by-step walkthrough for a real deploy
- deploy/shit_deploy_cant_do.md: manual prerequisites + recurring ops
- deploy/README.md updated with storage toggle, worker-replica caveat,
multi-arch recipe, connection-pool tuning, renumbered sections
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a new endpoint GET /api/tasks/templates/by-region/?zip= that resolves
ZIP codes to IECC climate regions and returns relevant home maintenance
task templates. Includes climate region model, region lookup service with
tests, seed data for all 8 climate zones with 50+ templates, and OpenAPI spec.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add TaskReminderLog to AutoMigrate (table was missing)
- Remove HandleTaskReminder and HandleOverdueReminder registrations
- Register HandleSmartReminder which uses frequency-aware scheduling
and tracks sent reminders to prevent duplicates
The old handlers sent ALL overdue tasks daily regardless of age.
Smart reminder tapers off (daily for 3 days, then every 3 days, stops at 14).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Google OAuth token verification and user lookup/creation
- Add GoogleAuthRequest and GoogleAuthResponse DTOs
- Add GoogleLogin handler in auth_handler.go
- Add google_auth.go service for token verification
- Add FindByGoogleID repository method for user lookup
- Add GoogleID field to User model
- Add Google OAuth configuration (client ID, enabled flag)
- Add i18n translations for Google auth error messages
- Add Google verification email template support
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove task_statuses lookup table and StatusID foreign key
- Add InProgress boolean field to Task model
- Add database migration (005_replace_status_with_in_progress)
- Update all handlers, services, and repositories
- Update admin frontend to display in_progress as checkbox/boolean
- Remove Task Statuses tab from admin lookups page
- Update tests to use InProgress instead of StatusID
- Task categorization now uses InProgress for kanban column assignment
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Implements automated onboarding emails to encourage user engagement:
- Post-verification welcome email with 5 tips (sent after email verification)
- "No Residence" email (2+ days after registration with no property)
- "No Tasks" email (5+ days after first residence with no tasks)
Key features:
- Each onboarding email type sent only once per user (enforced by unique constraint)
- Email open tracking via tracking pixel endpoint
- Daily scheduled job at 10:00 AM UTC to process eligible users
- Admin panel UI for viewing sent emails, stats, and manual sending
- Admin can send any email type to users from the user detail Testing section
New files:
- internal/models/onboarding_email.go - Database model with tracking
- internal/services/onboarding_email_service.go - Business logic and eligibility queries
- internal/handlers/tracking_handler.go - Email open tracking endpoint
- internal/admin/handlers/onboarding_handler.go - Admin API endpoints
- admin/src/app/(dashboard)/onboarding-emails/ - Admin UI pages
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add TaskTemplate model with category and frequency support
- Add task template repository with CRUD and search operations
- Add task template service layer
- Add public API endpoints for templates (no auth required):
- GET /api/tasks/templates/ - list all templates
- GET /api/tasks/templates/grouped/ - templates grouped by category
- GET /api/tasks/templates/search/?q= - search templates
- GET /api/tasks/templates/by-category/:id/ - templates by category
- GET /api/tasks/templates/:id/ - single template
- Add admin panel for task template management (CRUD)
- Add admin API endpoints for templates
- Add seed file with predefined task templates
- Add i18n translations for template errors
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add AppleSocialAuth model to store Apple ID linkages
- Create AppleAuthService for JWT verification with Apple's public keys
- Add AppleSignIn handler and route (POST /auth/apple-sign-in/)
- Implement account linking (links Apple ID to existing accounts by email)
- Add Redis caching for Apple public keys (24-hour TTL)
- Support private relay emails (@privaterelay.appleid.com)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add TaskCompletionImage and DocumentImage models with one-to-many relationships
- Update admin panel to display images for completions and documents
- Add image arrays to API request/response DTOs
- Update repositories with Preload("Images") for eager loading
- Fix seed SQL execution to use raw SQL instead of prepared statements
- Fix table names in seed file (admin_users, push_notifications_*)
- Add comprehensive seed test data with 34 completion images and 24 document images
- Add subscription limitations admin feature with toggle
- Update admin sidebar with limitations link
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Features:
- PDF service for generating task reports with ReportLab-style formatting
- Storage service for file uploads (local and S3-compatible)
- Admin authentication middleware with JWT support
- Admin user model and repository
Infrastructure:
- Updated Docker configuration for admin panel builds
- Email service enhancements for task notifications
- Updated router with admin and file upload routes
- Environment configuration updates
Tests:
- Unit tests for handlers (auth, residence, task)
- Unit tests for models (user, residence, task)
- Unit tests for repositories (user, residence, task)
- Unit tests for services (residence, task)
- Integration test setup
- Test utilities for mocking database and services
Database:
- Admin user seed data
- Updated test data seeds
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix admin login bcrypt hash in database migrations
- Add static data handler (GET /api/static_data/, POST /api/static_data/refresh/)
- Add user handler (list users, get user, list profiles in shared residences)
- Add generate tasks report endpoint for residences
- Remove all placeholder handlers from router
- Add seeding documentation to README
New files:
- internal/handlers/static_data_handler.go
- internal/handlers/user_handler.go
- internal/services/user_service.go
- internal/dto/responses/user.go
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Admin panel accessible at /admin on the same domain
- GoAdmin tables created automatically during startup migrations
- Default credentials: admin / admin
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Complete rewrite of Django REST API to Go with:
- Gin web framework for HTTP routing
- GORM for database operations
- GoAdmin for admin panel
- Gorush integration for push notifications
- Redis for caching and job queues
Features implemented:
- User authentication (login, register, logout, password reset)
- Residence management (CRUD, sharing, share codes)
- Task management (CRUD, kanban board, completions)
- Contractor management (CRUD, specialties)
- Document management (CRUD, warranties)
- Notifications (preferences, push notifications)
- Subscription management (tiers, limits)
Infrastructure:
- Docker Compose for local development
- Database migrations and seed data
- Admin panel for data management
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>