BE-3 observability: expose the worker's Prometheus metrics on :6060/metrics
(apns/fcm/asynq histograms + a new cache_ops_total counter were recorded all
along but never scraped — which is why those dashboard panels read empty); add
the worker containerPort, the vmagent worker scrape job, and two additive
NetworkPolicies. Instrument cache Get/Set hit/miss.
BE-2 retention: three periodic Asynq cleanup crons mirroring the reminder-log
cleanup — notifications (90d), webhook dedup log (180d), audit_log (365d).
BE-1 GDPR data export: POST /api/auth/export/ enqueues a low-priority Asynq job
that gathers all of the user's data (owned residences + their tasks/contractors/
documents/share-codes, plus profile/notifications/prefs/push-tokens/subscription/
audit log), zips one JSON file per category, and emails it as an attachment.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
POST /api/task-completions/ was spending ~1.5-1.75s synchronously on
APNs push + SMTP email + B2 image fetches inside sendTaskCompletedNotification.
Per-user loop made it scale linearly with residence membership; one image
attached + one residence user is the 1.75s baseline observed in the live
honeydue-eli5-overview Grafana panel.
Replace the inline call (and the fire-and-forget goroutine in QuickComplete,
which violated the project's "no goroutines in handlers" rule) with an
Asynq job:
- new task type notification:task_completed (worker/scheduler.go)
- new payload {task_id, completion_id} — IDs only, worker re-reads
canonical state from Postgres so concurrent edits between enqueue
and dequeue are reflected
- new HandleTaskCompletedNotification on jobs.Handler delegates to
TaskService.SendTaskCompletedNotificationByID
- new dispatchTaskCompletedNotification in task_service.go picks
between enqueue (preferred) and inline (fallback) when Redis is
unreachable or the enqueuer isn't wired (tests / local dev)
Other changes required to wire it up:
- widen worker.NewTaskClient signature to accept asynq.RedisClientOpt
so the file-mounted Redis password (audit HIGH-1) can be supplied;
no prior callers, no breakage
- extend worker.Enqueuer interface with EnqueueTaskCompletedNotification
- add TaskEnqueuer field to router.Dependencies; wire from cmd/api/main.go
with the standard typed-nil interface guard
- wire a worker-side TaskService in cmd/worker/main.go so the handler
can use the shared SendTaskCompletedNotificationByID implementation
(storage service shared with the existing upload-cleanup wiring)
Expected impact on POST /api/task-completions/ p50:
~1.75s -> ~120-170ms (DB + tx + Asynq enqueue only)
Notifications still deliver; they just go via the worker instead of in
the request path. MaxRetry=3; "row not found" returns nil so a deleted
task/completion doesn't churn the retry loop.
All 31 test packages pass. No DB migrations.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the multipart-via-API path for image uploads with a three-step
direct-to-storage flow:
1. Client POSTs /api/uploads/presign with content_length + content_type;
server validates size (10 MB cap), mime allow-list per category, rate
limit (50/hour/user via Redis sliding window), and concurrent unclaimed
cap (10 in-flight per user). On success it persists a pending_uploads
row, signs an S3 POST policy with content-length-range bound to the
claimed length ±256 bytes, and returns the URL+fields.
2. Client POSTs the bytes directly to B2 using the signed policy. B2
enforces size, content-type, and key match before accepting.
3. Client passes upload_ids[] to /api/task-completions/ or /api/documents/.
Service HEADs each B2 object, verifies size matches expected_bytes
within slack, marks pending_uploads claimed_at, and creates the
associated TaskCompletionImage / DocumentImage rows.
Bytes never traverse our API server. The 1 MB Echo BodyLimit middleware
that was rejecting all task-completion image uploads becomes irrelevant
for this path. Existing multipart endpoints stay functional alongside,
soak-testing the new path before legacy removal.
Cleanup:
- cmd/worker registers a new hourly cron (TypeUploadCleanup, "30 * * * *")
that reaps pending_uploads where claimed_at IS NULL AND expires_at < NOW().
Reaps both the B2 object and the row.
- B2 bucket lifecycle rule on `uploads/` prefix (7 days hide → 1 day delete)
documented in deploy-k3s/manifests/b2-lifecycle.md as a backstop.
Schema:
- migrations/000002_pending_uploads.sql adds the table + partial index for
cleanup + nullable pending_upload_id FKs on task_taskcompletionimage and
task_documentimage.
Policy (single tier, no free/pro split):
- 10 MB cap per upload
- 50 presigns/hour/user
- 10 concurrent unclaimed uploads/user
- allow-list: jpeg/png/heic/heif/webp for image categories;
+ pdf for document_file
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 1 — OTel SDK: cmd/api and cmd/worker initialize a tracer provider
that exports OTLP/HTTP to obs.88oakapps.com (Jaeger all-in-one). Sampling
is AlwaysSample in dev (DEBUG=true) and TraceIDRatioBased(0.1) in prod,
overridable via OTEL_TRACES_SAMPLER_ARG. Service names are honeydue-api
and honeydue-worker. otelecho.Middleware opens a span per HTTP request.
Step 2 — Manual spans: storage_service.Upload now takes ctx and emits
storage.upload + b2.PutObject spans (size_bytes, key, mime_type, bucket,
result attrs). APNs Send/SendWithCategory and FCM sendOne emit per-token
spans with topic, status_code, reason. Asynq middleware emits
asynq.handle:<task_type> per job with retry/payload attrs and records
asynq_job_duration_seconds.
Step 3 — Database: otelgorm plugin registered in database.Connect, so
any SQL emitted via db.WithContext(ctx) attaches to the request span.
Every repository now exposes WithContext(ctx) *XRepository as the
migration helper. TaskService.ListTasks and GetTasksByResidence are
migrated end-to-end (ctx threaded through handler → service → repo);
remaining services adopt the same pattern incrementally — pre-migration
methods still emit untraced SQL via the unchanged db field.
OBS_TRACES_URL and OBS_INGEST_TOKEN flow from deploy/prod.env →
honeydue-secrets → api+worker Deployments via secretKeyRef (optional).
02-setup-secrets.sh sources them from prod.env on next run; manifests
mark both env vars optional so the deployment rolls without traces if
the secret is absent.
ch15 observability doc now lists what produces spans today vs the
remaining migration work, with the explicit per-method pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Swarm stack
- Resource limits on all services, stop_grace_period 60s on api/worker/admin
- Dozzle bound to manager loopback only (ssh -L required for access)
- Worker health server on :6060, admin /api/health endpoint
- Redis 200M LRU cap, B2/S3 env vars wired through to api service
Deploy script
- DRY_RUN=1 prints plan + exits
- Auto-rollback on failed healthcheck, docker logout at end
- Versioned-secret pruning keeps last SECRET_KEEP_VERSIONS (default 3)
- PUSH_LATEST_TAG default flipped to false
- B2 all-or-none validation before deploy
Code
- cmd/api takes pg_advisory_lock on a dedicated connection before
AutoMigrate, serialising boot-time migrations across replicas
- cmd/worker exposes an HTTP /health endpoint with graceful shutdown
Docs
- deploy/DEPLOYING.md: step-by-step walkthrough for a real deploy
- deploy/shit_deploy_cant_do.md: manual prerequisites + recurring ops
- deploy/README.md updated with storage toggle, worker-replica caveat,
multi-arch recipe, connection-pool tuning, renumbered sections
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add TaskReminderLog to AutoMigrate (table was missing)
- Remove HandleTaskReminder and HandleOverdueReminder registrations
- Register HandleSmartReminder which uses frequency-aware scheduling
and tracks sent reminders to prevent duplicates
The old handlers sent ALL overdue tasks daily regardless of age.
Smart reminder tapers off (daily for 3 days, then every 3 days, stops at 14).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements a comprehensive monitoring system for the admin interface:
Backend:
- New monitoring package with Redis ring buffer for log storage
- Zerolog MultiWriter to capture logs to Redis
- System stats collection (CPU, memory, disk, goroutines, GC)
- HTTP metrics middleware (request counts, latency, error rates)
- Asynq queue stats for worker process
- WebSocket endpoint for real-time log streaming
- Admin auth middleware now accepts token in query params (for WebSocket)
Frontend:
- New monitoring page with tabs (Overview, Logs, API Stats, Worker Stats)
- Real-time log viewer with level filtering and search
- System stats cards showing CPU, memory, goroutines, uptime
- HTTP endpoint statistics table
- Asynq queue depth visualization
- Enable/disable monitoring toggle in settings
Memory safeguards:
- Max 200 unique endpoints tracked
- Hourly stats reset to prevent unbounded growth
- Max 1000 log entries in ring buffer
- Max 1000 latency samples for P95 calculation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Implements automated onboarding emails to encourage user engagement:
- Post-verification welcome email with 5 tips (sent after email verification)
- "No Residence" email (2+ days after registration with no property)
- "No Tasks" email (5+ days after first residence with no tasks)
Key features:
- Each onboarding email type sent only once per user (enforced by unique constraint)
- Email open tracking via tracking pixel endpoint
- Daily scheduled job at 10:00 AM UTC to process eligible users
- Admin panel UI for viewing sent emails, stats, and manual sending
- Admin can send any email type to users from the user detail Testing section
New files:
- internal/models/onboarding_email.go - Database model with tracking
- internal/services/onboarding_email_service.go - Business logic and eligibility queries
- internal/handlers/tracking_handler.go - Email open tracking endpoint
- internal/admin/handlers/onboarding_handler.go - Admin API endpoints
- admin/src/app/(dashboard)/onboarding-emails/ - Admin UI pages
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add daily_digest boolean and daily_digest_hour fields to NotificationPreference model
- Update HandleDailyDigest to check user preferences and custom notification times
- Change Daily Digest scheduler to run hourly (supports per-user custom times)
- Update notification service DTOs for new fields
- Add Daily Digest toggle and custom time to admin notification prefs page
- Fix notification handlers to only notify users at their designated hour
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Allow users to customize when they receive notification reminders:
- Add task_due_soon_hour, task_overdue_hour, warranty_expiring_hour fields
- Store times in UTC, clients convert to/from local timezone
- Worker runs hourly, queries only users scheduled for that hour
- Early exit optimization when no users need notifications
- Admin UI displays custom notification times
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add POST /api/admin/settings/clear-stuck-jobs endpoint to clear
stuck/failed asynq worker jobs from Redis (retry queue, archived,
orphaned task metadata)
- Add "Clear Stuck Jobs" button to admin settings UI
- Remove TASK_REMINDER_MINUTE config - all jobs now run at minute 0
- Simplify formatCron to only take hour parameter
- Update default notification times to CST-friendly hours
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove Gorush push server dependency (now using direct APNs/FCM)
- Update docker-compose.yml to remove gorush service
- Update config.go to remove GORUSH_URL
- Fix worker queries:
- Use auth_user instead of user_user table
- Use completed_at instead of completion_date column
- Add NotificationService to worker handler for actionable notifications
- Add docs/PUSH_NOTIFICATIONS.md with architecture documentation
- Update README.md, DOKKU_SETUP.md, and dev.sh
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add direct APNs client using sideshow/apns2 for iOS push
- Add direct FCM client using legacy HTTP API for Android push
- Remove Gorush dependency (no external push server needed)
- Update all services/handlers to use new push.Client
- Update config for APNS_PRODUCTION flag
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Complete rewrite of Django REST API to Go with:
- Gin web framework for HTTP routing
- GORM for database operations
- GoAdmin for admin panel
- Gorush integration for push notifications
- Redis for caching and job queues
Features implemented:
- User authentication (login, register, logout, password reset)
- Residence management (CRUD, sharing, share codes)
- Task management (CRUD, kanban board, completions)
- Contractor management (CRUD, specialties)
- Document management (CRUD, warranties)
- Notifications (preferences, push notifications)
- Subscription management (tiers, limits)
Infrastructure:
- Docker Compose for local development
- Database migrations and seed data
- Admin panel for data management
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>