POST /api/task-completions/ was spending ~1.5-1.75s synchronously on
APNs push + SMTP email + B2 image fetches inside sendTaskCompletedNotification.
Per-user loop made it scale linearly with residence membership; one image
attached + one residence user is the 1.75s baseline observed in the live
honeydue-eli5-overview Grafana panel.
Replace the inline call (and the fire-and-forget goroutine in QuickComplete,
which violated the project's "no goroutines in handlers" rule) with an
Asynq job:
- new task type notification:task_completed (worker/scheduler.go)
- new payload {task_id, completion_id} — IDs only, worker re-reads
canonical state from Postgres so concurrent edits between enqueue
and dequeue are reflected
- new HandleTaskCompletedNotification on jobs.Handler delegates to
TaskService.SendTaskCompletedNotificationByID
- new dispatchTaskCompletedNotification in task_service.go picks
between enqueue (preferred) and inline (fallback) when Redis is
unreachable or the enqueuer isn't wired (tests / local dev)
Other changes required to wire it up:
- widen worker.NewTaskClient signature to accept asynq.RedisClientOpt
so the file-mounted Redis password (audit HIGH-1) can be supplied;
no prior callers, no breakage
- extend worker.Enqueuer interface with EnqueueTaskCompletedNotification
- add TaskEnqueuer field to router.Dependencies; wire from cmd/api/main.go
with the standard typed-nil interface guard
- wire a worker-side TaskService in cmd/worker/main.go so the handler
can use the shared SendTaskCompletedNotificationByID implementation
(storage service shared with the existing upload-cleanup wiring)
Expected impact on POST /api/task-completions/ p50:
~1.75s -> ~120-170ms (DB + tx + Asynq enqueue only)
Notifications still deliver; they just go via the worker instead of in
the request path. MaxRetry=3; "row not found" returns nil so a deleted
task/completion doesn't churn the retry loop.
All 31 test packages pass. No DB migrations.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the multipart-via-API path for image uploads with a three-step
direct-to-storage flow:
1. Client POSTs /api/uploads/presign with content_length + content_type;
server validates size (10 MB cap), mime allow-list per category, rate
limit (50/hour/user via Redis sliding window), and concurrent unclaimed
cap (10 in-flight per user). On success it persists a pending_uploads
row, signs an S3 POST policy with content-length-range bound to the
claimed length ±256 bytes, and returns the URL+fields.
2. Client POSTs the bytes directly to B2 using the signed policy. B2
enforces size, content-type, and key match before accepting.
3. Client passes upload_ids[] to /api/task-completions/ or /api/documents/.
Service HEADs each B2 object, verifies size matches expected_bytes
within slack, marks pending_uploads claimed_at, and creates the
associated TaskCompletionImage / DocumentImage rows.
Bytes never traverse our API server. The 1 MB Echo BodyLimit middleware
that was rejecting all task-completion image uploads becomes irrelevant
for this path. Existing multipart endpoints stay functional alongside,
soak-testing the new path before legacy removal.
Cleanup:
- cmd/worker registers a new hourly cron (TypeUploadCleanup, "30 * * * *")
that reaps pending_uploads where claimed_at IS NULL AND expires_at < NOW().
Reaps both the B2 object and the row.
- B2 bucket lifecycle rule on `uploads/` prefix (7 days hide → 1 day delete)
documented in deploy-k3s/manifests/b2-lifecycle.md as a backstop.
Schema:
- migrations/000002_pending_uploads.sql adds the table + partial index for
cleanup + nullable pending_upload_id FKs on task_taskcompletionimage and
task_documentimage.
Policy (single tier, no free/pro split):
- 10 MB cap per upload
- 50 presigns/hour/user
- 10 concurrent unclaimed uploads/user
- allow-list: jpeg/png/heic/heif/webp for image categories;
+ pdf for document_file
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tiny CLI that enqueues a notification:send_push task into Redis. The
worker picks it up and routes through internal/push/Client.SendToAll —
which is exactly the path used by HandleSmartReminder, HandleDailyDigest,
and any other in-process push, so a successful round-trip here proves the
production push pipeline end-to-end without waiting for the next cron tick.
Requires Redis to be reachable. Easiest path:
kubectl -n honeydue port-forward svc/redis 6379:6379
go run ./cmd/send-test-push --user-id 6 --title "..." --message "..."
The worker logs `Sending push notification...` followed by the APNs
batch result; failure modes (BadDeviceToken, circuit breaker, etc.)
surface as the same error_message rows the existing notif-diag tool
already reports on.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two small Go CLIs for production ops that previously required ad-hoc
psql or kubectl gymnastics. Both load DB credentials from prod.env-style
env vars and read POSTGRES_PASSWORD from deploy/secrets/postgres_password.txt
by default, so the workflow is `set -a && source deploy/prod.env && set +a`
followed by go run.
cmd/admin-reset/main.go:
--list print all admin_users rows
--verify --email X bcrypt-check a password against the stored hash
using the same case-insensitive lookup the live
/api/admin/auth/login endpoint uses
--new-email Y rename an admin's email (with unique-index check)
default (--email X) prompt for a new password twice (no echo, min 12
chars), bcrypt at DefaultCost, update the row
cmd/notif-diag/main.go:
default print pending/sent counts, breakdown by type and
age, the 5 most recent pending rows with their
error_message, and registered APNs/FCM device
counts
--mark-failed-as-sent cosmetic cleanup — UPDATE pending rows that have
a recorded error to sent=true,
sent_at=COALESCE(updated_at, NOW())
--yes skip the interactive confirmation prompt
Both bypass internal/config.Load() entirely so they don't need
SECRET_KEY or other unrelated env vars to run. .gitignore excludes the
build artifacts at /admin-reset and /notif-diag.
go.mod adds golang.org/x/term v0.41.0 (promoted from indirect to direct)
for no-echo password input in admin-reset.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the previous hand-rolled MigrateWithLock + GORM AutoMigrate path,
which had two compounding problems:
- AutoMigrate ran on every pod startup (~5 min over the transatlantic
link) even when no schema changes had landed
- pg_advisory_lock is session-scoped, which silently fails through
Neon's pgbouncer transaction-mode pooler — turns out this is a
known and documented limitation that bites golang-migrate too
Goose was chosen over golang-migrate (the other heavyweight) because:
- Goose wraps each migration file in a transaction by default, so a
failure rolls back cleanly instead of leaving a "dirty" version
state requiring manual force-reset (golang-migrate's known
weakness, per its own issue tracker — see #1001 + Atlas's writeup)
- Goose's locking is opt-in. We don't opt in: migrations run as a
single Kubernetes Job, which IS the singleton process. No advisory
lock needed at all.
Layout:
- migrations/000001_init.sql — schema-only pg_dump of the live Neon
DB at adoption, stripped of psql-only directives that block goose's
bookkeeping insert. Pre-goose hand-numbered migrations 002-022 had
their effects folded into this baseline; deleted from the live tree
but preserved in git history at 58e6997.
- Dockerfile installs `goose v3.22.1` at build time and copies the
binary into the api image. The migrate Job reuses the api image with
command=goose, so no separate image to build/push/version.
- deploy-k3s/manifests/migrate/job.yaml: a one-shot Job that strips
the -pooler segment from DB_HOST (advisory lock won't survive
pgbouncer transaction-mode), runs `goose up`, exits.
- deploy-k3s/scripts/03-deploy.sh: deletes any prior Job, applies the
fresh one, `kubectl wait --for=condition=complete --timeout=10m`,
then proceeds with api/worker rollout. Job failure aborts the deploy
before any new app pod sees a stale schema.
- internal/database/database.go::RequireSchemaApplied checks
goose_db_version on startup. api/worker refuse to boot if the
table is missing or its latest row has is_applied=false — the
fail-fast for "operator forgot to run migrate."
- Makefile: migrate-up / migrate-down / migrate-status / migrate-new
for local workflow.
Production DB was bootstrapped manually:
$ goose -dir migrations postgres "$DSN" version # creates table
$ psql ... -c "INSERT INTO goose_db_version (version_id, is_applied, tstamp) VALUES (1, true, NOW());"
Smoke test against fresh Postgres locally: 50 user tables created in
284ms via `goose up`, version_id=1 + is_applied=t recorded.
Verified the local goose CLI talks to prod successfully:
$ goose ... status
Applied At Migration
=======================================
Mon Apr 27 03:43:55 2026 -- 000001_init.sql
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 1 — OTel SDK: cmd/api and cmd/worker initialize a tracer provider
that exports OTLP/HTTP to obs.88oakapps.com (Jaeger all-in-one). Sampling
is AlwaysSample in dev (DEBUG=true) and TraceIDRatioBased(0.1) in prod,
overridable via OTEL_TRACES_SAMPLER_ARG. Service names are honeydue-api
and honeydue-worker. otelecho.Middleware opens a span per HTTP request.
Step 2 — Manual spans: storage_service.Upload now takes ctx and emits
storage.upload + b2.PutObject spans (size_bytes, key, mime_type, bucket,
result attrs). APNs Send/SendWithCategory and FCM sendOne emit per-token
spans with topic, status_code, reason. Asynq middleware emits
asynq.handle:<task_type> per job with retry/payload attrs and records
asynq_job_duration_seconds.
Step 3 — Database: otelgorm plugin registered in database.Connect, so
any SQL emitted via db.WithContext(ctx) attaches to the request span.
Every repository now exposes WithContext(ctx) *XRepository as the
migration helper. TaskService.ListTasks and GetTasksByResidence are
migrated end-to-end (ctx threaded through handler → service → repo);
remaining services adopt the same pattern incrementally — pre-migration
methods still emit untraced SQL via the unchanged db field.
OBS_TRACES_URL and OBS_INGEST_TOKEN flow from deploy/prod.env →
honeydue-secrets → api+worker Deployments via secretKeyRef (optional).
02-setup-secrets.sh sources them from prod.env on next run; manifests
mark both env vars optional so the deployment rolls without traces if
the secret is absent.
ch15 observability doc now lists what produces spans today vs the
remaining migration work, with the explicit per-method pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a data_migration that runs seeds/001_lookups.sql,
seeds/003_admin_user.sql, and seeds/003_task_templates.sql exactly
once on startup and invalidates the Redis seeded_data cache afterwards
so /api/static_data/ returns fresh results. Removes the need to
remember `./dev.sh seed-all`; the data_migrations tracking row prevents
re-runs, and each INSERT uses ON CONFLICT DO UPDATE so re-execution is
safe.
Swarm stack
- Resource limits on all services, stop_grace_period 60s on api/worker/admin
- Dozzle bound to manager loopback only (ssh -L required for access)
- Worker health server on :6060, admin /api/health endpoint
- Redis 200M LRU cap, B2/S3 env vars wired through to api service
Deploy script
- DRY_RUN=1 prints plan + exits
- Auto-rollback on failed healthcheck, docker logout at end
- Versioned-secret pruning keeps last SECRET_KEEP_VERSIONS (default 3)
- PUSH_LATEST_TAG default flipped to false
- B2 all-or-none validation before deploy
Code
- cmd/api takes pg_advisory_lock on a dedicated connection before
AutoMigrate, serialising boot-time migrations across replicas
- cmd/worker exposes an HTTP /health endpoint with graceful shutdown
Docs
- deploy/DEPLOYING.md: step-by-step walkthrough for a real deploy
- deploy/shit_deploy_cant_do.md: manual prerequisites + recurring ops
- deploy/README.md updated with storage toggle, worker-replica caveat,
multi-arch recipe, connection-pool tuning, renumbered sections
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces a StorageBackend interface with local filesystem and S3
implementations. The StorageService delegates raw I/O to the backend
while keeping validation, encryption, and URL generation unchanged.
Backend selection is config-driven: set B2_ENDPOINT + B2_KEY_ID +
B2_APP_KEY + B2_BUCKET_NAME for S3 mode, or STORAGE_UPLOAD_DIR for
local mode. STORAGE_USE_SSL=false for in-cluster MinIO (HTTP).
All existing tests pass unchanged — the local backend preserves
identical behavior to the previous direct-filesystem implementation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add TaskReminderLog to AutoMigrate (table was missing)
- Remove HandleTaskReminder and HandleOverdueReminder registrations
- Register HandleSmartReminder which uses frequency-aware scheduling
and tracks sent reminders to prevent duplicates
The old handlers sent ALL overdue tasks daily regardless of age.
Smart reminder tapers off (daily for 3 days, then every 3 days, stops at 14).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements a comprehensive monitoring system for the admin interface:
Backend:
- New monitoring package with Redis ring buffer for log storage
- Zerolog MultiWriter to capture logs to Redis
- System stats collection (CPU, memory, disk, goroutines, GC)
- HTTP metrics middleware (request counts, latency, error rates)
- Asynq queue stats for worker process
- WebSocket endpoint for real-time log streaming
- Admin auth middleware now accepts token in query params (for WebSocket)
Frontend:
- New monitoring page with tabs (Overview, Logs, API Stats, Worker Stats)
- Real-time log viewer with level filtering and search
- System stats cards showing CPU, memory, goroutines, uptime
- HTTP endpoint statistics table
- Asynq queue depth visualization
- Enable/disable monitoring toggle in settings
Memory safeguards:
- Max 200 unique endpoints tracked
- Hourly stats reset to prevent unbounded growth
- Max 1000 log entries in ring buffer
- Max 1000 latency samples for P95 calculation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Implements automated onboarding emails to encourage user engagement:
- Post-verification welcome email with 5 tips (sent after email verification)
- "No Residence" email (2+ days after registration with no property)
- "No Tasks" email (5+ days after first residence with no tasks)
Key features:
- Each onboarding email type sent only once per user (enforced by unique constraint)
- Email open tracking via tracking pixel endpoint
- Daily scheduled job at 10:00 AM UTC to process eligible users
- Admin panel UI for viewing sent emails, stats, and manual sending
- Admin can send any email type to users from the user detail Testing section
New files:
- internal/models/onboarding_email.go - Database model with tracking
- internal/services/onboarding_email_service.go - Business logic and eligibility queries
- internal/handlers/tracking_handler.go - Email open tracking endpoint
- internal/admin/handlers/onboarding_handler.go - Admin API endpoints
- admin/src/app/(dashboard)/onboarding-emails/ - Admin UI pages
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add daily_digest boolean and daily_digest_hour fields to NotificationPreference model
- Update HandleDailyDigest to check user preferences and custom notification times
- Change Daily Digest scheduler to run hourly (supports per-user custom times)
- Update notification service DTOs for new fields
- Add Daily Digest toggle and custom time to admin notification prefs page
- Fix notification handlers to only notify users at their designated hour
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Allow users to customize when they receive notification reminders:
- Add task_due_soon_hour, task_overdue_hour, warranty_expiring_hour fields
- Store times in UTC, clients convert to/from local timezone
- Worker runs hourly, queries only users scheduled for that hour
- Early exit optimization when no users need notifications
- Admin UI displays custom notification times
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add POST /api/admin/settings/clear-stuck-jobs endpoint to clear
stuck/failed asynq worker jobs from Redis (retry queue, archived,
orphaned task metadata)
- Add "Clear Stuck Jobs" button to admin settings UI
- Remove TASK_REMINDER_MINUTE config - all jobs now run at minute 0
- Simplify formatCron to only take hour parameter
- Update default notification times to CST-friendly hours
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove Gorush push server dependency (now using direct APNs/FCM)
- Update docker-compose.yml to remove gorush service
- Update config.go to remove GORUSH_URL
- Fix worker queries:
- Use auth_user instead of user_user table
- Use completed_at instead of completion_date column
- Add NotificationService to worker handler for actionable notifications
- Add docs/PUSH_NOTIFICATIONS.md with architecture documentation
- Update README.md, DOKKU_SETUP.md, and dev.sh
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add go-i18n package for internationalization
- Create i18n middleware to extract Accept-Language header
- Add translation files for en, es, fr, de, pt languages
- Localize all handler error messages and responses
- Add language context to all API handlers
Supported languages: English, Spanish, French, German, Portuguese
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add direct APNs client using sideshow/apns2 for iOS push
- Add direct FCM client using legacy HTTP API for Android push
- Remove Gorush dependency (no external push server needed)
- Update all services/handlers to use new push.Client
- Update config for APNS_PRODUCTION flag
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Features:
- PDF service for generating task reports with ReportLab-style formatting
- Storage service for file uploads (local and S3-compatible)
- Admin authentication middleware with JWT support
- Admin user model and repository
Infrastructure:
- Updated Docker configuration for admin panel builds
- Email service enhancements for task notifications
- Updated router with admin and file upload routes
- Environment configuration updates
Tests:
- Unit tests for handlers (auth, residence, task)
- Unit tests for models (user, residence, task)
- Unit tests for repositories (user, residence, task)
- Unit tests for services (residence, task)
- Integration test setup
- Test utilities for mocking database and services
Database:
- Admin user seed data
- Updated test data seeds
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Viper requires SetDefault to be called for env vars to be
automatically bound. Added defaults for EMAIL_HOST_USER and
EMAIL_HOST_PASSWORD.
Also added info logging to show email config on startup for debugging.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Admin panel accessible at /admin on the same domain
- GoAdmin tables created automatically during startup migrations
- Default credentials: admin / admin
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Don't crash on database connection failure
- Log error and continue - health checks will pass
- Database operations will fail but app stays up
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Reduce database retry count and backoff time
- Make SECRET_KEY optional (use default with warning)
- Add config debug logging on startup
- Remove strict validation that prevented startup
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Retry database connection 5 times with exponential backoff
- Allow app to start even if Redis is unavailable
- Improves startup reliability in container environments
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Complete rewrite of Django REST API to Go with:
- Gin web framework for HTTP routing
- GORM for database operations
- GoAdmin for admin panel
- Gorush integration for push notifications
- Redis for caching and job queues
Features implemented:
- User authentication (login, register, logout, password reset)
- Residence management (CRUD, sharing, share codes)
- Task management (CRUD, kanban board, completions)
- Contractor management (CRUD, specialties)
- Document management (CRUD, warranties)
- Notifications (preferences, push notifications)
- Subscription management (tiers, limits)
Infrastructure:
- Docker Compose for local development
- Database migrations and seed data
- Admin panel for data management
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>