honeyDueAPI

Author	SHA1	Message	Date
Trey t	bc3da007db	Wire OpenTelemetry tracing — HTTP, B2, APNs, FCM, asynq, GORM (partial) Backend CI / Test (push) Has been cancelled Details Backend CI / Contract Tests (push) Has been cancelled Details Backend CI / Build (push) Has been cancelled Details Backend CI / Lint (push) Has been cancelled Details Backend CI / Secret Scanning (push) Has been cancelled Details Step 1 — OTel SDK: cmd/api and cmd/worker initialize a tracer provider that exports OTLP/HTTP to obs.88oakapps.com (Jaeger all-in-one). Sampling is AlwaysSample in dev (DEBUG=true) and TraceIDRatioBased(0.1) in prod, overridable via OTEL_TRACES_SAMPLER_ARG. Service names are honeydue-api and honeydue-worker. otelecho.Middleware opens a span per HTTP request. Step 2 — Manual spans: storage_service.Upload now takes ctx and emits storage.upload + b2.PutObject spans (size_bytes, key, mime_type, bucket, result attrs). APNs Send/SendWithCategory and FCM sendOne emit per-token spans with topic, status_code, reason. Asynq middleware emits asynq.handle:<task_type> per job with retry/payload attrs and records asynq_job_duration_seconds. Step 3 — Database: otelgorm plugin registered in database.Connect, so any SQL emitted via db.WithContext(ctx) attaches to the request span. Every repository now exposes WithContext(ctx) *XRepository as the migration helper. TaskService.ListTasks and GetTasksByResidence are migrated end-to-end (ctx threaded through handler → service → repo); remaining services adopt the same pattern incrementally — pre-migration methods still emit untraced SQL via the unchanged db field. OBS_TRACES_URL and OBS_INGEST_TOKEN flow from deploy/prod.env → honeydue-secrets → api+worker Deployments via secretKeyRef (optional). 02-setup-secrets.sh sources them from prod.env on next run; manifests mark both env vars optional so the deployment rolls without traces if the secret is absent. ch15 observability doc now lists what produces spans today vs the remaining migration work, with the explicit per-method pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 15:28:05 -05:00
Trey t	77cfcc0b27	docs: rewrite ch15 observability + cross-refs for the live obs stack Backend CI / Test (push) Has been cancelled Details Backend CI / Contract Tests (push) Has been cancelled Details Backend CI / Build (push) Has been cancelled Details Backend CI / Lint (push) Has been cancelled Details Backend CI / Secret Scanning (push) Has been cancelled Details ch15 is now an account of what's actually running, not a roadmap for what we'd add: VictoriaMetrics + Jaeger + Grafana on 88oakappsUpdate fronted by Cloudflare and bearer-gated nginx, vmagent in-cluster, the internal/prom histogram set, the rollout's NetworkPolicy footprint, the obs.88oakapps.com endpoint shape, the ~$0/700MB resource budget, and a token-rotation runbook. The "what we still don't have" section keeps log aggregation, alerting, and full distributed tracing as the honest gap list. Other touched docs: - 00-overview: \"deliberately absent\" no longer claims we have no metrics — calls out the cross-cluster shape instead. - 14-deployment-process: TL;DR now points at deploy-k3s/scripts/03-deploy.sh (full build + push + apply + obs vmagent), with the manual kubectl-set-image flow kept as the single-service path. Notes the IfNotPresent gotcha that bit us during the rollout. - 16-failure-modes: adds vmagent-can't-reach-obs and Grafana-no-data. - 18-cost: $0 line item for the obs stack on 88oakappsUpdate, with the CX32 migration trigger. - 17/18 README + appendix b: link the new ch15, add the obs cheat sheet block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 15:05:06 -05:00
Trey t	d3708e6c72	Fix /metrics double-gzip + deploy script for amd64 build Backend CI / Test (push) Has been cancelled Details Backend CI / Contract Tests (push) Has been cancelled Details Backend CI / Build (push) Has been cancelled Details Backend CI / Lint (push) Has been cancelled Details Backend CI / Secret Scanning (push) Has been cancelled Details The Echo gzip middleware was wrapping promhttp's pre-gzipped output, so vmagent received double-compressed bytes that failed the Prometheus parser with binary garbage. Skipping /metrics in the gzip Skipper. Three deploy-script fixes uncovered while shipping this: - _config.sh had backticks around \"kubectl get cm\" inside the python heredoc, which bash treated as command substitution when KUBECONFIG was set. Quoted the literal instead. - 03-deploy.sh now passes --platform linux/amd64 to all docker builds so arm64 Macs don't push images that fail with \"exec format error\" on the Hetzner CX nodes. - OBS_INGEST_TOKEN lookup was reading deploy-k3s/prod.env instead of the actual deploy/prod.env at the repo root. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 14:42:15 -05:00
Trey t	372d4d2d37	deploy-k3s: apply observability manifests during 03-deploy Backend CI / Test (push) Has been cancelled Details Backend CI / Contract Tests (push) Has been cancelled Details Backend CI / Build (push) Has been cancelled Details Backend CI / Lint (push) Has been cancelled Details Backend CI / Secret Scanning (push) Has been cancelled Details vmagent.yaml lives under manifests/observability/; the deploy script now substitutes the OBS_INGEST_TOKEN from deploy/prod.env into the manifest before apply, and waits on the vmagent rollout. Manual kubectl apply is no longer needed after the next deploy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 14:16:59 -05:00
Trey t	df78d9ccd8	Add Prometheus metrics + vmagent push to obs.88oakapps.com Backend CI / Test (push) Has been cancelled Details Backend CI / Contract Tests (push) Has been cancelled Details Backend CI / Build (push) Has been cancelled Details Backend CI / Lint (push) Has been cancelled Details Backend CI / Secret Scanning (push) Has been cancelled Details Adds internal/prom package with histograms for HTTP, GORM, B2, APNs, and FCM, wired into the Echo router (HTTPMiddleware + /metrics) and GORM via statement-level callbacks (no ctx plumbing needed). Storage and push clients call ObserveB2Upload / ObserveAPNsSend / ObserveFCMSend at the network round-trip points. Existing internal/monitoring metrics move to /metrics/legacy so the canonical /metrics emits proper histogram buckets for p50/p95/p99 rollups. deploy-k3s/manifests/observability/vmagent.yaml deploys a single-replica vmagent in the honeydue namespace that scrapes api Pods on :8000/metrics every 15s and remote-writes to https://obs.88oakapps.com/api/v1/write with a bearer token (substituted at deploy time from OBS_INGEST_TOKEN in deploy/prod.env). NetworkPolicies allow vmagent egress to api Pods and to the public obs endpoint over :443; the obs side runs VictoriaMetrics + Jaeger + Grafana on 88oakappsUpdate. docs/observability-plan.md captures the full plan including resource budget, instrumentation table, 4-step rollout, and migration triggers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 14:16:17 -05:00
Trey t	1cd6cafa9d	deploy-k3s: wire B2_KEY_ID/B2_APP_KEY into api Deployment Backend CI / Test (push) Has been cancelled Details Backend CI / Contract Tests (push) Has been cancelled Details Backend CI / Build (push) Has been cancelled Details Backend CI / Lint (push) Has been cancelled Details Backend CI / Secret Scanning (push) Has been cancelled Details The B2 credentials existed in honeydue-secrets (created by 02-setup-secrets.sh) but were never referenced from the api Deployment, so StorageConfig.IsS3() returned false at runtime → StorageService fell back to local filesystem. With readOnlyRootFilesystem=true on the api container, that local fallback would silently fail on every upload — meaning every photo, document, and task-completion upload was broken in prod since the k3s migration on 2026-04-24. Adding both as secretKeyRef on the api container only (the worker doesn't perform uploads). Verified end-to-end with a registered test user: source PDF (sha256=3af3a645...) → POST /api/uploads/document/ → POST /api/documents/ → GET /api/media/document/:id → byte-identical download. Storage init log now reports "Storage service initialized (S3)". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 00:53:25 -05:00
Trey t	57cef36379	deploy-k3s: align _config.sh::generate_env with live ConfigMap Backend CI / Test (push) Has been cancelled Details Backend CI / Contract Tests (push) Has been cancelled Details Backend CI / Build (push) Has been cancelled Details Backend CI / Lint (push) Has been cancelled Details Backend CI / Secret Scanning (push) Has been cancelled Details generate_env was missing 5 keys that exist in the live honeydue-config ConfigMap (drift introduced over time by manual kubectl patches): STATIC_DIR, STORAGE_UPLOAD_DIR, STORAGE_BASE_URL, B2_REGION, B2_USE_SSL. Without these, running 03-deploy.sh would silently drop them and break static asset serving + B2 region/TLS. Also: - Move B2_KEY_ID/B2_APP_KEY out of generate_env: they're credentials and belong in honeydue-secrets, not cleartext in the ConfigMap. The api/worker deployments still need to be wired to read them via envFrom: secretRef before B2 uploads will work — pre-existing gap, not caused by this commit. - Use the in-namespace short DNS form for REDIS_URL ('redis:6379') to match what the live cluster has — pods' resolv.conf search path already covers honeydue.svc.cluster.local. - config.yaml.example: add b2_region, b2_use_ssl, upload_dir, base_url, static_dir under storage so a fresh bootstrap sets them correctly. Verified by sourcing _config.sh and diffing generate_env output against `kubectl get cm honeydue-config -o jsonpath='{.data}'`: clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 00:38:37 -05:00
Trey t	9ea058347f	Fix Apple Sign In: update bundle IDs from old com.tt.honeyDue.* to com.myhoneydue.* Backend CI / Test (push) Has been cancelled Details Backend CI / Contract Tests (push) Has been cancelled Details Backend CI / Build (push) Has been cancelled Details Backend CI / Lint (push) Has been cancelled Details Backend CI / Secret Scanning (push) Has been cancelled Details The iOS app was renamed (MyCrib → Casera → honeyDue) and the bundle ID was updated to com.myhoneydue.honeyDue (release) / .dev (debug), but APPLE_CLIENT_ID and APNS_TOPIC across env templates and k3s configs still pointed at the old com.tt.honeyDue.honeyDueDev value. This made verifyAudience reject every Apple identity token (aud claim mismatch). Updated: - deploy/prod.env.example: bundle ID + comment that empty client_id rejects all tokens with DEBUG=false - .env.example: add Sign in with Apple block (was missing entirely) - deploy-k3s{,-dev}/config.yaml.example: apple_auth.client_id default - deploy-k3s-dev/scripts/00-init.sh: same - docker-compose.dev.yml: APNS_TOPIC fallback - docs/deployment/10-secrets-config.md: doc reference The live deploy/prod.env and local .env are .gitignored — they were edited in place and need to ship via deploy_prod.sh to take effect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 23:58:44 -05:00
Trey t	7e77e3bbab	docs/deployment: record security hardening pass + webapp + APNs Backend CI / Test (push) Has been cancelled Details Backend CI / Contract Tests (push) Has been cancelled Details Backend CI / Build (push) Has been cancelled Details Backend CI / Lint (push) Has been cancelled Details Backend CI / Secret Scanning (push) Has been cancelled Details Mark roadmap items done (network policies, Traefik middleware, CF Full strict, CF IP UFW restriction, webapp deploy, APNs wired up, admin URL-baking fix, admin probe bug). Update Chapter 4 (firewall rule inventory now shows CF-only :443, no :80), Chapter 6 (request flow walks through TLS on :443 and middleware hops), Chapter 13 (CF SSL mode is Full strict, not Flexible; documents the origin cert install), Chapter 7 (adds the web service section — proxy pattern, 3 replicas, PostHog build-args), and Appendix C (web manifests, CF origin cert paths on disk, APNs .p8 path, updated network-policies applied status). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 15:50:59 -05:00
Trey t	ace03d2340	Security hardening: TLS at origin, security headers, network policies, admin probe fix Four related hardening changes made on the live cluster during this session. Each manifest captures the final working state so a fresh `kubectl apply` of the repo reproduces it. 1. Cloudflare Full (strict) TLS — ingresses now carry `tls:` blocks pointing at `cloudflare-origin-cert` secret (installed imperatively from the CF Origin CA PEM). CF SSL mode flipped from Flexible to Full (strict). CF↔origin is now HTTPS; origin serves a CF-issued cert that only CF can validate. 2. Traefik middleware attached to all three ingresses — `rate-limit` (100/min avg, 200 burst) and `security-headers` (frame-deny, nosniff, HSTS, referrer policy, permissions policy). `admin-auth` middleware was also defined in middleware.yaml but is not attached (needs an unset basic-auth secret) and was deleted at runtime. 3. `security-headers` middleware: stripped the Content-Security-Policy entry. The Go API sets its own CSP in internal/router/router.go that permits Google Fonts for the landing page. Two CSP headers combine via intersection (most restrictive wins), which would break the landing page. Next.js apps set their own CSP via middleware. Header kept documentation comments explain this. 4. NetworkPolicies — default-deny + explicit allows, applied. Added missing policies for `web`. Corrected the Traefik ingress rule: the scaffold used `namespaceSelector: kube-system`, but our Traefik runs as a DaemonSet with `hostNetwork: true`, so traffic arrives with the NODE IP as source. Fixed to an `ipBlock` list of the three node IPs plus the cluster pod CIDR (10.42.0.0/16). 5. admin livenessProbe path fix: was hitting /admin/ (404) which caused a 6-hour crashloop cycle (87 restarts) before the bug was caught. Fixed to / — matches the startupProbe and readinessProbe paths that were corrected earlier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 15:50:47 -05:00
Trey t	15359401fa	Deploy honeyDueAPI-Web to k3s at app.myhoneydue.com Backend CI / Test (push) Has been cancelled Details Backend CI / Contract Tests (push) Has been cancelled Details Backend CI / Build (push) Has been cancelled Details Backend CI / Lint (push) Has been cancelled Details Backend CI / Secret Scanning (push) Has been cancelled Details The Next.js 16 webapp in sibling repo honeyDueAPI-Web now runs alongside api/worker/admin on the cluster. Uses a server-side proxy pattern: browser hits app.myhoneydue.com, Next.js route handlers forward to the Go API with an httpOnly cookie, so no CORS entry or Allowed-Hosts change is needed on the API side. Availability mirrors api (3 replicas, PDB minAvailable:2, topologySpreadConstraints across nodes). Changes: - deploy-k3s/manifests/web/deployment.yaml: 3 replicas, readOnly root FS, drops all caps, mounts emptyDir for /app/.next/cache and /tmp, reads API_URL from honeydue-config. - deploy-k3s/manifests/web/service.yaml: ClusterIP :3000. - deploy-k3s/manifests/rbac.yaml: ServiceAccount web with automountServiceAccountToken: false. - deploy-k3s/manifests/pod-disruption-budgets.yaml: web-pdb minAvailable: 2. - deploy-k3s/manifests/ingress/ingress-simple.yaml: route app.myhoneydue.com → web:3000. - deploy-k3s/scripts/_config.sh: emit API_URL into the ConfigMap. - deploy-k3s/scripts/03-deploy.sh: build + push + apply the web image alongside api/worker/admin. Reads NEXT_PUBLIC_POSTHOG_KEY and NEXT_PUBLIC_POSTHOG_HOST from the operator shell env (not committed). Also adds the --build-arg NEXT_PUBLIC_API_URL wiring for the admin image that was previously only done manually. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 10:11:17 -05:00
Trey t	082b5fd3cd	Fix admin URL baking: bake NEXT_PUBLIC_API_URL at Docker build time Next.js bakes NEXT_PUBLIC_* vars into the client JS bundle at build time, not runtime. The admin image was being built with admin/.env.local containing NEXT_PUBLIC_API_URL=http://localhost:8000, hardcoding localhost into the browser bundle. The runtime configMap value had no effect on the already-compiled JS, causing prod admin login to throw CORS errors hitting localhost. Fix: - Dockerfile: admin-builder stage accepts ARG NEXT_PUBLIC_API_URL and strips any committed .env.local/.env.development.local before npm run build. - .dockerignore: explicitly exclude admin/.env.* (root-level .env.* pattern doesn't match nested paths), so a local dev .env.local can never sneak into the build context again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 10:10:53 -05:00
Trey t	6d39875ef2	README: reflect auto-seed, expand env var reference, link deployment book Backend CI / Test (push) Has been cancelled Details Backend CI / Contract Tests (push) Has been cancelled Details Backend CI / Build (push) Has been cancelled Details Backend CI / Lint (push) Has been cancelled Details Backend CI / Secret Scanning (push) Has been cancelled Details - Getting Started (all 3 install paths): simplify to one "start the app, first-boot auto-seeds lookups + admin + templates" step + optional dev test-data seed - Seed Data table: mark 001_lookups / 003_admin_user / 003_task_templates as auto-seeded via internal/database/migration_seed_initial_data.go; only 002_test_data.sql is manual now - Environment Variables: split into logical groups (core, server, admin seed, email, push, B2, worker schedules, feature flags, Apple/Google), added ~45 vars that weren't documented. Defer full reference to docs/deployment/10-secrets-config.md - Add ADMIN_EMAIL / ADMIN_PASSWORD to the admin-seed group - Tech Stack: add Backblaze B2 (minio-go), Fastmail/go-mail, Cloudflare, K3s (production orchestrator) - Project Structure: add deploy-k3s/ and docs/deployment/; mark docker-compose.yml as Swarm-era legacy - Docker subsection: clarify compose files are local dev; point at deployment book for prod workflow Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 07:30:55 -05:00
Trey t	6f303dbbaa	Migrate prod deploy from Swarm to K3s; add full deployment book Backend CI / Test (push) Has been cancelled Details Backend CI / Contract Tests (push) Has been cancelled Details Backend CI / Build (push) Has been cancelled Details Backend CI / Lint (push) Has been cancelled Details Backend CI / Secret Scanning (push) Has been cancelled Details Infrastructure: - Stack now runs on K3s v1.34.6 HA (3 Hetzner CX33 nodes as managers) - Traefik DaemonSet + hostNetwork replaces Caddy + ingress mesh - All manifests in deploy-k3s/manifests/; Swarm config (deploy/) kept temporarily for reference Bug fixes surfaced during migration: - Dockerfile: golang:1.24-alpine -> 1.25-alpine (go.mod requires 1.25) - cache_service.go: remove sync.Once reassignment from inside Do() callback (was causing 'unlock of unlocked mutex' fatal after Redis Ping failure) - router.go: relax CSP from 'default-src none' to 'default-src self' + allowlist fonts.googleapis.com so the marketing landing page CSS actually loads in browsers - deploy/scripts/deploy_prod.sh: use docker buildx with --platform linux/amd64 so arm64 (Apple Silicon) dev machines produce images runnable on x86_64 Hetzner nodes; fix array expansion under set -u - deploy/swarm-stack.prod.yml: fix secret source references to use top-level aliases (the '\${X_SECRET}' form never actually resolved); dozzle ports: long-form host_ip is rejected by Swarm, switched to short-form (bound to 0.0.0.0 with UFW-based loopback restriction); worker replicas 2 -> 1 (Asynq scheduler singleton) - deploy-k3s/manifests/admin/deployment.yaml: probe path '/admin/' -> '/' (Next.js serves at root; /admin/ returned 404 and killed pods); startupProbe failureThreshold 12 -> 24 - deploy-k3s/manifests/pod-disruption-budgets.yaml: worker minAvailable 1 -> 0 (singleton) - deploy-k3s/manifests/api/deployment.yaml: startupProbe failureThreshold 12 -> 48 (MigrateWithLock serializes across 3 replicas on first-boot; real startup takes up to 240s) - .gitignore: tighten 'api' -> '/api' (was matching deploy-k3s/manifests/api/ and admin/src/app/api/*, hiding legitimate files) New files: - deploy-k3s/manifests/traefik-helmchartconfig.yaml: DaemonSet + hostNetwork override for k3s-bundled Traefik - deploy-k3s/manifests/ingress/ingress-simple.yaml: plain Ingress without TLS (CF Flexible SSL) and without middleware - deploy-k3s/MIGRATION_NOTES.md: operator-facing migration log Documentation: - docs/deployment/ — full deployment book, 26 files, ~42k words: - Part I Overview, infrastructure, orchestrator choice (Ch 0-2) - Part II Networking, firewall, Cloudflare (Ch 3-4, 13) - Part III Security, Traefik ingress (Ch 5-6) - Part IV Services, DB, storage, secrets, registry (Ch 7-11) - Part V Data flow, deploy process, observability, failures, runbook (Ch 12, 14-17) - Part VI Cost, Swarm postmortem, roadmap (Ch 18-20) - Appendices: glossary, kubectl cheat sheet, file locations, consolidated citations - README.md: Production Deployment section replaced with pointer to the book; Go version bumped to 1.25 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 07:20:54 -05:00
Trey T	4ec4bbbfe8	Auto-seed lookups + admin + templates on first API boot Backend CI / Test (push) Has been cancelled Details Backend CI / Contract Tests (push) Has been cancelled Details Backend CI / Lint (push) Has been cancelled Details Backend CI / Secret Scanning (push) Has been cancelled Details Backend CI / Build (push) Has been cancelled Details Add a data_migration that runs seeds/001_lookups.sql, seeds/003_admin_user.sql, and seeds/003_task_templates.sql exactly once on startup and invalidates the Redis seeded_data cache afterwards so /api/static_data/ returns fresh results. Removes the need to remember `./dev.sh seed-all`; the data_migrations tracking row prevents re-runs, and each INSERT uses ON CONFLICT DO UPDATE so re-execution is safe.	2026-04-15 08:37:55 -05:00
Trey T	58e6997eee	Fix migration numbering collision and bump Dockerfile to Go 1.25 Backend CI / Test (push) Has been cancelled Details Backend CI / Contract Tests (push) Has been cancelled Details Backend CI / Build (push) Has been cancelled Details Backend CI / Lint (push) Has been cancelled Details Backend CI / Secret Scanning (push) Has been cancelled Details The `000016_task_template_id` and `000017_drop_task_template_regions_join` migrations introduced on gitea collided with the existing unpadded 016/017 migrations (authtoken_created_at, fk_indexes). Renamed them to 021/022 so they extend the shipped sequence instead of replacing real migrations. Also removed the padded 000012-000015 files which were duplicate content of the shipped 012-015 unpadded migrations. Dockerfile builder image bumped from golang:1.24-alpine to 1.25-alpine to match go.mod's `go 1.25` directive. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 16:17:23 -05:00
Trey t	237c6b84ee	Onboarding: template backlink, bulk-create endpoint, climate-region scoring Backend CI / Test (push) Has been cancelled Details Backend CI / Contract Tests (push) Has been cancelled Details Backend CI / Build (push) Has been cancelled Details Backend CI / Lint (push) Has been cancelled Details Backend CI / Secret Scanning (push) Has been cancelled Details Clients that send users through a multi-task onboarding step no longer loop N POST /api/tasks/ calls and no longer create "orphan" tasks with no reference to the TaskTemplate they came from. Task model - New task_template_id column + GORM FK (migration 000016) - CreateTaskRequest.template_id, TaskResponse.template_id - task_service.CreateTask persists the backlink Bulk endpoint - POST /api/tasks/bulk/ — 1-50 tasks in a single transaction, returns every created row + TotalSummary. Single residence access check, per-entry residence_id is overridden with batch value - task_handler.BulkCreateTasks + task_service.BulkCreateTasks using db.Transaction; task_repo.CreateTx + FindByIDTx helpers Climate-region scoring - templateConditions gains ClimateRegionID; suggestion_service scores residence.PostalCode -> ZipToState -> GetClimateRegionIDByState against the template's conditions JSON (no penalty on mismatch / unknown ZIP) - regionMatchBonus 0.35, totalProfileFields 14 -> 15 - Standalone GET /api/tasks/templates/by-region/ removed; legacy task_tasktemplate_regions many-to-many dropped (migration 000017). Region affinity now lives entirely in the template's conditions JSON Tests - +11 cases across task_service_test, task_handler_test, suggestion_ service_test: template_id persistence, bulk rollback + cap + auth, region match / mismatch / no-ZIP / unknown-ZIP / stacks-with-others Docs - docs/openapi.yaml: /tasks/bulk/ + BulkCreateTasks schemas, template_id on TaskResponse + CreateTaskRequest, /templates/by-region/ removed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:23:57 -05:00
Trey t	33eee812b6	Harden prod deploy: versioned secrets, healthchecks, migration lock, dry-run Swarm stack - Resource limits on all services, stop_grace_period 60s on api/worker/admin - Dozzle bound to manager loopback only (ssh -L required for access) - Worker health server on :6060, admin /api/health endpoint - Redis 200M LRU cap, B2/S3 env vars wired through to api service Deploy script - DRY_RUN=1 prints plan + exits - Auto-rollback on failed healthcheck, docker logout at end - Versioned-secret pruning keeps last SECRET_KEEP_VERSIONS (default 3) - PUSH_LATEST_TAG default flipped to false - B2 all-or-none validation before deploy Code - cmd/api takes pg_advisory_lock on a dedicated connection before AutoMigrate, serialising boot-time migrations across replicas - cmd/worker exposes an HTTP /health endpoint with graceful shutdown Docs - deploy/DEPLOYING.md: step-by-step walkthrough for a real deploy - deploy/shit_deploy_cant_do.md: manual prerequisites + recurring ops - deploy/README.md updated with storage toggle, worker-replica caveat, multi-arch recipe, connection-pool tuning, renumbered sections Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:22:43 -05:00
Trey t	ca818e8478	Merge branch 'master' of github.com:akatreyt/MyCribAPI_GO Backend CI / Test (push) Has been cancelled Details Backend CI / Contract Tests (push) Has been cancelled Details Backend CI / Lint (push) Has been cancelled Details Backend CI / Secret Scanning (push) Has been cancelled Details Backend CI / Build (push) Has been cancelled Details	2026-04-01 20:45:43 -05:00
Trey T	bec880886b	Coverage priorities 1-5: test pure functions, extract interfaces, mock-based handler tests - Priority 1: Test NewSendEmailTask + NewSendPushTask (5 tests) - Priority 2: Test customHTTPErrorHandler — all 15+ branches (21 tests) - Priority 3: Extract Enqueuer interface + payload builders in worker pkg (5 tests) - Priority 4: Extract ClassifyFile/ComputeRelPath in migrate-encrypt (6 tests) - Priority 5: Define Handler interfaces, refactor to accept them, mock-based tests (14 tests) - Fix .gitignore: /worker instead of worker to stop ignoring internal/worker/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-01 20:30:09 -05:00
Trey t	2e10822e5a	Add S3-compatible storage backend (B2, MinIO, AWS S3) Introduces a StorageBackend interface with local filesystem and S3 implementations. The StorageService delegates raw I/O to the backend while keeping validation, encryption, and URL generation unchanged. Backend selection is config-driven: set B2_ENDPOINT + B2_KEY_ID + B2_APP_KEY + B2_BUCKET_NAME for S3 mode, or STORAGE_UPLOAD_DIR for local mode. STORAGE_USE_SSL=false for in-cluster MinIO (HTTP). All existing tests pass unchanged — the local backend preserves identical behavior to the previous direct-filesystem implementation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 21:31:24 -05:00
Trey t	34553f3bec	Add K3s dev deployment setup for single-node VPS Mirrors the prod deploy-k3s/ setup but runs all services in-cluster on a single node: PostgreSQL (replaces Neon), MinIO S3-compatible storage (replaces B2), Redis, API, worker, and admin. Includes fully automated setup scripts (00-init through 04-verify), server hardening (SSH, fail2ban, ufw), Let's Encrypt TLS via Traefik, network policies, RBAC, and security contexts matching prod. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 21:30:39 -05:00
Trey T	00fd674b56	Remove dead climate region code from suggestion engine Suggestion engine now purely uses home profile features (heating, cooling, pool, etc.) for template matching. Climate region field and matching block removed — ZIP code is no longer collected.	2026-03-30 11:19:04 -05:00
Trey T	cb7080c460	Smart onboarding: residence home profile + suggestion engine 14 new optional residence fields (heating, cooling, water heater, roof, pool, sprinkler, septic, fireplace, garage, basement, attic, exterior, flooring, landscaping) with JSONB conditions on templates. Suggestion engine scores templates against home profile: string match +0.25, bool +0.3, property type +0.15, universal base 0.3. Graceful degradation from minimal to full profile info. GET /api/tasks/suggestions/?residence_id=X returns ranked templates. 54 template conditions across 44 templates in seed data. 8 suggestion service tests.	2026-03-30 09:02:03 -05:00
Trey T	4c9a818bd9	Comprehensive TDD test suite for task logic — ~80 new tests Predicates (20 cases): IsRecurring, IsOneTime, IsDueSoon, HasCompletions, GetCompletionCount, IsUpcoming edge cases Task creation (10): NextDueDate initialization, all frequency types, past dates, all optional fields, access validation One-time completion (8): NextDueDate→nil, InProgress reset, notes/cost/rating, double completion, backdated completed_at Recurring completion (16): Daily/Weekly/BiWeekly/Monthly/Quarterly/ Yearly/Custom frequencies, late/early completion timing, multiple sequential completions, no-original-DueDate, CompletedFromColumn capture QuickComplete (5): one-time, recurring, widget notes, 404, 403 State transitions (10): Cancel→Complete, Archive→Complete, InProgress cycles, recurring full lifecycle, Archive→Unarchive column restore Kanban column priority (7): verify chain priority order for all columns Optimistic locking (7): correct/stale version, conflict on complete/ cancel/archive/mark-in-progress, rollback verification Deletion (5): single/multi/middle completion deletion, NextDueDate recalculation, InProgress restore behavior documented Edge cases (9): boundary dates, late/early recurring, nil/zero frequency days, custom intervals, version conflicts Handler validation (4): rating bounds, title/description length, custom interval validation All 679 tests pass.	2026-03-26 17:36:50 -05:00
Trey T	7f0300cc95	Add custom_interval_days to TaskResponse DTO Field existed in Task model but was missing from API response. Aligns Go API contract with KMM mobile model.	2026-03-26 17:06:34 -05:00
Trey T	6df27f203b	Add rate limit response headers (X-RateLimit-*, Retry-After) Custom rate limiter replacing Echo built-in, with per-IP token bucket. Every response includes X-RateLimit-Limit, Remaining, Reset headers. 429 responses additionally include Retry-After (seconds). CORS updated to expose rate limit headers to mobile clients. 4 unit tests for header behavior and per-IP isolation.	2026-03-26 14:36:48 -05:00
Trey T	b679f28e55	Production hardening: security, resilience, observability, and compliance Password complexity: custom validator requiring uppercase, lowercase, digit (min 8 chars) Token expiry: 90-day token lifetime with refresh endpoint (60-90 day renewal window) Health check: /api/health/ now pings Postgres + Redis, returns 503 on failure Audit logging: async audit_log table for auth events (login, register, delete, etc.) Circuit breaker: APNs/FCM push sends wrapped with 5-failure threshold, 30s recovery FK indexes: 27 missing foreign key indexes across all tables (migration 017) CSP header: default-src 'none'; frame-ancestors 'none' Gzip compression: level 5 with media endpoint skipper Prometheus metrics: /metrics endpoint using existing monitoring service External timeouts: 15s push, 30s SMTP, context timeouts on all external calls Migrations: 016 (token created_at), 017 (FK indexes), 018 (audit_log) Tests: circuit breaker (15), audit service (8), token refresh (7), health (4), middleware expiry (5), validator (new)	2026-03-26 14:05:28 -05:00
Trey T	4abc57535e	Add delete account endpoint and file encryption at rest Delete Account (Plan #2): - DELETE /api/auth/account/ with password or "DELETE" confirmation - Cascade delete across 15+ tables in correct FK order - Auth provider detection (email/apple/google) for /auth/me/ - File cleanup after account deletion - Handler + repository tests (12 tests) Encryption at Rest (Plan #3): - AES-256-GCM envelope encryption (per-file DEK wrapped by KEK) - Encrypt on upload, auto-decrypt on serve via StorageService.ReadFile() - MediaHandler serves decrypted files via c.Blob() - TaskService email image loading uses ReadFile() - cmd/migrate-encrypt CLI tool with --dry-run for existing files - Encryption service + storage service tests (18 tests)	2026-03-26 10:41:01 -05:00
Trey T	72866e935e	Disable auth rate limiters in debug mode for UI test suites Rate limiters on login/register/password-reset endpoints cause 429 errors when running parallel UI tests that create many accounts. In debug mode, skip rate limiters entirely so test suites can run without throttling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 15:06:18 -05:00
Trey t	42a5533a56	Fix 113 hardening issues across entire Go backend Security: - Replace all binding: tags with validate: + c.Validate() in admin handlers - Add rate limiting to auth endpoints (login, register, password reset) - Add security headers (HSTS, XSS protection, nosniff, frame options) - Wire Google Pub/Sub token verification into webhook handler - Replace ParseUnverified with proper OIDC/JWKS key verification - Verify inner Apple JWS signatures in webhook handler - Add io.LimitReader (1MB) to all webhook body reads - Add ownership verification to file deletion - Move hardcoded admin credentials to env vars - Add uniqueIndex to User.Email - Hide ConfirmationCode from JSON serialization - Mask confirmation codes in admin responses - Use http.DetectContentType for upload validation - Fix path traversal in storage service - Replace os.Getenv with Viper in stripe service - Sanitize Redis URLs before logging - Separate DEBUG_FIXED_CODES from DEBUG flag - Reject weak SECRET_KEY in production - Add host check on /_next/* proxy routes - Use explicit localhost CORS origins in debug mode - Replace err.Error() with generic messages in all admin error responses Critical fixes: - Rewrite FCM to HTTP v1 API with OAuth 2.0 service account auth - Fix user_customuser -> auth_user table names in raw SQL - Fix dashboard verified query to use UserProfile model - Add escapeLikeWildcards() to prevent SQL wildcard injection Bug fixes: - Add bounds checks for days/expiring_soon query params (1-3650) - Add receipt_data/transaction_id empty-check to RestoreSubscription - Change Active bool -> *bool in device handler - Check all unchecked GORM/FindByIDWithProfile errors - Add validation for notification hour fields (0-23) - Add max=10000 validation on task description updates Transactions & data integrity: - Wrap registration flow in transaction - Wrap QuickComplete in transaction - Move image creation inside completion transaction - Wrap SetSpecialties in transaction - Wrap GetOrCreateToken in transaction - Wrap completion+image deletion in transaction Performance: - Batch completion summaries (2 queries vs 2N) - Reuse single http.Client in IAP validation - Cache dashboard counts (30s TTL) - Batch COUNT queries in admin user list - Add Limit(500) to document queries - Add reminder_stage+due_date filters to reminder queries - Parse AllowedTypes once at init - In-memory user cache in auth middleware (30s TTL) - Timezone change detection cache - Optimize P95 with per-endpoint sorted buffers - Replace crypto/md5 with hash/fnv for ETags Code quality: - Add sync.Once to all monitoring Stop()/Close() methods - Replace 8 fmt.Printf with zerolog in auth service - Log previously discarded errors - Standardize delete response shapes - Route hardcoded English through i18n - Remove FileURL from DocumentResponse (keep MediaURL only) - Thread user timezone through kanban board responses - Initialize empty slices to prevent null JSON - Extract shared field map for task Update/UpdateTx - Delete unused SoftDeleteModel, min(), formatCron, legacy handlers Worker & jobs: - Wire Asynq email infrastructure into worker - Register HandleReminderLogCleanup with daily 3AM cron - Use per-user timezone in HandleSmartReminder - Replace direct DB queries with repository calls - Delete legacy reminder handlers (~200 lines) - Delete unused task type constants Dependencies: - Replace archived jung-kurt/gofpdf with go-pdf/fpdf - Replace unmaintained gomail.v2 with wneessen/go-mail - Add TODO for Echo jwt v3 transitive dep removal Test infrastructure: - Fix MakeRequest/SeedLookupData error handling - Replace os.Exit(0) with t.Skip() in scope/consistency tests - Add 11 new FCM v1 tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 23:14:13 -05:00
Trey t	3b86d0aae1	Include completion_summary in my-residences list endpoint Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 00:14:24 -05:00
Trey t	6803f6ec18	Add honeycomb completion heatmap and data migration framework - Add completion_summary endpoint data to residence detail response - Track completed_from_column on task completions (overdue/due_soon/upcoming) - Add GetCompletionSummary repo method with monthly aggregation - Add one-time data migration framework (data_migrations table + registry) - Add backfill migration to classify historical completions - Add standalone backfill script for manual/dry-run usage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 00:05:10 -05:00
Trey t	739b245ee6	Fix PDF report UTF-8 encoding for residence names and task fields Add UnicodeTranslatorFromDescriptor to convert UTF-8 strings to Windows-1252 for gofpdf built-in fonts. Prevents garbled characters in residence names, task titles, categories, priorities, and statuses. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 11:23:44 -05:00
Trey t	7bd2cbabe9	Fix broken email icon by updating old domain references to myhoneydue.com The email icon URL was pointing to honeyDue.treytartt.com which now returns 404. Updated to api.myhoneydue.com along with BASE_URL, FROM_EMAIL, and CORS defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 13:38:55 -06:00
Trey t	bf309f5ff9	Move admin dashboard to admin.myhoneydue.com subdomain - Remove Next.js basePath "/admin" — admin now serves at root - Update all internal links from /admin/xxx to /xxx - Change Go proxy to host-based routing: admin subdomain requests proxy to Next.js, /admin/* redirects to main web app - Update timeout middleware skipper for admin subdomain Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 12:35:31 -06:00
Trey t	1fdc29af1c	Add admin subdomain redirect for admin.myhoneydue.com When ADMIN_HOST is set, redirects root "/" to "/admin/" so admin.myhoneydue.com works without needing the /admin path suffix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 12:25:26 -06:00
Trey t	821a3e452f	Remove docs and marketing files relocated to old_files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 07:09:06 -06:00
Trey t	4976eafc6c	Rebrand from Casera/MyCrib to honeyDue Total rebrand across all Go API source files: - Go module path: casera-api -> honeydue-api - All imports updated (130+ files) - Docker: containers, images, networks renamed - Email templates: support email, noreply, icon URL - Domains: casera.app/mycrib.treytartt.com -> honeyDue.treytartt.com - Bundle IDs: com.tt.casera -> com.tt.honeyDue - IAP product IDs updated - Landing page, admin panel, config defaults - Seeds, CI workflows, Makefile, docs - Database table names preserved (no migration needed) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 06:33:38 -06:00
Trey t	793e50ce52	Add regional task templates API with climate zone lookup Adds a new endpoint GET /api/tasks/templates/by-region/?zip= that resolves ZIP codes to IECC climate regions and returns relevant home maintenance task templates. Includes climate region model, region lookup service with tests, seed data for all 8 climate zones with 50+ templates, and OpenAPI spec. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 15:15:30 -06:00
Trey t	72db9050f8	Add Stripe billing, free trials, and cross-platform subscription guards - Stripe integration: add StripeService with checkout sessions, customer portal, and webhook handling for subscription lifecycle events. - Free trials: auto-start configurable trial on first subscription check, with admin-controllable duration and enable/disable toggle. - Cross-platform guard: prevent duplicate subscriptions across iOS, Android, and Stripe by checking existing platform before allowing purchase. - Subscription model: add Stripe fields (customer_id, subscription_id, price_id), trial fields (trial_start, trial_end, trial_used), and SubscriptionSource/IsTrialActive helpers. - API: add trial and source fields to status response, update OpenAPI spec. - Clean up stale migration and audit docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 11:36:14 -06:00
Trey t	d5bb123cd0	Redesign email templates to match web landing page Warm Sage design system Rewrote all 11 email templates to use the Casera web brand: Outfit font via Google Fonts, sage green (#6B8F71) brand stripe, cream (#FAFAF7) background, pill-shaped clay (#C4856A) CTA buttons, icon-badge feature cards, numbered tip cards, linen callout boxes, and refined light footer. Extracted reusable helpers (emailButton, emailCodeBox, emailCalloutBox, emailAlertBox, emailFeatureItem, emailTipCard) for consistent component composition. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 20:02:41 -06:00
Trey t	6dcf797613	Fix worker healthcheck: use pgrep -f for Alpine busybox compatibility Alpine's busybox pgrep -x doesn't match process names correctly. Use pgrep -f /app/worker to match the full command path instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 20:02:42 -06:00
Trey t	7438dfd9b1	Fix timeout middleware panic on proxy/WebSocket routes and worker healthcheck The TimeoutMiddleware wraps the response writer in *http.timeoutWriter which doesn't implement http.Flusher. When the admin reverse proxy or WebSocket upgrader tries to flush, it panics and crashes the container (502 Bad Gateway). Skip timeout for /admin, /_next, and /ws routes. Also fix the Dockerfile HEALTHCHECK to detect the worker process — the worker has no HTTP server so the curl-based check always failed, marking it unhealthy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 19:56:12 -06:00
Trey t	7690f07a2b	Harden API security: input validation, safe auth extraction, new tests, and deploy config Comprehensive security hardening from audit findings: - Add validation tags to all DTO request structs (max lengths, ranges, enums) - Replace unsafe type assertions with MustGetAuthUser helper across all handlers - Remove query-param token auth from admin middleware (prevents URL token leakage) - Add request validation calls in handlers that were missing c.Validate() - Remove goroutines in handlers (timezone update now synchronous) - Add sanitize middleware and path traversal protection (path_utils) - Stop resetting admin passwords on migration restart - Warn on well-known default SECRET_KEY - Add ~30 new test files covering security regressions, auth safety, repos, and services - Add deploy/ config, audit digests, and AUDIT_FINDINGS documentation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 09:48:01 -06:00
treyt	56d6fa4514	Add Dozzle log viewer to dev and prod compose files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 21:39:43 -06:00
treyt	e26116e2cf	Add webhook logging, pagination, middleware, migrations, and prod hardening - Webhook event logging repo and subscription webhook idempotency - Pagination helper (echohelpers) with cursor/offset support - Request ID and structured logging middleware - Push client improvements (FCM HTTP v1, better error handling) - Task model version column, business constraint migrations, targeted indexes - Expanded categorization chain tests - Email service and config hardening - CI workflow updates, .gitignore additions, .env.example updates Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 21:32:09 -06:00
treyt	806bd07f80	Update README for split dev/prod Docker config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 21:29:52 -06:00
treyt	f1e39f90c7	Split Docker config for dev/prod and fix arch-agnostic builds - Dockerfile: use --platform=$BUILDPLATFORM + ARG TARGETARCH instead of hardcoded GOARCH=arm64, enabling cross-compilation and native builds on both arm64 (M1) and amd64 (prod server) - docker-compose.yml: rewrite for Docker Swarm — image refs, deploy sections, overlay network, no container_name/depends_on conditions, DB/Redis ports not exposed externally - docker-compose.dev.yml: rewrite as self-contained dev compose with build targets, container_name, depends_on, dev-safe defaults - Makefile: switch to docker compose v2, point dev targets at docker-compose.dev.yml, add docker-build-prod target - Delete stale docker/Dockerfile (Go 1.21) and docker/docker-compose.yml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 21:27:35 -06:00
treyt	9f8828a503	Fix admin healthcheck: use 127.0.0.1 instead of localhost Alpine Linux resolves localhost to IPv6 ::1, but Next.js binds to IPv4 0.0.0.0 — causing the healthcheck to fail with connection refused. Also update worker env vars from legacy Celery names to current ones. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 21:08:40 -06:00

1 2 3 4

183 Commits