# Go To Prod Plan This document is a phased production-readiness plan for the Casera Go API repo. Execute phases in order. Do not skip exit criteria. ## How To Use This Plan 1. Create an issue/epic per phase. 2. Track each checklist item as a task. 3. Only advance phases after all exit criteria pass in CI and staging. ## Phase 0 - Baseline And Drift Cleanup Goal: eliminate known repo/config drift before hardening. ### Tasks 1. Fix stale admin build/run targets in [`Makefile`](/Users/treyt/Desktop/code/MyCribAPI_GO/Makefile) that reference `cmd/admin` (non-existent). 2. Align worker env vars in [`docker-compose.yml`](/Users/treyt/Desktop/code/MyCribAPI_GO/docker-compose.yml) with Go config: - use `TASK_REMINDER_HOUR` - use `OVERDUE_REMINDER_HOUR` - use `DAILY_DIGEST_HOUR` 3. Align supported locales in [`internal/i18n/i18n.go`](/Users/treyt/Desktop/code/MyCribAPI_GO/internal/i18n/i18n.go) with translation files in [`internal/i18n/translations`](/Users/treyt/Desktop/code/MyCribAPI_GO/internal/i18n/translations). 4. Remove any committed secrets/keys from repo and history; rotate immediately. ### Validation 1. `go test ./...` 2. `go build ./cmd/api ./cmd/worker` 3. `docker compose config` succeeds. ### Exit Criteria 1. No stale targets or mismatched env keys remain. 2. CI and local boot work with a single source-of-truth config model. --- ## Phase 1 - Non-Negotiable CI Gates Goal: block regressions by policy. ### Tasks 1. Update [`/.github/workflows/backend-ci.yml`](/Users/treyt/Desktop/code/MyCribAPI_GO/.github/workflows/backend-ci.yml) with required jobs: - `lint` (`go vet ./...`, `gofmt -l .`) - `test` (`go test -race -count=1 ./...`) - `contract` (`go test -v -run "TestRouteSpecContract|TestKMPSpecContract" ./internal/integration/`) - `build` (`go build ./cmd/api ./cmd/worker`) 2. Add `govulncheck ./...` job. 3. Add secret scanning (for example, gitleaks). 4. Set branch protection on `main` and `develop`: - require PR - require all status checks - require at least one review - dismiss stale reviews on new commits ### Validation 1. Open test PR with intentional formatting error; ensure merge is blocked. 2. Open test PR with OpenAPI/route drift; ensure merge is blocked. ### Exit Criteria 1. No direct merge path exists without passing all gates. --- ## Phase 2 - Contract, Data, And Migration Safety Goal: guarantee deploy safety for API behavior and schema changes. ### Tasks 1. Keep OpenAPI as source of truth in [`docs/openapi.yaml`](/Users/treyt/Desktop/code/MyCribAPI_GO/docs/openapi.yaml). 2. Require route/schema updates in same PR as handler changes. 3. Add migration checks in CI: - migrate up on clean DB - migrate down one step - migrate up again 4. Add DB constraints for business invariants currently enforced only in service code. 5. Add idempotency protections for webhook/job handlers. ### Validation 1. Run migration smoke test pipeline against ephemeral Postgres. 2. Re-run integration contract tests after each endpoint change. ### Exit Criteria 1. Schema changes are reversible and validated before merge. 2. API contract drift is caught pre-merge. --- ## Phase 3 - Test Hardening For Failure Modes Goal: increase confidence in edge cases and concurrency. ### Tasks 1. Add table-driven tests for task lifecycle transitions: - cancel/uncancel - archive/unarchive - complete/quick-complete - recurring next due date transitions 2. Add timezone boundary tests around midnight and DST. 3. Add concurrency tests for race-prone flows in services/repositories. 4. Add fuzz/property tests for: - task categorization predicates - reminder schedule logic 5. Add unauthorized-access tests for media/document/task cross-residence access. ### Validation 1. `go test -race -count=1 ./...` stays green. 2. New tests fail when logic is intentionally broken (mutation spot checks). ### Exit Criteria 1. High-risk flows have explicit edge-case coverage. --- ## Phase 4 - Security Hardening Goal: reduce breach and abuse risk. ### Tasks 1. Add strict request size/time limits for upload and auth endpoints. 2. Add rate limits for: - login - forgot/reset password - verification endpoints - webhooks 3. Ensure logs redact secrets/tokens/PII payloads. 4. Enforce least-privilege for runtime creds and service accounts. 5. Enable dependency update cadence with security review. ### Validation 1. Abuse test scripts for brute-force and oversized payload attempts. 2. Verify logs do not expose secrets under failure paths. ### Exit Criteria 1. Security scans pass and abuse protections are enforced in runtime. --- ## Phase 5 - Observability And Operations Goal: make production behavior measurable and actionable. ### Tasks 1. Standardize request correlation IDs across API and worker logs. 2. Define SLOs: - API availability - p95 latency for key endpoints - worker queue delay 3. Add dashboards + alerts for: - 5xx error rate - auth failures - queue depth/retry spikes - DB latency 4. Add dead-letter queue review and replay procedure. 5. Document incident runbooks in [`docs/`](/Users/treyt/Desktop/code/MyCribAPI_GO/docs): - DB outage - Redis outage - push provider outage - webhook backlog ### Validation 1. Trigger synthetic failures in staging and confirm alerts fire. 2. Execute at least one incident drill and capture MTTR. ### Exit Criteria 1. Team can detect and recover from common failures quickly. --- ## Phase 6 - Performance And Capacity Goal: prove headroom before production growth. ### Tasks 1. Define load profiles for hot endpoints: - `/api/tasks/` - `/api/static_data/` - `/api/auth/login/` 2. Run load and soak tests in staging. 3. Capture query plans for slow SQL and add indexes where needed. 4. Validate Redis/cache fallback behavior under cache loss. 5. Tune worker concurrency and queue weights from measured data. ### Validation 1. Meet agreed latency/error SLOs under target load. 2. No sustained queue growth under steady-state load. ### Exit Criteria 1. Capacity plan is documented with clear limits and scaling triggers. --- ## Phase 7 - Release Discipline And Recovery Goal: safe deployments and verified rollback/recovery. ### Tasks 1. Adopt canary or blue/green deploy strategy. 2. Add automatic rollback triggers based on SLO violations. 3. Add pre-deploy checklist: - migrations reviewed - CI green - queue backlog healthy - dependencies healthy 4. Validate backups with restore drills (not just backup existence). 5. Document RPO/RTO targets and current measured reality. ### Validation 1. Perform one full staging rollback rehearsal. 2. Perform one restore-from-backup rehearsal. ### Exit Criteria 1. Deploy and rollback are repeatable, scripted, and tested. --- ## Definition Of Done (Every PR) 1. `go vet ./...` 2. `gofmt -l .` returns no files 3. `go test -race -count=1 ./...` 4. Contract tests pass 5. OpenAPI updated for endpoint changes 6. Migrations added and reversible for schema changes 7. Security impact reviewed for auth/uploads/media/webhooks 8. Observability impact reviewed for new critical paths --- ## Recommended Execution Timeline 1. Week 1: Phase 0 + Phase 1 2. Week 2: Phase 2 3. Week 3-4: Phase 3 + Phase 4 4. Week 5: Phase 5 5. Week 6: Phase 6 + Phase 7 rehearsal Adjust timeline based on team size and release pressure, but keep ordering.