docs(06-01): create validation reports phase plan

Phase 6: Validation Reports - 1 plan created - 2 tasks defined - Ready for execution
2026-01-10 10:24:59 -06:00
parent ad7a396704
commit 4266940c8f
1 changed files with 158 additions and 0 deletions
@@ -0,0 +1,158 @@
+---
+phase: 06-validation-reports
+plan: 01
+type: execute
+---
+
+<objective>
+Add comprehensive validation reporting to cloudkit_import.py with data quality metrics, orphan detection, and formatted output.
+
+Purpose: Enable quick data quality assessment before/after sync operations to catch relationship integrity issues and data gaps.
+Output: `--validate` command that generates detailed validation report with counts, gaps, orphans, and relationship checks.
+</objective>
+
+<execution_context>
+~/.claude/get-shit-done/workflows/execute-phase.md
+~/.claude/get-shit-done/templates/summary.md
+</execution_context>
+
+<context>
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/STATE.md
+
+# Prior phase context:
+@.planning/phases/05-cloudkit-crud/05-01-SUMMARY.md
+@.planning/phases/05-cloudkit-crud/05-02-SUMMARY.md
+
+# Source files:
+@Scripts/cloudkit_import.py
+@Scripts/validate_canonical.py
+
+**Tech stack available:** Python 3, requests, cloudkit server-to-server auth
+**Established patterns:**
+- query_all() for CloudKit pagination
+- compute_diff() for local vs cloud comparison
+- --verify/--verify-deep for sync verification
+- validate_canonical.py for local data validation
+
+**Constraining decisions:**
+- Phase 5: Triple lookup fallback (recordName -> deterministic UUID -> canonicalId query)
+- Phase 5: Location comparison uses 0.0001 tolerance for lat/lng
+</context>
+
+<tasks>
+
+<task type="auto">
+  <name>Task 1: Add comprehensive validation command</name>
+  <files>Scripts/cloudkit_import.py</files>
+  <action>
+Add `--validate` flag and `validate_all()` function that:
+
+1. **Local validation** - Call existing validate_canonical.py checks:
+   - Duplicate IDs
+   - Required fields
+   - Team → Stadium references
+   - Game → Team/Stadium references
+   - Cross-sport references
+   - Stadium alias references
+   - Game counts per team
+
+2. **CloudKit relationship validation** - New checks:
+   - Games referencing non-existent teams in CloudKit
+   - Games referencing non-existent stadiums in CloudKit
+   - Teams referencing non-existent stadiums in CloudKit
+   - Aliases referencing non-existent stadiums in CloudKit
+
+3. **Sync status** - Leverage existing compute_diff():
+   - Count of records only in local (not uploaded)
+   - Count of records only in CloudKit (orphans)
+   - Count of records with field differences
+
+4. **Output format**:
+   - Print structured report to console
+   - If `--output FILE` provided, write JSON report
+
+Import validate_canonical functions directly rather than subprocess call. Add to interactive menu as option 16.
+  </action>
+  <verify>
+Run `python cloudkit_import.py --validate --dry-run` and confirm:
+- Local validation results displayed
+- CloudKit relationship checks run (or skip gracefully if no credentials)
+- Sync status summary shown
+- No errors/exceptions
+  </verify>
+  <done>
+--validate flag works, shows local validation + CloudKit checks + sync status. Menu option 16 available.
+  </done>
+</task>
+
+<task type="auto">
+  <name>Task 2: Add orphan listing and completeness metrics</name>
+  <files>Scripts/cloudkit_import.py</files>
+  <action>
+Enhance validation report with:
+
+1. **Orphan listing** (non-destructive):
+   - Add `--list-orphans` flag that shows orphan records without deleting
+   - Group by type (Stadium, Team, Game, StadiumAlias, TeamAlias)
+   - Show first 10 of each type with recordName/canonicalId
+   - Show total count per type
+
+2. **Data completeness metrics**:
+   - Stadiums: % with coordinates, % with capacity, % with year_opened
+   - Teams: % with valid stadium reference
+   - Games: % with resolved home/away teams, % with resolved stadium
+   - Show counts of "unknown" stadiums (stadium_unknown_*)
+
+3. **Report summary**:
+   - Overall health score (0-100 based on error count)
+   - Quick pass/fail for each category
+   - Actionable recommendations for common issues
+
+4. **JSON output enhancement**:
+   - Include all metrics in structured format
+   - Add timestamp and data source versions
+   - Compatible with future dashboard consumption
+
+Add `--list-orphans` to menu as option 17.
+  </action>
+  <verify>
+Run `python cloudkit_import.py --validate --list-orphans` and confirm:
+- Orphan records listed by type (or "No orphans found")
+- Completeness metrics shown (% with coordinates, etc.)
+- Health score calculated
+- JSON output works with --output flag
+  </verify>
+  <done>
+--list-orphans shows orphans without deletion. Completeness metrics calculated. Health score displayed. JSON export includes all data.
+  </done>
+</task>
+
+</tasks>
+
+<verification>
+Before declaring phase complete:
+- [ ] `python cloudkit_import.py --validate` runs without errors
+- [ ] `python cloudkit_import.py --list-orphans` shows orphan summary
+- [ ] `python cloudkit_import.py --validate --output report.json` creates valid JSON
+- [ ] Menu options 16-17 work in interactive mode
+- [ ] Existing functionality (--diff, --verify, --smart-sync) still works
+</verification>
+
+<success_criteria>
+
+- All tasks completed
+- All verification checks pass
+- No errors or warnings introduced
+- Validation report shows meaningful data quality metrics
+- Phase 6 complete (final phase of milestone)
+</success_criteria>
+
+<output>
+After completion, create `.planning/phases/06-validation-reports/06-01-SUMMARY.md` with:
+- What validation capabilities were added
+- Sample validation output
+- Health score calculation method
+- Any issues encountered
+</output>