--- phase: 06-validation-reports plan: 01 type: execute --- Add comprehensive validation reporting to cloudkit_import.py with data quality metrics, orphan detection, and formatted output. Purpose: Enable quick data quality assessment before/after sync operations to catch relationship integrity issues and data gaps. Output: `--validate` command that generates detailed validation report with counts, gaps, orphans, and relationship checks. ~/.claude/get-shit-done/workflows/execute-phase.md ~/.claude/get-shit-done/templates/summary.md @.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md # Prior phase context: @.planning/phases/05-cloudkit-crud/05-01-SUMMARY.md @.planning/phases/05-cloudkit-crud/05-02-SUMMARY.md # Source files: @Scripts/cloudkit_import.py @Scripts/validate_canonical.py **Tech stack available:** Python 3, requests, cloudkit server-to-server auth **Established patterns:** - query_all() for CloudKit pagination - compute_diff() for local vs cloud comparison - --verify/--verify-deep for sync verification - validate_canonical.py for local data validation **Constraining decisions:** - Phase 5: Triple lookup fallback (recordName -> deterministic UUID -> canonicalId query) - Phase 5: Location comparison uses 0.0001 tolerance for lat/lng Task 1: Add comprehensive validation command Scripts/cloudkit_import.py Add `--validate` flag and `validate_all()` function that: 1. **Local validation** - Call existing validate_canonical.py checks: - Duplicate IDs - Required fields - Team → Stadium references - Game → Team/Stadium references - Cross-sport references - Stadium alias references - Game counts per team 2. **CloudKit relationship validation** - New checks: - Games referencing non-existent teams in CloudKit - Games referencing non-existent stadiums in CloudKit - Teams referencing non-existent stadiums in CloudKit - Aliases referencing non-existent stadiums in CloudKit 3. **Sync status** - Leverage existing compute_diff(): - Count of records only in local (not uploaded) - Count of records only in CloudKit (orphans) - Count of records with field differences 4. **Output format**: - Print structured report to console - If `--output FILE` provided, write JSON report Import validate_canonical functions directly rather than subprocess call. Add to interactive menu as option 16. Run `python cloudkit_import.py --validate --dry-run` and confirm: - Local validation results displayed - CloudKit relationship checks run (or skip gracefully if no credentials) - Sync status summary shown - No errors/exceptions --validate flag works, shows local validation + CloudKit checks + sync status. Menu option 16 available. Task 2: Add orphan listing and completeness metrics Scripts/cloudkit_import.py Enhance validation report with: 1. **Orphan listing** (non-destructive): - Add `--list-orphans` flag that shows orphan records without deleting - Group by type (Stadium, Team, Game, StadiumAlias, TeamAlias) - Show first 10 of each type with recordName/canonicalId - Show total count per type 2. **Data completeness metrics**: - Stadiums: % with coordinates, % with capacity, % with year_opened - Teams: % with valid stadium reference - Games: % with resolved home/away teams, % with resolved stadium - Show counts of "unknown" stadiums (stadium_unknown_*) 3. **Report summary**: - Overall health score (0-100 based on error count) - Quick pass/fail for each category - Actionable recommendations for common issues 4. **JSON output enhancement**: - Include all metrics in structured format - Add timestamp and data source versions - Compatible with future dashboard consumption Add `--list-orphans` to menu as option 17. Run `python cloudkit_import.py --validate --list-orphans` and confirm: - Orphan records listed by type (or "No orphans found") - Completeness metrics shown (% with coordinates, etc.) - Health score calculated - JSON output works with --output flag --list-orphans shows orphans without deletion. Completeness metrics calculated. Health score displayed. JSON export includes all data. Before declaring phase complete: - [ ] `python cloudkit_import.py --validate` runs without errors - [ ] `python cloudkit_import.py --list-orphans` shows orphan summary - [ ] `python cloudkit_import.py --validate --output report.json` creates valid JSON - [ ] Menu options 16-17 work in interactive mode - [ ] Existing functionality (--diff, --verify, --smart-sync) still works - All tasks completed - All verification checks pass - No errors or warnings introduced - Validation report shows meaningful data quality metrics - Phase 6 complete (final phase of milestone) After completion, create `.planning/phases/06-validation-reports/06-01-SUMMARY.md` with: - What validation capabilities were added - Sample validation output - Health score calculation method - Any issues encountered