Files
Sportstime/.planning/phases/06-validation-reports/06-01-PLAN.md
Trey t 4266940c8f docs(06-01): create validation reports phase plan
Phase 6: Validation Reports
- 1 plan created
- 2 tasks defined
- Ready for execution
2026-01-10 10:24:59 -06:00

5.3 KiB

phase, plan, type
phase plan type
06-validation-reports 01 execute
Add comprehensive validation reporting to cloudkit_import.py with data quality metrics, orphan detection, and formatted output.

Purpose: Enable quick data quality assessment before/after sync operations to catch relationship integrity issues and data gaps. Output: --validate command that generates detailed validation report with counts, gaps, orphans, and relationship checks.

<execution_context> ~/.claude/get-shit-done/workflows/execute-phase.md ~/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md

Prior phase context:

@.planning/phases/05-cloudkit-crud/05-01-SUMMARY.md @.planning/phases/05-cloudkit-crud/05-02-SUMMARY.md

Source files:

@Scripts/cloudkit_import.py @Scripts/validate_canonical.py

Tech stack available: Python 3, requests, cloudkit server-to-server auth Established patterns:

  • query_all() for CloudKit pagination
  • compute_diff() for local vs cloud comparison
  • --verify/--verify-deep for sync verification
  • validate_canonical.py for local data validation

Constraining decisions:

  • Phase 5: Triple lookup fallback (recordName -> deterministic UUID -> canonicalId query)
  • Phase 5: Location comparison uses 0.0001 tolerance for lat/lng
Task 1: Add comprehensive validation command Scripts/cloudkit_import.py Add `--validate` flag and `validate_all()` function that:
  1. Local validation - Call existing validate_canonical.py checks:

    • Duplicate IDs
    • Required fields
    • Team → Stadium references
    • Game → Team/Stadium references
    • Cross-sport references
    • Stadium alias references
    • Game counts per team
  2. CloudKit relationship validation - New checks:

    • Games referencing non-existent teams in CloudKit
    • Games referencing non-existent stadiums in CloudKit
    • Teams referencing non-existent stadiums in CloudKit
    • Aliases referencing non-existent stadiums in CloudKit
  3. Sync status - Leverage existing compute_diff():

    • Count of records only in local (not uploaded)
    • Count of records only in CloudKit (orphans)
    • Count of records with field differences
  4. Output format:

    • Print structured report to console
    • If --output FILE provided, write JSON report

Import validate_canonical functions directly rather than subprocess call. Add to interactive menu as option 16. Run python cloudkit_import.py --validate --dry-run and confirm:

  • Local validation results displayed
  • CloudKit relationship checks run (or skip gracefully if no credentials)
  • Sync status summary shown
  • No errors/exceptions --validate flag works, shows local validation + CloudKit checks + sync status. Menu option 16 available.
Task 2: Add orphan listing and completeness metrics Scripts/cloudkit_import.py Enhance validation report with:
  1. Orphan listing (non-destructive):

    • Add --list-orphans flag that shows orphan records without deleting
    • Group by type (Stadium, Team, Game, StadiumAlias, TeamAlias)
    • Show first 10 of each type with recordName/canonicalId
    • Show total count per type
  2. Data completeness metrics:

    • Stadiums: % with coordinates, % with capacity, % with year_opened
    • Teams: % with valid stadium reference
    • Games: % with resolved home/away teams, % with resolved stadium
    • Show counts of "unknown" stadiums (stadium_unknown_*)
  3. Report summary:

    • Overall health score (0-100 based on error count)
    • Quick pass/fail for each category
    • Actionable recommendations for common issues
  4. JSON output enhancement:

    • Include all metrics in structured format
    • Add timestamp and data source versions
    • Compatible with future dashboard consumption

Add --list-orphans to menu as option 17. Run python cloudkit_import.py --validate --list-orphans and confirm:

  • Orphan records listed by type (or "No orphans found")
  • Completeness metrics shown (% with coordinates, etc.)
  • Health score calculated
  • JSON output works with --output flag --list-orphans shows orphans without deletion. Completeness metrics calculated. Health score displayed. JSON export includes all data.
Before declaring phase complete: - [ ] `python cloudkit_import.py --validate` runs without errors - [ ] `python cloudkit_import.py --list-orphans` shows orphan summary - [ ] `python cloudkit_import.py --validate --output report.json` creates valid JSON - [ ] Menu options 16-17 work in interactive mode - [ ] Existing functionality (--diff, --verify, --smart-sync) still works

<success_criteria>

  • All tasks completed
  • All verification checks pass
  • No errors or warnings introduced
  • Validation report shows meaningful data quality metrics
  • Phase 6 complete (final phase of milestone) </success_criteria>
After completion, create `.planning/phases/06-validation-reports/06-01-SUMMARY.md` with: - What validation capabilities were added - Sample validation output - Health score calculation method - Any issues encountered