Files

Trey t 4266940c8f docs(06-01): create validation reports phase plan

Phase 6: Validation Reports
- 1 plan created
- 2 tasks defined
- Ready for execution

2026-01-10 10:24:59 -06:00

5.3 KiB

Raw Blame History

phase, plan, type

phase	plan	type
06-validation-reports	01	execute

Add comprehensive validation reporting to cloudkit_import.py with data quality metrics, orphan detection, and formatted output.

Purpose: Enable quick data quality assessment before/after sync operations to catch relationship integrity issues and data gaps. Output: --validate command that generates detailed validation report with counts, gaps, orphans, and relationship checks.

<execution_context> ~/.claude/get-shit-done/workflows/execute-phase.md ~/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md

Prior phase context:

@.planning/phases/05-cloudkit-crud/05-01-SUMMARY.md @.planning/phases/05-cloudkit-crud/05-02-SUMMARY.md

Source files:

@Scripts/cloudkit_import.py @Scripts/validate_canonical.py

Tech stack available: Python 3, requests, cloudkit server-to-server auth Established patterns:

query_all() for CloudKit pagination
compute_diff() for local vs cloud comparison
--verify/--verify-deep for sync verification
validate_canonical.py for local data validation

Constraining decisions:

Phase 5: Triple lookup fallback (recordName -> deterministic UUID -> canonicalId query)
Phase 5: Location comparison uses 0.0001 tolerance for lat/lng

Task 1: Add comprehensive validation command Scripts/cloudkit_import.py Add `--validate` flag and `validate_all()` function that:

Local validation - Call existing validate_canonical.py checks:
- Duplicate IDs
- Required fields
- Team → Stadium references
- Game → Team/Stadium references
- Cross-sport references
- Stadium alias references
- Game counts per team
CloudKit relationship validation - New checks:
- Games referencing non-existent teams in CloudKit
- Games referencing non-existent stadiums in CloudKit
- Teams referencing non-existent stadiums in CloudKit
- Aliases referencing non-existent stadiums in CloudKit
Sync status - Leverage existing compute_diff():
- Count of records only in local (not uploaded)
- Count of records only in CloudKit (orphans)
- Count of records with field differences
Output format:
- Print structured report to console
- If --output FILE provided, write JSON report

Import validate_canonical functions directly rather than subprocess call. Add to interactive menu as option 16. Run python cloudkit_import.py --validate --dry-run and confirm:

Local validation results displayed
CloudKit relationship checks run (or skip gracefully if no credentials)
Sync status summary shown
No errors/exceptions --validate flag works, shows local validation + CloudKit checks + sync status. Menu option 16 available.

Task 2: Add orphan listing and completeness metrics Scripts/cloudkit_import.py Enhance validation report with:

Orphan listing (non-destructive):
- Add --list-orphans flag that shows orphan records without deleting
- Group by type (Stadium, Team, Game, StadiumAlias, TeamAlias)
- Show first 10 of each type with recordName/canonicalId
- Show total count per type
Data completeness metrics:
- Stadiums: % with coordinates, % with capacity, % with year_opened
- Teams: % with valid stadium reference
- Games: % with resolved home/away teams, % with resolved stadium
- Show counts of "unknown" stadiums (stadium_unknown_*)
Report summary:
- Overall health score (0-100 based on error count)
- Quick pass/fail for each category
- Actionable recommendations for common issues
JSON output enhancement:
- Include all metrics in structured format
- Add timestamp and data source versions
- Compatible with future dashboard consumption

Add --list-orphans to menu as option 17. Run python cloudkit_import.py --validate --list-orphans and confirm:

Orphan records listed by type (or "No orphans found")
Completeness metrics shown (% with coordinates, etc.)
Health score calculated
JSON output works with --output flag --list-orphans shows orphans without deletion. Completeness metrics calculated. Health score displayed. JSON export includes all data.

Before declaring phase complete: - [ ] `python cloudkit_import.py --validate` runs without errors - [ ] `python cloudkit_import.py --list-orphans` shows orphan summary - [ ] `python cloudkit_import.py --validate --output report.json` creates valid JSON - [ ] Menu options 16-17 work in interactive mode - [ ] Existing functionality (--diff, --verify, --smart-sync) still works

<success_criteria>

All tasks completed
All verification checks pass
No errors or warnings introduced
Validation report shows meaningful data quality metrics
Phase 6 complete (final phase of milestone) </success_criteria>

After completion, create `.planning/phases/06-validation-reports/06-01-SUMMARY.md` with: - What validation capabilities were added - Sample validation output - Health score calculation method - Any issues encountered

5.3 KiB Raw Blame History

Prior phase context:

Source files:

5.3 KiB

Raw Blame History