5.3 KiB
phase, plan, type
| phase | plan | type |
|---|---|---|
| 06-validation-reports | 01 | execute |
Purpose: Enable quick data quality assessment before/after sync operations to catch relationship integrity issues and data gaps.
Output: --validate command that generates detailed validation report with counts, gaps, orphans, and relationship checks.
<execution_context> ~/.claude/get-shit-done/workflows/execute-phase.md ~/.claude/get-shit-done/templates/summary.md </execution_context>
@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.mdPrior phase context:
@.planning/phases/05-cloudkit-crud/05-01-SUMMARY.md @.planning/phases/05-cloudkit-crud/05-02-SUMMARY.md
Source files:
@Scripts/cloudkit_import.py @Scripts/validate_canonical.py
Tech stack available: Python 3, requests, cloudkit server-to-server auth Established patterns:
- query_all() for CloudKit pagination
- compute_diff() for local vs cloud comparison
- --verify/--verify-deep for sync verification
- validate_canonical.py for local data validation
Constraining decisions:
- Phase 5: Triple lookup fallback (recordName -> deterministic UUID -> canonicalId query)
- Phase 5: Location comparison uses 0.0001 tolerance for lat/lng
-
Local validation - Call existing validate_canonical.py checks:
- Duplicate IDs
- Required fields
- Team → Stadium references
- Game → Team/Stadium references
- Cross-sport references
- Stadium alias references
- Game counts per team
-
CloudKit relationship validation - New checks:
- Games referencing non-existent teams in CloudKit
- Games referencing non-existent stadiums in CloudKit
- Teams referencing non-existent stadiums in CloudKit
- Aliases referencing non-existent stadiums in CloudKit
-
Sync status - Leverage existing compute_diff():
- Count of records only in local (not uploaded)
- Count of records only in CloudKit (orphans)
- Count of records with field differences
-
Output format:
- Print structured report to console
- If
--output FILEprovided, write JSON report
Import validate_canonical functions directly rather than subprocess call. Add to interactive menu as option 16.
Run python cloudkit_import.py --validate --dry-run and confirm:
- Local validation results displayed
- CloudKit relationship checks run (or skip gracefully if no credentials)
- Sync status summary shown
- No errors/exceptions --validate flag works, shows local validation + CloudKit checks + sync status. Menu option 16 available.
-
Orphan listing (non-destructive):
- Add
--list-orphansflag that shows orphan records without deleting - Group by type (Stadium, Team, Game, StadiumAlias, TeamAlias)
- Show first 10 of each type with recordName/canonicalId
- Show total count per type
- Add
-
Data completeness metrics:
- Stadiums: % with coordinates, % with capacity, % with year_opened
- Teams: % with valid stadium reference
- Games: % with resolved home/away teams, % with resolved stadium
- Show counts of "unknown" stadiums (stadium_unknown_*)
-
Report summary:
- Overall health score (0-100 based on error count)
- Quick pass/fail for each category
- Actionable recommendations for common issues
-
JSON output enhancement:
- Include all metrics in structured format
- Add timestamp and data source versions
- Compatible with future dashboard consumption
Add --list-orphans to menu as option 17.
Run python cloudkit_import.py --validate --list-orphans and confirm:
- Orphan records listed by type (or "No orphans found")
- Completeness metrics shown (% with coordinates, etc.)
- Health score calculated
- JSON output works with --output flag --list-orphans shows orphans without deletion. Completeness metrics calculated. Health score displayed. JSON export includes all data.
<success_criteria>
- All tasks completed
- All verification checks pass
- No errors or warnings introduced
- Validation report shows meaningful data quality metrics
- Phase 6 complete (final phase of milestone) </success_criteria>