From 4266940c8f562e71d329fdf72790e91623cad31f Mon Sep 17 00:00:00 2001 From: Trey t Date: Sat, 10 Jan 2026 10:24:59 -0600 Subject: [PATCH] docs(06-01): create validation reports phase plan Phase 6: Validation Reports - 1 plan created - 2 tasks defined - Ready for execution --- .../06-validation-reports/06-01-PLAN.md | 158 ++++++++++++++++++ 1 file changed, 158 insertions(+) create mode 100644 .planning/phases/06-validation-reports/06-01-PLAN.md diff --git a/.planning/phases/06-validation-reports/06-01-PLAN.md b/.planning/phases/06-validation-reports/06-01-PLAN.md new file mode 100644 index 0000000..852ac91 --- /dev/null +++ b/.planning/phases/06-validation-reports/06-01-PLAN.md @@ -0,0 +1,158 @@ +--- +phase: 06-validation-reports +plan: 01 +type: execute +--- + + +Add comprehensive validation reporting to cloudkit_import.py with data quality metrics, orphan detection, and formatted output. + +Purpose: Enable quick data quality assessment before/after sync operations to catch relationship integrity issues and data gaps. +Output: `--validate` command that generates detailed validation report with counts, gaps, orphans, and relationship checks. + + + +~/.claude/get-shit-done/workflows/execute-phase.md +~/.claude/get-shit-done/templates/summary.md + + + +@.planning/PROJECT.md +@.planning/ROADMAP.md +@.planning/STATE.md + +# Prior phase context: +@.planning/phases/05-cloudkit-crud/05-01-SUMMARY.md +@.planning/phases/05-cloudkit-crud/05-02-SUMMARY.md + +# Source files: +@Scripts/cloudkit_import.py +@Scripts/validate_canonical.py + +**Tech stack available:** Python 3, requests, cloudkit server-to-server auth +**Established patterns:** +- query_all() for CloudKit pagination +- compute_diff() for local vs cloud comparison +- --verify/--verify-deep for sync verification +- validate_canonical.py for local data validation + +**Constraining decisions:** +- Phase 5: Triple lookup fallback (recordName -> deterministic UUID -> canonicalId query) +- Phase 5: Location comparison uses 0.0001 tolerance for lat/lng + + + + + + Task 1: Add comprehensive validation command + Scripts/cloudkit_import.py + +Add `--validate` flag and `validate_all()` function that: + +1. **Local validation** - Call existing validate_canonical.py checks: + - Duplicate IDs + - Required fields + - Team → Stadium references + - Game → Team/Stadium references + - Cross-sport references + - Stadium alias references + - Game counts per team + +2. **CloudKit relationship validation** - New checks: + - Games referencing non-existent teams in CloudKit + - Games referencing non-existent stadiums in CloudKit + - Teams referencing non-existent stadiums in CloudKit + - Aliases referencing non-existent stadiums in CloudKit + +3. **Sync status** - Leverage existing compute_diff(): + - Count of records only in local (not uploaded) + - Count of records only in CloudKit (orphans) + - Count of records with field differences + +4. **Output format**: + - Print structured report to console + - If `--output FILE` provided, write JSON report + +Import validate_canonical functions directly rather than subprocess call. Add to interactive menu as option 16. + + +Run `python cloudkit_import.py --validate --dry-run` and confirm: +- Local validation results displayed +- CloudKit relationship checks run (or skip gracefully if no credentials) +- Sync status summary shown +- No errors/exceptions + + +--validate flag works, shows local validation + CloudKit checks + sync status. Menu option 16 available. + + + + + Task 2: Add orphan listing and completeness metrics + Scripts/cloudkit_import.py + +Enhance validation report with: + +1. **Orphan listing** (non-destructive): + - Add `--list-orphans` flag that shows orphan records without deleting + - Group by type (Stadium, Team, Game, StadiumAlias, TeamAlias) + - Show first 10 of each type with recordName/canonicalId + - Show total count per type + +2. **Data completeness metrics**: + - Stadiums: % with coordinates, % with capacity, % with year_opened + - Teams: % with valid stadium reference + - Games: % with resolved home/away teams, % with resolved stadium + - Show counts of "unknown" stadiums (stadium_unknown_*) + +3. **Report summary**: + - Overall health score (0-100 based on error count) + - Quick pass/fail for each category + - Actionable recommendations for common issues + +4. **JSON output enhancement**: + - Include all metrics in structured format + - Add timestamp and data source versions + - Compatible with future dashboard consumption + +Add `--list-orphans` to menu as option 17. + + +Run `python cloudkit_import.py --validate --list-orphans` and confirm: +- Orphan records listed by type (or "No orphans found") +- Completeness metrics shown (% with coordinates, etc.) +- Health score calculated +- JSON output works with --output flag + + +--list-orphans shows orphans without deletion. Completeness metrics calculated. Health score displayed. JSON export includes all data. + + + + + + +Before declaring phase complete: +- [ ] `python cloudkit_import.py --validate` runs without errors +- [ ] `python cloudkit_import.py --list-orphans` shows orphan summary +- [ ] `python cloudkit_import.py --validate --output report.json` creates valid JSON +- [ ] Menu options 16-17 work in interactive mode +- [ ] Existing functionality (--diff, --verify, --smart-sync) still works + + + + +- All tasks completed +- All verification checks pass +- No errors or warnings introduced +- Validation report shows meaningful data quality metrics +- Phase 6 complete (final phase of milestone) + + + +After completion, create `.planning/phases/06-validation-reports/06-01-SUMMARY.md` with: +- What validation capabilities were added +- Sample validation output +- Health score calculation method +- Any issues encountered +