docs(06-01): create validation reports phase plan

Phase 6: Validation Reports
- 1 plan created
- 2 tasks defined
- Ready for execution
This commit is contained in:
Trey t
2026-01-10 10:24:59 -06:00
parent ad7a396704
commit 4266940c8f

View File

@@ -0,0 +1,158 @@
---
phase: 06-validation-reports
plan: 01
type: execute
---
<objective>
Add comprehensive validation reporting to cloudkit_import.py with data quality metrics, orphan detection, and formatted output.
Purpose: Enable quick data quality assessment before/after sync operations to catch relationship integrity issues and data gaps.
Output: `--validate` command that generates detailed validation report with counts, gaps, orphans, and relationship checks.
</objective>
<execution_context>
~/.claude/get-shit-done/workflows/execute-phase.md
~/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
# Prior phase context:
@.planning/phases/05-cloudkit-crud/05-01-SUMMARY.md
@.planning/phases/05-cloudkit-crud/05-02-SUMMARY.md
# Source files:
@Scripts/cloudkit_import.py
@Scripts/validate_canonical.py
**Tech stack available:** Python 3, requests, cloudkit server-to-server auth
**Established patterns:**
- query_all() for CloudKit pagination
- compute_diff() for local vs cloud comparison
- --verify/--verify-deep for sync verification
- validate_canonical.py for local data validation
**Constraining decisions:**
- Phase 5: Triple lookup fallback (recordName -> deterministic UUID -> canonicalId query)
- Phase 5: Location comparison uses 0.0001 tolerance for lat/lng
</context>
<tasks>
<task type="auto">
<name>Task 1: Add comprehensive validation command</name>
<files>Scripts/cloudkit_import.py</files>
<action>
Add `--validate` flag and `validate_all()` function that:
1. **Local validation** - Call existing validate_canonical.py checks:
- Duplicate IDs
- Required fields
- Team → Stadium references
- Game → Team/Stadium references
- Cross-sport references
- Stadium alias references
- Game counts per team
2. **CloudKit relationship validation** - New checks:
- Games referencing non-existent teams in CloudKit
- Games referencing non-existent stadiums in CloudKit
- Teams referencing non-existent stadiums in CloudKit
- Aliases referencing non-existent stadiums in CloudKit
3. **Sync status** - Leverage existing compute_diff():
- Count of records only in local (not uploaded)
- Count of records only in CloudKit (orphans)
- Count of records with field differences
4. **Output format**:
- Print structured report to console
- If `--output FILE` provided, write JSON report
Import validate_canonical functions directly rather than subprocess call. Add to interactive menu as option 16.
</action>
<verify>
Run `python cloudkit_import.py --validate --dry-run` and confirm:
- Local validation results displayed
- CloudKit relationship checks run (or skip gracefully if no credentials)
- Sync status summary shown
- No errors/exceptions
</verify>
<done>
--validate flag works, shows local validation + CloudKit checks + sync status. Menu option 16 available.
</done>
</task>
<task type="auto">
<name>Task 2: Add orphan listing and completeness metrics</name>
<files>Scripts/cloudkit_import.py</files>
<action>
Enhance validation report with:
1. **Orphan listing** (non-destructive):
- Add `--list-orphans` flag that shows orphan records without deleting
- Group by type (Stadium, Team, Game, StadiumAlias, TeamAlias)
- Show first 10 of each type with recordName/canonicalId
- Show total count per type
2. **Data completeness metrics**:
- Stadiums: % with coordinates, % with capacity, % with year_opened
- Teams: % with valid stadium reference
- Games: % with resolved home/away teams, % with resolved stadium
- Show counts of "unknown" stadiums (stadium_unknown_*)
3. **Report summary**:
- Overall health score (0-100 based on error count)
- Quick pass/fail for each category
- Actionable recommendations for common issues
4. **JSON output enhancement**:
- Include all metrics in structured format
- Add timestamp and data source versions
- Compatible with future dashboard consumption
Add `--list-orphans` to menu as option 17.
</action>
<verify>
Run `python cloudkit_import.py --validate --list-orphans` and confirm:
- Orphan records listed by type (or "No orphans found")
- Completeness metrics shown (% with coordinates, etc.)
- Health score calculated
- JSON output works with --output flag
</verify>
<done>
--list-orphans shows orphans without deletion. Completeness metrics calculated. Health score displayed. JSON export includes all data.
</done>
</task>
</tasks>
<verification>
Before declaring phase complete:
- [ ] `python cloudkit_import.py --validate` runs without errors
- [ ] `python cloudkit_import.py --list-orphans` shows orphan summary
- [ ] `python cloudkit_import.py --validate --output report.json` creates valid JSON
- [ ] Menu options 16-17 work in interactive mode
- [ ] Existing functionality (--diff, --verify, --smart-sync) still works
</verification>
<success_criteria>
- All tasks completed
- All verification checks pass
- No errors or warnings introduced
- Validation report shows meaningful data quality metrics
- Phase 6 complete (final phase of milestone)
</success_criteria>
<output>
After completion, create `.planning/phases/06-validation-reports/06-01-SUMMARY.md` with:
- What validation capabilities were added
- Sample validation output
- Health score calculation method
- Any issues encountered
</output>