docs(06-01): create validation reports phase plan
Phase 6: Validation Reports - 1 plan created - 2 tasks defined - Ready for execution
This commit is contained in:
158
.planning/phases/06-validation-reports/06-01-PLAN.md
Normal file
158
.planning/phases/06-validation-reports/06-01-PLAN.md
Normal file
@@ -0,0 +1,158 @@
|
||||
---
|
||||
phase: 06-validation-reports
|
||||
plan: 01
|
||||
type: execute
|
||||
---
|
||||
|
||||
<objective>
|
||||
Add comprehensive validation reporting to cloudkit_import.py with data quality metrics, orphan detection, and formatted output.
|
||||
|
||||
Purpose: Enable quick data quality assessment before/after sync operations to catch relationship integrity issues and data gaps.
|
||||
Output: `--validate` command that generates detailed validation report with counts, gaps, orphans, and relationship checks.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
~/.claude/get-shit-done/workflows/execute-phase.md
|
||||
~/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
|
||||
# Prior phase context:
|
||||
@.planning/phases/05-cloudkit-crud/05-01-SUMMARY.md
|
||||
@.planning/phases/05-cloudkit-crud/05-02-SUMMARY.md
|
||||
|
||||
# Source files:
|
||||
@Scripts/cloudkit_import.py
|
||||
@Scripts/validate_canonical.py
|
||||
|
||||
**Tech stack available:** Python 3, requests, cloudkit server-to-server auth
|
||||
**Established patterns:**
|
||||
- query_all() for CloudKit pagination
|
||||
- compute_diff() for local vs cloud comparison
|
||||
- --verify/--verify-deep for sync verification
|
||||
- validate_canonical.py for local data validation
|
||||
|
||||
**Constraining decisions:**
|
||||
- Phase 5: Triple lookup fallback (recordName -> deterministic UUID -> canonicalId query)
|
||||
- Phase 5: Location comparison uses 0.0001 tolerance for lat/lng
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Add comprehensive validation command</name>
|
||||
<files>Scripts/cloudkit_import.py</files>
|
||||
<action>
|
||||
Add `--validate` flag and `validate_all()` function that:
|
||||
|
||||
1. **Local validation** - Call existing validate_canonical.py checks:
|
||||
- Duplicate IDs
|
||||
- Required fields
|
||||
- Team → Stadium references
|
||||
- Game → Team/Stadium references
|
||||
- Cross-sport references
|
||||
- Stadium alias references
|
||||
- Game counts per team
|
||||
|
||||
2. **CloudKit relationship validation** - New checks:
|
||||
- Games referencing non-existent teams in CloudKit
|
||||
- Games referencing non-existent stadiums in CloudKit
|
||||
- Teams referencing non-existent stadiums in CloudKit
|
||||
- Aliases referencing non-existent stadiums in CloudKit
|
||||
|
||||
3. **Sync status** - Leverage existing compute_diff():
|
||||
- Count of records only in local (not uploaded)
|
||||
- Count of records only in CloudKit (orphans)
|
||||
- Count of records with field differences
|
||||
|
||||
4. **Output format**:
|
||||
- Print structured report to console
|
||||
- If `--output FILE` provided, write JSON report
|
||||
|
||||
Import validate_canonical functions directly rather than subprocess call. Add to interactive menu as option 16.
|
||||
</action>
|
||||
<verify>
|
||||
Run `python cloudkit_import.py --validate --dry-run` and confirm:
|
||||
- Local validation results displayed
|
||||
- CloudKit relationship checks run (or skip gracefully if no credentials)
|
||||
- Sync status summary shown
|
||||
- No errors/exceptions
|
||||
</verify>
|
||||
<done>
|
||||
--validate flag works, shows local validation + CloudKit checks + sync status. Menu option 16 available.
|
||||
</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Add orphan listing and completeness metrics</name>
|
||||
<files>Scripts/cloudkit_import.py</files>
|
||||
<action>
|
||||
Enhance validation report with:
|
||||
|
||||
1. **Orphan listing** (non-destructive):
|
||||
- Add `--list-orphans` flag that shows orphan records without deleting
|
||||
- Group by type (Stadium, Team, Game, StadiumAlias, TeamAlias)
|
||||
- Show first 10 of each type with recordName/canonicalId
|
||||
- Show total count per type
|
||||
|
||||
2. **Data completeness metrics**:
|
||||
- Stadiums: % with coordinates, % with capacity, % with year_opened
|
||||
- Teams: % with valid stadium reference
|
||||
- Games: % with resolved home/away teams, % with resolved stadium
|
||||
- Show counts of "unknown" stadiums (stadium_unknown_*)
|
||||
|
||||
3. **Report summary**:
|
||||
- Overall health score (0-100 based on error count)
|
||||
- Quick pass/fail for each category
|
||||
- Actionable recommendations for common issues
|
||||
|
||||
4. **JSON output enhancement**:
|
||||
- Include all metrics in structured format
|
||||
- Add timestamp and data source versions
|
||||
- Compatible with future dashboard consumption
|
||||
|
||||
Add `--list-orphans` to menu as option 17.
|
||||
</action>
|
||||
<verify>
|
||||
Run `python cloudkit_import.py --validate --list-orphans` and confirm:
|
||||
- Orphan records listed by type (or "No orphans found")
|
||||
- Completeness metrics shown (% with coordinates, etc.)
|
||||
- Health score calculated
|
||||
- JSON output works with --output flag
|
||||
</verify>
|
||||
<done>
|
||||
--list-orphans shows orphans without deletion. Completeness metrics calculated. Health score displayed. JSON export includes all data.
|
||||
</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
Before declaring phase complete:
|
||||
- [ ] `python cloudkit_import.py --validate` runs without errors
|
||||
- [ ] `python cloudkit_import.py --list-orphans` shows orphan summary
|
||||
- [ ] `python cloudkit_import.py --validate --output report.json` creates valid JSON
|
||||
- [ ] Menu options 16-17 work in interactive mode
|
||||
- [ ] Existing functionality (--diff, --verify, --smart-sync) still works
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
|
||||
- All tasks completed
|
||||
- All verification checks pass
|
||||
- No errors or warnings introduced
|
||||
- Validation report shows meaningful data quality metrics
|
||||
- Phase 6 complete (final phase of milestone)
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/06-validation-reports/06-01-SUMMARY.md` with:
|
||||
- What validation capabilities were added
|
||||
- Sample validation output
|
||||
- Health score calculation method
|
||||
- Any issues encountered
|
||||
</output>
|
||||
Reference in New Issue
Block a user