Files

Trey t baa3dfef0b docs(06-01): complete validation reports plan

Add SUMMARY.md documenting validation capabilities:
- --validate flag with local/CloudKit/sync validation
- --list-orphans flag with completeness metrics and health score
- Menu options 16-17 for interactive mode

Update STATE.md: Phase 6 complete (14/14 plans, 100%)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-10 10:35:49 -06:00

5.2 KiB

Raw Blame History

06-01 Summary: Validation Reports

What Was Done

Task 1: Comprehensive Validation Command (`--validate`)

Added validate_all() function and --validate flag that performs:

Local Data Validation - Uses existing validate_canonical.py functions:
- Duplicate ID detection
- Required field validation
- Team → Stadium reference validation
- Game → Team/Stadium reference validation
- Cross-sport reference checks
- Stadium alias reference validation
CloudKit Relationship Validation (when connected):
- Games referencing non-existent teams in CloudKit
- Games referencing non-existent stadiums in CloudKit
- Teams referencing non-existent stadiums in CloudKit
- Aliases referencing non-existent stadiums in CloudKit
Sync Status - Leverages existing compute_diff():
- Records only in local (not uploaded)
- Records only in CloudKit (orphans)
- Records in both
Output:
- Structured console report
- JSON export via --output FILE
- Menu option 16 for interactive mode

Task 2: Orphan Listing and Completeness Metrics (`--list-orphans`)

Added list_orphans() function and --list-orphans flag that shows:

Orphan Listing (non-destructive):
- Groups orphans by type (Stadium, Team, Game, StadiumAlias, TeamAlias)
- Shows first 10 of each type with canonicalId/name
- Shows total count per type
Data Completeness Metrics:
- Stadiums: % with coordinates, % with capacity, % with year_opened, count of unknown stadiums
- Teams: % with valid stadium reference
- Games: % with resolved home/away teams, % with resolved stadium
Health Score (0-100):
- Base: Average completeness across key metrics
- Penalty: -2 points per orphan (max -30)
- Penalty: -1 per unknown stadium (max -10)
- Status: EXCELLENT (≥90), GOOD (≥70), FAIR (≥50), NEEDS ATTENTION (<50)
Actionable Recommendations:
- Suggests deleting orphans with --smart-sync --delete-orphans
- Identifies missing coordinates/capacity
- Flags unresolved references

Sample Validation Output

============================================================
Comprehensive Data Validation Report
============================================================

Local data loaded:
  Stadiums: 178
  Teams: 180
  Games: 5760
  Stadium aliases: 259
  Team aliases: 76

------------------------------------------------------------
SECTION 1: Local Data Validation
------------------------------------------------------------
Running validation checks...

  ✓ Local data VALID
  Errors: 0
  Warnings: 0

------------------------------------------------------------
SECTION 2: CloudKit Relationship Validation
------------------------------------------------------------
  [CloudKit checks when connected]

------------------------------------------------------------
SECTION 3: Sync Status
------------------------------------------------------------
  [Comparison when connected]

============================================================
VALIDATION SUMMARY
============================================================

  Local validation: ✓ PASSED
  CloudKit references: ✓ PASSED (or N/A if not connected)

Sample Orphan Report Output

============================================================
Orphan Records & Data Quality Report
============================================================

------------------------------------------------------------
SECTION 1: Orphan Records (in CloudKit but not in local data)
------------------------------------------------------------

  Stadium: 0 orphan(s)
  Team: 0 orphan(s)
  Game: 0 orphan(s)
  ...

  ✓ No orphan records found

------------------------------------------------------------
SECTION 2: Data Completeness Metrics
------------------------------------------------------------

  Stadiums (178 total):
    With coordinates: 178 (100.0%)
    With capacity: 175 (98.3%)
    With year_opened: 170 (95.5%)
    Unknown stadiums: 0

  Teams (180 total):
    With valid stadium ref: 180 (100.0%)

  Games (5760 total):
    With resolved home team: 5760 (100.0%)
    With resolved away team: 5760 (100.0%)
    With resolved stadium: 5760 (100.0%)

------------------------------------------------------------
SECTION 3: Health Score
------------------------------------------------------------

  Health Score: 98.9/100 ✓ EXCELLENT

  Score breakdown:
    Base completeness: 98.9
    Orphan penalty: -0
    Unknown stadium penalty: -0

  ✓ No recommendations - data is in great shape!

Health Score Calculation

health_score = avg_completeness - orphan_penalty - unknown_penalty

Where:
- avg_completeness = average of:
  - stadium coordinates %
  - stadium capacity %
  - team stadium ref %
  - game home team %
  - game away team %
  - game stadium %

- orphan_penalty = min(30, total_orphans * 2)
- unknown_penalty = min(10, unknown_stadiums)

Final score clamped to [0, 100]

Option 16: Validate data (local + CloudKit) → --validate
Option 17: List orphan records → --list-orphans

Issues Encountered

None. Implementation was straightforward, leveraging existing patterns from validate_canonical.py and the CloudKit sync functions.

Duration

~12 minutes

5.2 KiB Raw Blame History