Phase 4: Canonical Linking - 1 plan created - 3 tasks defined (game canonicalization, validation, fix issues) - Ready for execution
6.5 KiB
phase, plan, type
| phase | plan | type |
|---|---|---|
| 04-canonical-linking | 01 | execute |
Purpose: Complete the data pipeline by resolving raw game data to canonical team/stadium IDs, enabling the iOS app to correctly display game→team→stadium relationships.
Output: games_canonical.json with all games linked to canonical teams and stadiums, validated and ready for CloudKit upload.
<execution_context> ~/.claude/get-shit-done/workflows/execute-phase.md ~/.claude/get-shit-done/templates/summary.md </execution_context>
@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.mdPrior phase summaries (dependency graph):
@.planning/phases/03-alias-systems/03-01-SUMMARY.md @.planning/phases/03-alias-systems/03-02-SUMMARY.md
Key files:
@Scripts/canonicalize_games.py @Scripts/validate_canonical.py
Tech stack available: Python canonicalization pipeline, team/stadium aliases Established patterns: 3-stage canonicalization (stadiums → teams → games), sport-scoped resolution Constraining decisions:
- Phase 03-01: Team abbreviation aliases handle relocations and data source variations
- Phase 03-02: All 7 sports (NBA, MLB, NHL, NFL, MLS, WNBA, NWSL) canonicalized with 180 total teams
Current state:
games.json: 2.2MB raw game datagames_canonical.json: Empty[]- needs to be generatedteams_canonical.json: 180 teams across 7 sportsstadiums_canonical.json: Complete stadium datastadium_aliases.json: Historical name aliases
cd Scripts && python canonicalize_games.py --games data/games.json --teams data/teams_canonical.json --aliases data/stadium_aliases.json --output data/ --verbose
This will:
- Load raw games from games.json
- Resolve team abbreviations to canonical IDs using TEAM_ABBREV_ALIASES
- Resolve venues to stadium canonical IDs (preferring home team stadium)
- Generate canonical game IDs with doubleheader handling
- Output games_canonical.json and any warnings
Expected output: ~10,000+ canonical games across all sports with home_team_canonical_id, away_team_canonical_id, and stadium_canonical_id populated.
- games_canonical.json exists and is non-empty
- File size > 1MB (indicates substantial data)
- Sample game has all three canonical ID fields populated games_canonical.json generated with canonical team/stadium links for all games
cd Scripts && python validate_canonical.py --data-dir data/ --verbose
Check validation output for:
- ERROR-level issues: Must be zero (blocks CloudKit upload)
- Unknown teams: Any team_canonical_id not found in teams_canonical.json
- Unknown stadiums: Any stadium_canonical_id starting with "stadium_unknown"
- Game count warnings: Teams with unusual game counts per EXPECTED_GAMES config
If validation passes with no errors, the linking is complete.
- validate_canonical.py exits with code 0
- No ERROR-level issues reported
- All teams and stadiums resolve to known entities All canonical games validated - no broken team or stadium links
If "Unknown home/away team" warnings:
- Add missing team abbreviation alias to TEAM_ABBREV_ALIASES in canonicalize_games.py
- Format:
('SPORT', 'ABBREV'): 'team_sport_canonical',
If "Unknown stadium" warnings:
- Check if venue name needs alias in HISTORICAL_STADIUM_ALIASES in canonicalize_stadiums.py
- Or verify home team has correct stadium_canonical_id in sport module
After fixes:
- Re-run canonicalization:
python canonicalize_games.py --verbose - Re-run validation:
python validate_canonical.py --verbose
If no warnings exist, mark this task as complete with "No resolution issues found."
- game_resolution_warnings.json is empty or contains only acceptable warnings
- Re-run canonicalization produces no new warnings All resolution issues fixed, or no issues found
<success_criteria>
- All tasks completed
- All verification checks pass
- games_canonical.json ready for CloudKit upload
- No broken team or stadium links in any game </success_criteria>
---
phase: 04-canonical-linking
plan: 01
subsystem: data-pipeline
tags: [games, canonicalization, linking, validation]
# Dependency graph
requires:
- phase: 03-alias-systems
provides: Team/stadium aliases for all 7 sports
provides:
- Canonical games with resolved team/stadium links
- Validated game→team→stadium relationships
affects: [05-cloudkit-crud, ios-app-data]
# Tech tracking
tech-stack:
added: []
patterns: [game-canonicalization, link-validation]
key-files:
created:
- Scripts/data/games_canonical.json
modified: []
key-decisions:
- [Any decisions made during execution]
patterns-established:
- [Any new patterns]
issues-created: []
# Metrics
duration: X min
completed: YYYY-MM-DD
---
# Phase 4 Plan 01: Canonical Linking Summary
**[One-liner: X games canonicalized with Y% resolution rate]**
## Accomplishments
- [Key outcomes]
## Files Created/Modified
- [List with descriptions]
## Decisions Made
- [Or "None"]
## Issues Encountered
- [Or "None"]
## Next Phase Readiness
- Ready for Phase 5: CloudKit CRUD