docs(04-01): create canonical linking plan
Phase 4: Canonical Linking - 1 plan created - 3 tasks defined (game canonicalization, validation, fix issues) - Ready for execution
This commit is contained in:
209
.planning/phases/04-canonical-linking/04-01-PLAN.md
Normal file
209
.planning/phases/04-canonical-linking/04-01-PLAN.md
Normal file
@@ -0,0 +1,209 @@
|
|||||||
|
---
|
||||||
|
phase: 04-canonical-linking
|
||||||
|
plan: 01
|
||||||
|
type: execute
|
||||||
|
---
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Generate canonical games with correct team and stadium links for all 7 sports.
|
||||||
|
|
||||||
|
Purpose: Complete the data pipeline by resolving raw game data to canonical team/stadium IDs, enabling the iOS app to correctly display game→team→stadium relationships.
|
||||||
|
Output: `games_canonical.json` with all games linked to canonical teams and stadiums, validated and ready for CloudKit upload.
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<execution_context>
|
||||||
|
~/.claude/get-shit-done/workflows/execute-phase.md
|
||||||
|
~/.claude/get-shit-done/templates/summary.md
|
||||||
|
</execution_context>
|
||||||
|
|
||||||
|
<context>
|
||||||
|
@.planning/PROJECT.md
|
||||||
|
@.planning/ROADMAP.md
|
||||||
|
@.planning/STATE.md
|
||||||
|
|
||||||
|
# Prior phase summaries (dependency graph):
|
||||||
|
@.planning/phases/03-alias-systems/03-01-SUMMARY.md
|
||||||
|
@.planning/phases/03-alias-systems/03-02-SUMMARY.md
|
||||||
|
|
||||||
|
# Key files:
|
||||||
|
@Scripts/canonicalize_games.py
|
||||||
|
@Scripts/validate_canonical.py
|
||||||
|
|
||||||
|
**Tech stack available:** Python canonicalization pipeline, team/stadium aliases
|
||||||
|
**Established patterns:** 3-stage canonicalization (stadiums → teams → games), sport-scoped resolution
|
||||||
|
**Constraining decisions:**
|
||||||
|
- Phase 03-01: Team abbreviation aliases handle relocations and data source variations
|
||||||
|
- Phase 03-02: All 7 sports (NBA, MLB, NHL, NFL, MLS, WNBA, NWSL) canonicalized with 180 total teams
|
||||||
|
|
||||||
|
**Current state:**
|
||||||
|
- `games.json`: 2.2MB raw game data
|
||||||
|
- `games_canonical.json`: Empty `[]` - needs to be generated
|
||||||
|
- `teams_canonical.json`: 180 teams across 7 sports
|
||||||
|
- `stadiums_canonical.json`: Complete stadium data
|
||||||
|
- `stadium_aliases.json`: Historical name aliases
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<tasks>
|
||||||
|
|
||||||
|
<task type="auto">
|
||||||
|
<name>Task 1: Run game canonicalization pipeline</name>
|
||||||
|
<files>Scripts/data/games_canonical.json, Scripts/data/game_resolution_warnings.json</files>
|
||||||
|
<action>
|
||||||
|
Run the game canonicalization to generate canonical games:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd Scripts && python canonicalize_games.py --games data/games.json --teams data/teams_canonical.json --aliases data/stadium_aliases.json --output data/ --verbose
|
||||||
|
```
|
||||||
|
|
||||||
|
This will:
|
||||||
|
1. Load raw games from games.json
|
||||||
|
2. Resolve team abbreviations to canonical IDs using TEAM_ABBREV_ALIASES
|
||||||
|
3. Resolve venues to stadium canonical IDs (preferring home team stadium)
|
||||||
|
4. Generate canonical game IDs with doubleheader handling
|
||||||
|
5. Output games_canonical.json and any warnings
|
||||||
|
|
||||||
|
Expected output: ~10,000+ canonical games across all sports with home_team_canonical_id, away_team_canonical_id, and stadium_canonical_id populated.
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
- games_canonical.json exists and is non-empty
|
||||||
|
- File size > 1MB (indicates substantial data)
|
||||||
|
- Sample game has all three canonical ID fields populated
|
||||||
|
</verify>
|
||||||
|
<done>games_canonical.json generated with canonical team/stadium links for all games</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
<task type="auto">
|
||||||
|
<name>Task 2: Validate canonical links</name>
|
||||||
|
<files>Scripts/data/canonicalization_validation.json</files>
|
||||||
|
<action>
|
||||||
|
Run validation to ensure all game→team→stadium references resolve:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd Scripts && python validate_canonical.py --data-dir data/ --verbose
|
||||||
|
```
|
||||||
|
|
||||||
|
Check validation output for:
|
||||||
|
1. **ERROR-level issues**: Must be zero (blocks CloudKit upload)
|
||||||
|
2. **Unknown teams**: Any team_canonical_id not found in teams_canonical.json
|
||||||
|
3. **Unknown stadiums**: Any stadium_canonical_id starting with "stadium_unknown"
|
||||||
|
4. **Game count warnings**: Teams with unusual game counts per EXPECTED_GAMES config
|
||||||
|
|
||||||
|
If validation passes with no errors, the linking is complete.
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
- validate_canonical.py exits with code 0
|
||||||
|
- No ERROR-level issues reported
|
||||||
|
- All teams and stadiums resolve to known entities
|
||||||
|
</verify>
|
||||||
|
<done>All canonical games validated - no broken team or stadium links</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
<task type="auto">
|
||||||
|
<name>Task 3: Fix resolution issues (if any)</name>
|
||||||
|
<files>Scripts/canonicalize_games.py, Scripts/canonicalize_stadiums.py</files>
|
||||||
|
<action>
|
||||||
|
Review game_resolution_warnings.json and fix any issues:
|
||||||
|
|
||||||
|
**If "Unknown home/away team" warnings:**
|
||||||
|
- Add missing team abbreviation alias to TEAM_ABBREV_ALIASES in canonicalize_games.py
|
||||||
|
- Format: `('SPORT', 'ABBREV'): 'team_sport_canonical',`
|
||||||
|
|
||||||
|
**If "Unknown stadium" warnings:**
|
||||||
|
- Check if venue name needs alias in HISTORICAL_STADIUM_ALIASES in canonicalize_stadiums.py
|
||||||
|
- Or verify home team has correct stadium_canonical_id in sport module
|
||||||
|
|
||||||
|
**After fixes:**
|
||||||
|
1. Re-run canonicalization: `python canonicalize_games.py --verbose`
|
||||||
|
2. Re-run validation: `python validate_canonical.py --verbose`
|
||||||
|
|
||||||
|
If no warnings exist, mark this task as complete with "No resolution issues found."
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
- game_resolution_warnings.json is empty or contains only acceptable warnings
|
||||||
|
- Re-run canonicalization produces no new warnings
|
||||||
|
</verify>
|
||||||
|
<done>All resolution issues fixed, or no issues found</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
</tasks>
|
||||||
|
|
||||||
|
<verification>
|
||||||
|
Before declaring phase complete:
|
||||||
|
- [ ] `games_canonical.json` exists with >1MB of data
|
||||||
|
- [ ] All games have valid `home_team_canonical_id` (no "team_unknown_*")
|
||||||
|
- [ ] All games have valid `away_team_canonical_id` (no "team_unknown_*")
|
||||||
|
- [ ] All games have valid `stadium_canonical_id` (no "stadium_unknown_*")
|
||||||
|
- [ ] `validate_canonical.py` passes with 0 errors
|
||||||
|
- [ ] Game counts per team within expected ranges
|
||||||
|
</verification>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
|
||||||
|
- All tasks completed
|
||||||
|
- All verification checks pass
|
||||||
|
- games_canonical.json ready for CloudKit upload
|
||||||
|
- No broken team or stadium links in any game
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<output>
|
||||||
|
After completion, create `.planning/phases/04-canonical-linking/04-01-SUMMARY.md` with:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
---
|
||||||
|
phase: 04-canonical-linking
|
||||||
|
plan: 01
|
||||||
|
subsystem: data-pipeline
|
||||||
|
tags: [games, canonicalization, linking, validation]
|
||||||
|
|
||||||
|
# Dependency graph
|
||||||
|
requires:
|
||||||
|
- phase: 03-alias-systems
|
||||||
|
provides: Team/stadium aliases for all 7 sports
|
||||||
|
provides:
|
||||||
|
- Canonical games with resolved team/stadium links
|
||||||
|
- Validated game→team→stadium relationships
|
||||||
|
affects: [05-cloudkit-crud, ios-app-data]
|
||||||
|
|
||||||
|
# Tech tracking
|
||||||
|
tech-stack:
|
||||||
|
added: []
|
||||||
|
patterns: [game-canonicalization, link-validation]
|
||||||
|
|
||||||
|
key-files:
|
||||||
|
created:
|
||||||
|
- Scripts/data/games_canonical.json
|
||||||
|
modified: []
|
||||||
|
|
||||||
|
key-decisions:
|
||||||
|
- [Any decisions made during execution]
|
||||||
|
|
||||||
|
patterns-established:
|
||||||
|
- [Any new patterns]
|
||||||
|
|
||||||
|
issues-created: []
|
||||||
|
|
||||||
|
# Metrics
|
||||||
|
duration: X min
|
||||||
|
completed: YYYY-MM-DD
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 4 Plan 01: Canonical Linking Summary
|
||||||
|
|
||||||
|
**[One-liner: X games canonicalized with Y% resolution rate]**
|
||||||
|
|
||||||
|
## Accomplishments
|
||||||
|
- [Key outcomes]
|
||||||
|
|
||||||
|
## Files Created/Modified
|
||||||
|
- [List with descriptions]
|
||||||
|
|
||||||
|
## Decisions Made
|
||||||
|
- [Or "None"]
|
||||||
|
|
||||||
|
## Issues Encountered
|
||||||
|
- [Or "None"]
|
||||||
|
|
||||||
|
## Next Phase Readiness
|
||||||
|
- Ready for Phase 5: CloudKit CRUD
|
||||||
|
```
|
||||||
|
</output>
|
||||||
Reference in New Issue
Block a user