---
phase: 02-stadium-foundation
plan: 01
type: execute
---
Audit and complete hardcoded stadium data across all 4 sport modules.
Purpose: Ensure all sport modules have complete, accurate stadium data that will flow through the canonicalization pipeline.
Output: All 4 sport modules with complete stadium data (city, state, lat/lng, capacity, year_opened, teams).
~/.claude/get-shit-done/workflows/execute-phase.md
~/.claude/get-shit-done/templates/summary.md
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/01-script-architecture/01-03-SUMMARY.md
**Key files:**
@Scripts/mlb.py
@Scripts/nba.py
@Scripts/nhl.py
@Scripts/nfl.py
**Current state:**
- MLB, NBA, NHL, NFL modules have hardcoded stadium data with city, state, lat/lng, capacity, teams
- Missing field: year_opened (null in all canonical data)
- NFL module created in Phase 1 Plan 03 with 30 hardcoded stadiums
- Bundled stadiums_canonical.json has incomplete data (state="", capacity=0, missing NFL)
**Expected stadium counts:**
- MLB: 30 stadiums (30 teams)
- NBA: 30 stadiums (30 teams)
- NHL: 32 stadiums (32 teams)
- NFL: 30 stadiums (32 teams, 2 shared: SoFi Stadium, MetLife Stadium)
**Stadium data structure:**
Each module has `scrape_{sport}_stadiums_hardcoded()` returning Stadium objects with:
- name, city, state, lat/lng, capacity, teams
- Missing: year_opened for filtering historical/renamed venues
Task 1: Audit stadium data completeness across all 4 sport modules
Scripts/mlb.py, Scripts/nba.py, Scripts/nhl.py, Scripts/nfl.py
1. Read each sport module's hardcoded stadium function
2. Create audit report listing for each sport:
- Stadium count (should match expected)
- Fields present/missing
- Any stadiums with missing lat/lng (should be 0)
- Any stadiums with missing capacity (should be 0)
3. Identify gaps: stadiums missing from lists, incorrect coordinates, missing teams
Do NOT modify any files in this task - audit only. The goal is to understand current state before making changes.
Print audit summary showing stadium counts per sport and any data quality issues found
Audit report shows MLB:30, NBA:30, NHL:32, NFL:30 stadiums with all required fields documented
Task 2: Add year_opened to all hardcoded stadiums
Scripts/mlb.py, Scripts/nba.py, Scripts/nhl.py, Scripts/nfl.py
Add year_opened to each stadium's hardcoded data. Use the actual opening year for each venue:
**MLB stadiums (sample):**
- Fenway Park: 1912
- Wrigley Field: 1914
- Dodger Stadium: 1962
- Globe Life Field: 2020
**NBA arenas (sample):**
- TD Garden: 1995
- Madison Square Garden: 1968
- Chase Center: 2019
- Intuit Dome: 2024
**NHL arenas:** Many share with NBA - verify and match
**NFL stadiums (sample):**
- Lambeau Field: 1957
- SoFi Stadium: 2020
- Allegiant Stadium: 2020
For each module:
1. Update the hardcoded dict to include 'year_opened' key
2. Update Stadium object creation to include year_opened parameter
3. Ensure Stadium dataclass in core.py has year_opened field (verify first)
Research actual opening years from Wikipedia if unsure. Use the original opening year, not renovation years.
Run `python -c "from mlb import scrape_mlb_stadiums; s=scrape_mlb_stadiums(); print(f'MLB: {len(s)} stadiums, year_opened example: {s[0].year_opened if hasattr(s[0], \"year_opened\") else \"MISSING\"}')"` for each sport
All 4 sport modules have year_opened in hardcoded data, Stadium objects include year_opened field
Before declaring plan complete:
- [ ] Audit confirms expected stadium counts: MLB:30, NBA:30, NHL:32, NFL:30
- [ ] All 4 modules have year_opened in hardcoded stadium data
- [ ] No Python syntax errors in any module
- [ ] Stadium dataclass supports year_opened field
- Task 1: Audit complete with documented counts and any gaps identified
- Task 2: year_opened added to all hardcoded stadiums in all 4 modules
- No import errors when loading modules
- Ready for Plan 02 (pipeline regeneration)