Sportstime/.planning/phases/02-stadium-foundation/02-02-PLAN.md

---
phase: 02-stadium-foundation
plan: 02
type: execute
---

<objective>
Regenerate canonical stadium data and verify the complete pipeline flow.

Purpose: Ensure hardcoded stadium data flows correctly through canonicalization to bundled JSON.
Output: Updated bundled stadiums_canonical.json with complete data for all 4 sports.
</objective>

<execution_context>
~/.claude/get-shit-done/workflows/execute-phase.md
~/.claude/get-shit-done/templates/summary.md
~/.claude/get-shit-done/references/checkpoints.md
</execution_context>

<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/02-stadium-foundation/02-01-SUMMARY.md

**Key files:**
@Scripts/scrape_schedules.py
@Scripts/run_canonicalization_pipeline.py
@Scripts/canonicalize_stadiums.py

**Pipeline flow:**
1. `scrape_schedules.py --stadiums-update` calls `scrape_all_stadiums()` → `data/stadiums.json`
2. `run_canonicalization_pipeline.py` reads stadiums.json → canonicalizes → `data/stadiums_canonical.json`
3. Copy to `SportsTime/Resources/stadiums_canonical.json`

**Expected output:**
- stadiums_canonical.json with all fields populated: canonical_id, name, city, state, latitude, longitude, capacity, sport, primary_team_abbrevs, year_opened
- stadium_aliases.json with historical name mappings
- Stadium counts: MLB:30, NBA:30, NHL:32, NFL:30 = 122 core stadiums

**Pre-requisites:**
- Plan 02-01 complete (year_opened added to all modules)
</context>

<tasks>

<task type="auto">
  <name>Task 1: Run stadium scraping and canonicalization pipeline</name>
  <files>data/stadiums.json, data/stadiums_canonical.json, data/stadium_aliases.json</files>
  <action>
1. Navigate to Scripts directory
2. Run: `python scrape_schedules.py --stadiums-update`
   - This calls scrape_all_stadiums() which invokes each sport module's scraper
   - Output: data/stadiums.json with raw stadium data
3. Run: `python run_canonicalization_pipeline.py --verbose`
   - Or run canonicalize_stadiums.py directly if pipeline is complex
4. Verify output files exist in data/ directory

If errors occur, debug and fix before proceeding. Common issues:
- Import errors: Check module paths and __init__.py
- Missing fields: Verify Stadium dataclass in core.py
  </action>
  <verify>ls -la data/stadiums*.json && cat data/stadiums_canonical.json | python3 -c "import json,sys; d=json.load(sys.stdin); print(f'Total: {len(d)}'); sports={}; [sports.__setitem__(s['sport'], sports.get(s['sport'],0)+1) for s in d]; print(sports)"</verify>
  <done>stadiums_canonical.json exists with MLB:30, NBA:30, NHL:32, NFL:30 stadiums</done>
</task>

<task type="auto">
  <name>Task 2: Copy canonical data to bundled resources and verify completeness</name>
  <files>SportsTime/Resources/stadiums_canonical.json, SportsTime/Resources/stadium_aliases.json</files>
  <action>
1. Copy generated canonical files to app bundle:
   - cp data/stadiums_canonical.json SportsTime/Resources/stadiums_canonical.json
   - cp data/stadium_aliases.json SportsTime/Resources/stadium_aliases.json

2. Verify data completeness by checking sample records:
   - All state fields populated (not empty string)
   - All capacity fields > 0
   - All year_opened fields not null
   - All lat/lng reasonable (US coordinates: lat 24-49, lng -125 to -66)

3. If any fields empty, trace back to source:
   - Check raw stadiums.json has the field
   - Check canonicalize_stadiums.py preserves the field
   - Fix the break in the chain
  </action>
  <verify>cat SportsTime/Resources/stadiums_canonical.json | python3 -c "import json,sys; d=json.load(sys.stdin); empty_state=sum(1 for s in d if not s.get('state')); zero_cap=sum(1 for s in d if not s.get('capacity')); null_year=sum(1 for s in d if s.get('year_opened') is None); print(f'Empty state: {empty_state}, Zero capacity: {zero_cap}, Null year: {null_year}')"</verify>
  <done>Bundled JSON has 0 empty states, 0 zero capacities, 0 null year_opened values</done>
</task>

<task type="checkpoint:human-verify" gate="blocking">
  <what-built>Regenerated and updated bundled stadium data for all 4 core sports</what-built>
  <how-to-verify>
1. Open `SportsTime/Resources/stadiums_canonical.json`
2. Verify stadium counts by sport:
   - MLB: 30 stadiums
   - NBA: 30 stadiums
   - NHL: 32 stadiums
   - NFL: 30 stadiums (2 shared: SoFi, MetLife)
3. Spot check data quality:
   - Pick any stadium, verify state is 2-letter code (e.g., "CA", "NY")
   - Pick any stadium, verify capacity is realistic (15000-100000)
   - Pick any stadium, verify year_opened is reasonable (1900-2025)
4. Verify no non-core sports included (MLS, WNBA, NWSL, CBB should NOT be in bundled JSON - or if present, that's intentional)
  </how-to-verify>
  <resume-signal>Type "approved" if data looks correct, or describe issues to fix</resume-signal>
</task>

</tasks>

<verification>
Before declaring plan complete:
- [ ] Pipeline runs without errors
- [ ] data/stadiums_canonical.json has 122 core sport stadiums
- [ ] Bundled JSON updated with complete data
- [ ] Human verified data quality
</verification>

<success_criteria>

- Pipeline completes successfully
- All stadium fields populated (no empty state, zero capacity, or null year_opened)
- Bundled JSON has correct stadium counts for MLB, NBA, NHL, NFL
- Phase 2 complete: Stadium Foundation established
</success_criteria>

<output>
After completion, create `.planning/phases/02-stadium-foundation/02-02-SUMMARY.md`:

# Phase 2 Plan 02: Pipeline Regeneration & Verification Summary

**[Substantive one-liner]**

## Accomplishments

- [Pipeline executed successfully]
- [Bundled JSON updated with complete data]

## Files Created/Modified

- `data/stadiums.json` - Raw stadium data
- `data/stadiums_canonical.json` - Canonical output
- `data/stadium_aliases.json` - Historical aliases
- `SportsTime/Resources/stadiums_canonical.json` - Bundled canonical data
- `SportsTime/Resources/stadium_aliases.json` - Bundled aliases

## Decisions Made

[Any decisions about included sports, data sources]

## Issues Encountered

[Any pipeline issues and fixes]

## Phase 2 Complete

Phase 2: Stadium Foundation is complete:
- All 4 core sports have complete stadium data
- Data includes: canonical_id, name, city, state, lat/lng, capacity, year_opened, teams
- Historical aliases in place for renamed stadiums
- Ready for Phase 3: Alias Systems
</output>