Files
Sportstime/.planning/phases/02-stadium-foundation/02-02-PLAN.md
Trey t 95861cae40 docs(02): create stadium foundation phase plans
Phase 2: Stadium Foundation
- 2 plans created
- 5 total tasks defined
- Ready for execution

Plan 02-01: Audit & complete hardcoded stadium data
Plan 02-02: Regenerate canonical data and verify pipeline

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 00:24:41 -06:00

6.2 KiB

phase, plan, type
phase plan type
02-stadium-foundation 02 execute
Regenerate canonical stadium data and verify the complete pipeline flow.

Purpose: Ensure hardcoded stadium data flows correctly through canonicalization to bundled JSON. Output: Updated bundled stadiums_canonical.json with complete data for all 4 sports.

<execution_context> ~/.claude/get-shit-done/workflows/execute-phase.md ~/.claude/get-shit-done/templates/summary.md ~/.claude/get-shit-done/references/checkpoints.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/02-stadium-foundation/02-01-SUMMARY.md

Key files: @Scripts/scrape_schedules.py @Scripts/run_canonicalization_pipeline.py @Scripts/canonicalize_stadiums.py

Pipeline flow:

  1. scrape_schedules.py --stadiums-update calls scrape_all_stadiums()data/stadiums.json
  2. run_canonicalization_pipeline.py reads stadiums.json → canonicalizes → data/stadiums_canonical.json
  3. Copy to SportsTime/Resources/stadiums_canonical.json

Expected output:

  • stadiums_canonical.json with all fields populated: canonical_id, name, city, state, latitude, longitude, capacity, sport, primary_team_abbrevs, year_opened
  • stadium_aliases.json with historical name mappings
  • Stadium counts: MLB:30, NBA:30, NHL:32, NFL:30 = 122 core stadiums

Pre-requisites:

  • Plan 02-01 complete (year_opened added to all modules)
Task 1: Run stadium scraping and canonicalization pipeline data/stadiums.json, data/stadiums_canonical.json, data/stadium_aliases.json 1. Navigate to Scripts directory 2. Run: `python scrape_schedules.py --stadiums-update` - This calls scrape_all_stadiums() which invokes each sport module's scraper - Output: data/stadiums.json with raw stadium data 3. Run: `python run_canonicalization_pipeline.py --verbose` - Or run canonicalize_stadiums.py directly if pipeline is complex 4. Verify output files exist in data/ directory

If errors occur, debug and fix before proceeding. Common issues:

  • Import errors: Check module paths and init.py
  • Missing fields: Verify Stadium dataclass in core.py ls -la data/stadiums*.json && cat data/stadiums_canonical.json | python3 -c "import json,sys; d=json.load(sys.stdin); print(f'Total: {len(d)}'); sports={}; [sports.setitem(s['sport'], sports.get(s['sport'],0)+1) for s in d]; print(sports)" stadiums_canonical.json exists with MLB:30, NBA:30, NHL:32, NFL:30 stadiums
Task 2: Copy canonical data to bundled resources and verify completeness SportsTime/Resources/stadiums_canonical.json, SportsTime/Resources/stadium_aliases.json 1. Copy generated canonical files to app bundle: - cp data/stadiums_canonical.json SportsTime/Resources/stadiums_canonical.json - cp data/stadium_aliases.json SportsTime/Resources/stadium_aliases.json
  1. Verify data completeness by checking sample records:

    • All state fields populated (not empty string)
    • All capacity fields > 0
    • All year_opened fields not null
    • All lat/lng reasonable (US coordinates: lat 24-49, lng -125 to -66)
  2. If any fields empty, trace back to source:

    • Check raw stadiums.json has the field
    • Check canonicalize_stadiums.py preserves the field
    • Fix the break in the chain cat SportsTime/Resources/stadiums_canonical.json | python3 -c "import json,sys; d=json.load(sys.stdin); empty_state=sum(1 for s in d if not s.get('state')); zero_cap=sum(1 for s in d if not s.get('capacity')); null_year=sum(1 for s in d if s.get('year_opened') is None); print(f'Empty state: {empty_state}, Zero capacity: {zero_cap}, Null year: {null_year}')" Bundled JSON has 0 empty states, 0 zero capacities, 0 null year_opened values
Regenerated and updated bundled stadium data for all 4 core sports 1. Open `SportsTime/Resources/stadiums_canonical.json` 2. Verify stadium counts by sport: - MLB: 30 stadiums - NBA: 30 stadiums - NHL: 32 stadiums - NFL: 30 stadiums (2 shared: SoFi, MetLife) 3. Spot check data quality: - Pick any stadium, verify state is 2-letter code (e.g., "CA", "NY") - Pick any stadium, verify capacity is realistic (15000-100000) - Pick any stadium, verify year_opened is reasonable (1900-2025) 4. Verify no non-core sports included (MLS, WNBA, NWSL, CBB should NOT be in bundled JSON - or if present, that's intentional) Type "approved" if data looks correct, or describe issues to fix Before declaring plan complete: - [ ] Pipeline runs without errors - [ ] data/stadiums_canonical.json has 122 core sport stadiums - [ ] Bundled JSON updated with complete data - [ ] Human verified data quality

<success_criteria>

  • Pipeline completes successfully
  • All stadium fields populated (no empty state, zero capacity, or null year_opened)
  • Bundled JSON has correct stadium counts for MLB, NBA, NHL, NFL
  • Phase 2 complete: Stadium Foundation established </success_criteria>
After completion, create `.planning/phases/02-stadium-foundation/02-02-SUMMARY.md`:

Phase 2 Plan 02: Pipeline Regeneration & Verification Summary

[Substantive one-liner]

Accomplishments

  • [Pipeline executed successfully]
  • [Bundled JSON updated with complete data]

Files Created/Modified

  • data/stadiums.json - Raw stadium data
  • data/stadiums_canonical.json - Canonical output
  • data/stadium_aliases.json - Historical aliases
  • SportsTime/Resources/stadiums_canonical.json - Bundled canonical data
  • SportsTime/Resources/stadium_aliases.json - Bundled aliases

Decisions Made

[Any decisions about included sports, data sources]

Issues Encountered

[Any pipeline issues and fixes]

Phase 2 Complete

Phase 2: Stadium Foundation is complete:

  • All 4 core sports have complete stadium data
  • Data includes: canonical_id, name, city, state, lat/lng, capacity, year_opened, teams
  • Historical aliases in place for renamed stadiums
  • Ready for Phase 3: Alias Systems