diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md index 45bed44..8002757 100644 --- a/.planning/ROADMAP.md +++ b/.planning/ROADMAP.md @@ -37,12 +37,12 @@ Plans: ### Phase 2: Stadium Foundation **Goal**: Complete stadium database with correct coordinates, names, and venue data for all 4 sports **Depends on**: Phase 1 -**Research**: Likely (stadium data sources, geocoding verification) -**Research topics**: Stadium data sources (Wikipedia, official league sites), geocoding API for coordinate verification, handling relocated/renamed venues -**Plans**: TBD +**Research**: No (hardcoded data exists in sport modules, internal pipeline work) +**Plans**: 2 plans Plans: -- [ ] 02-01: TBD +- [ ] 02-01: Audit & complete hardcoded stadium data in sport modules +- [ ] 02-02: Regenerate canonical data and verify pipeline ### Phase 3: Alias Systems **Goal**: Implement alias systems for both stadiums and teams to handle name variations across data sources @@ -89,7 +89,7 @@ Phases execute in numeric order: 1 → 2 → 3 → 4 → 5 → 6 | Phase | Plans Complete | Status | Completed | |-------|----------------|--------|-----------| | 1. Script Architecture | 3/3 | Complete | 2026-01-10 | -| 2. Stadium Foundation | 0/TBD | Not started | - | +| 2. Stadium Foundation | 0/2 | Planned | - | | 3. Alias Systems | 0/TBD | Not started | - | | 4. Canonical Linking | 0/TBD | Not started | - | | 5. CloudKit CRUD | 0/TBD | Not started | - | diff --git a/.planning/phases/02-stadium-foundation/02-01-PLAN.md b/.planning/phases/02-stadium-foundation/02-01-PLAN.md new file mode 100644 index 0000000..29eee5a --- /dev/null +++ b/.planning/phases/02-stadium-foundation/02-01-PLAN.md @@ -0,0 +1,153 @@ +--- +phase: 02-stadium-foundation +plan: 01 +type: execute +--- + + +Audit and complete hardcoded stadium data across all 4 sport modules. + +Purpose: Ensure all sport modules have complete, accurate stadium data that will flow through the canonicalization pipeline. +Output: All 4 sport modules with complete stadium data (city, state, lat/lng, capacity, year_opened, teams). + + + +~/.claude/get-shit-done/workflows/execute-phase.md +~/.claude/get-shit-done/templates/summary.md + + + +@.planning/PROJECT.md +@.planning/ROADMAP.md +@.planning/STATE.md +@.planning/phases/01-script-architecture/01-03-SUMMARY.md + +**Key files:** +@Scripts/mlb.py +@Scripts/nba.py +@Scripts/nhl.py +@Scripts/nfl.py + +**Current state:** +- MLB, NBA, NHL, NFL modules have hardcoded stadium data with city, state, lat/lng, capacity, teams +- Missing field: year_opened (null in all canonical data) +- NFL module created in Phase 1 Plan 03 with 30 hardcoded stadiums +- Bundled stadiums_canonical.json has incomplete data (state="", capacity=0, missing NFL) + +**Expected stadium counts:** +- MLB: 30 stadiums (30 teams) +- NBA: 30 stadiums (30 teams) +- NHL: 32 stadiums (32 teams) +- NFL: 30 stadiums (32 teams, 2 shared: SoFi Stadium, MetLife Stadium) + +**Stadium data structure:** +Each module has `scrape_{sport}_stadiums_hardcoded()` returning Stadium objects with: +- name, city, state, lat/lng, capacity, teams +- Missing: year_opened for filtering historical/renamed venues + + + + + + Task 1: Audit stadium data completeness across all 4 sport modules + Scripts/mlb.py, Scripts/nba.py, Scripts/nhl.py, Scripts/nfl.py + +1. Read each sport module's hardcoded stadium function +2. Create audit report listing for each sport: + - Stadium count (should match expected) + - Fields present/missing + - Any stadiums with missing lat/lng (should be 0) + - Any stadiums with missing capacity (should be 0) +3. Identify gaps: stadiums missing from lists, incorrect coordinates, missing teams + +Do NOT modify any files in this task - audit only. The goal is to understand current state before making changes. + + Print audit summary showing stadium counts per sport and any data quality issues found + Audit report shows MLB:30, NBA:30, NHL:32, NFL:30 stadiums with all required fields documented + + + + Task 2: Add year_opened to all hardcoded stadiums + Scripts/mlb.py, Scripts/nba.py, Scripts/nhl.py, Scripts/nfl.py + +Add year_opened to each stadium's hardcoded data. Use the actual opening year for each venue: + +**MLB stadiums (sample):** +- Fenway Park: 1912 +- Wrigley Field: 1914 +- Dodger Stadium: 1962 +- Globe Life Field: 2020 + +**NBA arenas (sample):** +- TD Garden: 1995 +- Madison Square Garden: 1968 +- Chase Center: 2019 +- Intuit Dome: 2024 + +**NHL arenas:** Many share with NBA - verify and match + +**NFL stadiums (sample):** +- Lambeau Field: 1957 +- SoFi Stadium: 2020 +- Allegiant Stadium: 2020 + +For each module: +1. Update the hardcoded dict to include 'year_opened' key +2. Update Stadium object creation to include year_opened parameter +3. Ensure Stadium dataclass in core.py has year_opened field (verify first) + +Research actual opening years from Wikipedia if unsure. Use the original opening year, not renovation years. + + Run `python -c "from mlb import scrape_mlb_stadiums; s=scrape_mlb_stadiums(); print(f'MLB: {len(s)} stadiums, year_opened example: {s[0].year_opened if hasattr(s[0], \"year_opened\") else \"MISSING\"}')"` for each sport + All 4 sport modules have year_opened in hardcoded data, Stadium objects include year_opened field + + + + + +Before declaring plan complete: +- [ ] Audit confirms expected stadium counts: MLB:30, NBA:30, NHL:32, NFL:30 +- [ ] All 4 modules have year_opened in hardcoded stadium data +- [ ] No Python syntax errors in any module +- [ ] Stadium dataclass supports year_opened field + + + + +- Task 1: Audit complete with documented counts and any gaps identified +- Task 2: year_opened added to all hardcoded stadiums in all 4 modules +- No import errors when loading modules +- Ready for Plan 02 (pipeline regeneration) + + + +After completion, create `.planning/phases/02-stadium-foundation/02-01-SUMMARY.md`: + +# Phase 2 Plan 01: Stadium Data Audit & Completion Summary + +**[Substantive one-liner]** + +## Accomplishments + +- [Stadium counts verified] +- [year_opened added to all modules] + +## Files Created/Modified + +- `Scripts/mlb.py` - Added year_opened +- `Scripts/nba.py` - Added year_opened +- `Scripts/nhl.py` - Added year_opened +- `Scripts/nfl.py` - Added year_opened + +## Decisions Made + +[Any gaps found and how resolved] + +## Issues Encountered + +[Any data issues discovered] + +## Next Step + +Ready for 02-02-PLAN.md (pipeline regeneration) + diff --git a/.planning/phases/02-stadium-foundation/02-02-PLAN.md b/.planning/phases/02-stadium-foundation/02-02-PLAN.md new file mode 100644 index 0000000..888ae7a --- /dev/null +++ b/.planning/phases/02-stadium-foundation/02-02-PLAN.md @@ -0,0 +1,161 @@ +--- +phase: 02-stadium-foundation +plan: 02 +type: execute +--- + + +Regenerate canonical stadium data and verify the complete pipeline flow. + +Purpose: Ensure hardcoded stadium data flows correctly through canonicalization to bundled JSON. +Output: Updated bundled stadiums_canonical.json with complete data for all 4 sports. + + + +~/.claude/get-shit-done/workflows/execute-phase.md +~/.claude/get-shit-done/templates/summary.md +~/.claude/get-shit-done/references/checkpoints.md + + + +@.planning/PROJECT.md +@.planning/ROADMAP.md +@.planning/STATE.md +@.planning/phases/02-stadium-foundation/02-01-SUMMARY.md + +**Key files:** +@Scripts/scrape_schedules.py +@Scripts/run_canonicalization_pipeline.py +@Scripts/canonicalize_stadiums.py + +**Pipeline flow:** +1. `scrape_schedules.py --stadiums-update` calls `scrape_all_stadiums()` → `data/stadiums.json` +2. `run_canonicalization_pipeline.py` reads stadiums.json → canonicalizes → `data/stadiums_canonical.json` +3. Copy to `SportsTime/Resources/stadiums_canonical.json` + +**Expected output:** +- stadiums_canonical.json with all fields populated: canonical_id, name, city, state, latitude, longitude, capacity, sport, primary_team_abbrevs, year_opened +- stadium_aliases.json with historical name mappings +- Stadium counts: MLB:30, NBA:30, NHL:32, NFL:30 = 122 core stadiums + +**Pre-requisites:** +- Plan 02-01 complete (year_opened added to all modules) + + + + + + Task 1: Run stadium scraping and canonicalization pipeline + data/stadiums.json, data/stadiums_canonical.json, data/stadium_aliases.json + +1. Navigate to Scripts directory +2. Run: `python scrape_schedules.py --stadiums-update` + - This calls scrape_all_stadiums() which invokes each sport module's scraper + - Output: data/stadiums.json with raw stadium data +3. Run: `python run_canonicalization_pipeline.py --verbose` + - Or run canonicalize_stadiums.py directly if pipeline is complex +4. Verify output files exist in data/ directory + +If errors occur, debug and fix before proceeding. Common issues: +- Import errors: Check module paths and __init__.py +- Missing fields: Verify Stadium dataclass in core.py + + ls -la data/stadiums*.json && cat data/stadiums_canonical.json | python3 -c "import json,sys; d=json.load(sys.stdin); print(f'Total: {len(d)}'); sports={}; [sports.__setitem__(s['sport'], sports.get(s['sport'],0)+1) for s in d]; print(sports)" + stadiums_canonical.json exists with MLB:30, NBA:30, NHL:32, NFL:30 stadiums + + + + Task 2: Copy canonical data to bundled resources and verify completeness + SportsTime/Resources/stadiums_canonical.json, SportsTime/Resources/stadium_aliases.json + +1. Copy generated canonical files to app bundle: + - cp data/stadiums_canonical.json SportsTime/Resources/stadiums_canonical.json + - cp data/stadium_aliases.json SportsTime/Resources/stadium_aliases.json + +2. Verify data completeness by checking sample records: + - All state fields populated (not empty string) + - All capacity fields > 0 + - All year_opened fields not null + - All lat/lng reasonable (US coordinates: lat 24-49, lng -125 to -66) + +3. If any fields empty, trace back to source: + - Check raw stadiums.json has the field + - Check canonicalize_stadiums.py preserves the field + - Fix the break in the chain + + cat SportsTime/Resources/stadiums_canonical.json | python3 -c "import json,sys; d=json.load(sys.stdin); empty_state=sum(1 for s in d if not s.get('state')); zero_cap=sum(1 for s in d if not s.get('capacity')); null_year=sum(1 for s in d if s.get('year_opened') is None); print(f'Empty state: {empty_state}, Zero capacity: {zero_cap}, Null year: {null_year}')" + Bundled JSON has 0 empty states, 0 zero capacities, 0 null year_opened values + + + + Regenerated and updated bundled stadium data for all 4 core sports + +1. Open `SportsTime/Resources/stadiums_canonical.json` +2. Verify stadium counts by sport: + - MLB: 30 stadiums + - NBA: 30 stadiums + - NHL: 32 stadiums + - NFL: 30 stadiums (2 shared: SoFi, MetLife) +3. Spot check data quality: + - Pick any stadium, verify state is 2-letter code (e.g., "CA", "NY") + - Pick any stadium, verify capacity is realistic (15000-100000) + - Pick any stadium, verify year_opened is reasonable (1900-2025) +4. Verify no non-core sports included (MLS, WNBA, NWSL, CBB should NOT be in bundled JSON - or if present, that's intentional) + + Type "approved" if data looks correct, or describe issues to fix + + + + + +Before declaring plan complete: +- [ ] Pipeline runs without errors +- [ ] data/stadiums_canonical.json has 122 core sport stadiums +- [ ] Bundled JSON updated with complete data +- [ ] Human verified data quality + + + + +- Pipeline completes successfully +- All stadium fields populated (no empty state, zero capacity, or null year_opened) +- Bundled JSON has correct stadium counts for MLB, NBA, NHL, NFL +- Phase 2 complete: Stadium Foundation established + + + +After completion, create `.planning/phases/02-stadium-foundation/02-02-SUMMARY.md`: + +# Phase 2 Plan 02: Pipeline Regeneration & Verification Summary + +**[Substantive one-liner]** + +## Accomplishments + +- [Pipeline executed successfully] +- [Bundled JSON updated with complete data] + +## Files Created/Modified + +- `data/stadiums.json` - Raw stadium data +- `data/stadiums_canonical.json` - Canonical output +- `data/stadium_aliases.json` - Historical aliases +- `SportsTime/Resources/stadiums_canonical.json` - Bundled canonical data +- `SportsTime/Resources/stadium_aliases.json` - Bundled aliases + +## Decisions Made + +[Any decisions about included sports, data sources] + +## Issues Encountered + +[Any pipeline issues and fixes] + +## Phase 2 Complete + +Phase 2: Stadium Foundation is complete: +- All 4 core sports have complete stadium data +- Data includes: canonical_id, name, city, state, lat/lng, capacity, year_opened, teams +- Historical aliases in place for renamed stadiums +- Ready for Phase 3: Alias Systems +