docs(02): create stadium foundation phase plans
Phase 2: Stadium Foundation - 2 plans created - 5 total tasks defined - Ready for execution Plan 02-01: Audit & complete hardcoded stadium data Plan 02-02: Regenerate canonical data and verify pipeline Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -37,12 +37,12 @@ Plans:
|
||||
### Phase 2: Stadium Foundation
|
||||
**Goal**: Complete stadium database with correct coordinates, names, and venue data for all 4 sports
|
||||
**Depends on**: Phase 1
|
||||
**Research**: Likely (stadium data sources, geocoding verification)
|
||||
**Research topics**: Stadium data sources (Wikipedia, official league sites), geocoding API for coordinate verification, handling relocated/renamed venues
|
||||
**Plans**: TBD
|
||||
**Research**: No (hardcoded data exists in sport modules, internal pipeline work)
|
||||
**Plans**: 2 plans
|
||||
|
||||
Plans:
|
||||
- [ ] 02-01: TBD
|
||||
- [ ] 02-01: Audit & complete hardcoded stadium data in sport modules
|
||||
- [ ] 02-02: Regenerate canonical data and verify pipeline
|
||||
|
||||
### Phase 3: Alias Systems
|
||||
**Goal**: Implement alias systems for both stadiums and teams to handle name variations across data sources
|
||||
@@ -89,7 +89,7 @@ Phases execute in numeric order: 1 → 2 → 3 → 4 → 5 → 6
|
||||
| Phase | Plans Complete | Status | Completed |
|
||||
|-------|----------------|--------|-----------|
|
||||
| 1. Script Architecture | 3/3 | Complete | 2026-01-10 |
|
||||
| 2. Stadium Foundation | 0/TBD | Not started | - |
|
||||
| 2. Stadium Foundation | 0/2 | Planned | - |
|
||||
| 3. Alias Systems | 0/TBD | Not started | - |
|
||||
| 4. Canonical Linking | 0/TBD | Not started | - |
|
||||
| 5. CloudKit CRUD | 0/TBD | Not started | - |
|
||||
|
||||
153
.planning/phases/02-stadium-foundation/02-01-PLAN.md
Normal file
153
.planning/phases/02-stadium-foundation/02-01-PLAN.md
Normal file
@@ -0,0 +1,153 @@
|
||||
---
|
||||
phase: 02-stadium-foundation
|
||||
plan: 01
|
||||
type: execute
|
||||
---
|
||||
|
||||
<objective>
|
||||
Audit and complete hardcoded stadium data across all 4 sport modules.
|
||||
|
||||
Purpose: Ensure all sport modules have complete, accurate stadium data that will flow through the canonicalization pipeline.
|
||||
Output: All 4 sport modules with complete stadium data (city, state, lat/lng, capacity, year_opened, teams).
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
~/.claude/get-shit-done/workflows/execute-phase.md
|
||||
~/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@.planning/phases/01-script-architecture/01-03-SUMMARY.md
|
||||
|
||||
**Key files:**
|
||||
@Scripts/mlb.py
|
||||
@Scripts/nba.py
|
||||
@Scripts/nhl.py
|
||||
@Scripts/nfl.py
|
||||
|
||||
**Current state:**
|
||||
- MLB, NBA, NHL, NFL modules have hardcoded stadium data with city, state, lat/lng, capacity, teams
|
||||
- Missing field: year_opened (null in all canonical data)
|
||||
- NFL module created in Phase 1 Plan 03 with 30 hardcoded stadiums
|
||||
- Bundled stadiums_canonical.json has incomplete data (state="", capacity=0, missing NFL)
|
||||
|
||||
**Expected stadium counts:**
|
||||
- MLB: 30 stadiums (30 teams)
|
||||
- NBA: 30 stadiums (30 teams)
|
||||
- NHL: 32 stadiums (32 teams)
|
||||
- NFL: 30 stadiums (32 teams, 2 shared: SoFi Stadium, MetLife Stadium)
|
||||
|
||||
**Stadium data structure:**
|
||||
Each module has `scrape_{sport}_stadiums_hardcoded()` returning Stadium objects with:
|
||||
- name, city, state, lat/lng, capacity, teams
|
||||
- Missing: year_opened for filtering historical/renamed venues
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Audit stadium data completeness across all 4 sport modules</name>
|
||||
<files>Scripts/mlb.py, Scripts/nba.py, Scripts/nhl.py, Scripts/nfl.py</files>
|
||||
<action>
|
||||
1. Read each sport module's hardcoded stadium function
|
||||
2. Create audit report listing for each sport:
|
||||
- Stadium count (should match expected)
|
||||
- Fields present/missing
|
||||
- Any stadiums with missing lat/lng (should be 0)
|
||||
- Any stadiums with missing capacity (should be 0)
|
||||
3. Identify gaps: stadiums missing from lists, incorrect coordinates, missing teams
|
||||
|
||||
Do NOT modify any files in this task - audit only. The goal is to understand current state before making changes.
|
||||
</action>
|
||||
<verify>Print audit summary showing stadium counts per sport and any data quality issues found</verify>
|
||||
<done>Audit report shows MLB:30, NBA:30, NHL:32, NFL:30 stadiums with all required fields documented</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Add year_opened to all hardcoded stadiums</name>
|
||||
<files>Scripts/mlb.py, Scripts/nba.py, Scripts/nhl.py, Scripts/nfl.py</files>
|
||||
<action>
|
||||
Add year_opened to each stadium's hardcoded data. Use the actual opening year for each venue:
|
||||
|
||||
**MLB stadiums (sample):**
|
||||
- Fenway Park: 1912
|
||||
- Wrigley Field: 1914
|
||||
- Dodger Stadium: 1962
|
||||
- Globe Life Field: 2020
|
||||
|
||||
**NBA arenas (sample):**
|
||||
- TD Garden: 1995
|
||||
- Madison Square Garden: 1968
|
||||
- Chase Center: 2019
|
||||
- Intuit Dome: 2024
|
||||
|
||||
**NHL arenas:** Many share with NBA - verify and match
|
||||
|
||||
**NFL stadiums (sample):**
|
||||
- Lambeau Field: 1957
|
||||
- SoFi Stadium: 2020
|
||||
- Allegiant Stadium: 2020
|
||||
|
||||
For each module:
|
||||
1. Update the hardcoded dict to include 'year_opened' key
|
||||
2. Update Stadium object creation to include year_opened parameter
|
||||
3. Ensure Stadium dataclass in core.py has year_opened field (verify first)
|
||||
|
||||
Research actual opening years from Wikipedia if unsure. Use the original opening year, not renovation years.
|
||||
</action>
|
||||
<verify>Run `python -c "from mlb import scrape_mlb_stadiums; s=scrape_mlb_stadiums(); print(f'MLB: {len(s)} stadiums, year_opened example: {s[0].year_opened if hasattr(s[0], \"year_opened\") else \"MISSING\"}')"` for each sport</verify>
|
||||
<done>All 4 sport modules have year_opened in hardcoded data, Stadium objects include year_opened field</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
Before declaring plan complete:
|
||||
- [ ] Audit confirms expected stadium counts: MLB:30, NBA:30, NHL:32, NFL:30
|
||||
- [ ] All 4 modules have year_opened in hardcoded stadium data
|
||||
- [ ] No Python syntax errors in any module
|
||||
- [ ] Stadium dataclass supports year_opened field
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
|
||||
- Task 1: Audit complete with documented counts and any gaps identified
|
||||
- Task 2: year_opened added to all hardcoded stadiums in all 4 modules
|
||||
- No import errors when loading modules
|
||||
- Ready for Plan 02 (pipeline regeneration)
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/02-stadium-foundation/02-01-SUMMARY.md`:
|
||||
|
||||
# Phase 2 Plan 01: Stadium Data Audit & Completion Summary
|
||||
|
||||
**[Substantive one-liner]**
|
||||
|
||||
## Accomplishments
|
||||
|
||||
- [Stadium counts verified]
|
||||
- [year_opened added to all modules]
|
||||
|
||||
## Files Created/Modified
|
||||
|
||||
- `Scripts/mlb.py` - Added year_opened
|
||||
- `Scripts/nba.py` - Added year_opened
|
||||
- `Scripts/nhl.py` - Added year_opened
|
||||
- `Scripts/nfl.py` - Added year_opened
|
||||
|
||||
## Decisions Made
|
||||
|
||||
[Any gaps found and how resolved]
|
||||
|
||||
## Issues Encountered
|
||||
|
||||
[Any data issues discovered]
|
||||
|
||||
## Next Step
|
||||
|
||||
Ready for 02-02-PLAN.md (pipeline regeneration)
|
||||
</output>
|
||||
161
.planning/phases/02-stadium-foundation/02-02-PLAN.md
Normal file
161
.planning/phases/02-stadium-foundation/02-02-PLAN.md
Normal file
@@ -0,0 +1,161 @@
|
||||
---
|
||||
phase: 02-stadium-foundation
|
||||
plan: 02
|
||||
type: execute
|
||||
---
|
||||
|
||||
<objective>
|
||||
Regenerate canonical stadium data and verify the complete pipeline flow.
|
||||
|
||||
Purpose: Ensure hardcoded stadium data flows correctly through canonicalization to bundled JSON.
|
||||
Output: Updated bundled stadiums_canonical.json with complete data for all 4 sports.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
~/.claude/get-shit-done/workflows/execute-phase.md
|
||||
~/.claude/get-shit-done/templates/summary.md
|
||||
~/.claude/get-shit-done/references/checkpoints.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@.planning/phases/02-stadium-foundation/02-01-SUMMARY.md
|
||||
|
||||
**Key files:**
|
||||
@Scripts/scrape_schedules.py
|
||||
@Scripts/run_canonicalization_pipeline.py
|
||||
@Scripts/canonicalize_stadiums.py
|
||||
|
||||
**Pipeline flow:**
|
||||
1. `scrape_schedules.py --stadiums-update` calls `scrape_all_stadiums()` → `data/stadiums.json`
|
||||
2. `run_canonicalization_pipeline.py` reads stadiums.json → canonicalizes → `data/stadiums_canonical.json`
|
||||
3. Copy to `SportsTime/Resources/stadiums_canonical.json`
|
||||
|
||||
**Expected output:**
|
||||
- stadiums_canonical.json with all fields populated: canonical_id, name, city, state, latitude, longitude, capacity, sport, primary_team_abbrevs, year_opened
|
||||
- stadium_aliases.json with historical name mappings
|
||||
- Stadium counts: MLB:30, NBA:30, NHL:32, NFL:30 = 122 core stadiums
|
||||
|
||||
**Pre-requisites:**
|
||||
- Plan 02-01 complete (year_opened added to all modules)
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Run stadium scraping and canonicalization pipeline</name>
|
||||
<files>data/stadiums.json, data/stadiums_canonical.json, data/stadium_aliases.json</files>
|
||||
<action>
|
||||
1. Navigate to Scripts directory
|
||||
2. Run: `python scrape_schedules.py --stadiums-update`
|
||||
- This calls scrape_all_stadiums() which invokes each sport module's scraper
|
||||
- Output: data/stadiums.json with raw stadium data
|
||||
3. Run: `python run_canonicalization_pipeline.py --verbose`
|
||||
- Or run canonicalize_stadiums.py directly if pipeline is complex
|
||||
4. Verify output files exist in data/ directory
|
||||
|
||||
If errors occur, debug and fix before proceeding. Common issues:
|
||||
- Import errors: Check module paths and __init__.py
|
||||
- Missing fields: Verify Stadium dataclass in core.py
|
||||
</action>
|
||||
<verify>ls -la data/stadiums*.json && cat data/stadiums_canonical.json | python3 -c "import json,sys; d=json.load(sys.stdin); print(f'Total: {len(d)}'); sports={}; [sports.__setitem__(s['sport'], sports.get(s['sport'],0)+1) for s in d]; print(sports)"</verify>
|
||||
<done>stadiums_canonical.json exists with MLB:30, NBA:30, NHL:32, NFL:30 stadiums</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Copy canonical data to bundled resources and verify completeness</name>
|
||||
<files>SportsTime/Resources/stadiums_canonical.json, SportsTime/Resources/stadium_aliases.json</files>
|
||||
<action>
|
||||
1. Copy generated canonical files to app bundle:
|
||||
- cp data/stadiums_canonical.json SportsTime/Resources/stadiums_canonical.json
|
||||
- cp data/stadium_aliases.json SportsTime/Resources/stadium_aliases.json
|
||||
|
||||
2. Verify data completeness by checking sample records:
|
||||
- All state fields populated (not empty string)
|
||||
- All capacity fields > 0
|
||||
- All year_opened fields not null
|
||||
- All lat/lng reasonable (US coordinates: lat 24-49, lng -125 to -66)
|
||||
|
||||
3. If any fields empty, trace back to source:
|
||||
- Check raw stadiums.json has the field
|
||||
- Check canonicalize_stadiums.py preserves the field
|
||||
- Fix the break in the chain
|
||||
</action>
|
||||
<verify>cat SportsTime/Resources/stadiums_canonical.json | python3 -c "import json,sys; d=json.load(sys.stdin); empty_state=sum(1 for s in d if not s.get('state')); zero_cap=sum(1 for s in d if not s.get('capacity')); null_year=sum(1 for s in d if s.get('year_opened') is None); print(f'Empty state: {empty_state}, Zero capacity: {zero_cap}, Null year: {null_year}')"</verify>
|
||||
<done>Bundled JSON has 0 empty states, 0 zero capacities, 0 null year_opened values</done>
|
||||
</task>
|
||||
|
||||
<task type="checkpoint:human-verify" gate="blocking">
|
||||
<what-built>Regenerated and updated bundled stadium data for all 4 core sports</what-built>
|
||||
<how-to-verify>
|
||||
1. Open `SportsTime/Resources/stadiums_canonical.json`
|
||||
2. Verify stadium counts by sport:
|
||||
- MLB: 30 stadiums
|
||||
- NBA: 30 stadiums
|
||||
- NHL: 32 stadiums
|
||||
- NFL: 30 stadiums (2 shared: SoFi, MetLife)
|
||||
3. Spot check data quality:
|
||||
- Pick any stadium, verify state is 2-letter code (e.g., "CA", "NY")
|
||||
- Pick any stadium, verify capacity is realistic (15000-100000)
|
||||
- Pick any stadium, verify year_opened is reasonable (1900-2025)
|
||||
4. Verify no non-core sports included (MLS, WNBA, NWSL, CBB should NOT be in bundled JSON - or if present, that's intentional)
|
||||
</how-to-verify>
|
||||
<resume-signal>Type "approved" if data looks correct, or describe issues to fix</resume-signal>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
Before declaring plan complete:
|
||||
- [ ] Pipeline runs without errors
|
||||
- [ ] data/stadiums_canonical.json has 122 core sport stadiums
|
||||
- [ ] Bundled JSON updated with complete data
|
||||
- [ ] Human verified data quality
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
|
||||
- Pipeline completes successfully
|
||||
- All stadium fields populated (no empty state, zero capacity, or null year_opened)
|
||||
- Bundled JSON has correct stadium counts for MLB, NBA, NHL, NFL
|
||||
- Phase 2 complete: Stadium Foundation established
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/02-stadium-foundation/02-02-SUMMARY.md`:
|
||||
|
||||
# Phase 2 Plan 02: Pipeline Regeneration & Verification Summary
|
||||
|
||||
**[Substantive one-liner]**
|
||||
|
||||
## Accomplishments
|
||||
|
||||
- [Pipeline executed successfully]
|
||||
- [Bundled JSON updated with complete data]
|
||||
|
||||
## Files Created/Modified
|
||||
|
||||
- `data/stadiums.json` - Raw stadium data
|
||||
- `data/stadiums_canonical.json` - Canonical output
|
||||
- `data/stadium_aliases.json` - Historical aliases
|
||||
- `SportsTime/Resources/stadiums_canonical.json` - Bundled canonical data
|
||||
- `SportsTime/Resources/stadium_aliases.json` - Bundled aliases
|
||||
|
||||
## Decisions Made
|
||||
|
||||
[Any decisions about included sports, data sources]
|
||||
|
||||
## Issues Encountered
|
||||
|
||||
[Any pipeline issues and fixes]
|
||||
|
||||
## Phase 2 Complete
|
||||
|
||||
Phase 2: Stadium Foundation is complete:
|
||||
- All 4 core sports have complete stadium data
|
||||
- Data includes: canonical_id, name, city, state, lat/lng, capacity, year_opened, teams
|
||||
- Historical aliases in place for renamed stadiums
|
||||
- Ready for Phase 3: Alias Systems
|
||||
</output>
|
||||
Reference in New Issue
Block a user