--- phase: 2.1-additional-sports-stadiums plan: 01 type: execute --- Create MLS sport module with complete hardcoded stadium data. Purpose: Enable MLS stadium data to flow through the canonicalization pipeline like the core 4 sports. Output: mls.py module with 30 stadiums including capacity, year_opened, and coordinates. ~/.claude/get-shit-done/workflows/execute-phase.md ~/.claude/get-shit-done/templates/summary.md @.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md # Prior phase context: @.planning/phases/02-stadium-foundation/02-02-SUMMARY.md # Pattern reference (follow this module structure): @Scripts/mlb.py @Scripts/nba.py # Current MLS data location: @Scripts/scrape_schedules.py (MLS_TEAMS dict at line 93) @Scripts/data/stadiums.json (MLS entries have lat/lng but missing capacity/year_opened) # Core module for imports: @Scripts/core.py **Tech stack available:** Python 3, dataclasses, requests **Established patterns:** Sport module structure (team dict, get_abbrev function, hardcoded stadiums, scraper sources) **Constraining decisions:** - Phase 02-02: MLS excluded from bundled JSON due to incomplete data (zero capacity, null year_opened) Task 1: Create mls.py module with complete stadium data Scripts/mls.py Create mls.py following the mlb.py/nba.py pattern: 1. Module docstring and imports (try/except for core imports) 2. __all__ exports list 3. MLS_TEAMS dict (copy from scrape_schedules.py, 30 teams) 4. get_mls_team_abbrev() function 5. Hardcoded MLS stadiums dict with COMPLETE data: - All 30 MLS stadiums - Each entry needs: city, state, lat, lng, capacity, teams (list of abbrevs), year_opened - Use existing lat/lng from Scripts/data/stadiums.json where available - Research capacity and year_opened for each stadium Key stadiums to research (capacity/year_opened): - Mercedes-Benz Stadium (ATL) - shared with NFL - Q2 Stadium (Austin) - MLS-specific, opened 2021 - Bank of America Stadium (CLT) - shared with NFL - Soldier Field (CHI) - shared with NFL - TQL Stadium (CIN) - MLS-specific, opened 2021 - Dick's Sporting Goods Park (COL) - Lower.com Field (CLB) - opened 2021 - Toyota Stadium (DAL) - Audi Field (DC) - MLS-specific, opened 2018 - Shell Energy Stadium (HOU) - MLS-specific - Dignity Health Sports Park (LAG) - BMO Stadium (LAFC) - opened 2018 - Chase Stadium (MIA) - MLS-specific - Allianz Field (MIN) - opened 2019 - Stade Saputo (MTL) - Geodis Park (NSH) - opened 2022 - Gillette Stadium (NE) - shared with NFL - Yankee Stadium (NYCFC) - shared with MLB - Red Bull Arena (NYRB) - Inter&Co Stadium (ORL) - Subaru Park (PHI) - Providence Park (POR) - America First Field (RSL) - PayPal Park (SJ) - Lumen Field (SEA) - shared with NFL - Children's Mercy Park (SKC) - CityPark (STL) - opened 2023 - BMO Field (TOR) - BC Place (VAN) - shared stadium - Snapdragon Stadium (SD) - shared, opened 2022 6. scrape_mls_stadiums_hardcoded() function returning list[Stadium] 7. scrape_mls_stadiums() function with fallback sources 8. MLS_STADIUM_SOURCES configuration Note: Some stadiums are shared with NFL/MLB - use correct MLS-specific capacity where different (soccer configuration). python3 -c "from Scripts.mls import MLS_TEAMS, scrape_mls_stadiums_hardcoded; s = scrape_mls_stadiums_hardcoded(); print(f'{len(s)} stadiums'); assert len(s) == 30; assert all(st.capacity > 0 for st in s); assert all(st.year_opened for st in s)" mls.py exists with 30 teams, 30 stadiums, all with non-zero capacity and year_opened values Task 2: Integrate MLS module with scrape_schedules.py Scripts/scrape_schedules.py Update scrape_schedules.py to use the new mls.py module: 1. Add import at top (with try/except pattern): - from mls import MLS_TEAMS, get_mls_team_abbrev, scrape_mls_stadiums, MLS_STADIUM_SOURCES 2. Remove inline MLS_TEAMS dict (lines ~93-124) - now imported from mls.py 3. Update get_team_abbrev() function to use get_mls_team_abbrev() for MLS 4. Update scrape_mls_stadiums_gavinr() to be a secondary source (keep it, but mls.py hardcoded is primary) 5. Update the stadium scraping section to use scrape_mls_stadiums() from mls.py 6. Verify MLS games scraping still works (uses MLS_TEAMS for abbreviation lookup) Do NOT remove the game scraping functions (scrape_mls_fbref, etc.) - those stay inline for now. cd Scripts && python3 -c "from scrape_schedules import MLS_TEAMS, get_team_abbrev; print(f'MLS teams: {len(MLS_TEAMS)}'); abbrev = get_team_abbrev('Atlanta United FC', 'MLS'); print(f'ATL United abbrev: {abbrev}'); assert abbrev == 'ATL'" scrape_schedules.py imports MLS_TEAMS from mls.py, get_team_abbrev works for MLS, inline MLS_TEAMS removed Before declaring plan complete: - [ ] mls.py exists with complete module structure - [ ] All 30 MLS stadiums have capacity > 0 and year_opened values - [ ] scrape_schedules.py imports from mls.py successfully - [ ] `python3 Scripts/scrape_schedules.py --stadiums-update` includes MLS stadiums with complete data - mls.py module created following established pattern - 30 MLS stadiums with complete data (capacity, year_opened, coordinates) - scrape_schedules.py integration works - No import errors when running pipeline After completion, create `.planning/phases/2.1-add-stadium-data-mls-wnba-nwsl-cbb/02.1-01-SUMMARY.md`