--- phase: 01-script-architecture plan: 01 type: execute --- Create shared core module and extract MLB scrapers as the first sport module. Purpose: Establish the modular pattern that subsequent sports will follow. Output: `Scripts/core.py` with shared utilities, `Scripts/mlb.py` with MLB scrapers. @~/.claude/get-shit-done/workflows/execute-phase.md @~/.claude/get-shit-done/templates/summary.md @.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md **Source file:** @Scripts/scrape_schedules.py **Codebase context:** @.planning/codebase/CONVENTIONS.md **Tech stack:** Python 3, requests, beautifulsoup4, pandas, lxml **Established patterns:** dataclasses, type hints, docstrings Task 1: Create core.py shared module Scripts/core.py Create `Scripts/core.py` containing: 1. Imports: argparse, json, time, re, datetime, timedelta, pathlib, dataclasses, typing, requests, BeautifulSoup, pandas 2. Rate limiting utilities: - `REQUEST_DELAY` constant (3.0) - `last_request_time` dict - `rate_limit(domain: str)` function - `fetch_page(url: str, domain: str) -> Optional[BeautifulSoup]` function 3. Data classes: - `@dataclass Game` with all fields (id, sport, season, date, time, home_team, away_team, etc.) - `@dataclass Stadium` with all fields (id, name, city, state, latitude, longitude, etc.) 4. Multi-source fallback system: - `@dataclass ScraperSource` - `scrape_with_fallback(sport, season, sources, verbose)` function - `@dataclass StadiumScraperSource` - `scrape_stadiums_with_fallback(sport, sources, verbose)` function 5. ID generation: - `assign_stable_ids(games, sport, season)` function 6. Export utilities: - `export_to_json(games, stadiums, output_dir)` function - `cross_validate_sources(games_by_source)` function Keep exact function signatures and logic from scrape_schedules.py. Use `__all__` to explicitly export public API. python3 -c "from Scripts.core import Game, Stadium, ScraperSource, rate_limit, fetch_page, scrape_with_fallback, assign_stable_ids, export_to_json; print('OK')" core.py exists, imports successfully, exports all shared utilities Task 2: Create mlb.py sport module Scripts/mlb.py Create `Scripts/mlb.py` containing: 1. Import from core: ```python from core import Game, Stadium, ScraperSource, StadiumScraperSource, fetch_page, scrape_with_fallback, scrape_stadiums_with_fallback ``` 2. MLB game scrapers (copy exact logic): - `scrape_mlb_baseball_reference(season: int) -> list[Game]` - `scrape_mlb_statsapi(season: int) -> list[Game]` - `scrape_mlb_espn(season: int) -> list[Game]` 3. MLB stadium scrapers: - `scrape_mlb_stadiums_scorebot() -> list[Stadium]` - `scrape_mlb_stadiums_geojson() -> list[Stadium]` - `scrape_mlb_stadiums_hardcoded() -> list[Stadium]` - `scrape_mlb_stadiums() -> list[Stadium]` (combines above with fallback) 4. Source configurations: - `MLB_GAME_SOURCES` list of ScraperSource - `MLB_STADIUM_SOURCES` list of StadiumScraperSource 5. Convenience function: - `scrape_mlb_games(season: int) -> list[Game]` - uses fallback system Use `__all__` to export public API. Keep all team abbreviation mappings, venue name normalizations, and parsing logic intact. python3 -c "from Scripts.mlb import scrape_mlb_games, scrape_mlb_stadiums, MLB_GAME_SOURCES; print('OK')" mlb.py exists, imports from core.py, exports MLB scrapers and source configs Before declaring plan complete: - [ ] `Scripts/core.py` exists and imports cleanly - [ ] `Scripts/mlb.py` exists and imports from core - [ ] No syntax errors: `python3 -m py_compile Scripts/core.py Scripts/mlb.py` - [ ] Type hints present on all public functions - core.py contains all shared utilities extracted from scrape_schedules.py - mlb.py contains all MLB-specific scrapers - Both files import without errors - Original scrape_schedules.py unchanged (we're creating new files first) After completion, create `.planning/phases/01-script-architecture/01-01-SUMMARY.md`