- 01-01-PLAN.md: core.py + mlb.py (executed) - 01-02-PLAN.md: nba.py + nhl.py - 01-03-PLAN.md: nfl.py + orchestrator refactor - Codebase documentation for planning context Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
4.3 KiB
phase, plan, type
| phase | plan | type |
|---|---|---|
| 01-script-architecture | 01 | execute |
Purpose: Establish the modular pattern that subsequent sports will follow.
Output: Scripts/core.py with shared utilities, Scripts/mlb.py with MLB scrapers.
<execution_context>
@/.claude/get-shit-done/workflows/execute-phase.md
@/.claude/get-shit-done/templates/summary.md
</execution_context>
Source file: @Scripts/scrape_schedules.py
Codebase context: @.planning/codebase/CONVENTIONS.md
Tech stack: Python 3, requests, beautifulsoup4, pandas, lxml Established patterns: dataclasses, type hints, docstrings
Task 1: Create core.py shared module Scripts/core.py Create `Scripts/core.py` containing:-
Imports: argparse, json, time, re, datetime, timedelta, pathlib, dataclasses, typing, requests, BeautifulSoup, pandas
-
Rate limiting utilities:
REQUEST_DELAYconstant (3.0)last_request_timedictrate_limit(domain: str)functionfetch_page(url: str, domain: str) -> Optional[BeautifulSoup]function
-
Data classes:
@dataclass Gamewith all fields (id, sport, season, date, time, home_team, away_team, etc.)@dataclass Stadiumwith all fields (id, name, city, state, latitude, longitude, etc.)
-
Multi-source fallback system:
@dataclass ScraperSourcescrape_with_fallback(sport, season, sources, verbose)function@dataclass StadiumScraperSourcescrape_stadiums_with_fallback(sport, sources, verbose)function
-
ID generation:
assign_stable_ids(games, sport, season)function
-
Export utilities:
export_to_json(games, stadiums, output_dir)functioncross_validate_sources(games_by_source)function
Keep exact function signatures and logic from scrape_schedules.py. Use __all__ to explicitly export public API.
python3 -c "from Scripts.core import Game, Stadium, ScraperSource, rate_limit, fetch_page, scrape_with_fallback, assign_stable_ids, export_to_json; print('OK')"
core.py exists, imports successfully, exports all shared utilities
-
Import from core:
from core import Game, Stadium, ScraperSource, StadiumScraperSource, fetch_page, scrape_with_fallback, scrape_stadiums_with_fallback -
MLB game scrapers (copy exact logic):
scrape_mlb_baseball_reference(season: int) -> list[Game]scrape_mlb_statsapi(season: int) -> list[Game]scrape_mlb_espn(season: int) -> list[Game]
-
MLB stadium scrapers:
scrape_mlb_stadiums_scorebot() -> list[Stadium]scrape_mlb_stadiums_geojson() -> list[Stadium]scrape_mlb_stadiums_hardcoded() -> list[Stadium]scrape_mlb_stadiums() -> list[Stadium](combines above with fallback)
-
Source configurations:
MLB_GAME_SOURCESlist of ScraperSourceMLB_STADIUM_SOURCESlist of StadiumScraperSource
-
Convenience function:
scrape_mlb_games(season: int) -> list[Game]- uses fallback system
Use __all__ to export public API. Keep all team abbreviation mappings, venue name normalizations, and parsing logic intact.
python3 -c "from Scripts.mlb import scrape_mlb_games, scrape_mlb_stadiums, MLB_GAME_SOURCES; print('OK')"
mlb.py exists, imports from core.py, exports MLB scrapers and source configs
<success_criteria>
- core.py contains all shared utilities extracted from scrape_schedules.py
- mlb.py contains all MLB-specific scrapers
- Both files import without errors
- Original scrape_schedules.py unchanged (we're creating new files first) </success_criteria>