Files
Sportstime/.planning/phases/01-script-architecture/01-01-PLAN.md
Trey t 60b450d869 docs: add Phase 1 plans and codebase documentation
- 01-01-PLAN.md: core.py + mlb.py (executed)
- 01-02-PLAN.md: nba.py + nhl.py
- 01-03-PLAN.md: nfl.py + orchestrator refactor
- Codebase documentation for planning context

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 00:00:45 -06:00

4.3 KiB

phase, plan, type
phase plan type
01-script-architecture 01 execute
Create shared core module and extract MLB scrapers as the first sport module.

Purpose: Establish the modular pattern that subsequent sports will follow. Output: Scripts/core.py with shared utilities, Scripts/mlb.py with MLB scrapers.

<execution_context> @/.claude/get-shit-done/workflows/execute-phase.md @/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md

Source file: @Scripts/scrape_schedules.py

Codebase context: @.planning/codebase/CONVENTIONS.md

Tech stack: Python 3, requests, beautifulsoup4, pandas, lxml Established patterns: dataclasses, type hints, docstrings

Task 1: Create core.py shared module Scripts/core.py Create `Scripts/core.py` containing:
  1. Imports: argparse, json, time, re, datetime, timedelta, pathlib, dataclasses, typing, requests, BeautifulSoup, pandas

  2. Rate limiting utilities:

    • REQUEST_DELAY constant (3.0)
    • last_request_time dict
    • rate_limit(domain: str) function
    • fetch_page(url: str, domain: str) -> Optional[BeautifulSoup] function
  3. Data classes:

    • @dataclass Game with all fields (id, sport, season, date, time, home_team, away_team, etc.)
    • @dataclass Stadium with all fields (id, name, city, state, latitude, longitude, etc.)
  4. Multi-source fallback system:

    • @dataclass ScraperSource
    • scrape_with_fallback(sport, season, sources, verbose) function
    • @dataclass StadiumScraperSource
    • scrape_stadiums_with_fallback(sport, sources, verbose) function
  5. ID generation:

    • assign_stable_ids(games, sport, season) function
  6. Export utilities:

    • export_to_json(games, stadiums, output_dir) function
    • cross_validate_sources(games_by_source) function

Keep exact function signatures and logic from scrape_schedules.py. Use __all__ to explicitly export public API. python3 -c "from Scripts.core import Game, Stadium, ScraperSource, rate_limit, fetch_page, scrape_with_fallback, assign_stable_ids, export_to_json; print('OK')" core.py exists, imports successfully, exports all shared utilities

Task 2: Create mlb.py sport module Scripts/mlb.py Create `Scripts/mlb.py` containing:
  1. Import from core:

    from core import Game, Stadium, ScraperSource, StadiumScraperSource, fetch_page, scrape_with_fallback, scrape_stadiums_with_fallback
    
  2. MLB game scrapers (copy exact logic):

    • scrape_mlb_baseball_reference(season: int) -> list[Game]
    • scrape_mlb_statsapi(season: int) -> list[Game]
    • scrape_mlb_espn(season: int) -> list[Game]
  3. MLB stadium scrapers:

    • scrape_mlb_stadiums_scorebot() -> list[Stadium]
    • scrape_mlb_stadiums_geojson() -> list[Stadium]
    • scrape_mlb_stadiums_hardcoded() -> list[Stadium]
    • scrape_mlb_stadiums() -> list[Stadium] (combines above with fallback)
  4. Source configurations:

    • MLB_GAME_SOURCES list of ScraperSource
    • MLB_STADIUM_SOURCES list of StadiumScraperSource
  5. Convenience function:

    • scrape_mlb_games(season: int) -> list[Game] - uses fallback system

Use __all__ to export public API. Keep all team abbreviation mappings, venue name normalizations, and parsing logic intact. python3 -c "from Scripts.mlb import scrape_mlb_games, scrape_mlb_stadiums, MLB_GAME_SOURCES; print('OK')" mlb.py exists, imports from core.py, exports MLB scrapers and source configs

Before declaring plan complete: - [ ] `Scripts/core.py` exists and imports cleanly - [ ] `Scripts/mlb.py` exists and imports from core - [ ] No syntax errors: `python3 -m py_compile Scripts/core.py Scripts/mlb.py` - [ ] Type hints present on all public functions

<success_criteria>

  • core.py contains all shared utilities extracted from scrape_schedules.py
  • mlb.py contains all MLB-specific scrapers
  • Both files import without errors
  • Original scrape_schedules.py unchanged (we're creating new files first) </success_criteria>
After completion, create `.planning/phases/01-script-architecture/01-01-SUMMARY.md`