- 01-01-PLAN.md: core.py + mlb.py (executed) - 01-02-PLAN.md: nba.py + nhl.py - 01-03-PLAN.md: nfl.py + orchestrator refactor - Codebase documentation for planning context Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
128 lines
4.3 KiB
Markdown
128 lines
4.3 KiB
Markdown
---
|
|
phase: 01-script-architecture
|
|
plan: 01
|
|
type: execute
|
|
---
|
|
|
|
<objective>
|
|
Create shared core module and extract MLB scrapers as the first sport module.
|
|
|
|
Purpose: Establish the modular pattern that subsequent sports will follow.
|
|
Output: `Scripts/core.py` with shared utilities, `Scripts/mlb.py` with MLB scrapers.
|
|
</objective>
|
|
|
|
<execution_context>
|
|
@~/.claude/get-shit-done/workflows/execute-phase.md
|
|
@~/.claude/get-shit-done/templates/summary.md
|
|
</execution_context>
|
|
|
|
<context>
|
|
@.planning/PROJECT.md
|
|
@.planning/ROADMAP.md
|
|
@.planning/STATE.md
|
|
|
|
**Source file:**
|
|
@Scripts/scrape_schedules.py
|
|
|
|
**Codebase context:**
|
|
@.planning/codebase/CONVENTIONS.md
|
|
|
|
**Tech stack:** Python 3, requests, beautifulsoup4, pandas, lxml
|
|
**Established patterns:** dataclasses, type hints, docstrings
|
|
</context>
|
|
|
|
<tasks>
|
|
|
|
<task type="auto">
|
|
<name>Task 1: Create core.py shared module</name>
|
|
<files>Scripts/core.py</files>
|
|
<action>
|
|
Create `Scripts/core.py` containing:
|
|
|
|
1. Imports: argparse, json, time, re, datetime, timedelta, pathlib, dataclasses, typing, requests, BeautifulSoup, pandas
|
|
|
|
2. Rate limiting utilities:
|
|
- `REQUEST_DELAY` constant (3.0)
|
|
- `last_request_time` dict
|
|
- `rate_limit(domain: str)` function
|
|
- `fetch_page(url: str, domain: str) -> Optional[BeautifulSoup]` function
|
|
|
|
3. Data classes:
|
|
- `@dataclass Game` with all fields (id, sport, season, date, time, home_team, away_team, etc.)
|
|
- `@dataclass Stadium` with all fields (id, name, city, state, latitude, longitude, etc.)
|
|
|
|
4. Multi-source fallback system:
|
|
- `@dataclass ScraperSource`
|
|
- `scrape_with_fallback(sport, season, sources, verbose)` function
|
|
- `@dataclass StadiumScraperSource`
|
|
- `scrape_stadiums_with_fallback(sport, sources, verbose)` function
|
|
|
|
5. ID generation:
|
|
- `assign_stable_ids(games, sport, season)` function
|
|
|
|
6. Export utilities:
|
|
- `export_to_json(games, stadiums, output_dir)` function
|
|
- `cross_validate_sources(games_by_source)` function
|
|
|
|
Keep exact function signatures and logic from scrape_schedules.py. Use `__all__` to explicitly export public API.
|
|
</action>
|
|
<verify>python3 -c "from Scripts.core import Game, Stadium, ScraperSource, rate_limit, fetch_page, scrape_with_fallback, assign_stable_ids, export_to_json; print('OK')"</verify>
|
|
<done>core.py exists, imports successfully, exports all shared utilities</done>
|
|
</task>
|
|
|
|
<task type="auto">
|
|
<name>Task 2: Create mlb.py sport module</name>
|
|
<files>Scripts/mlb.py</files>
|
|
<action>
|
|
Create `Scripts/mlb.py` containing:
|
|
|
|
1. Import from core:
|
|
```python
|
|
from core import Game, Stadium, ScraperSource, StadiumScraperSource, fetch_page, scrape_with_fallback, scrape_stadiums_with_fallback
|
|
```
|
|
|
|
2. MLB game scrapers (copy exact logic):
|
|
- `scrape_mlb_baseball_reference(season: int) -> list[Game]`
|
|
- `scrape_mlb_statsapi(season: int) -> list[Game]`
|
|
- `scrape_mlb_espn(season: int) -> list[Game]`
|
|
|
|
3. MLB stadium scrapers:
|
|
- `scrape_mlb_stadiums_scorebot() -> list[Stadium]`
|
|
- `scrape_mlb_stadiums_geojson() -> list[Stadium]`
|
|
- `scrape_mlb_stadiums_hardcoded() -> list[Stadium]`
|
|
- `scrape_mlb_stadiums() -> list[Stadium]` (combines above with fallback)
|
|
|
|
4. Source configurations:
|
|
- `MLB_GAME_SOURCES` list of ScraperSource
|
|
- `MLB_STADIUM_SOURCES` list of StadiumScraperSource
|
|
|
|
5. Convenience function:
|
|
- `scrape_mlb_games(season: int) -> list[Game]` - uses fallback system
|
|
|
|
Use `__all__` to export public API. Keep all team abbreviation mappings, venue name normalizations, and parsing logic intact.
|
|
</action>
|
|
<verify>python3 -c "from Scripts.mlb import scrape_mlb_games, scrape_mlb_stadiums, MLB_GAME_SOURCES; print('OK')"</verify>
|
|
<done>mlb.py exists, imports from core.py, exports MLB scrapers and source configs</done>
|
|
</task>
|
|
|
|
</tasks>
|
|
|
|
<verification>
|
|
Before declaring plan complete:
|
|
- [ ] `Scripts/core.py` exists and imports cleanly
|
|
- [ ] `Scripts/mlb.py` exists and imports from core
|
|
- [ ] No syntax errors: `python3 -m py_compile Scripts/core.py Scripts/mlb.py`
|
|
- [ ] Type hints present on all public functions
|
|
</verification>
|
|
|
|
<success_criteria>
|
|
- core.py contains all shared utilities extracted from scrape_schedules.py
|
|
- mlb.py contains all MLB-specific scrapers
|
|
- Both files import without errors
|
|
- Original scrape_schedules.py unchanged (we're creating new files first)
|
|
</success_criteria>
|
|
|
|
<output>
|
|
After completion, create `.planning/phases/01-script-architecture/01-01-SUMMARY.md`
|
|
</output>
|