docs: add Phase 1 plans and codebase documentation
- 01-01-PLAN.md: core.py + mlb.py (executed) - 01-02-PLAN.md: nba.py + nhl.py - 01-03-PLAN.md: nfl.py + orchestrator refactor - Codebase documentation for planning context Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
147
.planning/phases/01-script-architecture/01-03-PLAN.md
Normal file
147
.planning/phases/01-script-architecture/01-03-PLAN.md
Normal file
@@ -0,0 +1,147 @@
|
||||
---
|
||||
phase: 01-script-architecture
|
||||
plan: 03
|
||||
type: execute
|
||||
---
|
||||
|
||||
<objective>
|
||||
Extract NFL scrapers and refactor scrape_schedules.py to be a thin orchestrator.
|
||||
|
||||
Purpose: Complete the modular architecture and update the main entry point.
|
||||
Output: `Scripts/nfl.py` and refactored `Scripts/scrape_schedules.py`.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@~/.claude/get-shit-done/workflows/execute-phase.md
|
||||
@~/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
|
||||
**Prior work:**
|
||||
@.planning/phases/01-script-architecture/01-01-SUMMARY.md
|
||||
@.planning/phases/01-script-architecture/01-02-SUMMARY.md
|
||||
|
||||
**Source files:**
|
||||
@Scripts/core.py
|
||||
@Scripts/mlb.py
|
||||
@Scripts/nba.py
|
||||
@Scripts/nhl.py
|
||||
@Scripts/scrape_schedules.py
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Create nfl.py sport module</name>
|
||||
<files>Scripts/nfl.py</files>
|
||||
<action>
|
||||
Create `Scripts/nfl.py` following the established pattern:
|
||||
|
||||
1. Import from core:
|
||||
```python
|
||||
from core import Game, Stadium, ScraperSource, StadiumScraperSource, fetch_page, scrape_with_fallback, scrape_stadiums_with_fallback
|
||||
```
|
||||
|
||||
2. NFL game scrapers:
|
||||
- `scrape_nfl_espn(season: int) -> list[Game]`
|
||||
- `scrape_nfl_pro_football_reference(season: int) -> list[Game]`
|
||||
- `scrape_nfl_cbssports(season: int) -> list[Game]`
|
||||
|
||||
3. NFL stadium scrapers:
|
||||
- `scrape_nfl_stadiums_scorebot() -> list[Stadium]`
|
||||
- `scrape_nfl_stadiums_geojson() -> list[Stadium]`
|
||||
- `scrape_nfl_stadiums_hardcoded() -> list[Stadium]`
|
||||
- `scrape_nfl_stadiums() -> list[Stadium]`
|
||||
|
||||
4. Source configurations:
|
||||
- `NFL_GAME_SOURCES` list of ScraperSource
|
||||
- `NFL_STADIUM_SOURCES` list of StadiumScraperSource
|
||||
|
||||
5. Convenience functions:
|
||||
- `scrape_nfl_games(season: int) -> list[Game]`
|
||||
- `get_nfl_season_string(season: int) -> str` - returns "2025-26" format
|
||||
|
||||
Copy exact parsing logic from scrape_schedules.py.
|
||||
</action>
|
||||
<verify>python3 -c "from Scripts.nfl import scrape_nfl_games, NFL_GAME_SOURCES; print('OK')"</verify>
|
||||
<done>nfl.py exists, imports from core.py, exports NFL scrapers</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Refactor scrape_schedules.py to orchestrator</name>
|
||||
<files>Scripts/scrape_schedules.py</files>
|
||||
<action>
|
||||
Rewrite `Scripts/scrape_schedules.py` as a thin orchestrator:
|
||||
|
||||
1. Replace inline scrapers with imports:
|
||||
```python
|
||||
from core import Game, Stadium, assign_stable_ids, export_to_json
|
||||
from mlb import scrape_mlb_games, scrape_mlb_stadiums, MLB_GAME_SOURCES
|
||||
from nba import scrape_nba_games, scrape_nba_stadiums, NBA_GAME_SOURCES, get_nba_season_string
|
||||
from nhl import scrape_nhl_games, scrape_nhl_stadiums, NHL_GAME_SOURCES, get_nhl_season_string
|
||||
from nfl import scrape_nfl_games, scrape_nfl_stadiums, NFL_GAME_SOURCES, get_nfl_season_string
|
||||
```
|
||||
|
||||
2. Keep the main() function with argparse for CLI
|
||||
|
||||
3. Update sport scraping blocks to use new imports:
|
||||
- `if args.sport in ['nba', 'all']:` uses `scrape_nba_games(season)`
|
||||
- `if args.sport in ['mlb', 'all']:` uses `scrape_mlb_games(season)`
|
||||
- etc.
|
||||
|
||||
4. Keep stadium scraping with the new module imports
|
||||
|
||||
5. For non-core sports (WNBA, MLS, NWSL, CBB), keep them inline for now with a `# TODO: Extract to separate modules` comment
|
||||
|
||||
6. Update file header docstring to explain the modular structure:
|
||||
```python
|
||||
"""
|
||||
Sports Schedule Scraper Orchestrator
|
||||
|
||||
This script coordinates scraping across sport-specific modules:
|
||||
- core.py: Shared utilities, data classes, fallback system
|
||||
- mlb.py: MLB scrapers
|
||||
- nba.py: NBA scrapers
|
||||
- nhl.py: NHL scrapers
|
||||
- nfl.py: NFL scrapers
|
||||
|
||||
Usage:
|
||||
python scrape_schedules.py --sport nba --season 2026
|
||||
python scrape_schedules.py --sport all --season 2026
|
||||
"""
|
||||
```
|
||||
|
||||
Target: ~500 lines (down from 3359) for the orchestrator, with sport logic in modules.
|
||||
</action>
|
||||
<verify>cd Scripts && python3 scrape_schedules.py --help</verify>
|
||||
<done>scrape_schedules.py is thin orchestrator, imports from sport modules, --help works</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
Before declaring phase complete:
|
||||
- [ ] All sport modules exist: core.py, mlb.py, nba.py, nhl.py, nfl.py
|
||||
- [ ] `python3 -m py_compile Scripts/*.py` passes for all files
|
||||
- [ ] `cd Scripts && python3 scrape_schedules.py --help` shows usage
|
||||
- [ ] scrape_schedules.py is significantly smaller (~500 lines vs 3359)
|
||||
- [ ] No circular imports between modules
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- Phase 1: Script Architecture complete
|
||||
- All 4 core sports have dedicated modules
|
||||
- Shared utilities in core.py
|
||||
- scrape_schedules.py is thin orchestrator
|
||||
- CLI unchanged (backward compatible)
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/01-script-architecture/01-03-SUMMARY.md` with:
|
||||
- Phase 1 complete
|
||||
- Ready for Phase 2: Stadium Foundation
|
||||
</output>
|
||||
Reference in New Issue
Block a user