docs(01-01): complete core.py + mlb.py plan

Tasks completed: 2/2
- Create core.py shared module
- Create mlb.py sport module

SUMMARY: .planning/phases/01-script-architecture/01-01-SUMMARY.md

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-01-10 00:00:33 -06:00
parent cdf4c775ff
commit 504187059f
3 changed files with 248 additions and 0 deletions

96
.planning/ROADMAP.md Normal file
View File

@@ -0,0 +1,96 @@
# Roadmap: SportsTime Data Pipeline
## Overview
Transform the monolithic data scraping scripts into a maintainable, sport-organized pipeline that ensures every game correctly links to its teams and stadium. Starting with script restructuring, we'll complete the stadium database, add alias systems for name variations, establish correct canonical linking, implement full CloudKit CRUD operations, and finish with comprehensive validation reports.
## Domain Expertise
None
## Phases
**Phase Numbering:**
- Integer phases (1, 2, 3): Planned milestone work
- Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)
- [ ] **Phase 1: Script Architecture** - Split monolithic scripts into sport-specific modules (1/3 plans)
- [ ] **Phase 2: Stadium Foundation** - Complete stadium database with coordinates and names
- [ ] **Phase 3: Alias Systems** - Stadium and team alias systems for name variations
- [ ] **Phase 4: Canonical Linking** - Correct game→team→stadium relationships
- [ ] **Phase 5: CloudKit CRUD** - Full create, read, update, delete operations
- [ ] **Phase 6: Validation Reports** - Reports showing counts, gaps, orphan records
## Phase Details
### Phase 1: Script Architecture
**Goal**: Reorganize monolithic scraping scripts into sport-specific modules (MLB, NBA, NHL, NFL) for easier debugging and maintenance
**Depends on**: Nothing (first phase)
**Research**: Unlikely (internal refactoring, Python module patterns)
**Plans**: 3 plans
Plans:
- [x] 01-01: Create core.py shared module + mlb.py sport module
- [ ] 01-02: Create nba.py + nhl.py sport modules
- [ ] 01-03: Create nfl.py + refactor scrape_schedules.py orchestrator
### Phase 2: Stadium Foundation
**Goal**: Complete stadium database with correct coordinates, names, and venue data for all 4 sports
**Depends on**: Phase 1
**Research**: Likely (stadium data sources, geocoding verification)
**Research topics**: Stadium data sources (Wikipedia, official league sites), geocoding API for coordinate verification, handling relocated/renamed venues
**Plans**: TBD
Plans:
- [ ] 02-01: TBD
### Phase 3: Alias Systems
**Goal**: Implement alias systems for both stadiums and teams to handle name variations across data sources
**Depends on**: Phase 2
**Research**: Unlikely (internal mapping logic)
**Plans**: TBD
Plans:
- [ ] 03-01: TBD
### Phase 4: Canonical Linking
**Goal**: Ensure every game correctly links to its home/away teams and stadium via canonical IDs
**Depends on**: Phase 3
**Research**: Unlikely (existing model relationships)
**Plans**: TBD
Plans:
- [ ] 04-01: TBD
### Phase 5: CloudKit CRUD
**Goal**: Implement full create, read, update, delete operations for CloudKit management
**Depends on**: Phase 4
**Research**: Likely (CloudKit server-to-server API)
**Research topics**: CloudKit server-to-server authentication, record modification operations, batch operations, conflict resolution
**Plans**: TBD
Plans:
- [ ] 05-01: TBD
### Phase 6: Validation Reports
**Goal**: Generate validation reports showing record counts, data gaps, orphan records, and relationship integrity
**Depends on**: Phase 5
**Research**: Unlikely (internal reporting logic)
**Plans**: TBD
Plans:
- [ ] 06-01: TBD
## Progress
**Execution Order:**
Phases execute in numeric order: 1 → 2 → 3 → 4 → 5 → 6
| Phase | Plans Complete | Status | Completed |
|-------|----------------|--------|-----------|
| 1. Script Architecture | 1/3 | In progress | - |
| 2. Stadium Foundation | 0/TBD | Not started | - |
| 3. Alias Systems | 0/TBD | Not started | - |
| 4. Canonical Linking | 0/TBD | Not started | - |
| 5. CloudKit CRUD | 0/TBD | Not started | - |
| 6. Validation Reports | 0/TBD | Not started | - |

58
.planning/STATE.md Normal file
View File

@@ -0,0 +1,58 @@
# Project State
## Project Reference
See: .planning/PROJECT.md (updated 2026-01-09)
**Core value:** Every game must correctly link to its teams and stadium — a game at the wrong venue or with broken team links ruins trip planning.
**Current focus:** Phase 1 — Script Architecture
## Current Position
Phase: 1 of 6 (Script Architecture)
Plan: 1 of 3 in current phase
Status: In progress
Last activity: 2026-01-10 — Completed 01-01-PLAN.md
Progress: █░░░░░░░░░ 10%
## Performance Metrics
**Velocity:**
- Total plans completed: 1
- Average duration: 5 min
- Total execution time: 5 min
**By Phase:**
| Phase | Plans | Total | Avg/Plan |
|-------|-------|-------|----------|
| 1. Script Architecture | 1/3 | 5 min | 5 min |
**Recent Trend:**
- Last 5 plans: 01-01 (5 min)
- Trend: —
## Accumulated Context
### Decisions
Decisions are logged in PROJECT.md Key Decisions table.
Recent decisions affecting current work:
- **01-01**: Each sport module has its own `get_{sport}_team_abbrev()` function for independence
- **01-01**: Import fallback pattern (try/except) for running from Scripts/ or project root
### Deferred Issues
None yet.
### Blockers/Concerns
None yet.
## Session Continuity
Last session: 2026-01-10
Stopped at: Completed 01-01-PLAN.md
Resume file: None

View File

@@ -0,0 +1,94 @@
---
phase: 01-script-architecture
plan: 01
subsystem: data-pipeline
tags: [python, scrapers, modular-architecture, dataclasses]
# Dependency graph
requires: []
provides:
- core.py shared utilities module
- mlb.py MLB-specific scrapers
- Multi-source fallback pattern for scrapers
affects: [01-02, 01-03]
# Tech tracking
tech-stack:
added: []
patterns:
- "Sport-specific modules import from core.py"
- "ScraperSource/StadiumScraperSource for fallback configuration"
- "get_{sport}_team_abbrev() in each sport module"
key-files:
created:
- Scripts/core.py
- Scripts/mlb.py
modified: []
key-decisions:
- "Each sport module has its own get_team_abbrev function for independence"
- "Import fallback pattern (try/except) for running from Scripts/ or project root"
patterns-established:
- "core.py exports shared utilities via __all__"
- "Sport modules import from core, define team mappings, scrapers, source configs"
issues-created: []
# Metrics
duration: 5min
completed: 2026-01-10
---
# Phase 1 Plan 01: Core + MLB Modules Summary
**Created core.py shared utilities and mlb.py as the first sport module, establishing the modular pattern for subsequent sports**
## Performance
- **Duration:** 5 min
- **Started:** 2026-01-10T05:53:50Z
- **Completed:** 2026-01-10T05:59:10Z
- **Tasks:** 2
- **Files modified:** 2
## Accomplishments
- Created `Scripts/core.py` with all shared utilities (rate limiting, dataclasses, fallback system, export)
- Created `Scripts/mlb.py` with MLB_TEAMS, 3 game scrapers, 3 stadium scrapers, and source configs
- Established modular pattern that NBA, NHL, NFL will follow
## Task Commits
Each task was committed atomically:
1. **Task 1: Create core.py shared module** - `edbb5db` (feat)
2. **Task 2: Create mlb.py sport module** - `cdf4c77` (feat)
## Files Created/Modified
- `Scripts/core.py` - Shared utilities: rate limiting, Game/Stadium dataclasses, fallback system, ID generation, export
- `Scripts/mlb.py` - MLB team mappings, Baseball-Reference/Stats API/ESPN scrapers, stadium scrapers, source configs
## Decisions Made
- Each sport module defines its own `get_{sport}_team_abbrev()` function rather than a shared one — keeps modules independent
- Used try/except import pattern to support both direct execution (`python mlb.py`) and import from project root
## Deviations from Plan
None - plan executed exactly as written.
## Issues Encountered
None
## Next Phase Readiness
- core.py and mlb.py pattern established
- Ready for 01-02-PLAN.md (nba.py + nhl.py modules)
---
*Phase: 01-script-architecture*
*Completed: 2026-01-10*