Tasks completed: 2/2 - Create core.py shared module - Create mlb.py sport module SUMMARY: .planning/phases/01-script-architecture/01-01-SUMMARY.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
97 lines
3.6 KiB
Markdown
97 lines
3.6 KiB
Markdown
# Roadmap: SportsTime Data Pipeline
|
|
|
|
## Overview
|
|
|
|
Transform the monolithic data scraping scripts into a maintainable, sport-organized pipeline that ensures every game correctly links to its teams and stadium. Starting with script restructuring, we'll complete the stadium database, add alias systems for name variations, establish correct canonical linking, implement full CloudKit CRUD operations, and finish with comprehensive validation reports.
|
|
|
|
## Domain Expertise
|
|
|
|
None
|
|
|
|
## Phases
|
|
|
|
**Phase Numbering:**
|
|
- Integer phases (1, 2, 3): Planned milestone work
|
|
- Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)
|
|
|
|
- [ ] **Phase 1: Script Architecture** - Split monolithic scripts into sport-specific modules (1/3 plans)
|
|
- [ ] **Phase 2: Stadium Foundation** - Complete stadium database with coordinates and names
|
|
- [ ] **Phase 3: Alias Systems** - Stadium and team alias systems for name variations
|
|
- [ ] **Phase 4: Canonical Linking** - Correct game→team→stadium relationships
|
|
- [ ] **Phase 5: CloudKit CRUD** - Full create, read, update, delete operations
|
|
- [ ] **Phase 6: Validation Reports** - Reports showing counts, gaps, orphan records
|
|
|
|
## Phase Details
|
|
|
|
### Phase 1: Script Architecture
|
|
**Goal**: Reorganize monolithic scraping scripts into sport-specific modules (MLB, NBA, NHL, NFL) for easier debugging and maintenance
|
|
**Depends on**: Nothing (first phase)
|
|
**Research**: Unlikely (internal refactoring, Python module patterns)
|
|
**Plans**: 3 plans
|
|
|
|
Plans:
|
|
- [x] 01-01: Create core.py shared module + mlb.py sport module
|
|
- [ ] 01-02: Create nba.py + nhl.py sport modules
|
|
- [ ] 01-03: Create nfl.py + refactor scrape_schedules.py orchestrator
|
|
|
|
### Phase 2: Stadium Foundation
|
|
**Goal**: Complete stadium database with correct coordinates, names, and venue data for all 4 sports
|
|
**Depends on**: Phase 1
|
|
**Research**: Likely (stadium data sources, geocoding verification)
|
|
**Research topics**: Stadium data sources (Wikipedia, official league sites), geocoding API for coordinate verification, handling relocated/renamed venues
|
|
**Plans**: TBD
|
|
|
|
Plans:
|
|
- [ ] 02-01: TBD
|
|
|
|
### Phase 3: Alias Systems
|
|
**Goal**: Implement alias systems for both stadiums and teams to handle name variations across data sources
|
|
**Depends on**: Phase 2
|
|
**Research**: Unlikely (internal mapping logic)
|
|
**Plans**: TBD
|
|
|
|
Plans:
|
|
- [ ] 03-01: TBD
|
|
|
|
### Phase 4: Canonical Linking
|
|
**Goal**: Ensure every game correctly links to its home/away teams and stadium via canonical IDs
|
|
**Depends on**: Phase 3
|
|
**Research**: Unlikely (existing model relationships)
|
|
**Plans**: TBD
|
|
|
|
Plans:
|
|
- [ ] 04-01: TBD
|
|
|
|
### Phase 5: CloudKit CRUD
|
|
**Goal**: Implement full create, read, update, delete operations for CloudKit management
|
|
**Depends on**: Phase 4
|
|
**Research**: Likely (CloudKit server-to-server API)
|
|
**Research topics**: CloudKit server-to-server authentication, record modification operations, batch operations, conflict resolution
|
|
**Plans**: TBD
|
|
|
|
Plans:
|
|
- [ ] 05-01: TBD
|
|
|
|
### Phase 6: Validation Reports
|
|
**Goal**: Generate validation reports showing record counts, data gaps, orphan records, and relationship integrity
|
|
**Depends on**: Phase 5
|
|
**Research**: Unlikely (internal reporting logic)
|
|
**Plans**: TBD
|
|
|
|
Plans:
|
|
- [ ] 06-01: TBD
|
|
|
|
## Progress
|
|
|
|
**Execution Order:**
|
|
Phases execute in numeric order: 1 → 2 → 3 → 4 → 5 → 6
|
|
|
|
| Phase | Plans Complete | Status | Completed |
|
|
|-------|----------------|--------|-----------|
|
|
| 1. Script Architecture | 1/3 | In progress | - |
|
|
| 2. Stadium Foundation | 0/TBD | Not started | - |
|
|
| 3. Alias Systems | 0/TBD | Not started | - |
|
|
| 4. Canonical Linking | 0/TBD | Not started | - |
|
|
| 5. CloudKit CRUD | 0/TBD | Not started | - |
|
|
| 6. Validation Reports | 0/TBD | Not started | - |
|