docs: initialize SportsTime Data Pipeline
Fix data quality issues across MLB, NBA, NHL, NFL with correct game→team→stadium canonical linking. Creates PROJECT.md with requirements and constraints. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
71
.planning/PROJECT.md
Normal file
71
.planning/PROJECT.md
Normal file
@@ -0,0 +1,71 @@
|
||||
# SportsTime Data Pipeline
|
||||
|
||||
## What This Is
|
||||
|
||||
A Python data pipeline that scrapes, canonicalizes, and syncs sports schedule data to CloudKit for the SportsTime iOS app. The pipeline ensures every game correctly links to its home/away teams and stadium with complete, accurate data across MLB, NBA, NHL, and NFL.
|
||||
|
||||
## Core Value
|
||||
|
||||
Every game must correctly link to its teams and stadium — a game at the wrong venue or with broken team links ruins trip planning.
|
||||
|
||||
## Requirements
|
||||
|
||||
### Validated
|
||||
|
||||
- ✓ Basic schedule scraping for MLB, NBA, NHL, NFL — existing
|
||||
- ✓ Canonical data models (stadiums, teams, games) — existing
|
||||
- ✓ CloudKit import capability — existing
|
||||
- ✓ Bundled JSON generation for offline-first — existing
|
||||
|
||||
### Active
|
||||
|
||||
- [ ] Split scripts by sport (MLB, NBA, NHL, NFL as separate modules)
|
||||
- [ ] Complete stadium database with correct coordinates and names
|
||||
- [ ] Stadium alias system for name variations across sources
|
||||
- [ ] Correct game→team→stadium canonical linking for all sports
|
||||
- [ ] Full CRUD CloudKit management (create, read, update, delete)
|
||||
- [ ] Validation reports showing counts, gaps, and orphan records
|
||||
- [ ] Team alias system for name variations across sources
|
||||
|
||||
### Out of Scope
|
||||
|
||||
- Real-time scores — this is schedule data, not live game tracking
|
||||
- Adding new sports (MLS, WNBA, etc.) — stabilize current 4 first
|
||||
- iOS app changes — this is purely backend/script work
|
||||
|
||||
## Context
|
||||
|
||||
**Current State:**
|
||||
- Data quality issues exist across all sports (wrong stadiums, missing games, broken team links)
|
||||
- Stadium problems include: missing venues, wrong coordinates, name mismatches between sources
|
||||
- Single large script files that are hard to debug and maintain
|
||||
- Existing CloudKit import works but lacks verification and CRUD operations
|
||||
|
||||
**Existing Infrastructure:**
|
||||
- Python 3 with requests, beautifulsoup4, pandas, lxml
|
||||
- CloudKit server-to-server auth via cryptography package
|
||||
- Bundled JSON in `SportsTime/Resources/` for offline bootstrap
|
||||
- Data sources: Basketball-Reference, Baseball-Reference, Hockey-Reference, official APIs
|
||||
|
||||
**iOS App Dependency:**
|
||||
- `AppDataProvider.shared` is single source of truth
|
||||
- SwiftData models: `CanonicalStadium`, `CanonicalTeam`, `CanonicalGame`
|
||||
- Domain models expect correct relationships via canonical IDs
|
||||
|
||||
## Constraints
|
||||
|
||||
- **Tech Stack**: Must remain Python (existing tooling, team familiarity)
|
||||
- **Data Sources**: Free/public APIs and sites only (no paid subscriptions)
|
||||
- **CloudKit**: Must use existing container (`iCloud.com.sportstime.app`)
|
||||
- **Compatibility**: Output must match existing Swift model expectations
|
||||
|
||||
## Key Decisions
|
||||
|
||||
| Decision | Rationale | Outcome |
|
||||
|----------|-----------|---------|
|
||||
| Split by sport, not function | User preference for organization | — Pending |
|
||||
| Validation reports over automated tests | Faster feedback, easier debugging | — Pending |
|
||||
| Full CRUD over upload-only | Enable data corrections without full rebuild | — Pending |
|
||||
|
||||
---
|
||||
*Last updated: 2026-01-09 after initialization*
|
||||
18
.planning/config.json
Normal file
18
.planning/config.json
Normal file
@@ -0,0 +1,18 @@
|
||||
{
|
||||
"mode": "yolo",
|
||||
"depth": "standard",
|
||||
"gates": {
|
||||
"confirm_project": false,
|
||||
"confirm_phases": false,
|
||||
"confirm_roadmap": false,
|
||||
"confirm_breakdown": false,
|
||||
"confirm_plan": false,
|
||||
"execute_next_plan": false,
|
||||
"issues_review": false,
|
||||
"confirm_transition": false
|
||||
},
|
||||
"safety": {
|
||||
"always_confirm_destructive": true,
|
||||
"always_confirm_external_services": true
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user