diff --git a/.planning/PROJECT.md b/.planning/PROJECT.md new file mode 100644 index 0000000..b9f5297 --- /dev/null +++ b/.planning/PROJECT.md @@ -0,0 +1,71 @@ +# SportsTime Data Pipeline + +## What This Is + +A Python data pipeline that scrapes, canonicalizes, and syncs sports schedule data to CloudKit for the SportsTime iOS app. The pipeline ensures every game correctly links to its home/away teams and stadium with complete, accurate data across MLB, NBA, NHL, and NFL. + +## Core Value + +Every game must correctly link to its teams and stadium — a game at the wrong venue or with broken team links ruins trip planning. + +## Requirements + +### Validated + +- ✓ Basic schedule scraping for MLB, NBA, NHL, NFL — existing +- ✓ Canonical data models (stadiums, teams, games) — existing +- ✓ CloudKit import capability — existing +- ✓ Bundled JSON generation for offline-first — existing + +### Active + +- [ ] Split scripts by sport (MLB, NBA, NHL, NFL as separate modules) +- [ ] Complete stadium database with correct coordinates and names +- [ ] Stadium alias system for name variations across sources +- [ ] Correct game→team→stadium canonical linking for all sports +- [ ] Full CRUD CloudKit management (create, read, update, delete) +- [ ] Validation reports showing counts, gaps, and orphan records +- [ ] Team alias system for name variations across sources + +### Out of Scope + +- Real-time scores — this is schedule data, not live game tracking +- Adding new sports (MLS, WNBA, etc.) — stabilize current 4 first +- iOS app changes — this is purely backend/script work + +## Context + +**Current State:** +- Data quality issues exist across all sports (wrong stadiums, missing games, broken team links) +- Stadium problems include: missing venues, wrong coordinates, name mismatches between sources +- Single large script files that are hard to debug and maintain +- Existing CloudKit import works but lacks verification and CRUD operations + +**Existing Infrastructure:** +- Python 3 with requests, beautifulsoup4, pandas, lxml +- CloudKit server-to-server auth via cryptography package +- Bundled JSON in `SportsTime/Resources/` for offline bootstrap +- Data sources: Basketball-Reference, Baseball-Reference, Hockey-Reference, official APIs + +**iOS App Dependency:** +- `AppDataProvider.shared` is single source of truth +- SwiftData models: `CanonicalStadium`, `CanonicalTeam`, `CanonicalGame` +- Domain models expect correct relationships via canonical IDs + +## Constraints + +- **Tech Stack**: Must remain Python (existing tooling, team familiarity) +- **Data Sources**: Free/public APIs and sites only (no paid subscriptions) +- **CloudKit**: Must use existing container (`iCloud.com.sportstime.app`) +- **Compatibility**: Output must match existing Swift model expectations + +## Key Decisions + +| Decision | Rationale | Outcome | +|----------|-----------|---------| +| Split by sport, not function | User preference for organization | — Pending | +| Validation reports over automated tests | Faster feedback, easier debugging | — Pending | +| Full CRUD over upload-only | Enable data corrections without full rebuild | — Pending | + +--- +*Last updated: 2026-01-09 after initialization* diff --git a/.planning/config.json b/.planning/config.json new file mode 100644 index 0000000..ecb5f9e --- /dev/null +++ b/.planning/config.json @@ -0,0 +1,18 @@ +{ + "mode": "yolo", + "depth": "standard", + "gates": { + "confirm_project": false, + "confirm_phases": false, + "confirm_roadmap": false, + "confirm_breakdown": false, + "confirm_plan": false, + "execute_next_plan": false, + "issues_review": false, + "confirm_transition": false + }, + "safety": { + "always_confirm_destructive": true, + "always_confirm_external_services": true + } +}