This commit is contained in:
Trey t
2026-01-19 22:12:53 -06:00
parent 11c0ae70d2
commit a8b0491571
19 changed files with 1328 additions and 525 deletions

503
docs/DATA_ARCHITECTURE.md Normal file
View File

@@ -0,0 +1,503 @@
# SportsTime Data Architecture
> A plain-English guide for understanding how data flows through the app, from source to screen.
## Quick Summary
**The Big Picture:**
1. **Python scripts** scrape schedules from sports websites
2. **JSON files** get bundled into the app (so it works offline on day one)
3. **CloudKit** syncs updates in the background (so users get fresh data)
4. **AppDataProvider.shared** is the single source of truth the app uses
---
## Part 1: What Data Do We Have?
### Core Data Types
| Data Type | Description | Count | Update Frequency |
|-----------|-------------|-------|------------------|
| **Stadiums** | Venues where games are played | ~178 total | Rarely (new stadium every few years) |
| **Teams** | Professional sports teams | ~180 total | Rarely (expansion, relocation) |
| **Games** | Scheduled matches | ~5,000/season | Daily during season |
| **League Structure** | Conferences, divisions | ~50 entries | Rarely (realignment) |
| **Aliases** | Historical names (old stadium names, team relocations) | ~100 | As needed |
### Data by Sport
| Sport | Teams | Stadiums | Divisions | Conferences |
|-------|-------|----------|-----------|-------------|
| MLB | 30 | 30 | 6 | 2 |
| NBA | 30 | 30 | 6 | 2 |
| NHL | 32 | 32 | 4 | 2 |
| NFL | 32 | 30* | 8 | 2 |
| MLS | 30 | 30 | 0 | 2 |
| WNBA | 13 | 13 | 0 | 2 |
| NWSL | 13 | 13 | 0 | 0 |
*NFL: Giants/Jets share MetLife Stadium; Rams/Chargers share SoFi Stadium
---
## Part 2: Where Does Data Live?
Data exists in **four places**, each serving a different purpose:
```
┌─────────────────────────────────────────────────────────────────┐
│ │
│ 1. BUNDLED JSON FILES (App Bundle) │
│ └─ Ships with app, works offline on first launch │
│ │
│ 2. SWIFTDATA (Local Database) │
│ └─ Fast local storage, persists between launches │
│ │
│ 3. CLOUDKIT (Apple's Cloud) │
│ └─ Remote sync, shared across all users │
│ │
│ 4. APPDATAPROVIDER (In-Memory Cache) │
│ └─ What the app actually uses at runtime │
│ │
└─────────────────────────────────────────────────────────────────┘
```
### How They Work Together
```
┌──────────────────┐
│ Python Scripts │
│ (scrape data) │
└────────┬─────────┘
┌──────────────┴──────────────┐
│ │
▼ ▼
┌────────────────┐ ┌────────────────┐
│ JSON Files │ │ CloudKit │
│ (bundled in │ │ (remote │
│ app) │ │ sync) │
└───────┬────────┘ └───────┬────────┘
│ │
│ First launch │ Background sync
│ bootstrap │ (ongoing)
▼ ▼
┌─────────────────────────────────────────────┐
│ SwiftData │
│ (local database) │
└────────────────────┬────────────────────────┘
│ App reads from here
┌─────────────────────────────────────────────┐
│ AppDataProvider.shared │
│ (single source of truth) │
└────────────────────┬────────────────────────┘
┌─────────────────────────────────────────────┐
│ Features & Views │
│ (Trip planning, Progress tracking, etc.) │
└─────────────────────────────────────────────┘
```
---
## Part 3: The JSON Files (What Ships With the App)
Located in: `SportsTime/Resources/`
### File Inventory
| File | Contains | Updated By |
|------|----------|------------|
| `stadiums_canonical.json` | All stadium data | `generate_missing_data.py` |
| `teams_canonical.json` | All team data | `generate_missing_data.py` |
| `games_canonical.json` | Game schedules | `scrape_schedules.py` |
| `league_structure.json` | Conferences & divisions | Manual edit |
| `stadium_aliases.json` | Historical stadium names | Manual edit |
| `team_aliases.json` | Historical team names | Manual edit |
### Why Bundle JSON?
1. **Offline-first**: App works immediately, even without internet
2. **Fast startup**: No network delay on first launch
3. **Fallback**: If CloudKit is down, app still functions
4. **Testing**: Deterministic data for development
### Example: Stadium Record
```json
{
"canonical_id": "stadium_mlb_fenway_park",
"name": "Fenway Park",
"city": "Boston",
"state": "Massachusetts",
"latitude": 42.3467,
"longitude": -71.0972,
"capacity": 37755,
"sport": "MLB",
"year_opened": 1912,
"primary_team_abbrevs": ["BOS"]
}
```
**Key concept: Canonical IDs**
- Every entity has a unique, permanent ID
- Format: `{type}_{sport}_{identifier}`
- Examples: `team_nba_lal`, `stadium_nfl_sofi_stadium`, `game_mlb_2026_20260401_bos_nyy`
- IDs never change, even if names do (that's what aliases are for)
---
## Part 4: The Python Scripts (How Data Gets Updated)
Located in: `Scripts/`
### Script Overview
| Script | Purpose | When to Use |
|--------|---------|-------------|
| `scrape_schedules.py` | Fetch game schedules from sports-reference sites | New season starts |
| `generate_missing_data.py` | Add teams/stadiums for new sports | Adding new league |
| `sportstime_parser/cli.py` | Full pipeline: scrape → normalize → upload | Production updates |
### Common Workflows
#### 1. Adding a New Season's Schedule
```bash
cd Scripts
# Scrape all sports for 2026 season
python scrape_schedules.py --sport all --season 2026
# Or one sport at a time
python scrape_schedules.py --sport nba --season 2026
python scrape_schedules.py --sport mlb --season 2026
```
**Output:** Updates `games_canonical.json` with new games
#### 2. Adding a New Sport or League
```bash
cd Scripts
# Generate missing teams/stadiums
python generate_missing_data.py
# This will:
# 1. Check existing teams_canonical.json
# 2. Add missing teams with proper IDs
# 3. Add missing stadiums
# 4. Update conference/division assignments
```
**After running:** Copy output files to Resources folder:
```bash
cp output/teams_canonical.json ../SportsTime/Resources/
cp output/stadiums_canonical.json ../SportsTime/Resources/
```
#### 3. Updating League Structure (Manual)
Edit `SportsTime/Resources/league_structure.json` directly:
```json
{
"id": "nfl_afc_east",
"sport": "NFL",
"type": "division",
"name": "AFC East",
"abbreviation": "AFC East",
"parent_id": "nfl_afc",
"display_order": 1
}
```
**Also update:** `SportsTime/Core/Models/Domain/Division.swift` to match
#### 4. Uploading to CloudKit (Production)
```bash
cd Scripts
# Upload all canonical data to CloudKit
python -m sportstime_parser upload all
# Or specific entity type
python -m sportstime_parser upload games
python -m sportstime_parser upload teams
```
**Requires:** CloudKit credentials configured in `sportstime_parser/config.py`
---
## Part 5: How the App Uses This Data
### Startup Flow
```
1. App launches
2. Check: Is this first launch?
3. YES → BootstrapService loads bundled JSON into SwiftData
NO → Skip to step 4
4. AppDataProvider.configure(modelContext)
5. AppDataProvider.loadInitialData()
- Reads from SwiftData
- Populates in-memory cache
6. App ready! (works offline)
7. Background: CanonicalSyncService.syncAll()
- Fetches updates from CloudKit
- Merges into SwiftData
- Refreshes AppDataProvider cache
```
### Accessing Data in Code
```swift
// CORRECT - Always use AppDataProvider
let stadiums = AppDataProvider.shared.stadiums
let teams = AppDataProvider.shared.teams
let games = try await AppDataProvider.shared.filterGames(
sports: [.nba, .mlb],
startDate: Date(),
endDate: Date().addingTimeInterval(86400 * 30)
)
// WRONG - Never bypass AppDataProvider
let stadiums = try await CloudKitService.shared.fetchStadiums()
```
---
## Part 6: What Can Be Updated and How
### Update Matrix
| What | Bundled JSON | CloudKit | Swift Code | Requires App Update |
|------|-------------|----------|------------|---------------------|
| Add new games | ✅ | ✅ | No | No (CloudKit sync) |
| Change stadium name | ✅ + alias | ✅ | No | No (CloudKit sync) |
| Add new team | ✅ | ✅ | No | No (CloudKit sync) |
| Add new sport | ✅ | ✅ | ✅ Division.swift | **YES** |
| Add division | ✅ | ✅ | ✅ Division.swift | **YES** |
| Add achievements | N/A | N/A | ✅ AchievementDefinitions.swift | **YES** |
| Change team colors | ✅ | ✅ | No | No (CloudKit sync) |
### Why Some Changes Need App Updates
**CloudKit CAN handle:**
- New records (games, teams, stadiums)
- Updated records (name changes, etc.)
- Soft deletes (deprecation flags)
**CloudKit CANNOT handle:**
- New enum cases in Swift (Sport enum)
- New achievement definitions (compiled into app)
- UI for new sports (views reference Sport enum)
- Division/Conference structure (static in Division.swift)
### The "New Sport" Problem
If you add a new sport via CloudKit only:
1. ❌ App won't recognize the sport enum value
2. ❌ No achievements defined for that sport
3. ❌ UI filters won't show the sport
4. ❌ Division.swift won't have the structure
**Solution:** New sports require:
1. Add Sport enum case
2. Add Division.swift entries
3. Add AchievementDefinitions entries
4. Bundle JSON with initial data
5. Ship app update
6. THEN CloudKit can sync ongoing changes
---
## Part 7: Aliases (Handling Historical Changes)
### Stadium Aliases
When a stadium is renamed (e.g., "Candlestick Park" → "3Com Park" → "Monster Park"):
```json
// stadium_aliases.json
{
"alias_name": "Candlestick Park",
"stadium_canonical_id": "stadium_nfl_candlestick_park",
"valid_from": "1960-01-01",
"valid_until": "1995-12-31"
},
{
"alias_name": "3Com Park",
"stadium_canonical_id": "stadium_nfl_candlestick_park",
"valid_from": "1996-01-01",
"valid_until": "2002-12-31"
}
```
**Why this matters:**
- Old photos tagged "Candlestick Park" still resolve to correct stadium
- Historical game data uses name from that era
- User visits tracked correctly regardless of current name
### Team Aliases
When teams relocate or rebrand:
```json
// team_aliases.json
{
"id": "alias_nba_sea_supersonics",
"team_canonical_id": "team_nba_okc",
"alias_type": "name",
"alias_value": "Seattle SuperSonics",
"valid_from": "1967-01-01",
"valid_until": "2008-06-30"
}
```
---
## Part 8: Sync State & Offline Behavior
### SyncState Tracking
The app tracks sync status in SwiftData:
```swift
struct SyncState {
var bootstrapCompleted: Bool // Has initial JSON been loaded?
var lastSuccessfulSync: Date? // When did CloudKit last succeed?
var syncInProgress: Bool // Is sync running now?
var consecutiveFailures: Int // How many failures in a row?
}
```
### Offline Scenarios
| Scenario | Behavior |
|----------|----------|
| First launch, no internet | Bootstrap from bundled JSON, app works |
| Returning user, no internet | Uses last synced SwiftData, app works |
| CloudKit partially fails | Partial sync saved, retry later |
| CloudKit down for days | App continues with local data |
| User deletes app, reinstalls | Fresh bootstrap from bundled JSON |
---
## Part 9: Quick Reference - File Locations
### JSON Data Files
```
SportsTime/Resources/
├── stadiums_canonical.json # Stadium metadata
├── teams_canonical.json # Team metadata
├── games_canonical.json # Game schedules
├── league_structure.json # Divisions & conferences
├── stadium_aliases.json # Historical stadium names
└── team_aliases.json # Historical team names
```
### Swift Code
```
SportsTime/Core/
├── Models/
│ ├── Domain/
│ │ ├── Stadium.swift # Stadium struct
│ │ ├── Team.swift # Team struct
│ │ ├── Game.swift # Game struct
│ │ ├── Division.swift # LeagueStructure + static data
│ │ └── AchievementDefinitions.swift # Achievement registry
│ ├── Local/
│ │ └── CanonicalModels.swift # SwiftData models
│ └── CloudKit/
│ └── CKModels.swift # CloudKit record types
└── Services/
├── DataProvider.swift # AppDataProvider (source of truth)
├── BootstrapService.swift # First-launch JSON → SwiftData
├── CanonicalSyncService.swift # CloudKit → SwiftData sync
└── CloudKitService.swift # CloudKit API wrapper
```
### Python Scripts
```
Scripts/
├── scrape_schedules.py # Game schedule scraper
├── generate_missing_data.py # Team/stadium generator
├── sportstime_parser/
│ ├── cli.py # Main CLI
│ ├── scrapers/ # Sport-specific scrapers
│ ├── normalizers/ # Data standardization
│ └── uploaders/ # CloudKit upload
└── output/ # Generated JSON files
```
---
## Part 10: Common Tasks Checklist
### Before a New Season
- [ ] Run `scrape_schedules.py --sport all --season YYYY`
- [ ] Verify `games_canonical.json` has expected game count
- [ ] Copy to Resources folder
- [ ] Test app locally
- [ ] Upload to CloudKit: `python -m sportstime_parser upload games`
### Adding a New Stadium
1. [ ] Add to `stadiums_canonical.json`
2. [ ] Add team reference in `teams_canonical.json`
3. [ ] Copy both to Resources folder
4. [ ] Upload to CloudKit
### Stadium Renamed
1. [ ] Add alias to `stadium_aliases.json` with date range
2. [ ] Update stadium name in `stadiums_canonical.json`
3. [ ] Copy both to Resources folder
4. [ ] Upload to CloudKit
### Adding a New Sport (App Update Required)
1. [ ] Add Sport enum case in `Sport.swift`
2. [ ] Add divisions/conferences to `Division.swift`
3. [ ] Add achievements to `AchievementDefinitions.swift`
4. [ ] Add teams to `teams_canonical.json`
5. [ ] Add stadiums to `stadiums_canonical.json`
6. [ ] Add league structure to `league_structure.json`
7. [ ] Run `generate_missing_data.py` to validate
8. [ ] Copy all JSON to Resources folder
9. [ ] Build and test
10. [ ] Ship app update
11. [ ] Upload data to CloudKit for sync
---
## Glossary
| Term | Definition |
|------|------------|
| **Canonical ID** | Permanent, unique identifier for an entity (e.g., `stadium_mlb_fenway_park`) |
| **Bootstrap** | First-launch process that loads bundled JSON into SwiftData |
| **Delta sync** | Only fetching changes since last sync (not full data) |
| **AppDataProvider** | The single source of truth for all canonical data in the app |
| **SwiftData** | Apple's local database framework (replacement for Core Data) |
| **CloudKit** | Apple's cloud database service |
| **Alias** | Historical name mapping (old name → canonical ID) |
| **Soft delete** | Marking record as deprecated instead of actually deleting |
---
*Last updated: January 2026*