504 lines
17 KiB
Markdown
504 lines
17 KiB
Markdown
# SportsTime Data Architecture
|
|
|
|
> A plain-English guide for understanding how data flows through the app, from source to screen.
|
|
|
|
## Quick Summary
|
|
|
|
**The Big Picture:**
|
|
1. **Python scripts** scrape schedules from sports websites
|
|
2. **JSON files** get bundled into the app (so it works offline on day one)
|
|
3. **CloudKit** syncs updates in the background (so users get fresh data)
|
|
4. **AppDataProvider.shared** is the single source of truth the app uses
|
|
|
|
---
|
|
|
|
## Part 1: What Data Do We Have?
|
|
|
|
### Core Data Types
|
|
|
|
| Data Type | Description | Count | Update Frequency |
|
|
|-----------|-------------|-------|------------------|
|
|
| **Stadiums** | Venues where games are played | ~178 total | Rarely (new stadium every few years) |
|
|
| **Teams** | Professional sports teams | ~180 total | Rarely (expansion, relocation) |
|
|
| **Games** | Scheduled matches | ~5,000/season | Daily during season |
|
|
| **League Structure** | Conferences, divisions | ~50 entries | Rarely (realignment) |
|
|
| **Aliases** | Historical names (old stadium names, team relocations) | ~100 | As needed |
|
|
|
|
### Data by Sport
|
|
|
|
| Sport | Teams | Stadiums | Divisions | Conferences |
|
|
|-------|-------|----------|-----------|-------------|
|
|
| MLB | 30 | 30 | 6 | 2 |
|
|
| NBA | 30 | 30 | 6 | 2 |
|
|
| NHL | 32 | 32 | 4 | 2 |
|
|
| NFL | 32 | 30* | 8 | 2 |
|
|
| MLS | 30 | 30 | 0 | 2 |
|
|
| WNBA | 13 | 13 | 0 | 2 |
|
|
| NWSL | 13 | 13 | 0 | 0 |
|
|
|
|
*NFL: Giants/Jets share MetLife Stadium; Rams/Chargers share SoFi Stadium
|
|
|
|
---
|
|
|
|
## Part 2: Where Does Data Live?
|
|
|
|
Data exists in **four places**, each serving a different purpose:
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ │
|
|
│ 1. BUNDLED JSON FILES (App Bundle) │
|
|
│ └─ Ships with app, works offline on first launch │
|
|
│ │
|
|
│ 2. SWIFTDATA (Local Database) │
|
|
│ └─ Fast local storage, persists between launches │
|
|
│ │
|
|
│ 3. CLOUDKIT (Apple's Cloud) │
|
|
│ └─ Remote sync, shared across all users │
|
|
│ │
|
|
│ 4. APPDATAPROVIDER (In-Memory Cache) │
|
|
│ └─ What the app actually uses at runtime │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### How They Work Together
|
|
|
|
```
|
|
┌──────────────────┐
|
|
│ Python Scripts │
|
|
│ (scrape data) │
|
|
└────────┬─────────┘
|
|
│
|
|
┌──────────────┴──────────────┐
|
|
│ │
|
|
▼ ▼
|
|
┌────────────────┐ ┌────────────────┐
|
|
│ JSON Files │ │ CloudKit │
|
|
│ (bundled in │ │ (remote │
|
|
│ app) │ │ sync) │
|
|
└───────┬────────┘ └───────┬────────┘
|
|
│ │
|
|
│ First launch │ Background sync
|
|
│ bootstrap │ (ongoing)
|
|
▼ ▼
|
|
┌─────────────────────────────────────────────┐
|
|
│ SwiftData │
|
|
│ (local database) │
|
|
└────────────────────┬────────────────────────┘
|
|
│
|
|
│ App reads from here
|
|
▼
|
|
┌─────────────────────────────────────────────┐
|
|
│ AppDataProvider.shared │
|
|
│ (single source of truth) │
|
|
└────────────────────┬────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────┐
|
|
│ Features & Views │
|
|
│ (Trip planning, Progress tracking, etc.) │
|
|
└─────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Part 3: The JSON Files (What Ships With the App)
|
|
|
|
Located in: `SportsTime/Resources/`
|
|
|
|
### File Inventory
|
|
|
|
| File | Contains | Updated By |
|
|
|------|----------|------------|
|
|
| `stadiums_canonical.json` | All stadium data | `generate_missing_data.py` |
|
|
| `teams_canonical.json` | All team data | `generate_missing_data.py` |
|
|
| `games_canonical.json` | Game schedules | `scrape_schedules.py` |
|
|
| `league_structure.json` | Conferences & divisions | Manual edit |
|
|
| `stadium_aliases.json` | Historical stadium names | Manual edit |
|
|
| `team_aliases.json` | Historical team names | Manual edit |
|
|
|
|
### Why Bundle JSON?
|
|
|
|
1. **Offline-first**: App works immediately, even without internet
|
|
2. **Fast startup**: No network delay on first launch
|
|
3. **Fallback**: If CloudKit is down, app still functions
|
|
4. **Testing**: Deterministic data for development
|
|
|
|
### Example: Stadium Record
|
|
|
|
```json
|
|
{
|
|
"canonical_id": "stadium_mlb_fenway_park",
|
|
"name": "Fenway Park",
|
|
"city": "Boston",
|
|
"state": "Massachusetts",
|
|
"latitude": 42.3467,
|
|
"longitude": -71.0972,
|
|
"capacity": 37755,
|
|
"sport": "MLB",
|
|
"year_opened": 1912,
|
|
"primary_team_abbrevs": ["BOS"]
|
|
}
|
|
```
|
|
|
|
**Key concept: Canonical IDs**
|
|
- Every entity has a unique, permanent ID
|
|
- Format: `{type}_{sport}_{identifier}`
|
|
- Examples: `team_nba_lal`, `stadium_nfl_sofi_stadium`, `game_mlb_2026_20260401_bos_nyy`
|
|
- IDs never change, even if names do (that's what aliases are for)
|
|
|
|
---
|
|
|
|
## Part 4: The Python Scripts (How Data Gets Updated)
|
|
|
|
Located in: `Scripts/`
|
|
|
|
### Script Overview
|
|
|
|
| Script | Purpose | When to Use |
|
|
|--------|---------|-------------|
|
|
| `scrape_schedules.py` | Fetch game schedules from sports-reference sites | New season starts |
|
|
| `generate_missing_data.py` | Add teams/stadiums for new sports | Adding new league |
|
|
| `sportstime_parser/cli.py` | Full pipeline: scrape → normalize → upload | Production updates |
|
|
|
|
### Common Workflows
|
|
|
|
#### 1. Adding a New Season's Schedule
|
|
|
|
```bash
|
|
cd Scripts
|
|
|
|
# Scrape all sports for 2026 season
|
|
python scrape_schedules.py --sport all --season 2026
|
|
|
|
# Or one sport at a time
|
|
python scrape_schedules.py --sport nba --season 2026
|
|
python scrape_schedules.py --sport mlb --season 2026
|
|
```
|
|
|
|
**Output:** Updates `games_canonical.json` with new games
|
|
|
|
#### 2. Adding a New Sport or League
|
|
|
|
```bash
|
|
cd Scripts
|
|
|
|
# Generate missing teams/stadiums
|
|
python generate_missing_data.py
|
|
|
|
# This will:
|
|
# 1. Check existing teams_canonical.json
|
|
# 2. Add missing teams with proper IDs
|
|
# 3. Add missing stadiums
|
|
# 4. Update conference/division assignments
|
|
```
|
|
|
|
**After running:** Copy output files to Resources folder:
|
|
```bash
|
|
cp output/teams_canonical.json ../SportsTime/Resources/
|
|
cp output/stadiums_canonical.json ../SportsTime/Resources/
|
|
```
|
|
|
|
#### 3. Updating League Structure (Manual)
|
|
|
|
Edit `SportsTime/Resources/league_structure.json` directly:
|
|
|
|
```json
|
|
{
|
|
"id": "nfl_afc_east",
|
|
"sport": "NFL",
|
|
"type": "division",
|
|
"name": "AFC East",
|
|
"abbreviation": "AFC East",
|
|
"parent_id": "nfl_afc",
|
|
"display_order": 1
|
|
}
|
|
```
|
|
|
|
**Also update:** `SportsTime/Core/Models/Domain/Division.swift` to match
|
|
|
|
#### 4. Uploading to CloudKit (Production)
|
|
|
|
```bash
|
|
cd Scripts
|
|
|
|
# Upload all canonical data to CloudKit
|
|
python -m sportstime_parser upload all
|
|
|
|
# Or specific entity type
|
|
python -m sportstime_parser upload games
|
|
python -m sportstime_parser upload teams
|
|
```
|
|
|
|
**Requires:** CloudKit credentials configured in `sportstime_parser/config.py`
|
|
|
|
---
|
|
|
|
## Part 5: How the App Uses This Data
|
|
|
|
### Startup Flow
|
|
|
|
```
|
|
1. App launches
|
|
↓
|
|
2. Check: Is this first launch?
|
|
↓
|
|
3. YES → BootstrapService loads bundled JSON into SwiftData
|
|
NO → Skip to step 4
|
|
↓
|
|
4. AppDataProvider.configure(modelContext)
|
|
↓
|
|
5. AppDataProvider.loadInitialData()
|
|
- Reads from SwiftData
|
|
- Populates in-memory cache
|
|
↓
|
|
6. App ready! (works offline)
|
|
↓
|
|
7. Background: CanonicalSyncService.syncAll()
|
|
- Fetches updates from CloudKit
|
|
- Merges into SwiftData
|
|
- Refreshes AppDataProvider cache
|
|
```
|
|
|
|
### Accessing Data in Code
|
|
|
|
```swift
|
|
// ✅ CORRECT - Always use AppDataProvider
|
|
let stadiums = AppDataProvider.shared.stadiums
|
|
let teams = AppDataProvider.shared.teams
|
|
let games = try await AppDataProvider.shared.filterGames(
|
|
sports: [.nba, .mlb],
|
|
startDate: Date(),
|
|
endDate: Date().addingTimeInterval(86400 * 30)
|
|
)
|
|
|
|
// ❌ WRONG - Never bypass AppDataProvider
|
|
let stadiums = try await CloudKitService.shared.fetchStadiums()
|
|
```
|
|
|
|
---
|
|
|
|
## Part 6: What Can Be Updated and How
|
|
|
|
### Update Matrix
|
|
|
|
| What | Bundled JSON | CloudKit | Swift Code | Requires App Update |
|
|
|------|-------------|----------|------------|---------------------|
|
|
| Add new games | ✅ | ✅ | No | No (CloudKit sync) |
|
|
| Change stadium name | ✅ + alias | ✅ | No | No (CloudKit sync) |
|
|
| Add new team | ✅ | ✅ | No | No (CloudKit sync) |
|
|
| Add new sport | ✅ | ✅ | ✅ Division.swift | **YES** |
|
|
| Add division | ✅ | ✅ | ✅ Division.swift | **YES** |
|
|
| Add achievements | N/A | N/A | ✅ AchievementDefinitions.swift | **YES** |
|
|
| Change team colors | ✅ | ✅ | No | No (CloudKit sync) |
|
|
|
|
### Why Some Changes Need App Updates
|
|
|
|
**CloudKit CAN handle:**
|
|
- New records (games, teams, stadiums)
|
|
- Updated records (name changes, etc.)
|
|
- Soft deletes (deprecation flags)
|
|
|
|
**CloudKit CANNOT handle:**
|
|
- New enum cases in Swift (Sport enum)
|
|
- New achievement definitions (compiled into app)
|
|
- UI for new sports (views reference Sport enum)
|
|
- Division/Conference structure (static in Division.swift)
|
|
|
|
### The "New Sport" Problem
|
|
|
|
If you add a new sport via CloudKit only:
|
|
1. ❌ App won't recognize the sport enum value
|
|
2. ❌ No achievements defined for that sport
|
|
3. ❌ UI filters won't show the sport
|
|
4. ❌ Division.swift won't have the structure
|
|
|
|
**Solution:** New sports require:
|
|
1. Add Sport enum case
|
|
2. Add Division.swift entries
|
|
3. Add AchievementDefinitions entries
|
|
4. Bundle JSON with initial data
|
|
5. Ship app update
|
|
6. THEN CloudKit can sync ongoing changes
|
|
|
|
---
|
|
|
|
## Part 7: Aliases (Handling Historical Changes)
|
|
|
|
### Stadium Aliases
|
|
|
|
When a stadium is renamed (e.g., "Candlestick Park" → "3Com Park" → "Monster Park"):
|
|
|
|
```json
|
|
// stadium_aliases.json
|
|
{
|
|
"alias_name": "Candlestick Park",
|
|
"stadium_canonical_id": "stadium_nfl_candlestick_park",
|
|
"valid_from": "1960-01-01",
|
|
"valid_until": "1995-12-31"
|
|
},
|
|
{
|
|
"alias_name": "3Com Park",
|
|
"stadium_canonical_id": "stadium_nfl_candlestick_park",
|
|
"valid_from": "1996-01-01",
|
|
"valid_until": "2002-12-31"
|
|
}
|
|
```
|
|
|
|
**Why this matters:**
|
|
- Old photos tagged "Candlestick Park" still resolve to correct stadium
|
|
- Historical game data uses name from that era
|
|
- User visits tracked correctly regardless of current name
|
|
|
|
### Team Aliases
|
|
|
|
When teams relocate or rebrand:
|
|
|
|
```json
|
|
// team_aliases.json
|
|
{
|
|
"id": "alias_nba_sea_supersonics",
|
|
"team_canonical_id": "team_nba_okc",
|
|
"alias_type": "name",
|
|
"alias_value": "Seattle SuperSonics",
|
|
"valid_from": "1967-01-01",
|
|
"valid_until": "2008-06-30"
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Part 8: Sync State & Offline Behavior
|
|
|
|
### SyncState Tracking
|
|
|
|
The app tracks sync status in SwiftData:
|
|
|
|
```swift
|
|
struct SyncState {
|
|
var bootstrapCompleted: Bool // Has initial JSON been loaded?
|
|
var lastSuccessfulSync: Date? // When did CloudKit last succeed?
|
|
var syncInProgress: Bool // Is sync running now?
|
|
var consecutiveFailures: Int // How many failures in a row?
|
|
}
|
|
```
|
|
|
|
### Offline Scenarios
|
|
|
|
| Scenario | Behavior |
|
|
|----------|----------|
|
|
| First launch, no internet | Bootstrap from bundled JSON, app works |
|
|
| Returning user, no internet | Uses last synced SwiftData, app works |
|
|
| CloudKit partially fails | Partial sync saved, retry later |
|
|
| CloudKit down for days | App continues with local data |
|
|
| User deletes app, reinstalls | Fresh bootstrap from bundled JSON |
|
|
|
|
---
|
|
|
|
## Part 9: Quick Reference - File Locations
|
|
|
|
### JSON Data Files
|
|
```
|
|
SportsTime/Resources/
|
|
├── stadiums_canonical.json # Stadium metadata
|
|
├── teams_canonical.json # Team metadata
|
|
├── games_canonical.json # Game schedules
|
|
├── league_structure.json # Divisions & conferences
|
|
├── stadium_aliases.json # Historical stadium names
|
|
└── team_aliases.json # Historical team names
|
|
```
|
|
|
|
### Swift Code
|
|
```
|
|
SportsTime/Core/
|
|
├── Models/
|
|
│ ├── Domain/
|
|
│ │ ├── Stadium.swift # Stadium struct
|
|
│ │ ├── Team.swift # Team struct
|
|
│ │ ├── Game.swift # Game struct
|
|
│ │ ├── Division.swift # LeagueStructure + static data
|
|
│ │ └── AchievementDefinitions.swift # Achievement registry
|
|
│ ├── Local/
|
|
│ │ └── CanonicalModels.swift # SwiftData models
|
|
│ └── CloudKit/
|
|
│ └── CKModels.swift # CloudKit record types
|
|
└── Services/
|
|
├── DataProvider.swift # AppDataProvider (source of truth)
|
|
├── BootstrapService.swift # First-launch JSON → SwiftData
|
|
├── CanonicalSyncService.swift # CloudKit → SwiftData sync
|
|
└── CloudKitService.swift # CloudKit API wrapper
|
|
```
|
|
|
|
### Python Scripts
|
|
```
|
|
Scripts/
|
|
├── scrape_schedules.py # Game schedule scraper
|
|
├── generate_missing_data.py # Team/stadium generator
|
|
├── sportstime_parser/
|
|
│ ├── cli.py # Main CLI
|
|
│ ├── scrapers/ # Sport-specific scrapers
|
|
│ ├── normalizers/ # Data standardization
|
|
│ └── uploaders/ # CloudKit upload
|
|
└── output/ # Generated JSON files
|
|
```
|
|
|
|
---
|
|
|
|
## Part 10: Common Tasks Checklist
|
|
|
|
### Before a New Season
|
|
|
|
- [ ] Run `scrape_schedules.py --sport all --season YYYY`
|
|
- [ ] Verify `games_canonical.json` has expected game count
|
|
- [ ] Copy to Resources folder
|
|
- [ ] Test app locally
|
|
- [ ] Upload to CloudKit: `python -m sportstime_parser upload games`
|
|
|
|
### Adding a New Stadium
|
|
|
|
1. [ ] Add to `stadiums_canonical.json`
|
|
2. [ ] Add team reference in `teams_canonical.json`
|
|
3. [ ] Copy both to Resources folder
|
|
4. [ ] Upload to CloudKit
|
|
|
|
### Stadium Renamed
|
|
|
|
1. [ ] Add alias to `stadium_aliases.json` with date range
|
|
2. [ ] Update stadium name in `stadiums_canonical.json`
|
|
3. [ ] Copy both to Resources folder
|
|
4. [ ] Upload to CloudKit
|
|
|
|
### Adding a New Sport (App Update Required)
|
|
|
|
1. [ ] Add Sport enum case in `Sport.swift`
|
|
2. [ ] Add divisions/conferences to `Division.swift`
|
|
3. [ ] Add achievements to `AchievementDefinitions.swift`
|
|
4. [ ] Add teams to `teams_canonical.json`
|
|
5. [ ] Add stadiums to `stadiums_canonical.json`
|
|
6. [ ] Add league structure to `league_structure.json`
|
|
7. [ ] Run `generate_missing_data.py` to validate
|
|
8. [ ] Copy all JSON to Resources folder
|
|
9. [ ] Build and test
|
|
10. [ ] Ship app update
|
|
11. [ ] Upload data to CloudKit for sync
|
|
|
|
---
|
|
|
|
## Glossary
|
|
|
|
| Term | Definition |
|
|
|------|------------|
|
|
| **Canonical ID** | Permanent, unique identifier for an entity (e.g., `stadium_mlb_fenway_park`) |
|
|
| **Bootstrap** | First-launch process that loads bundled JSON into SwiftData |
|
|
| **Delta sync** | Only fetching changes since last sync (not full data) |
|
|
| **AppDataProvider** | The single source of truth for all canonical data in the app |
|
|
| **SwiftData** | Apple's local database framework (replacement for Core Data) |
|
|
| **CloudKit** | Apple's cloud database service |
|
|
| **Alias** | Historical name mapping (old name → canonical ID) |
|
|
| **Soft delete** | Marking record as deprecated instead of actually deleting |
|
|
|
|
---
|
|
|
|
*Last updated: January 2026*
|