17 KiB
SportsTime Data Architecture
A plain-English guide for understanding how data flows through the app, from source to screen.
Quick Summary
The Big Picture:
- Python scripts scrape schedules from sports websites
- JSON files get bundled into the app (so it works offline on day one)
- CloudKit syncs updates in the background (so users get fresh data)
- AppDataProvider.shared is the single source of truth the app uses
Part 1: What Data Do We Have?
Core Data Types
| Data Type | Description | Count | Update Frequency |
|---|---|---|---|
| Stadiums | Venues where games are played | ~178 total | Rarely (new stadium every few years) |
| Teams | Professional sports teams | ~180 total | Rarely (expansion, relocation) |
| Games | Scheduled matches | ~5,000/season | Daily during season |
| League Structure | Conferences, divisions | ~50 entries | Rarely (realignment) |
| Aliases | Historical names (old stadium names, team relocations) | ~100 | As needed |
Data by Sport
| Sport | Teams | Stadiums | Divisions | Conferences |
|---|---|---|---|---|
| MLB | 30 | 30 | 6 | 2 |
| NBA | 30 | 30 | 6 | 2 |
| NHL | 32 | 32 | 4 | 2 |
| NFL | 32 | 30* | 8 | 2 |
| MLS | 30 | 30 | 0 | 2 |
| WNBA | 13 | 13 | 0 | 2 |
| NWSL | 13 | 13 | 0 | 0 |
*NFL: Giants/Jets share MetLife Stadium; Rams/Chargers share SoFi Stadium
Part 2: Where Does Data Live?
Data exists in four places, each serving a different purpose:
┌─────────────────────────────────────────────────────────────────┐
│ │
│ 1. BUNDLED JSON FILES (App Bundle) │
│ └─ Ships with app, works offline on first launch │
│ │
│ 2. SWIFTDATA (Local Database) │
│ └─ Fast local storage, persists between launches │
│ │
│ 3. CLOUDKIT (Apple's Cloud) │
│ └─ Remote sync, shared across all users │
│ │
│ 4. APPDATAPROVIDER (In-Memory Cache) │
│ └─ What the app actually uses at runtime │
│ │
└─────────────────────────────────────────────────────────────────┘
How They Work Together
┌──────────────────┐
│ Python Scripts │
│ (scrape data) │
└────────┬─────────┘
│
┌──────────────┴──────────────┐
│ │
▼ ▼
┌────────────────┐ ┌────────────────┐
│ JSON Files │ │ CloudKit │
│ (bundled in │ │ (remote │
│ app) │ │ sync) │
└───────┬────────┘ └───────┬────────┘
│ │
│ First launch │ Background sync
│ bootstrap │ (ongoing)
▼ ▼
┌─────────────────────────────────────────────┐
│ SwiftData │
│ (local database) │
└────────────────────┬────────────────────────┘
│
│ App reads from here
▼
┌─────────────────────────────────────────────┐
│ AppDataProvider.shared │
│ (single source of truth) │
└────────────────────┬────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Features & Views │
│ (Trip planning, Progress tracking, etc.) │
└─────────────────────────────────────────────┘
Part 3: The JSON Files (What Ships With the App)
Located in: SportsTime/Resources/
File Inventory
| File | Contains | Updated By |
|---|---|---|
stadiums_canonical.json |
All stadium data | generate_missing_data.py |
teams_canonical.json |
All team data | generate_missing_data.py |
games_canonical.json |
Game schedules | scrape_schedules.py |
league_structure.json |
Conferences & divisions | Manual edit |
stadium_aliases.json |
Historical stadium names | Manual edit |
team_aliases.json |
Historical team names | Manual edit |
Why Bundle JSON?
- Offline-first: App works immediately, even without internet
- Fast startup: No network delay on first launch
- Fallback: If CloudKit is down, app still functions
- Testing: Deterministic data for development
Example: Stadium Record
{
"canonical_id": "stadium_mlb_fenway_park",
"name": "Fenway Park",
"city": "Boston",
"state": "Massachusetts",
"latitude": 42.3467,
"longitude": -71.0972,
"capacity": 37755,
"sport": "MLB",
"year_opened": 1912,
"primary_team_abbrevs": ["BOS"]
}
Key concept: Canonical IDs
- Every entity has a unique, permanent ID
- Format:
{type}_{sport}_{identifier} - Examples:
team_nba_lal,stadium_nfl_sofi_stadium,game_mlb_2026_20260401_bos_nyy - IDs never change, even if names do (that's what aliases are for)
Part 4: The Python Scripts (How Data Gets Updated)
Located in: Scripts/
Script Overview
| Script | Purpose | When to Use |
|---|---|---|
scrape_schedules.py |
Fetch game schedules from sports-reference sites | New season starts |
generate_missing_data.py |
Add teams/stadiums for new sports | Adding new league |
sportstime_parser/cli.py |
Full pipeline: scrape → normalize → upload | Production updates |
Common Workflows
1. Adding a New Season's Schedule
cd Scripts
# Scrape all sports for 2026 season
python scrape_schedules.py --sport all --season 2026
# Or one sport at a time
python scrape_schedules.py --sport nba --season 2026
python scrape_schedules.py --sport mlb --season 2026
Output: Updates games_canonical.json with new games
2. Adding a New Sport or League
cd Scripts
# Generate missing teams/stadiums
python generate_missing_data.py
# This will:
# 1. Check existing teams_canonical.json
# 2. Add missing teams with proper IDs
# 3. Add missing stadiums
# 4. Update conference/division assignments
After running: Copy output files to Resources folder:
cp output/teams_canonical.json ../SportsTime/Resources/
cp output/stadiums_canonical.json ../SportsTime/Resources/
3. Updating League Structure (Manual)
Edit SportsTime/Resources/league_structure.json directly:
{
"id": "nfl_afc_east",
"sport": "NFL",
"type": "division",
"name": "AFC East",
"abbreviation": "AFC East",
"parent_id": "nfl_afc",
"display_order": 1
}
Also update: SportsTime/Core/Models/Domain/Division.swift to match
4. Uploading to CloudKit (Production)
cd Scripts
# Upload all canonical data to CloudKit
python -m sportstime_parser upload all
# Or specific entity type
python -m sportstime_parser upload games
python -m sportstime_parser upload teams
Requires: CloudKit credentials configured in sportstime_parser/config.py
Part 5: How the App Uses This Data
Startup Flow
1. App launches
↓
2. Check: Is this first launch?
↓
3. YES → BootstrapService loads bundled JSON into SwiftData
NO → Skip to step 4
↓
4. AppDataProvider.configure(modelContext)
↓
5. AppDataProvider.loadInitialData()
- Reads from SwiftData
- Populates in-memory cache
↓
6. App ready! (works offline)
↓
7. Background: CanonicalSyncService.syncAll()
- Fetches updates from CloudKit
- Merges into SwiftData
- Refreshes AppDataProvider cache
Accessing Data in Code
// ✅ CORRECT - Always use AppDataProvider
let stadiums = AppDataProvider.shared.stadiums
let teams = AppDataProvider.shared.teams
let games = try await AppDataProvider.shared.filterGames(
sports: [.nba, .mlb],
startDate: Date(),
endDate: Date().addingTimeInterval(86400 * 30)
)
// ❌ WRONG - Never bypass AppDataProvider
let stadiums = try await CloudKitService.shared.fetchStadiums()
Part 6: What Can Be Updated and How
Update Matrix
| What | Bundled JSON | CloudKit | Swift Code | Requires App Update |
|---|---|---|---|---|
| Add new games | ✅ | ✅ | No | No (CloudKit sync) |
| Change stadium name | ✅ + alias | ✅ | No | No (CloudKit sync) |
| Add new team | ✅ | ✅ | No | No (CloudKit sync) |
| Add new sport | ✅ | ✅ | ✅ Division.swift | YES |
| Add division | ✅ | ✅ | ✅ Division.swift | YES |
| Add achievements | N/A | N/A | ✅ AchievementDefinitions.swift | YES |
| Change team colors | ✅ | ✅ | No | No (CloudKit sync) |
Why Some Changes Need App Updates
CloudKit CAN handle:
- New records (games, teams, stadiums)
- Updated records (name changes, etc.)
- Soft deletes (deprecation flags)
CloudKit CANNOT handle:
- New enum cases in Swift (Sport enum)
- New achievement definitions (compiled into app)
- UI for new sports (views reference Sport enum)
- Division/Conference structure (static in Division.swift)
The "New Sport" Problem
If you add a new sport via CloudKit only:
- ❌ App won't recognize the sport enum value
- ❌ No achievements defined for that sport
- ❌ UI filters won't show the sport
- ❌ Division.swift won't have the structure
Solution: New sports require:
- Add Sport enum case
- Add Division.swift entries
- Add AchievementDefinitions entries
- Bundle JSON with initial data
- Ship app update
- THEN CloudKit can sync ongoing changes
Part 7: Aliases (Handling Historical Changes)
Stadium Aliases
When a stadium is renamed (e.g., "Candlestick Park" → "3Com Park" → "Monster Park"):
// stadium_aliases.json
{
"alias_name": "Candlestick Park",
"stadium_canonical_id": "stadium_nfl_candlestick_park",
"valid_from": "1960-01-01",
"valid_until": "1995-12-31"
},
{
"alias_name": "3Com Park",
"stadium_canonical_id": "stadium_nfl_candlestick_park",
"valid_from": "1996-01-01",
"valid_until": "2002-12-31"
}
Why this matters:
- Old photos tagged "Candlestick Park" still resolve to correct stadium
- Historical game data uses name from that era
- User visits tracked correctly regardless of current name
Team Aliases
When teams relocate or rebrand:
// team_aliases.json
{
"id": "alias_nba_sea_supersonics",
"team_canonical_id": "team_nba_okc",
"alias_type": "name",
"alias_value": "Seattle SuperSonics",
"valid_from": "1967-01-01",
"valid_until": "2008-06-30"
}
Part 8: Sync State & Offline Behavior
SyncState Tracking
The app tracks sync status in SwiftData:
struct SyncState {
var bootstrapCompleted: Bool // Has initial JSON been loaded?
var lastSuccessfulSync: Date? // When did CloudKit last succeed?
var syncInProgress: Bool // Is sync running now?
var consecutiveFailures: Int // How many failures in a row?
}
Offline Scenarios
| Scenario | Behavior |
|---|---|
| First launch, no internet | Bootstrap from bundled JSON, app works |
| Returning user, no internet | Uses last synced SwiftData, app works |
| CloudKit partially fails | Partial sync saved, retry later |
| CloudKit down for days | App continues with local data |
| User deletes app, reinstalls | Fresh bootstrap from bundled JSON |
Part 9: Quick Reference - File Locations
JSON Data Files
SportsTime/Resources/
├── stadiums_canonical.json # Stadium metadata
├── teams_canonical.json # Team metadata
├── games_canonical.json # Game schedules
├── league_structure.json # Divisions & conferences
├── stadium_aliases.json # Historical stadium names
└── team_aliases.json # Historical team names
Swift Code
SportsTime/Core/
├── Models/
│ ├── Domain/
│ │ ├── Stadium.swift # Stadium struct
│ │ ├── Team.swift # Team struct
│ │ ├── Game.swift # Game struct
│ │ ├── Division.swift # LeagueStructure + static data
│ │ └── AchievementDefinitions.swift # Achievement registry
│ ├── Local/
│ │ └── CanonicalModels.swift # SwiftData models
│ └── CloudKit/
│ └── CKModels.swift # CloudKit record types
└── Services/
├── DataProvider.swift # AppDataProvider (source of truth)
├── BootstrapService.swift # First-launch JSON → SwiftData
├── CanonicalSyncService.swift # CloudKit → SwiftData sync
└── CloudKitService.swift # CloudKit API wrapper
Python Scripts
Scripts/
├── scrape_schedules.py # Game schedule scraper
├── generate_missing_data.py # Team/stadium generator
├── sportstime_parser/
│ ├── cli.py # Main CLI
│ ├── scrapers/ # Sport-specific scrapers
│ ├── normalizers/ # Data standardization
│ └── uploaders/ # CloudKit upload
└── output/ # Generated JSON files
Part 10: Common Tasks Checklist
Before a New Season
- Run
scrape_schedules.py --sport all --season YYYY - Verify
games_canonical.jsonhas expected game count - Copy to Resources folder
- Test app locally
- Upload to CloudKit:
python -m sportstime_parser upload games
Adding a New Stadium
- Add to
stadiums_canonical.json - Add team reference in
teams_canonical.json - Copy both to Resources folder
- Upload to CloudKit
Stadium Renamed
- Add alias to
stadium_aliases.jsonwith date range - Update stadium name in
stadiums_canonical.json - Copy both to Resources folder
- Upload to CloudKit
Adding a New Sport (App Update Required)
- Add Sport enum case in
Sport.swift - Add divisions/conferences to
Division.swift - Add achievements to
AchievementDefinitions.swift - Add teams to
teams_canonical.json - Add stadiums to
stadiums_canonical.json - Add league structure to
league_structure.json - Run
generate_missing_data.pyto validate - Copy all JSON to Resources folder
- Build and test
- Ship app update
- Upload data to CloudKit for sync
Glossary
| Term | Definition |
|---|---|
| Canonical ID | Permanent, unique identifier for an entity (e.g., stadium_mlb_fenway_park) |
| Bootstrap | First-launch process that loads bundled JSON into SwiftData |
| Delta sync | Only fetching changes since last sync (not full data) |
| AppDataProvider | The single source of truth for all canonical data in the app |
| SwiftData | Apple's local database framework (replacement for Core Data) |
| CloudKit | Apple's cloud database service |
| Alias | Historical name mapping (old name → canonical ID) |
| Soft delete | Marking record as deprecated instead of actually deleting |
Last updated: January 2026