# SportsTime Data Architecture > A plain-English guide for understanding how data flows through the app, from source to screen. ## Quick Summary **The Big Picture:** 1. **Python scripts** scrape schedules from sports websites 2. **JSON files** get bundled into the app (so it works offline on day one) 3. **CloudKit** syncs updates in the background (so users get fresh data) 4. **AppDataProvider.shared** is the single source of truth the app uses --- ## Part 1: What Data Do We Have? ### Core Data Types | Data Type | Description | Count | Update Frequency | |-----------|-------------|-------|------------------| | **Stadiums** | Venues where games are played | ~178 total | Rarely (new stadium every few years) | | **Teams** | Professional sports teams | ~180 total | Rarely (expansion, relocation) | | **Games** | Scheduled matches | ~5,000/season | Daily during season | | **League Structure** | Conferences, divisions | ~50 entries | Rarely (realignment) | | **Aliases** | Historical names (old stadium names, team relocations) | ~100 | As needed | ### Data by Sport | Sport | Teams | Stadiums | Divisions | Conferences | |-------|-------|----------|-----------|-------------| | MLB | 30 | 30 | 6 | 2 | | NBA | 30 | 30 | 6 | 2 | | NHL | 32 | 32 | 4 | 2 | | NFL | 32 | 30* | 8 | 2 | | MLS | 30 | 30 | 0 | 2 | | WNBA | 13 | 13 | 0 | 2 | | NWSL | 13 | 13 | 0 | 0 | *NFL: Giants/Jets share MetLife Stadium; Rams/Chargers share SoFi Stadium --- ## Part 2: Where Does Data Live? Data exists in **four places**, each serving a different purpose: ``` ┌─────────────────────────────────────────────────────────────────┐ │ │ │ 1. BUNDLED JSON FILES (App Bundle) │ │ └─ Ships with app, works offline on first launch │ │ │ │ 2. SWIFTDATA (Local Database) │ │ └─ Fast local storage, persists between launches │ │ │ │ 3. CLOUDKIT (Apple's Cloud) │ │ └─ Remote sync, shared across all users │ │ │ │ 4. APPDATAPROVIDER (In-Memory Cache) │ │ └─ What the app actually uses at runtime │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### How They Work Together ``` ┌──────────────────┐ │ Python Scripts │ │ (scrape data) │ └────────┬─────────┘ │ ┌──────────────┴──────────────┐ │ │ ▼ ▼ ┌────────────────┐ ┌────────────────┐ │ JSON Files │ │ CloudKit │ │ (bundled in │ │ (remote │ │ app) │ │ sync) │ └───────┬────────┘ └───────┬────────┘ │ │ │ First launch │ Background sync │ bootstrap │ (ongoing) ▼ ▼ ┌─────────────────────────────────────────────┐ │ SwiftData │ │ (local database) │ └────────────────────┬────────────────────────┘ │ │ App reads from here ▼ ┌─────────────────────────────────────────────┐ │ AppDataProvider.shared │ │ (single source of truth) │ └────────────────────┬────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────┐ │ Features & Views │ │ (Trip planning, Progress tracking, etc.) │ └─────────────────────────────────────────────┘ ``` --- ## Part 3: The JSON Files (What Ships With the App) Located in: `SportsTime/Resources/` ### File Inventory | File | Contains | Updated By | |------|----------|------------| | `stadiums_canonical.json` | All stadium data | `generate_missing_data.py` | | `teams_canonical.json` | All team data | `generate_missing_data.py` | | `games_canonical.json` | Game schedules | `scrape_schedules.py` | | `league_structure.json` | Conferences & divisions | Manual edit | | `stadium_aliases.json` | Historical stadium names | Manual edit | | `team_aliases.json` | Historical team names | Manual edit | ### Why Bundle JSON? 1. **Offline-first**: App works immediately, even without internet 2. **Fast startup**: No network delay on first launch 3. **Fallback**: If CloudKit is down, app still functions 4. **Testing**: Deterministic data for development ### Example: Stadium Record ```json { "canonical_id": "stadium_mlb_fenway_park", "name": "Fenway Park", "city": "Boston", "state": "Massachusetts", "latitude": 42.3467, "longitude": -71.0972, "capacity": 37755, "sport": "MLB", "year_opened": 1912, "primary_team_abbrevs": ["BOS"] } ``` **Key concept: Canonical IDs** - Every entity has a unique, permanent ID - Format: `{type}_{sport}_{identifier}` - Examples: `team_nba_lal`, `stadium_nfl_sofi_stadium`, `game_mlb_2026_20260401_bos_nyy` - IDs never change, even if names do (that's what aliases are for) --- ## Part 4: The Python Scripts (How Data Gets Updated) Located in: `Scripts/` ### Script Overview | Script | Purpose | When to Use | |--------|---------|-------------| | `scrape_schedules.py` | Fetch game schedules from sports-reference sites | New season starts | | `generate_missing_data.py` | Add teams/stadiums for new sports | Adding new league | | `sportstime_parser/cli.py` | Full pipeline: scrape → normalize → upload | Production updates | ### Common Workflows #### 1. Adding a New Season's Schedule ```bash cd Scripts # Scrape all sports for 2026 season python scrape_schedules.py --sport all --season 2026 # Or one sport at a time python scrape_schedules.py --sport nba --season 2026 python scrape_schedules.py --sport mlb --season 2026 ``` **Output:** Updates `games_canonical.json` with new games #### 2. Adding a New Sport or League ```bash cd Scripts # Generate missing teams/stadiums python generate_missing_data.py # This will: # 1. Check existing teams_canonical.json # 2. Add missing teams with proper IDs # 3. Add missing stadiums # 4. Update conference/division assignments ``` **After running:** Copy output files to Resources folder: ```bash cp output/teams_canonical.json ../SportsTime/Resources/ cp output/stadiums_canonical.json ../SportsTime/Resources/ ``` #### 3. Updating League Structure (Manual) Edit `SportsTime/Resources/league_structure.json` directly: ```json { "id": "nfl_afc_east", "sport": "NFL", "type": "division", "name": "AFC East", "abbreviation": "AFC East", "parent_id": "nfl_afc", "display_order": 1 } ``` **Also update:** `SportsTime/Core/Models/Domain/Division.swift` to match #### 4. Uploading to CloudKit (Production) ```bash cd Scripts # Upload all canonical data to CloudKit python -m sportstime_parser upload all # Or specific entity type python -m sportstime_parser upload games python -m sportstime_parser upload teams ``` **Requires:** CloudKit credentials configured in `sportstime_parser/config.py` --- ## Part 5: How the App Uses This Data ### Startup Flow ``` 1. App launches ↓ 2. Check: Is this first launch? ↓ 3. YES → BootstrapService loads bundled JSON into SwiftData NO → Skip to step 4 ↓ 4. AppDataProvider.configure(modelContext) ↓ 5. AppDataProvider.loadInitialData() - Reads from SwiftData - Populates in-memory cache ↓ 6. App ready! (works offline) ↓ 7. Background: CanonicalSyncService.syncAll() - Fetches updates from CloudKit - Merges into SwiftData - Refreshes AppDataProvider cache ``` ### Accessing Data in Code ```swift // ✅ CORRECT - Always use AppDataProvider let stadiums = AppDataProvider.shared.stadiums let teams = AppDataProvider.shared.teams let games = try await AppDataProvider.shared.filterGames( sports: [.nba, .mlb], startDate: Date(), endDate: Date().addingTimeInterval(86400 * 30) ) // ❌ WRONG - Never bypass AppDataProvider let stadiums = try await CloudKitService.shared.fetchStadiums() ``` --- ## Part 6: What Can Be Updated and How ### Update Matrix | What | Bundled JSON | CloudKit | Swift Code | Requires App Update | |------|-------------|----------|------------|---------------------| | Add new games | ✅ | ✅ | No | No (CloudKit sync) | | Change stadium name | ✅ + alias | ✅ | No | No (CloudKit sync) | | Add new team | ✅ | ✅ | No | No (CloudKit sync) | | Add new sport | ✅ | ✅ | ✅ Division.swift | **YES** | | Add division | ✅ | ✅ | ✅ Division.swift | **YES** | | Add achievements | N/A | N/A | ✅ AchievementDefinitions.swift | **YES** | | Change team colors | ✅ | ✅ | No | No (CloudKit sync) | ### Why Some Changes Need App Updates **CloudKit CAN handle:** - New records (games, teams, stadiums) - Updated records (name changes, etc.) - Soft deletes (deprecation flags) **CloudKit CANNOT handle:** - New enum cases in Swift (Sport enum) - New achievement definitions (compiled into app) - UI for new sports (views reference Sport enum) - Division/Conference structure (static in Division.swift) ### The "New Sport" Problem If you add a new sport via CloudKit only: 1. ❌ App won't recognize the sport enum value 2. ❌ No achievements defined for that sport 3. ❌ UI filters won't show the sport 4. ❌ Division.swift won't have the structure **Solution:** New sports require: 1. Add Sport enum case 2. Add Division.swift entries 3. Add AchievementDefinitions entries 4. Bundle JSON with initial data 5. Ship app update 6. THEN CloudKit can sync ongoing changes --- ## Part 7: Aliases (Handling Historical Changes) ### Stadium Aliases When a stadium is renamed (e.g., "Candlestick Park" → "3Com Park" → "Monster Park"): ```json // stadium_aliases.json { "alias_name": "Candlestick Park", "stadium_canonical_id": "stadium_nfl_candlestick_park", "valid_from": "1960-01-01", "valid_until": "1995-12-31" }, { "alias_name": "3Com Park", "stadium_canonical_id": "stadium_nfl_candlestick_park", "valid_from": "1996-01-01", "valid_until": "2002-12-31" } ``` **Why this matters:** - Old photos tagged "Candlestick Park" still resolve to correct stadium - Historical game data uses name from that era - User visits tracked correctly regardless of current name ### Team Aliases When teams relocate or rebrand: ```json // team_aliases.json { "id": "alias_nba_sea_supersonics", "team_canonical_id": "team_nba_okc", "alias_type": "name", "alias_value": "Seattle SuperSonics", "valid_from": "1967-01-01", "valid_until": "2008-06-30" } ``` --- ## Part 8: Sync State & Offline Behavior ### SyncState Tracking The app tracks sync status in SwiftData: ```swift struct SyncState { var bootstrapCompleted: Bool // Has initial JSON been loaded? var lastSuccessfulSync: Date? // When did CloudKit last succeed? var syncInProgress: Bool // Is sync running now? var consecutiveFailures: Int // How many failures in a row? } ``` ### Offline Scenarios | Scenario | Behavior | |----------|----------| | First launch, no internet | Bootstrap from bundled JSON, app works | | Returning user, no internet | Uses last synced SwiftData, app works | | CloudKit partially fails | Partial sync saved, retry later | | CloudKit down for days | App continues with local data | | User deletes app, reinstalls | Fresh bootstrap from bundled JSON | --- ## Part 9: Quick Reference - File Locations ### JSON Data Files ``` SportsTime/Resources/ ├── stadiums_canonical.json # Stadium metadata ├── teams_canonical.json # Team metadata ├── games_canonical.json # Game schedules ├── league_structure.json # Divisions & conferences ├── stadium_aliases.json # Historical stadium names └── team_aliases.json # Historical team names ``` ### Swift Code ``` SportsTime/Core/ ├── Models/ │ ├── Domain/ │ │ ├── Stadium.swift # Stadium struct │ │ ├── Team.swift # Team struct │ │ ├── Game.swift # Game struct │ │ ├── Division.swift # LeagueStructure + static data │ │ └── AchievementDefinitions.swift # Achievement registry │ ├── Local/ │ │ └── CanonicalModels.swift # SwiftData models │ └── CloudKit/ │ └── CKModels.swift # CloudKit record types └── Services/ ├── DataProvider.swift # AppDataProvider (source of truth) ├── BootstrapService.swift # First-launch JSON → SwiftData ├── CanonicalSyncService.swift # CloudKit → SwiftData sync └── CloudKitService.swift # CloudKit API wrapper ``` ### Python Scripts ``` Scripts/ ├── scrape_schedules.py # Game schedule scraper ├── generate_missing_data.py # Team/stadium generator ├── sportstime_parser/ │ ├── cli.py # Main CLI │ ├── scrapers/ # Sport-specific scrapers │ ├── normalizers/ # Data standardization │ └── uploaders/ # CloudKit upload └── output/ # Generated JSON files ``` --- ## Part 10: Common Tasks Checklist ### Before a New Season - [ ] Run `scrape_schedules.py --sport all --season YYYY` - [ ] Verify `games_canonical.json` has expected game count - [ ] Copy to Resources folder - [ ] Test app locally - [ ] Upload to CloudKit: `python -m sportstime_parser upload games` ### Adding a New Stadium 1. [ ] Add to `stadiums_canonical.json` 2. [ ] Add team reference in `teams_canonical.json` 3. [ ] Copy both to Resources folder 4. [ ] Upload to CloudKit ### Stadium Renamed 1. [ ] Add alias to `stadium_aliases.json` with date range 2. [ ] Update stadium name in `stadiums_canonical.json` 3. [ ] Copy both to Resources folder 4. [ ] Upload to CloudKit ### Adding a New Sport (App Update Required) 1. [ ] Add Sport enum case in `Sport.swift` 2. [ ] Add divisions/conferences to `Division.swift` 3. [ ] Add achievements to `AchievementDefinitions.swift` 4. [ ] Add teams to `teams_canonical.json` 5. [ ] Add stadiums to `stadiums_canonical.json` 6. [ ] Add league structure to `league_structure.json` 7. [ ] Run `generate_missing_data.py` to validate 8. [ ] Copy all JSON to Resources folder 9. [ ] Build and test 10. [ ] Ship app update 11. [ ] Upload data to CloudKit for sync --- ## Glossary | Term | Definition | |------|------------| | **Canonical ID** | Permanent, unique identifier for an entity (e.g., `stadium_mlb_fenway_park`) | | **Bootstrap** | First-launch process that loads bundled JSON into SwiftData | | **Delta sync** | Only fetching changes since last sync (not full data) | | **AppDataProvider** | The single source of truth for all canonical data in the app | | **SwiftData** | Apple's local database framework (replacement for Core Data) | | **CloudKit** | Apple's cloud database service | | **Alias** | Historical name mapping (old name → canonical ID) | | **Soft delete** | Marking record as deprecated instead of actually deleting | --- *Last updated: January 2026*