Files
Sportstime/docs/DATA_ARCHITECTURE.md
Trey t a8b0491571 wip
2026-01-19 22:12:53 -06:00

17 KiB

SportsTime Data Architecture

A plain-English guide for understanding how data flows through the app, from source to screen.

Quick Summary

The Big Picture:

  1. Python scripts scrape schedules from sports websites
  2. JSON files get bundled into the app (so it works offline on day one)
  3. CloudKit syncs updates in the background (so users get fresh data)
  4. AppDataProvider.shared is the single source of truth the app uses

Part 1: What Data Do We Have?

Core Data Types

Data Type Description Count Update Frequency
Stadiums Venues where games are played ~178 total Rarely (new stadium every few years)
Teams Professional sports teams ~180 total Rarely (expansion, relocation)
Games Scheduled matches ~5,000/season Daily during season
League Structure Conferences, divisions ~50 entries Rarely (realignment)
Aliases Historical names (old stadium names, team relocations) ~100 As needed

Data by Sport

Sport Teams Stadiums Divisions Conferences
MLB 30 30 6 2
NBA 30 30 6 2
NHL 32 32 4 2
NFL 32 30* 8 2
MLS 30 30 0 2
WNBA 13 13 0 2
NWSL 13 13 0 0

*NFL: Giants/Jets share MetLife Stadium; Rams/Chargers share SoFi Stadium


Part 2: Where Does Data Live?

Data exists in four places, each serving a different purpose:

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│  1. BUNDLED JSON FILES (App Bundle)                             │
│     └─ Ships with app, works offline on first launch            │
│                                                                 │
│  2. SWIFTDATA (Local Database)                                  │
│     └─ Fast local storage, persists between launches            │
│                                                                 │
│  3. CLOUDKIT (Apple's Cloud)                                    │
│     └─ Remote sync, shared across all users                     │
│                                                                 │
│  4. APPDATAPROVIDER (In-Memory Cache)                           │
│     └─ What the app actually uses at runtime                    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

How They Work Together

                    ┌──────────────────┐
                    │  Python Scripts  │
                    │  (scrape data)   │
                    └────────┬─────────┘
                             │
              ┌──────────────┴──────────────┐
              │                             │
              ▼                             ▼
     ┌────────────────┐           ┌────────────────┐
     │  JSON Files    │           │   CloudKit     │
     │  (bundled in   │           │   (remote      │
     │   app)         │           │    sync)       │
     └───────┬────────┘           └───────┬────────┘
             │                            │
             │ First launch               │ Background sync
             │ bootstrap                  │ (ongoing)
             ▼                            ▼
     ┌─────────────────────────────────────────────┐
     │              SwiftData                       │
     │         (local database)                     │
     └────────────────────┬────────────────────────┘
                          │
                          │ App reads from here
                          ▼
     ┌─────────────────────────────────────────────┐
     │        AppDataProvider.shared               │
     │     (single source of truth)                │
     └────────────────────┬────────────────────────┘
                          │
                          ▼
     ┌─────────────────────────────────────────────┐
     │           Features & Views                  │
     │  (Trip planning, Progress tracking, etc.)   │
     └─────────────────────────────────────────────┘

Part 3: The JSON Files (What Ships With the App)

Located in: SportsTime/Resources/

File Inventory

File Contains Updated By
stadiums_canonical.json All stadium data generate_missing_data.py
teams_canonical.json All team data generate_missing_data.py
games_canonical.json Game schedules scrape_schedules.py
league_structure.json Conferences & divisions Manual edit
stadium_aliases.json Historical stadium names Manual edit
team_aliases.json Historical team names Manual edit

Why Bundle JSON?

  1. Offline-first: App works immediately, even without internet
  2. Fast startup: No network delay on first launch
  3. Fallback: If CloudKit is down, app still functions
  4. Testing: Deterministic data for development

Example: Stadium Record

{
  "canonical_id": "stadium_mlb_fenway_park",
  "name": "Fenway Park",
  "city": "Boston",
  "state": "Massachusetts",
  "latitude": 42.3467,
  "longitude": -71.0972,
  "capacity": 37755,
  "sport": "MLB",
  "year_opened": 1912,
  "primary_team_abbrevs": ["BOS"]
}

Key concept: Canonical IDs

  • Every entity has a unique, permanent ID
  • Format: {type}_{sport}_{identifier}
  • Examples: team_nba_lal, stadium_nfl_sofi_stadium, game_mlb_2026_20260401_bos_nyy
  • IDs never change, even if names do (that's what aliases are for)

Part 4: The Python Scripts (How Data Gets Updated)

Located in: Scripts/

Script Overview

Script Purpose When to Use
scrape_schedules.py Fetch game schedules from sports-reference sites New season starts
generate_missing_data.py Add teams/stadiums for new sports Adding new league
sportstime_parser/cli.py Full pipeline: scrape → normalize → upload Production updates

Common Workflows

1. Adding a New Season's Schedule

cd Scripts

# Scrape all sports for 2026 season
python scrape_schedules.py --sport all --season 2026

# Or one sport at a time
python scrape_schedules.py --sport nba --season 2026
python scrape_schedules.py --sport mlb --season 2026

Output: Updates games_canonical.json with new games

2. Adding a New Sport or League

cd Scripts

# Generate missing teams/stadiums
python generate_missing_data.py

# This will:
# 1. Check existing teams_canonical.json
# 2. Add missing teams with proper IDs
# 3. Add missing stadiums
# 4. Update conference/division assignments

After running: Copy output files to Resources folder:

cp output/teams_canonical.json ../SportsTime/Resources/
cp output/stadiums_canonical.json ../SportsTime/Resources/

3. Updating League Structure (Manual)

Edit SportsTime/Resources/league_structure.json directly:

{
  "id": "nfl_afc_east",
  "sport": "NFL",
  "type": "division",
  "name": "AFC East",
  "abbreviation": "AFC East",
  "parent_id": "nfl_afc",
  "display_order": 1
}

Also update: SportsTime/Core/Models/Domain/Division.swift to match

4. Uploading to CloudKit (Production)

cd Scripts

# Upload all canonical data to CloudKit
python -m sportstime_parser upload all

# Or specific entity type
python -m sportstime_parser upload games
python -m sportstime_parser upload teams

Requires: CloudKit credentials configured in sportstime_parser/config.py


Part 5: How the App Uses This Data

Startup Flow

1. App launches
    ↓
2. Check: Is this first launch?
    ↓
3. YES → BootstrapService loads bundled JSON into SwiftData
   NO  → Skip to step 4
    ↓
4. AppDataProvider.configure(modelContext)
    ↓
5. AppDataProvider.loadInitialData()
   - Reads from SwiftData
   - Populates in-memory cache
    ↓
6. App ready! (works offline)
    ↓
7. Background: CanonicalSyncService.syncAll()
   - Fetches updates from CloudKit
   - Merges into SwiftData
   - Refreshes AppDataProvider cache

Accessing Data in Code

// ✅ CORRECT - Always use AppDataProvider
let stadiums = AppDataProvider.shared.stadiums
let teams = AppDataProvider.shared.teams
let games = try await AppDataProvider.shared.filterGames(
    sports: [.nba, .mlb],
    startDate: Date(),
    endDate: Date().addingTimeInterval(86400 * 30)
)

// ❌ WRONG - Never bypass AppDataProvider
let stadiums = try await CloudKitService.shared.fetchStadiums()

Part 6: What Can Be Updated and How

Update Matrix

What Bundled JSON CloudKit Swift Code Requires App Update
Add new games No No (CloudKit sync)
Change stadium name + alias No No (CloudKit sync)
Add new team No No (CloudKit sync)
Add new sport Division.swift YES
Add division Division.swift YES
Add achievements N/A N/A AchievementDefinitions.swift YES
Change team colors No No (CloudKit sync)

Why Some Changes Need App Updates

CloudKit CAN handle:

  • New records (games, teams, stadiums)
  • Updated records (name changes, etc.)
  • Soft deletes (deprecation flags)

CloudKit CANNOT handle:

  • New enum cases in Swift (Sport enum)
  • New achievement definitions (compiled into app)
  • UI for new sports (views reference Sport enum)
  • Division/Conference structure (static in Division.swift)

The "New Sport" Problem

If you add a new sport via CloudKit only:

  1. App won't recognize the sport enum value
  2. No achievements defined for that sport
  3. UI filters won't show the sport
  4. Division.swift won't have the structure

Solution: New sports require:

  1. Add Sport enum case
  2. Add Division.swift entries
  3. Add AchievementDefinitions entries
  4. Bundle JSON with initial data
  5. Ship app update
  6. THEN CloudKit can sync ongoing changes

Part 7: Aliases (Handling Historical Changes)

Stadium Aliases

When a stadium is renamed (e.g., "Candlestick Park" → "3Com Park" → "Monster Park"):

// stadium_aliases.json
{
  "alias_name": "Candlestick Park",
  "stadium_canonical_id": "stadium_nfl_candlestick_park",
  "valid_from": "1960-01-01",
  "valid_until": "1995-12-31"
},
{
  "alias_name": "3Com Park",
  "stadium_canonical_id": "stadium_nfl_candlestick_park",
  "valid_from": "1996-01-01",
  "valid_until": "2002-12-31"
}

Why this matters:

  • Old photos tagged "Candlestick Park" still resolve to correct stadium
  • Historical game data uses name from that era
  • User visits tracked correctly regardless of current name

Team Aliases

When teams relocate or rebrand:

// team_aliases.json
{
  "id": "alias_nba_sea_supersonics",
  "team_canonical_id": "team_nba_okc",
  "alias_type": "name",
  "alias_value": "Seattle SuperSonics",
  "valid_from": "1967-01-01",
  "valid_until": "2008-06-30"
}

Part 8: Sync State & Offline Behavior

SyncState Tracking

The app tracks sync status in SwiftData:

struct SyncState {
    var bootstrapCompleted: Bool      // Has initial JSON been loaded?
    var lastSuccessfulSync: Date?     // When did CloudKit last succeed?
    var syncInProgress: Bool          // Is sync running now?
    var consecutiveFailures: Int      // How many failures in a row?
}

Offline Scenarios

Scenario Behavior
First launch, no internet Bootstrap from bundled JSON, app works
Returning user, no internet Uses last synced SwiftData, app works
CloudKit partially fails Partial sync saved, retry later
CloudKit down for days App continues with local data
User deletes app, reinstalls Fresh bootstrap from bundled JSON

Part 9: Quick Reference - File Locations

JSON Data Files

SportsTime/Resources/
├── stadiums_canonical.json   # Stadium metadata
├── teams_canonical.json      # Team metadata
├── games_canonical.json      # Game schedules
├── league_structure.json     # Divisions & conferences
├── stadium_aliases.json      # Historical stadium names
└── team_aliases.json         # Historical team names

Swift Code

SportsTime/Core/
├── Models/
│   ├── Domain/
│   │   ├── Stadium.swift          # Stadium struct
│   │   ├── Team.swift             # Team struct
│   │   ├── Game.swift             # Game struct
│   │   ├── Division.swift         # LeagueStructure + static data
│   │   └── AchievementDefinitions.swift  # Achievement registry
│   ├── Local/
│   │   └── CanonicalModels.swift  # SwiftData models
│   └── CloudKit/
│       └── CKModels.swift         # CloudKit record types
└── Services/
    ├── DataProvider.swift         # AppDataProvider (source of truth)
    ├── BootstrapService.swift     # First-launch JSON → SwiftData
    ├── CanonicalSyncService.swift # CloudKit → SwiftData sync
    └── CloudKitService.swift      # CloudKit API wrapper

Python Scripts

Scripts/
├── scrape_schedules.py           # Game schedule scraper
├── generate_missing_data.py      # Team/stadium generator
├── sportstime_parser/
│   ├── cli.py                    # Main CLI
│   ├── scrapers/                 # Sport-specific scrapers
│   ├── normalizers/              # Data standardization
│   └── uploaders/                # CloudKit upload
└── output/                       # Generated JSON files

Part 10: Common Tasks Checklist

Before a New Season

  • Run scrape_schedules.py --sport all --season YYYY
  • Verify games_canonical.json has expected game count
  • Copy to Resources folder
  • Test app locally
  • Upload to CloudKit: python -m sportstime_parser upload games

Adding a New Stadium

  1. Add to stadiums_canonical.json
  2. Add team reference in teams_canonical.json
  3. Copy both to Resources folder
  4. Upload to CloudKit

Stadium Renamed

  1. Add alias to stadium_aliases.json with date range
  2. Update stadium name in stadiums_canonical.json
  3. Copy both to Resources folder
  4. Upload to CloudKit

Adding a New Sport (App Update Required)

  1. Add Sport enum case in Sport.swift
  2. Add divisions/conferences to Division.swift
  3. Add achievements to AchievementDefinitions.swift
  4. Add teams to teams_canonical.json
  5. Add stadiums to stadiums_canonical.json
  6. Add league structure to league_structure.json
  7. Run generate_missing_data.py to validate
  8. Copy all JSON to Resources folder
  9. Build and test
  10. Ship app update
  11. Upload data to CloudKit for sync

Glossary

Term Definition
Canonical ID Permanent, unique identifier for an entity (e.g., stadium_mlb_fenway_park)
Bootstrap First-launch process that loads bundled JSON into SwiftData
Delta sync Only fetching changes since last sync (not full data)
AppDataProvider The single source of truth for all canonical data in the app
SwiftData Apple's local database framework (replacement for Core Data)
CloudKit Apple's cloud database service
Alias Historical name mapping (old name → canonical ID)
Soft delete Marking record as deprecated instead of actually deleting

Last updated: January 2026