# SportsTime Data Audit Report **Generated:** 2026-01-20 **Scope:** NBA, MLB, NFL, NHL, MLS, WNBA, NWSL **Data Pipeline:** Scripts → CloudKit → iOS App --- ## Executive Summary The data audit identified **15 issues** across the SportsTime data pipeline, with significant gaps in source reliability, stadium resolution, and iOS data freshness. | Severity | Count | Description | |----------|-------|-------------| | **Critical** | 1 | iOS bundled data severely outdated | | **High** | 4 | Single-source sports, NHL stadium data, NBA naming rights | | **Medium** | 6 | Alias gaps, outdated config, silent game exclusion | | **Low** | 4 | Minor configuration and coverage issues | ### Key Findings **Data Pipeline Health:** - ✅ **Canonical ID system**: 100% format compliance across 7,186 IDs - ✅ **Team mappings**: All 183 teams correctly mapped with current abbreviations - ✅ **Referential integrity**: Zero orphan references (0 games pointing to non-existent teams/stadiums) - ⚠️ **Stadium resolution**: 1,466 games (21.6%) have unresolved stadiums **Critical Risks:** 1. **ESPN single-point-of-failure** for WNBA, NWSL, MLS - if ESPN changes, 3 sports lose all data 2. **NHL has 100% missing stadiums** - Hockey Reference provides no venue data 3. **iOS bundled data 27% behind** - 1,820 games missing from first-launch experience **Root Causes:** - Stadium naming rights changed faster than alias updates (2024-2025) - Fallback source limit (`max_sources_to_try = 2`) prevents third source from being tried - Hockey Reference source limitation (no venue info) combined with fallback limit - iOS bundled JSON not updated with latest pipeline output --- ## Phase Status Tracking | Phase | Status | Issues Found | |-------|--------|--------------| | 1. Hardcoded Mapping Audit | ✅ COMPLETE | 1 Low | | 2. Alias File Completeness | ✅ COMPLETE | 1 Medium, 1 Low | | 3. Scraper Source Reliability | ✅ COMPLETE | 2 High, 1 Medium | | 4. Game Count & Coverage | ✅ COMPLETE | 2 High, 2 Medium, 1 Low | | 5. Canonical ID Consistency | ✅ COMPLETE | 0 issues | | 6. Referential Integrity | ✅ COMPLETE | 1 Medium (NHL source) | | 7. iOS Data Reception | ✅ COMPLETE | 1 Critical, 1 Medium, 1 Low | --- ## Phase 1 Results: Hardcoded Mapping Audit **Files Audited:** - `sportstime_parser/normalizers/team_resolver.py` (TEAM_MAPPINGS) - `sportstime_parser/normalizers/stadium_resolver.py` (STADIUM_MAPPINGS) ### Team Counts | Sport | Hardcoded | Expected | Abbreviations | Status | |-------|-----------|----------|---------------|--------| | NBA | 30 | 30 | 38 | ✅ | | MLB | 30 | 30 | 38 | ✅ | | NFL | 32 | 32 | 40 | ✅ | | NHL | 32 | 32 | 41 | ✅ | | MLS | 30 | 30* | 32 | ✅ | | WNBA | 13 | 13 | 13 | ✅ | | NWSL | 16 | 16 | 24 | ✅ | *MLS: 29 original teams + San Diego FC (2025 expansion) = 30 ### Stadium Counts | Sport | Hardcoded | Notes | Status | |-------|-----------|-------|--------| | NBA | 30 | 1 per team | ✅ | | MLB | 57 | 30 regular + 18 spring training + 9 special venues | ✅ | | NFL | 30 | Includes shared venues (SoFi Stadium: LAR+LAC, MetLife: NYG+NYJ) | ✅ | | NHL | 32 | 1 per team | ✅ | | MLS | 30 | 1 per team | ✅ | | WNBA | 13 | 1 per team | ✅ | | NWSL | 19 | 14 current + 5 expansion team venues (Boston/Denver) | ✅ | ### Recent Updates Verification | Update | Type | Status | Notes | |--------|------|--------|-------| | Utah Hockey Club (NHL) | Relocation | ✅ Present | ARI + UTA abbreviations both map to `team_nhl_ari` | | Golden State Valkyries (WNBA) | Expansion 2025 | ✅ Present | `team_wnba_gsv` with Chase Center venue | | Boston Legacy FC (NWSL) | Expansion 2026 | ✅ Present | `team_nwsl_bos` with Gillette Stadium | | Denver Summit FC (NWSL) | Expansion 2026 | ✅ Present | `team_nwsl_den` with Dick's Sporting Goods Park | | Oakland A's → Sacramento | Temporary relocation | ✅ Present | `stadium_mlb_sutter_health_park` | | San Diego FC (MLS) | Expansion 2025 | ✅ Present | `team_mls_sd` with Snapdragon Stadium | | FedExField → Northwest Stadium | Naming rights | ✅ Present | `stadium_nfl_northwest_stadium` | ### NFL Stadium Sharing | Stadium | Teams | Status | |---------|-------|--------| | SoFi Stadium | LAR, LAC | ✅ Correct | | MetLife Stadium | NYG, NYJ | ✅ Correct | ### Issues Found | # | Issue | Severity | Description | |---|-------|----------|-------------| | 1 | WNBA single abbreviations | Low | All 13 WNBA teams have only 1 abbreviation each. May need additional abbreviations for source compatibility. | ### Phase 1 Summary **Result: PASS** - All team and stadium mappings are complete and up-to-date with 2025-2026 changes. - ✅ All 7 sports have correct team counts - ✅ All stadium counts are appropriate (including spring training, special venues) - ✅ Recent franchise moves/expansions are reflected - ✅ Stadium sharing is correctly handled - ✅ Naming rights updates are current --- ## Phase 2 Results: Alias File Completeness **Files Audited:** - `Scripts/team_aliases.json` - `Scripts/stadium_aliases.json` ### Team Aliases Summary | Sport | Entries | Coverage | Status | |-------|---------|----------|--------| | MLB | 23 | Historical relocations/renames | ✅ | | NBA | 29 | Historical relocations/renames | ✅ | | NHL | 24 | Historical relocations/renames | ✅ | | NFL | 0 | **No aliases** | ⚠️ | | MLS | 0 | No aliases (newer league) | ✅ | | WNBA | 0 | No aliases (newer league) | ✅ | | NWSL | 0 | No aliases (newer league) | ✅ | | **Total** | **76** | | | - All 76 entries have valid date ranges - No orphan references (all canonical IDs exist in mappings) ### Stadium Aliases Summary | Sport | Entries | Coverage | Status | |-------|---------|----------|--------| | MLB | 109 | Regular + spring training + special venues | ✅ | | NFL | 65 | Naming rights history | ✅ | | NBA | 44 | Naming rights history | ✅ | | NHL | 39 | Naming rights history | ✅ | | MLS | 35 | Current + naming variants | ✅ | | WNBA | 15 | Current + naming variants | ✅ | | NWSL | 14 | Current + naming variants | ✅ | | **Total** | **321** | | | - 65 entries have date ranges (historical naming rights) - 256 entries are permanent aliases (no date restrictions) ### Orphan Reference Check | Type | Count | Status | |------|-------|--------| | Team aliases with invalid references | 0 | ✅ | | Stadium aliases with invalid references | **5** | ❌ | **Orphan Stadium References Found:** | Alias Name | References (Invalid) | Correct ID | |------------|---------------------|------------| | Broncos Stadium at Mile High | `stadium_nfl_empower_field_at_mile_high` | `stadium_nfl_empower_field` | | Sports Authority Field at Mile High | `stadium_nfl_empower_field_at_mile_high` | `stadium_nfl_empower_field` | | Invesco Field at Mile High | `stadium_nfl_empower_field_at_mile_high` | `stadium_nfl_empower_field` | | Mile High Stadium | `stadium_nfl_empower_field_at_mile_high` | `stadium_nfl_empower_field` | | Arrowhead Stadium | `stadium_nfl_geha_field_at_arrowhead_stadium` | `stadium_nfl_arrowhead_stadium` | ### Historical Changes Coverage | Historical Name | Current Team | In Aliases? | |-----------------|--------------|-------------| | Montreal Expos | Washington Nationals | ✅ | | Seattle SuperSonics | Oklahoma City Thunder | ✅ | | Arizona Coyotes | Utah Hockey Club | ✅ | | Cleveland Indians | Cleveland Guardians | ✅ | | Hartford Whalers | Carolina Hurricanes | ✅ | | Quebec Nordiques | Colorado Avalanche | ✅ | | Vancouver Grizzlies | Memphis Grizzlies | ✅ | | Washington Redskins | Washington Commanders | ❌ Missing | | Washington Football Team | Washington Commanders | ❌ Missing | | Brooklyn Dodgers | Los Angeles Dodgers | ❌ Missing | ### Issues Found | # | Issue | Severity | Description | |---|-------|----------|-------------| | 2 | Orphan stadium alias references | Medium | 5 stadium aliases point to non-existent canonical IDs (`stadium_nfl_empower_field_at_mile_high`, `stadium_nfl_geha_field_at_arrowhead_stadium`). Causes resolution failures for historical Denver/KC stadium names. | | 3 | No NFL team aliases | Low | Missing Washington Redskins/Football Team historical names. Limits historical game matching for NFL. | ### Phase 2 Summary **Result: PASS with issues** - Alias files cover most historical changes but have referential integrity bugs. - ✅ Team aliases cover MLB/NBA/NHL historical changes - ✅ Stadium aliases cover naming rights changes across all sports - ✅ No date range validation errors - ❌ 5 orphan stadium references need fixing - ⚠️ No NFL team aliases (Washington Redskins/Football Team missing) --- ## Phase 3 Results: Scraper Source Reliability **Files Audited:** - `sportstime_parser/scrapers/base.py` (fallback logic) - `sportstime_parser/scrapers/nba.py`, `mlb.py`, `nfl.py`, `nhl.py`, `mls.py`, `wnba.py`, `nwsl.py` ### Source Dependency Matrix | Sport | Primary | Status | Fallback 1 | Status | Fallback 2 | Status | Risk | |-------|---------|--------|------------|--------|------------|--------|------| | NBA | basketball_reference | ✅ | espn | ✅ | cbs | ❌ NOT IMPL | Medium | | MLB | mlb_api | ✅ | espn | ✅ | baseball_reference | ✅ | Low | | NFL | espn | ✅ | pro_football_reference | ✅ | cbs | ❌ NOT IMPL | Medium | | NHL | hockey_reference | ✅ | nhl_api | ✅ | espn | ✅ | Low | | MLS | espn | ✅ | fbref | ❌ NOT IMPL | - | - | **HIGH** | | WNBA | espn | ✅ | - | - | - | - | **HIGH** | | NWSL | espn | ✅ | - | - | - | - | **HIGH** | ### Unimplemented Sources | Sport | Source | Line | Status | |-------|--------|------|--------| | NBA | cbs | `nba.py:421` | `raise NotImplementedError("CBS scraper not implemented")` | | NFL | cbs | `nfl.py:386` | `raise NotImplementedError("CBS scraper not implemented")` | | MLS | fbref | `mls.py:214` | `raise NotImplementedError("FBref scraper not implemented")` | ### Fallback Logic Analysis **File:** `base.py:189` ```python max_sources_to_try = 2 # Don't try all sources if first few return nothing ``` **Impact:** - Even if 3 sources are declared, only 2 are tried - If sources 1 and 2 fail, source 3 is never attempted - This limits resilience for NBA, MLB, NFL, NHL which have 3 sources ### International Game Filtering | Sport | Hardcoded Locations | Notes | |-------|---------------------|-------| | NFL | London, Mexico City, Frankfurt, Munich, São Paulo | ✅ Complete for 2025 | | NHL | Prague, Stockholm, Helsinki, Tampere, Gothenburg | ✅ Complete for 2025 | | NBA | None | ⚠️ No international filtering (Abu Dhabi games?) | | MLB | None | ⚠️ No international filtering (Mexico City games?) | | MLS | None | N/A (domestic only) | | WNBA | None | N/A (domestic only) | | NWSL | None | N/A (domestic only) | ### Single Point of Failure Risk | Sport | Primary Source | If ESPN Fails... | Risk Level | |-------|----------------|------------------|------------| | WNBA | ESPN only | **Complete data loss** | Critical | | NWSL | ESPN only | **Complete data loss** | Critical | | MLS | ESPN only (fbref not impl) | **Complete data loss** | Critical | | NBA | Basketball-Ref → ESPN | ESPN fallback available | Low | | NFL | ESPN → Pro-Football-Ref | Fallback available | Low | | NHL | Hockey-Ref → NHL API → ESPN | Two fallbacks | Very Low | | MLB | MLB API → ESPN → B-Ref | Two fallbacks | Very Low | ### Issues Found | # | Issue | Severity | Description | |---|-------|----------|-------------| | 4 | WNBA/NWSL/MLS single source | High | ESPN is the only working source for 3 sports. If ESPN changes or fails, data collection completely stops. | | 5 | max_sources_to_try = 2 | High | Third fallback source never tried even if available. Reduces resilience for NBA/MLB/NFL/NHL. | | 6 | CBS/FBref not implemented | Medium | Declared fallback sources raise NotImplementedError. Appears functional in config but fails at runtime. | ### Phase 3 Summary **Result: FAIL** - Critical single-point-of-failure for 3 sports. - ❌ WNBA, NWSL, MLS have only ESPN (no resilience) - ❌ Fallback limit of 2 prevents third source from being tried - ⚠️ CBS and FBref declared but not implemented - ✅ MLB and NHL have full fallback chains - ✅ International game filtering present for NFL/NHL --- ## Phase 4 Results: Game Count & Coverage **Files Audited:** - `Scripts/output/games_*.json` (all 2025 season files) - `Scripts/output/validation_*.md` (all validation reports) - `sportstime_parser/config.py` (EXPECTED_GAME_COUNTS) ### Coverage Summary | Sport | Scraped | Expected | Coverage | Status | |-------|---------|----------|----------|--------| | NBA | 1,231 | 1,230 | 100.1% | ✅ | | MLB | 2,866 | 2,430 | 117.9% | ⚠️ Includes spring training | | NFL | 330 | 272 | 121.3% | ⚠️ Includes preseason/playoffs | | NHL | 1,312 | 1,312 | 100.0% | ✅ | | MLS | 542 | 493 | 109.9% | ✅ Includes playoffs | | WNBA | 322 | 220 | **146.4%** | ⚠️ Expected count outdated | | NWSL | 189 | 182 | 103.8% | ✅ | ### Date Range Analysis | Sport | Start Date | End Date | Notes | |-------|------------|----------|-------| | NBA | 2025-10-21 | 2026-04-12 | Regular season only | | MLB | 2025-03-01 | 2025-11-02 | Includes spring training (417 games in March) | | NFL | 2025-08-01 | 2026-01-25 | Includes preseason (49 in Aug) + playoffs (28 in Jan) | | NHL | 2025-10-07 | 2026-04-16 | Regular season only | | MLS | 2025-02-22 | 2025-11-30 | Regular season + playoffs | | WNBA | 2025-05-02 | 2025-10-11 | Regular season + playoffs | | NWSL | 2025-03-15 | 2025-11-23 | Regular season + playoffs | ### Game Status Distribution All games across all sports have status `unknown` - game status is not being properly parsed from sources. ### Duplicate Game Detection | Sport | Duplicates Found | Details | |-------|-----------------|---------| | NBA | 0 | ✅ | | MLB | 1 | `game_mlb_2025_20250508_det_col_1` appears twice (doubleheader handling issue) | | NFL | 0 | ✅ | | NHL | 0 | ✅ | | MLS | 0 | ✅ | | WNBA | 0 | ✅ | | NWSL | 0 | ✅ | ### Validation Report Analysis | Sport | Total Games | Unresolved Teams | Unresolved Stadiums | Manual Review Items | |-------|-------------|------------------|---------------------|---------------------| | NBA | 1,231 | 0 | **131** | 131 | | MLB | 2,866 | 12 | 4 | 20 | | NFL | 330 | 1 | 5 | 11 | | NHL | 1,312 | 0 | 0 | **1,312** (all missing stadiums) | | MLS | 542 | 1 | **64** | 129 | | WNBA | 322 | 5 | **65** | 135 | | NWSL | 189 | 0 | **16** | 32 | ### Top Unresolved Stadium Names (Recent Naming Rights) | Stadium Name | Occurrences | Actual Venue | Issue | |--------------|-------------|--------------|-------| | Sports Illustrated Stadium | 11 | MLS expansion venue | New venue, missing alias | | Mortgage Matchup Center | 8 | Rocket Mortgage FieldHouse (CLE) | 2025 naming rights change | | ScottsMiracle-Gro Field | 4 | MLS Columbus Crew | Missing alias | | Energizer Park | 3 | MLS CITY SC (STL?) | Missing alias | | Xfinity Mobile Arena | 3 | Intuit Dome (LAC) | 2025 naming rights change | | Rocket Arena | 3 | Toyota Center (HOU) | Potential name change | | CareFirst Arena | 2 | Washington Mystics venue | New WNBA venue name | ### Unresolved Teams (Exhibition/International) | Team Name | Sport | Type | Games | |-----------|-------|------|-------| | BRAZIL | WNBA | International exhibition | 2 | | Toyota Antelopes | WNBA | Japanese team | 2 | | TEAM CLARK | WNBA | All-Star Game | 1 | | (Various MLB) | MLB | International teams | 12 | | (MLS international) | MLS | CCL/exhibition | 1 | | (NFL preseason) | NFL | Pre-season exhibition | 1 | ### NHL Stadium Data Issue **Critical:** Hockey Reference does not provide stadium data. All 1,312 NHL games have `raw_stadium: None`, causing 100% of games to have missing stadium IDs. The NHL fallback sources (NHL API, ESPN) should provide this data, but the `max_sources_to_try = 2` limit combined with Hockey Reference success means fallbacks are never attempted. ### Expected Count Updates Needed | Sport | Current Expected | Recommended | Reason | |-------|------------------|-------------|--------| | WNBA | 220 | **286** | 13 teams × 44 games / 2 (expanded with Golden State Valkyries) | | NFL | 272 | 272 (filter preseason) | Or document that 330 includes preseason | | MLB | 2,430 | 2,430 (filter spring training) | Or document that 2,866 includes spring training | ### Issues Found | # | Issue | Severity | Description | |---|-------|----------|-------------| | 7 | NHL has no stadium data | High | Hockey Reference provides no venue info. All 1,312 games missing stadium_id. Fallback sources not tried. | | 8 | 131 NBA stadium resolution failures | High | Recent naming rights changes ("Mortgage Matchup Center", "Xfinity Mobile Arena") not in aliases. | | 9 | Outdated WNBA expected count | Medium | Config says 220 but WNBA expanded to 13 teams in 2025; actual is 322 (286 regular + playoffs). | | 10 | MLS/WNBA stadium alias gaps | Medium | 64 MLS + 65 WNBA unresolved stadiums from new/renamed venues. | | 11 | Game status not parsed | Low | All games have status `unknown` instead of final/scheduled/postponed. | ### Phase 4 Summary **Result: FAIL** - Significant stadium resolution failures across multiple sports. - ❌ 131 NBA games missing stadium (naming rights changes) - ❌ 1,312 NHL games missing stadium (source doesn't provide data) - ❌ 64 MLS + 65 WNBA stadiums unresolved (new/renamed venues) - ⚠️ WNBA expected count severely outdated (220 vs 322 actual) - ⚠️ MLB/NFL include preseason/spring training games - ✅ No significant duplicate games (1 MLB doubleheader edge case) - ✅ All teams resolved except exhibition/international games --- ## Phase 5 Results: Canonical ID Consistency **Files Audited:** - `sportstime_parser/normalizers/canonical_id.py` (Python ID generation) - `SportsTime/Core/Models/Local/CanonicalModels.swift` (iOS models) - `SportsTime/Core/Services/BootstrapService.swift` (iOS JSON parsing) - All `Scripts/output/*.json` files (generated IDs) ### Format Validation | Type | Total IDs | Valid | Invalid | Pass Rate | |------|-----------|-------|---------|-----------| | Team | 183 | 183 | 0 | 100.0% ✅ | | Stadium | 211 | 211 | 0 | 100.0% ✅ | | Game | 6,792 | 6,792 | 0 | 100.0% ✅ | ### ID Format Patterns (all validated) ``` Teams: team_{sport}_{abbrev} → team_nba_lal Stadiums: stadium_{sport}_{normalized_name} → stadium_nba_cryptocom_arena Games: game_{sport}_{season}_{YYYYMMDD}_{away}_{home}[_{#}] → game_nba_2025_20251021_hou_okc ``` ### Normalization Quality | Check | Result | |-------|--------| | Double underscores (`__`) | 0 found ✅ | | Leading/trailing underscores | 0 found ✅ | | Uppercase letters | 0 found ✅ | | Special characters | 0 found ✅ | ### Abbreviation Lengths (Teams) | Length | Count | |--------|-------| | 2 chars | 21 | | 3 chars | 161 | | 4 chars | 1 | ### Stadium ID Lengths - Minimum: 8 characters - Maximum: 29 characters - Average: 16.2 characters ### iOS Cross-Compatibility | Aspect | Status | Notes | |--------|--------|-------| | Field naming convention | ✅ Compatible | Python uses snake_case; iOS `BootstrapService` uses matching Codable structs | | Deterministic UUID generation | ✅ Compatible | iOS uses SHA256 hash of canonical_id - matches any valid string | | Schema version | ✅ Compatible | Both use version 1 | | Required fields | ✅ Present | All iOS-required fields present in JSON output | ### Field Mapping (Python → iOS) | Python Field | iOS Field | Notes | |--------------|-----------|-------| | `canonical_id` | `canonicalId` | Mapped via `JSONCanonicalStadium.canonical_id` → `CanonicalStadium.canonicalId` | | `home_team_canonical_id` | `homeTeamCanonicalId` | Explicit mapping in BootstrapService | | `away_team_canonical_id` | `awayTeamCanonicalId` | Explicit mapping in BootstrapService | | `stadium_canonical_id` | `stadiumCanonicalId` | Explicit mapping in BootstrapService | | `game_datetime_utc` | `dateTime` | ISO 8601 parsing with fallback to legacy format | ### Issues Found **No issues found.** All canonical IDs are: - Correctly formatted according to defined patterns - Properly normalized (lowercase, no special characters) - Deterministic (same input produces same output) - Compatible with iOS parsing ### Phase 5 Summary **Result: PASS** - All canonical IDs are consistent and iOS-compatible. - ✅ 100% format validation pass rate across 7,186 IDs - ✅ No normalization issues found - ✅ iOS BootstrapService explicitly handles snake_case → camelCase mapping - ✅ Deterministic UUID generation using SHA256 hash --- ## Phase 6 Results: Referential Integrity **Files Audited:** - `Scripts/output/games_*_2025.json` - `Scripts/output/teams_*.json` - `Scripts/output/stadiums_*.json` ### Game → Team References | Sport | Total Games | Valid Home | Valid Away | Orphan Home | Orphan Away | Status | |-------|-------------|------------|------------|-------------|-------------|--------| | NBA | 1,231 | 1,231 | 1,231 | 0 | 0 | ✅ | | MLB | 2,866 | 2,866 | 2,866 | 0 | 0 | ✅ | | NFL | 330 | 330 | 330 | 0 | 0 | ✅ | | NHL | 1,312 | 1,312 | 1,312 | 0 | 0 | ✅ | | MLS | 542 | 542 | 542 | 0 | 0 | ✅ | | WNBA | 322 | 322 | 322 | 0 | 0 | ✅ | | NWSL | 189 | 189 | 189 | 0 | 0 | ✅ | **Result:** 100% valid team references across all 6,792 games. ### Game → Stadium References | Sport | Total Games | Valid | Missing | Percentage Missing | |-------|-------------|-------|---------|-------------------| | NBA | 1,231 | 1,231 | 0 | 0.0% ✅ | | MLB | 2,866 | 2,862 | 4 | 0.1% ✅ | | NFL | 330 | 325 | 5 | 1.5% ✅ | | NHL | 1,312 | 0 | **1,312** | **100%** ❌ | | MLS | 542 | 478 | 64 | 11.8% ⚠️ | | WNBA | 322 | 257 | 65 | 20.2% ⚠️ | | NWSL | 189 | 173 | 16 | 8.5% ⚠️ | **Note:** "Missing" means `stadium_canonical_id` is empty (resolution failed at scrape time). This is NOT orphan references to non-existent stadiums. ### Team → Stadium References | Sport | Teams | Valid Stadium | Invalid | Status | |-------|-------|---------------|---------|--------| | NBA | 30 | 30 | 0 | ✅ | | MLB | 30 | 30 | 0 | ✅ | | NFL | 32 | 32 | 0 | ✅ | | NHL | 32 | 32 | 0 | ✅ | | MLS | 30 | 30 | 0 | ✅ | | WNBA | 13 | 13 | 0 | ✅ | | NWSL | 16 | 16 | 0 | ✅ | **Result:** 100% valid team → stadium references. ### Cross-Sport Stadium Check ✅ No stadiums are duplicated across sports. Each `stadium_{sport}_*` ID is unique to its sport. ### Missing Stadium Root Causes | Sport | Missing | Root Cause | |-------|---------|------------| | NHL | 1,312 | **Hockey Reference provides no venue data** - source limitation | | MLS | 64 | New/renamed stadiums not in aliases (see Phase 4) | | WNBA | 65 | New venue names not in aliases (see Phase 4) | | NWSL | 16 | Expansion team venues + alternate venues | | NFL | 5 | International games not in stadium mappings | | MLB | 4 | Exhibition/international games | ### Orphan Reference Summary | Reference Type | Total Checked | Orphans Found | |----------------|---------------|---------------| | Game → Home Team | 6,792 | 0 ✅ | | Game → Away Team | 6,792 | 0 ✅ | | Game → Stadium | 6,792 | 0 ✅ | | Team → Stadium | 183 | 0 ✅ | **Note:** Zero orphan references. All "missing" stadiums are resolution failures (empty string), not references to non-existent canonical IDs. ### Issues Found | # | Issue | Severity | Description | |---|-------|----------|-------------| | 12 | NHL games have no stadium data | Medium | Hockey Reference source doesn't provide venue information. All 1,312 NHL games have empty stadium_canonical_id. Fallback sources could provide this data but are limited by `max_sources_to_try = 2`. | ### Phase 6 Summary **Result: PASS with known limitations** - No orphan references exist; missing stadiums are resolution failures. - ✅ 100% valid team references (home and away) - ✅ 100% valid team → stadium references - ✅ No orphan references to non-existent canonical IDs - ⚠️ 1,466 games (21.6%) have empty stadium_canonical_id (resolution failures, not orphans) - ⚠️ NHL accounts for 90% of missing stadium data (source limitation) --- ## Phase 7 Results: iOS Data Reception **Files Audited:** - `SportsTime/Core/Services/BootstrapService.swift` (JSON parsing) - `SportsTime/Core/Services/CanonicalSyncService.swift` (CloudKit sync) - `SportsTime/Core/Services/DataProvider.swift` (data access) - `SportsTime/Core/Models/Local/CanonicalModels.swift` (SwiftData models) - `SportsTime/Resources/*_canonical.json` (bundled data files) ### Bundled Data Comparison | Data Type | iOS Bundled | Scripts Output | Difference | Status | |-----------|-------------|----------------|------------|--------| | Teams | 148 | 183 | **-35** (19%) | ❌ STALE | | Stadiums | 122 | 211 | **-89** (42%) | ❌ STALE | | Games | 4,972 | 6,792 | **-1,820** (27%) | ❌ STALE | **iOS bundled data is significantly outdated compared to Scripts output.** ### Field Mapping Verification | Python Field | iOS JSON Struct | iOS Model | Type Match | Status | |--------------|-----------------|-----------|------------|--------| | `canonical_id` | `canonical_id` | `canonicalId` | String ✅ | ✅ | | `name` | `name` | `name` | String ✅ | ✅ | | `game_datetime_utc` | `game_datetime_utc` | `dateTime` | ISO 8601 → Date ✅ | ✅ | | `date` + `time` (legacy) | `date`, `time` | `dateTime` | Fallback parsing ✅ | ✅ | | `home_team_canonical_id` | `home_team_canonical_id` | `homeTeamCanonicalId` | String ✅ | ✅ | | `away_team_canonical_id` | `away_team_canonical_id` | `awayTeamCanonicalId` | String ✅ | ✅ | | `stadium_canonical_id` | `stadium_canonical_id` | `stadiumCanonicalId` | String ✅ | ✅ | | `sport` | `sport` | `sport` | String ✅ | ✅ | | `season` | `season` | `season` | String ✅ | ✅ | | `is_playoff` | `is_playoff` | `isPlayoff` | Bool ✅ | ✅ | | `broadcast_info` | `broadcast_info` | `broadcastInfo` | String? ✅ | ✅ | **Result:** All field mappings are correct and compatible. ### Date Parsing Compatibility iOS `BootstrapService` supports both formats: ```swift // New canonical format (preferred) let game_datetime_utc: String? // ISO 8601 // Legacy format (fallback) let date: String? // "YYYY-MM-DD" let time: String? // "HH:mm" or "TBD" ``` **Current iOS bundled games use legacy format.** After updating bundled data, new `game_datetime_utc` format will be used. ### Missing Reference Handling **`DataProvider.filterRichGames()` behavior:** ```swift return games.compactMap { game in guard let homeTeam = teamsById[game.homeTeamId], let awayTeam = teamsById[game.awayTeamId], let stadium = stadiumsById[game.stadiumId] else { return nil // ⚠️ Silently drops game } return RichGame(...) } ``` **Impact:** - Games with missing stadium IDs are **silently excluded** from RichGame queries - No error logging or fallback behavior - User sees fewer games than expected without explanation ### Deduplication Logic **Bootstrap:** No explicit deduplication. If bundled JSON contains duplicate canonical IDs, both would be inserted into SwiftData (leading to potential query issues). **CloudKit Sync:** Uses upsert pattern with canonical ID as unique key - duplicates would overwrite. ### Schema Version Compatibility | Component | Schema Version | Status | |-----------|----------------|--------| | Scripts output | 1 | ✅ | | iOS CanonicalModels | 1 | ✅ | | iOS BootstrapService | Expects 1 | ✅ | **Compatible.** Schema version mismatch protection exists in `CanonicalSyncService`: ```swift case .schemaVersionTooNew(let version): return "Data requires app version supporting schema \(version). Please update the app." ``` ### Bootstrap Order Validation iOS bootstraps in correct dependency order: 1. Stadiums (no dependencies) 2. Stadium aliases (depends on stadiums) 3. League structure (no dependencies) 4. Teams (depends on stadiums) 5. Team aliases (depends on teams) 6. Games (depends on teams + stadiums) **Correct - prevents orphan references during bootstrap.** ### CloudKit Sync Validation `CanonicalSyncService` syncs in same dependency order and tracks: - Per-entity sync timestamps - Skipped records (incompatible schema version) - Skipped records (older than local) - Sync duration and cancellation **Well-designed sync infrastructure.** ### Issues Found | # | Issue | Severity | Description | |---|-------|----------|-------------| | 13 | iOS bundled data severely outdated | **Critical** | Missing 35 teams (19%), 89 stadiums (42%), 1,820 games (27%). First-launch experience shows incomplete data until CloudKit sync completes. | | 14 | Silent game exclusion in RichGame queries | Medium | `filterRichGames()` silently drops games with missing team/stadium references. Users see fewer games without explanation. | | 15 | No bootstrap deduplication | Low | Duplicate game IDs in bundled JSON would create duplicate SwiftData records. Low risk since JSON is generated correctly. | ### Phase 7 Summary **Result: FAIL** - iOS bundled data is critically outdated. - ❌ iOS bundled data missing 35 teams, 89 stadiums, 1,820 games - ⚠️ Games with unresolved references silently dropped from RichGame queries - ✅ Field mapping between Python and iOS is correct - ✅ Date parsing supports both legacy and new formats - ✅ Schema versions are compatible - ✅ Bootstrap/sync order handles dependencies correctly --- ## Prioritized Issue List | # | Issue | Severity | Phase | Root Cause | Remediation | |---|-------|----------|-------|------------|-------------| | 13 | iOS bundled data severely outdated | **Critical** | 7 | Bundled JSON not updated after pipeline runs | Copy Scripts/output/*_canonical.json to iOS Resources/ and rebuild | | 4 | WNBA/NWSL/MLS ESPN-only source | **High** | 3 | No implemented fallback sources | Implement alternative scrapers (FBref for MLS, WNBA League Pass) | | 5 | max_sources_to_try = 2 limits fallback | **High** | 3 | Hardcoded limit in base.py:189 | Increase to 3 or remove limit for sports with 3+ sources | | 7 | NHL has no stadium data from primary source | **High** | 4 | Hockey Reference doesn't provide venue info | Force NHL to use NHL API or ESPN as primary (they provide venues) | | 8 | 131 NBA stadium resolution failures | **High** | 4 | 2024-2025 naming rights not in aliases | Add aliases: "Mortgage Matchup Center" → Rocket Mortgage FieldHouse, "Xfinity Mobile Arena" → Intuit Dome | | 2 | Orphan stadium alias references | **Medium** | 2 | Wrong canonical IDs in stadium_aliases.json | Fix 5 Denver/KC stadium aliases pointing to non-existent IDs | | 6 | CBS/FBref scrapers declared but not implemented | **Medium** | 3 | NotImplementedError at runtime | Either implement or remove from source lists to avoid confusion | | 9 | Outdated WNBA expected count | **Medium** | 4 | WNBA expanded to 13 teams in 2025 | Update config.py EXPECTED_GAME_COUNTS["wnba"] from 220 to 286 | | 10 | MLS/WNBA stadium alias gaps | **Medium** | 4 | New/renamed venues missing from aliases | Add 129 missing stadium aliases (64 MLS + 65 WNBA) | | 12 | NHL games have no stadium data | **Medium** | 6 | Same as Issue #7 | See Issue #7 remediation | | 14 | Silent game exclusion in RichGame queries | **Medium** | 7 | compactMap silently drops games | Log dropped games or return partial RichGame with placeholder stadium | | 1 | WNBA single abbreviations | **Low** | 1 | Only 1 abbreviation per team | Add alternative abbreviations for source compatibility | | 3 | No NFL team aliases | **Low** | 2 | Missing Washington Redskins/Football Team | Add historical Washington team name aliases | | 11 | Game status not parsed | **Low** | 4 | Status field always "unknown" | Parse game status from source data (final, scheduled, postponed) | | 15 | No bootstrap deduplication | **Low** | 7 | No explicit duplicate check during bootstrap | Add deduplication check in bootstrapGames() | --- ## Recommended Next Steps ### Immediate (Before Next Release) 1. **Update iOS bundled data** (Issue #13) ```bash cp Scripts/output/stadiums_*.json SportsTime/Resources/stadiums_canonical.json cp Scripts/output/teams_*.json SportsTime/Resources/teams_canonical.json cp Scripts/output/games_*.json SportsTime/Resources/games_canonical.json ``` 2. **Fix NHL stadium data** (Issues #7, #12) - Change NHL primary source from Hockey Reference to NHL API - Or: Increase `max_sources_to_try` to 3 so fallbacks are attempted 3. **Add critical stadium aliases** (Issues #8, #10) - "Mortgage Matchup Center" → `stadium_nba_rocket_mortgage_fieldhouse` - "Xfinity Mobile Arena" → `stadium_nba_intuit_dome` - Run validation report to identify all unresolved venue names ### Short-term (This Quarter) 4. **Implement MLS fallback source** (Issue #4) - FBref has MLS data with venue information - Reduces ESPN single-point-of-failure risk 5. **Fix orphan alias references** (Issue #2) - Correct 5 NFL stadium aliases pointing to wrong canonical IDs - Add validation check to prevent future orphan references 6. **Update expected game counts** (Issue #9) - WNBA: 220 → 286 (13 teams × 44 games / 2) ### Long-term (Next Quarter) 7. **Implement WNBA/NWSL fallback sources** (Issue #4) - Consider WNBA League Pass API or other sources - NWSL has limited data availability - may need to accept ESPN-only 8. **Add RichGame partial loading** (Issue #14) - Log games dropped due to missing references - Consider returning games with placeholder stadiums for NHL 9. **Parse game status** (Issue #11) - Extract final/scheduled/postponed from source data - Enables filtering by game state --- ## Verification Checklist After implementing fixes, verify: - [ ] Run `python -m sportstime_parser scrape --sport all --season 2025` - [ ] Check validation reports show <5% unresolved stadiums per sport - [ ] Copy output JSON to iOS Resources/ - [ ] Build iOS app and verify data loads at startup - [ ] Query RichGames and verify game count matches expectations - [ ] Run CloudKit sync and verify no errors