Scripts changes: - Add WNBA abbreviation aliases to team_resolver.py - Fix NHL stadium coordinates in stadium_resolver.py - Add validate_aliases.py script for orphan detection - Update scrapers with improved error handling - Add DATA_AUDIT.md and REMEDIATION_PLAN.md documentation - Update alias JSON files with new mappings iOS bundle updates: - Update games_canonical.json with latest scraped data - Update teams_canonical.json and stadiums_canonical.json - Sync alias files with Scripts versions All 5 remediation phases complete. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
34 KiB
SportsTime Data Audit Report
Generated: 2026-01-20 Scope: NBA, MLB, NFL, NHL, MLS, WNBA, NWSL Data Pipeline: Scripts → CloudKit → iOS App
Executive Summary
The data audit identified 15 issues across the SportsTime data pipeline, with significant gaps in source reliability, stadium resolution, and iOS data freshness.
| Severity | Count | Description |
|---|---|---|
| Critical | 1 | iOS bundled data severely outdated |
| High | 4 | Single-source sports, NHL stadium data, NBA naming rights |
| Medium | 6 | Alias gaps, outdated config, silent game exclusion |
| Low | 4 | Minor configuration and coverage issues |
Key Findings
Data Pipeline Health:
- ✅ Canonical ID system: 100% format compliance across 7,186 IDs
- ✅ Team mappings: All 183 teams correctly mapped with current abbreviations
- ✅ Referential integrity: Zero orphan references (0 games pointing to non-existent teams/stadiums)
- ⚠️ Stadium resolution: 1,466 games (21.6%) have unresolved stadiums
Critical Risks:
- ESPN single-point-of-failure for WNBA, NWSL, MLS - if ESPN changes, 3 sports lose all data
- NHL has 100% missing stadiums - Hockey Reference provides no venue data
- iOS bundled data 27% behind - 1,820 games missing from first-launch experience
Root Causes:
- Stadium naming rights changed faster than alias updates (2024-2025)
- Fallback source limit (
max_sources_to_try = 2) prevents third source from being tried - Hockey Reference source limitation (no venue info) combined with fallback limit
- iOS bundled JSON not updated with latest pipeline output
Phase Status Tracking
| Phase | Status | Issues Found |
|---|---|---|
| 1. Hardcoded Mapping Audit | ✅ COMPLETE | 1 Low |
| 2. Alias File Completeness | ✅ COMPLETE | 1 Medium, 1 Low |
| 3. Scraper Source Reliability | ✅ COMPLETE | 2 High, 1 Medium |
| 4. Game Count & Coverage | ✅ COMPLETE | 2 High, 2 Medium, 1 Low |
| 5. Canonical ID Consistency | ✅ COMPLETE | 0 issues |
| 6. Referential Integrity | ✅ COMPLETE | 1 Medium (NHL source) |
| 7. iOS Data Reception | ✅ COMPLETE | 1 Critical, 1 Medium, 1 Low |
Phase 1 Results: Hardcoded Mapping Audit
Files Audited:
sportstime_parser/normalizers/team_resolver.py(TEAM_MAPPINGS)sportstime_parser/normalizers/stadium_resolver.py(STADIUM_MAPPINGS)
Team Counts
| Sport | Hardcoded | Expected | Abbreviations | Status |
|---|---|---|---|---|
| NBA | 30 | 30 | 38 | ✅ |
| MLB | 30 | 30 | 38 | ✅ |
| NFL | 32 | 32 | 40 | ✅ |
| NHL | 32 | 32 | 41 | ✅ |
| MLS | 30 | 30* | 32 | ✅ |
| WNBA | 13 | 13 | 13 | ✅ |
| NWSL | 16 | 16 | 24 | ✅ |
*MLS: 29 original teams + San Diego FC (2025 expansion) = 30
Stadium Counts
| Sport | Hardcoded | Notes | Status |
|---|---|---|---|
| NBA | 30 | 1 per team | ✅ |
| MLB | 57 | 30 regular + 18 spring training + 9 special venues | ✅ |
| NFL | 30 | Includes shared venues (SoFi Stadium: LAR+LAC, MetLife: NYG+NYJ) | ✅ |
| NHL | 32 | 1 per team | ✅ |
| MLS | 30 | 1 per team | ✅ |
| WNBA | 13 | 1 per team | ✅ |
| NWSL | 19 | 14 current + 5 expansion team venues (Boston/Denver) | ✅ |
Recent Updates Verification
| Update | Type | Status | Notes |
|---|---|---|---|
| Utah Hockey Club (NHL) | Relocation | ✅ Present | ARI + UTA abbreviations both map to team_nhl_ari |
| Golden State Valkyries (WNBA) | Expansion 2025 | ✅ Present | team_wnba_gsv with Chase Center venue |
| Boston Legacy FC (NWSL) | Expansion 2026 | ✅ Present | team_nwsl_bos with Gillette Stadium |
| Denver Summit FC (NWSL) | Expansion 2026 | ✅ Present | team_nwsl_den with Dick's Sporting Goods Park |
| Oakland A's → Sacramento | Temporary relocation | ✅ Present | stadium_mlb_sutter_health_park |
| San Diego FC (MLS) | Expansion 2025 | ✅ Present | team_mls_sd with Snapdragon Stadium |
| FedExField → Northwest Stadium | Naming rights | ✅ Present | stadium_nfl_northwest_stadium |
NFL Stadium Sharing
| Stadium | Teams | Status |
|---|---|---|
| SoFi Stadium | LAR, LAC | ✅ Correct |
| MetLife Stadium | NYG, NYJ | ✅ Correct |
Issues Found
| # | Issue | Severity | Description |
|---|---|---|---|
| 1 | WNBA single abbreviations | Low | All 13 WNBA teams have only 1 abbreviation each. May need additional abbreviations for source compatibility. |
Phase 1 Summary
Result: PASS - All team and stadium mappings are complete and up-to-date with 2025-2026 changes.
- ✅ All 7 sports have correct team counts
- ✅ All stadium counts are appropriate (including spring training, special venues)
- ✅ Recent franchise moves/expansions are reflected
- ✅ Stadium sharing is correctly handled
- ✅ Naming rights updates are current
Phase 2 Results: Alias File Completeness
Files Audited:
Scripts/team_aliases.jsonScripts/stadium_aliases.json
Team Aliases Summary
| Sport | Entries | Coverage | Status |
|---|---|---|---|
| MLB | 23 | Historical relocations/renames | ✅ |
| NBA | 29 | Historical relocations/renames | ✅ |
| NHL | 24 | Historical relocations/renames | ✅ |
| NFL | 0 | No aliases | ⚠️ |
| MLS | 0 | No aliases (newer league) | ✅ |
| WNBA | 0 | No aliases (newer league) | ✅ |
| NWSL | 0 | No aliases (newer league) | ✅ |
| Total | 76 |
- All 76 entries have valid date ranges
- No orphan references (all canonical IDs exist in mappings)
Stadium Aliases Summary
| Sport | Entries | Coverage | Status |
|---|---|---|---|
| MLB | 109 | Regular + spring training + special venues | ✅ |
| NFL | 65 | Naming rights history | ✅ |
| NBA | 44 | Naming rights history | ✅ |
| NHL | 39 | Naming rights history | ✅ |
| MLS | 35 | Current + naming variants | ✅ |
| WNBA | 15 | Current + naming variants | ✅ |
| NWSL | 14 | Current + naming variants | ✅ |
| Total | 321 |
- 65 entries have date ranges (historical naming rights)
- 256 entries are permanent aliases (no date restrictions)
Orphan Reference Check
| Type | Count | Status |
|---|---|---|
| Team aliases with invalid references | 0 | ✅ |
| Stadium aliases with invalid references | 5 | ❌ |
Orphan Stadium References Found:
| Alias Name | References (Invalid) | Correct ID |
|---|---|---|
| Broncos Stadium at Mile High | stadium_nfl_empower_field_at_mile_high |
stadium_nfl_empower_field |
| Sports Authority Field at Mile High | stadium_nfl_empower_field_at_mile_high |
stadium_nfl_empower_field |
| Invesco Field at Mile High | stadium_nfl_empower_field_at_mile_high |
stadium_nfl_empower_field |
| Mile High Stadium | stadium_nfl_empower_field_at_mile_high |
stadium_nfl_empower_field |
| Arrowhead Stadium | stadium_nfl_geha_field_at_arrowhead_stadium |
stadium_nfl_arrowhead_stadium |
Historical Changes Coverage
| Historical Name | Current Team | In Aliases? |
|---|---|---|
| Montreal Expos | Washington Nationals | ✅ |
| Seattle SuperSonics | Oklahoma City Thunder | ✅ |
| Arizona Coyotes | Utah Hockey Club | ✅ |
| Cleveland Indians | Cleveland Guardians | ✅ |
| Hartford Whalers | Carolina Hurricanes | ✅ |
| Quebec Nordiques | Colorado Avalanche | ✅ |
| Vancouver Grizzlies | Memphis Grizzlies | ✅ |
| Washington Redskins | Washington Commanders | ❌ Missing |
| Washington Football Team | Washington Commanders | ❌ Missing |
| Brooklyn Dodgers | Los Angeles Dodgers | ❌ Missing |
Issues Found
| # | Issue | Severity | Description |
|---|---|---|---|
| 2 | Orphan stadium alias references | Medium | 5 stadium aliases point to non-existent canonical IDs (stadium_nfl_empower_field_at_mile_high, stadium_nfl_geha_field_at_arrowhead_stadium). Causes resolution failures for historical Denver/KC stadium names. |
| 3 | No NFL team aliases | Low | Missing Washington Redskins/Football Team historical names. Limits historical game matching for NFL. |
Phase 2 Summary
Result: PASS with issues - Alias files cover most historical changes but have referential integrity bugs.
- ✅ Team aliases cover MLB/NBA/NHL historical changes
- ✅ Stadium aliases cover naming rights changes across all sports
- ✅ No date range validation errors
- ❌ 5 orphan stadium references need fixing
- ⚠️ No NFL team aliases (Washington Redskins/Football Team missing)
Phase 3 Results: Scraper Source Reliability
Files Audited:
sportstime_parser/scrapers/base.py(fallback logic)sportstime_parser/scrapers/nba.py,mlb.py,nfl.py,nhl.py,mls.py,wnba.py,nwsl.py
Source Dependency Matrix
| Sport | Primary | Status | Fallback 1 | Status | Fallback 2 | Status | Risk |
|---|---|---|---|---|---|---|---|
| NBA | basketball_reference | ✅ | espn | ✅ | cbs | ❌ NOT IMPL | Medium |
| MLB | mlb_api | ✅ | espn | ✅ | baseball_reference | ✅ | Low |
| NFL | espn | ✅ | pro_football_reference | ✅ | cbs | ❌ NOT IMPL | Medium |
| NHL | hockey_reference | ✅ | nhl_api | ✅ | espn | ✅ | Low |
| MLS | espn | ✅ | fbref | ❌ NOT IMPL | - | - | HIGH |
| WNBA | espn | ✅ | - | - | - | - | HIGH |
| NWSL | espn | ✅ | - | - | - | - | HIGH |
Unimplemented Sources
| Sport | Source | Line | Status |
|---|---|---|---|
| NBA | cbs | nba.py:421 |
raise NotImplementedError("CBS scraper not implemented") |
| NFL | cbs | nfl.py:386 |
raise NotImplementedError("CBS scraper not implemented") |
| MLS | fbref | mls.py:214 |
raise NotImplementedError("FBref scraper not implemented") |
Fallback Logic Analysis
File: base.py:189
max_sources_to_try = 2 # Don't try all sources if first few return nothing
Impact:
- Even if 3 sources are declared, only 2 are tried
- If sources 1 and 2 fail, source 3 is never attempted
- This limits resilience for NBA, MLB, NFL, NHL which have 3 sources
International Game Filtering
| Sport | Hardcoded Locations | Notes |
|---|---|---|
| NFL | London, Mexico City, Frankfurt, Munich, São Paulo | ✅ Complete for 2025 |
| NHL | Prague, Stockholm, Helsinki, Tampere, Gothenburg | ✅ Complete for 2025 |
| NBA | None | ⚠️ No international filtering (Abu Dhabi games?) |
| MLB | None | ⚠️ No international filtering (Mexico City games?) |
| MLS | None | N/A (domestic only) |
| WNBA | None | N/A (domestic only) |
| NWSL | None | N/A (domestic only) |
Single Point of Failure Risk
| Sport | Primary Source | If ESPN Fails... | Risk Level |
|---|---|---|---|
| WNBA | ESPN only | Complete data loss | Critical |
| NWSL | ESPN only | Complete data loss | Critical |
| MLS | ESPN only (fbref not impl) | Complete data loss | Critical |
| NBA | Basketball-Ref → ESPN | ESPN fallback available | Low |
| NFL | ESPN → Pro-Football-Ref | Fallback available | Low |
| NHL | Hockey-Ref → NHL API → ESPN | Two fallbacks | Very Low |
| MLB | MLB API → ESPN → B-Ref | Two fallbacks | Very Low |
Issues Found
| # | Issue | Severity | Description |
|---|---|---|---|
| 4 | WNBA/NWSL/MLS single source | High | ESPN is the only working source for 3 sports. If ESPN changes or fails, data collection completely stops. |
| 5 | max_sources_to_try = 2 | High | Third fallback source never tried even if available. Reduces resilience for NBA/MLB/NFL/NHL. |
| 6 | CBS/FBref not implemented | Medium | Declared fallback sources raise NotImplementedError. Appears functional in config but fails at runtime. |
Phase 3 Summary
Result: FAIL - Critical single-point-of-failure for 3 sports.
- ❌ WNBA, NWSL, MLS have only ESPN (no resilience)
- ❌ Fallback limit of 2 prevents third source from being tried
- ⚠️ CBS and FBref declared but not implemented
- ✅ MLB and NHL have full fallback chains
- ✅ International game filtering present for NFL/NHL
Phase 4 Results: Game Count & Coverage
Files Audited:
Scripts/output/games_*.json(all 2025 season files)Scripts/output/validation_*.md(all validation reports)sportstime_parser/config.py(EXPECTED_GAME_COUNTS)
Coverage Summary
| Sport | Scraped | Expected | Coverage | Status |
|---|---|---|---|---|
| NBA | 1,231 | 1,230 | 100.1% | ✅ |
| MLB | 2,866 | 2,430 | 117.9% | ⚠️ Includes spring training |
| NFL | 330 | 272 | 121.3% | ⚠️ Includes preseason/playoffs |
| NHL | 1,312 | 1,312 | 100.0% | ✅ |
| MLS | 542 | 493 | 109.9% | ✅ Includes playoffs |
| WNBA | 322 | 220 | 146.4% | ⚠️ Expected count outdated |
| NWSL | 189 | 182 | 103.8% | ✅ |
Date Range Analysis
| Sport | Start Date | End Date | Notes |
|---|---|---|---|
| NBA | 2025-10-21 | 2026-04-12 | Regular season only |
| MLB | 2025-03-01 | 2025-11-02 | Includes spring training (417 games in March) |
| NFL | 2025-08-01 | 2026-01-25 | Includes preseason (49 in Aug) + playoffs (28 in Jan) |
| NHL | 2025-10-07 | 2026-04-16 | Regular season only |
| MLS | 2025-02-22 | 2025-11-30 | Regular season + playoffs |
| WNBA | 2025-05-02 | 2025-10-11 | Regular season + playoffs |
| NWSL | 2025-03-15 | 2025-11-23 | Regular season + playoffs |
Game Status Distribution
All games across all sports have status unknown - game status is not being properly parsed from sources.
Duplicate Game Detection
| Sport | Duplicates Found | Details |
|---|---|---|
| NBA | 0 | ✅ |
| MLB | 1 | game_mlb_2025_20250508_det_col_1 appears twice (doubleheader handling issue) |
| NFL | 0 | ✅ |
| NHL | 0 | ✅ |
| MLS | 0 | ✅ |
| WNBA | 0 | ✅ |
| NWSL | 0 | ✅ |
Validation Report Analysis
| Sport | Total Games | Unresolved Teams | Unresolved Stadiums | Manual Review Items |
|---|---|---|---|---|
| NBA | 1,231 | 0 | 131 | 131 |
| MLB | 2,866 | 12 | 4 | 20 |
| NFL | 330 | 1 | 5 | 11 |
| NHL | 1,312 | 0 | 0 | 1,312 (all missing stadiums) |
| MLS | 542 | 1 | 64 | 129 |
| WNBA | 322 | 5 | 65 | 135 |
| NWSL | 189 | 0 | 16 | 32 |
Top Unresolved Stadium Names (Recent Naming Rights)
| Stadium Name | Occurrences | Actual Venue | Issue |
|---|---|---|---|
| Sports Illustrated Stadium | 11 | MLS expansion venue | New venue, missing alias |
| Mortgage Matchup Center | 8 | Rocket Mortgage FieldHouse (CLE) | 2025 naming rights change |
| ScottsMiracle-Gro Field | 4 | MLS Columbus Crew | Missing alias |
| Energizer Park | 3 | MLS CITY SC (STL?) | Missing alias |
| Xfinity Mobile Arena | 3 | Intuit Dome (LAC) | 2025 naming rights change |
| Rocket Arena | 3 | Toyota Center (HOU) | Potential name change |
| CareFirst Arena | 2 | Washington Mystics venue | New WNBA venue name |
Unresolved Teams (Exhibition/International)
| Team Name | Sport | Type | Games |
|---|---|---|---|
| BRAZIL | WNBA | International exhibition | 2 |
| Toyota Antelopes | WNBA | Japanese team | 2 |
| TEAM CLARK | WNBA | All-Star Game | 1 |
| (Various MLB) | MLB | International teams | 12 |
| (MLS international) | MLS | CCL/exhibition | 1 |
| (NFL preseason) | NFL | Pre-season exhibition | 1 |
NHL Stadium Data Issue
Critical: Hockey Reference does not provide stadium data. All 1,312 NHL games have raw_stadium: None, causing 100% of games to have missing stadium IDs. The NHL fallback sources (NHL API, ESPN) should provide this data, but the max_sources_to_try = 2 limit combined with Hockey Reference success means fallbacks are never attempted.
Expected Count Updates Needed
| Sport | Current Expected | Recommended | Reason |
|---|---|---|---|
| WNBA | 220 | 286 | 13 teams × 44 games / 2 (expanded with Golden State Valkyries) |
| NFL | 272 | 272 (filter preseason) | Or document that 330 includes preseason |
| MLB | 2,430 | 2,430 (filter spring training) | Or document that 2,866 includes spring training |
Issues Found
| # | Issue | Severity | Description |
|---|---|---|---|
| 7 | NHL has no stadium data | High | Hockey Reference provides no venue info. All 1,312 games missing stadium_id. Fallback sources not tried. |
| 8 | 131 NBA stadium resolution failures | High | Recent naming rights changes ("Mortgage Matchup Center", "Xfinity Mobile Arena") not in aliases. |
| 9 | Outdated WNBA expected count | Medium | Config says 220 but WNBA expanded to 13 teams in 2025; actual is 322 (286 regular + playoffs). |
| 10 | MLS/WNBA stadium alias gaps | Medium | 64 MLS + 65 WNBA unresolved stadiums from new/renamed venues. |
| 11 | Game status not parsed | Low | All games have status unknown instead of final/scheduled/postponed. |
Phase 4 Summary
Result: FAIL - Significant stadium resolution failures across multiple sports.
- ❌ 131 NBA games missing stadium (naming rights changes)
- ❌ 1,312 NHL games missing stadium (source doesn't provide data)
- ❌ 64 MLS + 65 WNBA stadiums unresolved (new/renamed venues)
- ⚠️ WNBA expected count severely outdated (220 vs 322 actual)
- ⚠️ MLB/NFL include preseason/spring training games
- ✅ No significant duplicate games (1 MLB doubleheader edge case)
- ✅ All teams resolved except exhibition/international games
Phase 5 Results: Canonical ID Consistency
Files Audited:
sportstime_parser/normalizers/canonical_id.py(Python ID generation)SportsTime/Core/Models/Local/CanonicalModels.swift(iOS models)SportsTime/Core/Services/BootstrapService.swift(iOS JSON parsing)- All
Scripts/output/*.jsonfiles (generated IDs)
Format Validation
| Type | Total IDs | Valid | Invalid | Pass Rate |
|---|---|---|---|---|
| Team | 183 | 183 | 0 | 100.0% ✅ |
| Stadium | 211 | 211 | 0 | 100.0% ✅ |
| Game | 6,792 | 6,792 | 0 | 100.0% ✅ |
ID Format Patterns (all validated)
Teams: team_{sport}_{abbrev} → team_nba_lal
Stadiums: stadium_{sport}_{normalized_name} → stadium_nba_cryptocom_arena
Games: game_{sport}_{season}_{YYYYMMDD}_{away}_{home}[_{#}]
→ game_nba_2025_20251021_hou_okc
Normalization Quality
| Check | Result |
|---|---|
Double underscores (__) |
0 found ✅ |
| Leading/trailing underscores | 0 found ✅ |
| Uppercase letters | 0 found ✅ |
| Special characters | 0 found ✅ |
Abbreviation Lengths (Teams)
| Length | Count |
|---|---|
| 2 chars | 21 |
| 3 chars | 161 |
| 4 chars | 1 |
Stadium ID Lengths
- Minimum: 8 characters
- Maximum: 29 characters
- Average: 16.2 characters
iOS Cross-Compatibility
| Aspect | Status | Notes |
|---|---|---|
| Field naming convention | ✅ Compatible | Python uses snake_case; iOS BootstrapService uses matching Codable structs |
| Deterministic UUID generation | ✅ Compatible | iOS uses SHA256 hash of canonical_id - matches any valid string |
| Schema version | ✅ Compatible | Both use version 1 |
| Required fields | ✅ Present | All iOS-required fields present in JSON output |
Field Mapping (Python → iOS)
| Python Field | iOS Field | Notes |
|---|---|---|
canonical_id |
canonicalId |
Mapped via JSONCanonicalStadium.canonical_id → CanonicalStadium.canonicalId |
home_team_canonical_id |
homeTeamCanonicalId |
Explicit mapping in BootstrapService |
away_team_canonical_id |
awayTeamCanonicalId |
Explicit mapping in BootstrapService |
stadium_canonical_id |
stadiumCanonicalId |
Explicit mapping in BootstrapService |
game_datetime_utc |
dateTime |
ISO 8601 parsing with fallback to legacy format |
Issues Found
No issues found. All canonical IDs are:
- Correctly formatted according to defined patterns
- Properly normalized (lowercase, no special characters)
- Deterministic (same input produces same output)
- Compatible with iOS parsing
Phase 5 Summary
Result: PASS - All canonical IDs are consistent and iOS-compatible.
- ✅ 100% format validation pass rate across 7,186 IDs
- ✅ No normalization issues found
- ✅ iOS BootstrapService explicitly handles snake_case → camelCase mapping
- ✅ Deterministic UUID generation using SHA256 hash
Phase 6 Results: Referential Integrity
Files Audited:
Scripts/output/games_*_2025.jsonScripts/output/teams_*.jsonScripts/output/stadiums_*.json
Game → Team References
| Sport | Total Games | Valid Home | Valid Away | Orphan Home | Orphan Away | Status |
|---|---|---|---|---|---|---|
| NBA | 1,231 | 1,231 | 1,231 | 0 | 0 | ✅ |
| MLB | 2,866 | 2,866 | 2,866 | 0 | 0 | ✅ |
| NFL | 330 | 330 | 330 | 0 | 0 | ✅ |
| NHL | 1,312 | 1,312 | 1,312 | 0 | 0 | ✅ |
| MLS | 542 | 542 | 542 | 0 | 0 | ✅ |
| WNBA | 322 | 322 | 322 | 0 | 0 | ✅ |
| NWSL | 189 | 189 | 189 | 0 | 0 | ✅ |
Result: 100% valid team references across all 6,792 games.
Game → Stadium References
| Sport | Total Games | Valid | Missing | Percentage Missing |
|---|---|---|---|---|
| NBA | 1,231 | 1,231 | 0 | 0.0% ✅ |
| MLB | 2,866 | 2,862 | 4 | 0.1% ✅ |
| NFL | 330 | 325 | 5 | 1.5% ✅ |
| NHL | 1,312 | 0 | 1,312 | 100% ❌ |
| MLS | 542 | 478 | 64 | 11.8% ⚠️ |
| WNBA | 322 | 257 | 65 | 20.2% ⚠️ |
| NWSL | 189 | 173 | 16 | 8.5% ⚠️ |
Note: "Missing" means stadium_canonical_id is empty (resolution failed at scrape time). This is NOT orphan references to non-existent stadiums.
Team → Stadium References
| Sport | Teams | Valid Stadium | Invalid | Status |
|---|---|---|---|---|
| NBA | 30 | 30 | 0 | ✅ |
| MLB | 30 | 30 | 0 | ✅ |
| NFL | 32 | 32 | 0 | ✅ |
| NHL | 32 | 32 | 0 | ✅ |
| MLS | 30 | 30 | 0 | ✅ |
| WNBA | 13 | 13 | 0 | ✅ |
| NWSL | 16 | 16 | 0 | ✅ |
Result: 100% valid team → stadium references.
Cross-Sport Stadium Check
✅ No stadiums are duplicated across sports. Each stadium_{sport}_* ID is unique to its sport.
Missing Stadium Root Causes
| Sport | Missing | Root Cause |
|---|---|---|
| NHL | 1,312 | Hockey Reference provides no venue data - source limitation |
| MLS | 64 | New/renamed stadiums not in aliases (see Phase 4) |
| WNBA | 65 | New venue names not in aliases (see Phase 4) |
| NWSL | 16 | Expansion team venues + alternate venues |
| NFL | 5 | International games not in stadium mappings |
| MLB | 4 | Exhibition/international games |
Orphan Reference Summary
| Reference Type | Total Checked | Orphans Found |
|---|---|---|
| Game → Home Team | 6,792 | 0 ✅ |
| Game → Away Team | 6,792 | 0 ✅ |
| Game → Stadium | 6,792 | 0 ✅ |
| Team → Stadium | 183 | 0 ✅ |
Note: Zero orphan references. All "missing" stadiums are resolution failures (empty string), not references to non-existent canonical IDs.
Issues Found
| # | Issue | Severity | Description |
|---|---|---|---|
| 12 | NHL games have no stadium data | Medium | Hockey Reference source doesn't provide venue information. All 1,312 NHL games have empty stadium_canonical_id. Fallback sources could provide this data but are limited by max_sources_to_try = 2. |
Phase 6 Summary
Result: PASS with known limitations - No orphan references exist; missing stadiums are resolution failures.
- ✅ 100% valid team references (home and away)
- ✅ 100% valid team → stadium references
- ✅ No orphan references to non-existent canonical IDs
- ⚠️ 1,466 games (21.6%) have empty stadium_canonical_id (resolution failures, not orphans)
- ⚠️ NHL accounts for 90% of missing stadium data (source limitation)
Phase 7 Results: iOS Data Reception
Files Audited:
SportsTime/Core/Services/BootstrapService.swift(JSON parsing)SportsTime/Core/Services/CanonicalSyncService.swift(CloudKit sync)SportsTime/Core/Services/DataProvider.swift(data access)SportsTime/Core/Models/Local/CanonicalModels.swift(SwiftData models)SportsTime/Resources/*_canonical.json(bundled data files)
Bundled Data Comparison
| Data Type | iOS Bundled | Scripts Output | Difference | Status |
|---|---|---|---|---|
| Teams | 148 | 183 | -35 (19%) | ❌ STALE |
| Stadiums | 122 | 211 | -89 (42%) | ❌ STALE |
| Games | 4,972 | 6,792 | -1,820 (27%) | ❌ STALE |
iOS bundled data is significantly outdated compared to Scripts output.
Field Mapping Verification
| Python Field | iOS JSON Struct | iOS Model | Type Match | Status |
|---|---|---|---|---|
canonical_id |
canonical_id |
canonicalId |
String ✅ | ✅ |
name |
name |
name |
String ✅ | ✅ |
game_datetime_utc |
game_datetime_utc |
dateTime |
ISO 8601 → Date ✅ | ✅ |
date + time (legacy) |
date, time |
dateTime |
Fallback parsing ✅ | ✅ |
home_team_canonical_id |
home_team_canonical_id |
homeTeamCanonicalId |
String ✅ | ✅ |
away_team_canonical_id |
away_team_canonical_id |
awayTeamCanonicalId |
String ✅ | ✅ |
stadium_canonical_id |
stadium_canonical_id |
stadiumCanonicalId |
String ✅ | ✅ |
sport |
sport |
sport |
String ✅ | ✅ |
season |
season |
season |
String ✅ | ✅ |
is_playoff |
is_playoff |
isPlayoff |
Bool ✅ | ✅ |
broadcast_info |
broadcast_info |
broadcastInfo |
String? ✅ | ✅ |
Result: All field mappings are correct and compatible.
Date Parsing Compatibility
iOS BootstrapService supports both formats:
// New canonical format (preferred)
let game_datetime_utc: String? // ISO 8601
// Legacy format (fallback)
let date: String? // "YYYY-MM-DD"
let time: String? // "HH:mm" or "TBD"
Current iOS bundled games use legacy format. After updating bundled data, new game_datetime_utc format will be used.
Missing Reference Handling
DataProvider.filterRichGames() behavior:
return games.compactMap { game in
guard let homeTeam = teamsById[game.homeTeamId],
let awayTeam = teamsById[game.awayTeamId],
let stadium = stadiumsById[game.stadiumId] else {
return nil // ⚠️ Silently drops game
}
return RichGame(...)
}
Impact:
- Games with missing stadium IDs are silently excluded from RichGame queries
- No error logging or fallback behavior
- User sees fewer games than expected without explanation
Deduplication Logic
Bootstrap: No explicit deduplication. If bundled JSON contains duplicate canonical IDs, both would be inserted into SwiftData (leading to potential query issues).
CloudKit Sync: Uses upsert pattern with canonical ID as unique key - duplicates would overwrite.
Schema Version Compatibility
| Component | Schema Version | Status |
|---|---|---|
| Scripts output | 1 | ✅ |
| iOS CanonicalModels | 1 | ✅ |
| iOS BootstrapService | Expects 1 | ✅ |
Compatible. Schema version mismatch protection exists in CanonicalSyncService:
case .schemaVersionTooNew(let version):
return "Data requires app version supporting schema \(version). Please update the app."
Bootstrap Order Validation
iOS bootstraps in correct dependency order:
- Stadiums (no dependencies)
- Stadium aliases (depends on stadiums)
- League structure (no dependencies)
- Teams (depends on stadiums)
- Team aliases (depends on teams)
- Games (depends on teams + stadiums)
Correct - prevents orphan references during bootstrap.
CloudKit Sync Validation
CanonicalSyncService syncs in same dependency order and tracks:
- Per-entity sync timestamps
- Skipped records (incompatible schema version)
- Skipped records (older than local)
- Sync duration and cancellation
Well-designed sync infrastructure.
Issues Found
| # | Issue | Severity | Description |
|---|---|---|---|
| 13 | iOS bundled data severely outdated | Critical | Missing 35 teams (19%), 89 stadiums (42%), 1,820 games (27%). First-launch experience shows incomplete data until CloudKit sync completes. |
| 14 | Silent game exclusion in RichGame queries | Medium | filterRichGames() silently drops games with missing team/stadium references. Users see fewer games without explanation. |
| 15 | No bootstrap deduplication | Low | Duplicate game IDs in bundled JSON would create duplicate SwiftData records. Low risk since JSON is generated correctly. |
Phase 7 Summary
Result: FAIL - iOS bundled data is critically outdated.
- ❌ iOS bundled data missing 35 teams, 89 stadiums, 1,820 games
- ⚠️ Games with unresolved references silently dropped from RichGame queries
- ✅ Field mapping between Python and iOS is correct
- ✅ Date parsing supports both legacy and new formats
- ✅ Schema versions are compatible
- ✅ Bootstrap/sync order handles dependencies correctly
Prioritized Issue List
| # | Issue | Severity | Phase | Root Cause | Remediation |
|---|---|---|---|---|---|
| 13 | iOS bundled data severely outdated | Critical | 7 | Bundled JSON not updated after pipeline runs | Copy Scripts/output/*_canonical.json to iOS Resources/ and rebuild |
| 4 | WNBA/NWSL/MLS ESPN-only source | High | 3 | No implemented fallback sources | Implement alternative scrapers (FBref for MLS, WNBA League Pass) |
| 5 | max_sources_to_try = 2 limits fallback | High | 3 | Hardcoded limit in base.py:189 | Increase to 3 or remove limit for sports with 3+ sources |
| 7 | NHL has no stadium data from primary source | High | 4 | Hockey Reference doesn't provide venue info | Force NHL to use NHL API or ESPN as primary (they provide venues) |
| 8 | 131 NBA stadium resolution failures | High | 4 | 2024-2025 naming rights not in aliases | Add aliases: "Mortgage Matchup Center" → Rocket Mortgage FieldHouse, "Xfinity Mobile Arena" → Intuit Dome |
| 2 | Orphan stadium alias references | Medium | 2 | Wrong canonical IDs in stadium_aliases.json | Fix 5 Denver/KC stadium aliases pointing to non-existent IDs |
| 6 | CBS/FBref scrapers declared but not implemented | Medium | 3 | NotImplementedError at runtime | Either implement or remove from source lists to avoid confusion |
| 9 | Outdated WNBA expected count | Medium | 4 | WNBA expanded to 13 teams in 2025 | Update config.py EXPECTED_GAME_COUNTS["wnba"] from 220 to 286 |
| 10 | MLS/WNBA stadium alias gaps | Medium | 4 | New/renamed venues missing from aliases | Add 129 missing stadium aliases (64 MLS + 65 WNBA) |
| 12 | NHL games have no stadium data | Medium | 6 | Same as Issue #7 | See Issue #7 remediation |
| 14 | Silent game exclusion in RichGame queries | Medium | 7 | compactMap silently drops games | Log dropped games or return partial RichGame with placeholder stadium |
| 1 | WNBA single abbreviations | Low | 1 | Only 1 abbreviation per team | Add alternative abbreviations for source compatibility |
| 3 | No NFL team aliases | Low | 2 | Missing Washington Redskins/Football Team | Add historical Washington team name aliases |
| 11 | Game status not parsed | Low | 4 | Status field always "unknown" | Parse game status from source data (final, scheduled, postponed) |
| 15 | No bootstrap deduplication | Low | 7 | No explicit duplicate check during bootstrap | Add deduplication check in bootstrapGames() |
Recommended Next Steps
Immediate (Before Next Release)
-
Update iOS bundled data (Issue #13)
cp Scripts/output/stadiums_*.json SportsTime/Resources/stadiums_canonical.json cp Scripts/output/teams_*.json SportsTime/Resources/teams_canonical.json cp Scripts/output/games_*.json SportsTime/Resources/games_canonical.json -
Fix NHL stadium data (Issues #7, #12)
- Change NHL primary source from Hockey Reference to NHL API
- Or: Increase
max_sources_to_tryto 3 so fallbacks are attempted
-
Add critical stadium aliases (Issues #8, #10)
- "Mortgage Matchup Center" →
stadium_nba_rocket_mortgage_fieldhouse - "Xfinity Mobile Arena" →
stadium_nba_intuit_dome - Run validation report to identify all unresolved venue names
- "Mortgage Matchup Center" →
Short-term (This Quarter)
-
Implement MLS fallback source (Issue #4)
- FBref has MLS data with venue information
- Reduces ESPN single-point-of-failure risk
-
Fix orphan alias references (Issue #2)
- Correct 5 NFL stadium aliases pointing to wrong canonical IDs
- Add validation check to prevent future orphan references
-
Update expected game counts (Issue #9)
- WNBA: 220 → 286 (13 teams × 44 games / 2)
Long-term (Next Quarter)
-
Implement WNBA/NWSL fallback sources (Issue #4)
- Consider WNBA League Pass API or other sources
- NWSL has limited data availability - may need to accept ESPN-only
-
Add RichGame partial loading (Issue #14)
- Log games dropped due to missing references
- Consider returning games with placeholder stadiums for NHL
-
Parse game status (Issue #11)
- Extract final/scheduled/postponed from source data
- Enables filtering by game state
Verification Checklist
After implementing fixes, verify:
- Run
python -m sportstime_parser scrape --sport all --season 2025 - Check validation reports show <5% unresolved stadiums per sport
- Copy output JSON to iOS Resources/
- Build iOS app and verify data loads at startup
- Query RichGames and verify game count matches expectations
- Run CloudKit sync and verify no errors