# Sports Data Sources ## Schedule Data Sources (by league) ### NBA Schedule | Source | URL Pattern | Data Available | Notes | |--------|-------------|----------------|-------| | Basketball-Reference | `https://www.basketball-reference.com/leagues/NBA_{YEAR}_games-{month}.html` | Date, Time, Teams, Arena, Attendance | Monthly pages (october, november, etc.) | | ESPN | `https://www.espn.com/nba/schedule/_/date/{YYYYMMDD}` | Date, Time, Teams, TV | Daily schedule | | NBA.com API | `https://cdn.nba.com/static/json/staticData/scheduleLeagueV2.json` | Full season JSON | Official source | | FixtureDownload | `https://fixturedownload.com/download/nba-{year}-UTC.csv` | CSV download | Easy format | ### MLB Schedule | Source | URL Pattern | Data Available | Notes | |--------|-------------|----------------|-------| | Baseball-Reference | `https://www.baseball-reference.com/leagues/majors/{YEAR}-schedule.shtml` | Date, Teams, Score, Attendance | Full season page | | ESPN | `https://www.espn.com/mlb/schedule/_/date/{YYYYMMDD}` | Date, Time, Teams, TV | Daily schedule | | MLB Stats API | `https://statsapi.mlb.com/api/v1/schedule?sportId=1&season={YEAR}` | Full season JSON | Official API | | FixtureDownload | `https://fixturedownload.com/download/mlb-{year}-UTC.csv` | CSV download | Easy format | ### NHL Schedule | Source | URL Pattern | Data Available | Notes | |--------|-------------|----------------|-------| | Hockey-Reference | `https://www.hockey-reference.com/leagues/NHL_{YEAR}_games.html` | Date, Teams, Score, Arena, Attendance | Full season page | | ESPN | `https://www.espn.com/nhl/schedule/_/date/{YYYYMMDD}` | Date, Time, Teams, TV | Daily schedule | | NHL API | `https://api-web.nhle.com/v1/schedule/{YYYY-MM-DD}` | Daily JSON | Official API | | FixtureDownload | `https://fixturedownload.com/download/nhl-{year}-UTC.csv` | CSV download | Easy format | --- ## Stadium/Arena Data Sources | Source | URL/Method | Data Available | Notes | |--------|------------|----------------|-------| | Wikipedia | Team pages | Name, City, Capacity, Coordinates | Manual or scrape | | HIFLD Open Data | `https://hifld-geoplatform.opendata.arcgis.com/datasets/major-sport-venues` | GeoJSON with coordinates | US Government data | | ESPN Team Pages | `https://www.espn.com/{sport}/team/_/name/{abbrev}` | Arena name, location | Per-team | | Sports-Reference | Team pages | Arena name, capacity | In schedule data | | OpenStreetMap | Nominatim API | Coordinates from address | For geocoding | --- ## Data Validation Strategy ### Cross-Reference Points 1. **Game Count**: Total games per team should match (82 NBA, 162 MLB, 82 NHL) 2. **Home/Away Balance**: Each team should have equal home/away games 3. **Date Alignment**: Same game should appear on same date across sources 4. **Team Names**: Map abbreviations across sources (NYK vs NY vs Knicks) 5. **Venue Names**: Stadiums may have different names (sponsorship changes) ### Discrepancy Handling - If sources disagree on game time: prefer official API (NBA.com, MLB.com, NHL.com) - If sources disagree on venue: prefer Sports-Reference (most accurate historically) - Log all discrepancies for manual review --- ## Rate Limiting Guidelines | Source | Limit | Recommended Delay | |--------|-------|-------------------| | Sports-Reference sites | 20 req/min | 3 seconds between requests | | ESPN | Unknown | 1 second between requests | | Official APIs | Varies | 0.5 seconds between requests | | Wikipedia | Polite | 1 second between requests | --- ## Team Abbreviation Mappings See `team_mappings.json` for canonical mappings between sources.