- Remove College Football, NASCAR, and PGA from scraper and app - Clean all data files (stadiums, games, pipeline reports) - Update Sport.swift enum and all UI components - Add sportstime.py CLI tool for pipeline management - Add DATA_SCRAPING.md documentation - Add WNBA/MLS/NWSL implementation documentation - Scraper now supports: NBA, MLB, NHL, NFL, WNBA, MLS, NWSL, CBB Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
36 KiB
WNBA, MLS, and NWSL Implementation Guide
Complete end-to-end implementation for adding WNBA, MLS, and NWSL to SportsTime.
1. League Overview
WNBA (Women's National Basketball Association)
- Teams: 13 (expanding to 15 by 2026)
- Season: May - September (regular season), September - October (playoffs)
- Game Cadence: ~40 games per team, 3-4 games per week
- Special Considerations:
- Many teams share arenas with NBA teams (key for stadium handling)
- Olympic break in summer every 4 years
- Commissioner's Cup midseason tournament
Shared Venues (WNBA/NBA):
| WNBA Team | NBA Team | Arena |
|---|---|---|
| Atlanta Dream | Hawks | State Farm Arena |
| Chicago Sky | Bulls | Wintrust Arena (different) |
| Dallas Wings | Mavericks | College Park Center (different) |
| Indiana Fever | Pacers | Gainbridge Fieldhouse |
| Los Angeles Sparks | Lakers/Clippers | Crypto.com Arena |
| Minnesota Lynx | Timberwolves | Target Center |
| New York Liberty | Knicks | Barclays Center |
| Phoenix Mercury | Suns | Footprint Center |
| Washington Mystics | Wizards | Entertainment & Sports Arena (different) |
MLS (Major League Soccer)
- Teams: 29 teams (2024), expanding
- Season: February/March - October (regular season), October - December (playoffs)
- Game Cadence: 34 games per team, 1-2 games per week
- Special Considerations:
- Some teams share NFL stadiums (Atlanta, Seattle, New England)
- Midweek matches (Wednesday/Thursday) common
- US Open Cup adds additional games
- Canadian teams (Toronto, Vancouver, Montreal) - timezone handling
Shared Venues (MLS/NFL):
| MLS Team | NFL Team | Stadium |
|---|---|---|
| Atlanta United | Falcons | Mercedes-Benz Stadium |
| Seattle Sounders | Seahawks | Lumen Field |
| New England Revolution | Patriots | Gillette Stadium |
NWSL (National Women's Soccer League)
- Teams: 14 teams (2024)
- Season: March - November (regular season + playoffs)
- Game Cadence: 26 games per team, 1-2 games per week
- Special Considerations:
- Some share MLS stadiums (Portland, Orlando, Kansas City)
- Many use smaller soccer-specific venues
- Expansion teams frequently added
2. Schedule & Data Sources
WNBA Data Sources
Primary: Basketball-Reference (Women)
URL Pattern: https://www.basketball-reference.com/wnba/years/{YEAR}_games.html
Example: https://www.basketball-reference.com/wnba/years/2025_games.html
HTML Structure:
<table id="schedule">
<tbody>
<tr>
<th data-stat="date_game">Fri, May 17, 2024</th>
<td data-stat="game_start_time">7:30p</td>
<td data-stat="visitor_team_name">Dallas Wings</td>
<td data-stat="home_team_name">Atlanta Dream</td>
<td data-stat="arena_name">Gateway Center Arena</td>
</tr>
</tbody>
</table>
Fields Available: date, time, home_team, away_team, arena, attendance, box_score_link
Secondary: ESPN WNBA
URL Pattern: https://www.espn.com/wnba/schedule/_/date/{YYYYMMDD}
MLS Data Sources
Primary: FBref (Football Reference)
URL Pattern: https://fbref.com/en/comps/22/{YEAR}/schedule/{YEAR}-Major-League-Soccer-Scores-and-Fixtures
Example: https://fbref.com/en/comps/22/2024/schedule/2024-Major-League-Soccer-Scores-and-Fixtures
HTML Structure:
<table id="sched_2024_22_1">
<tbody>
<tr>
<td data-stat="date">2024-02-24</td>
<td data-stat="time">19:30</td>
<td data-stat="home_team">LA Galaxy</td>
<td data-stat="away_team">Inter Miami</td>
<td data-stat="venue">Dignity Health Sports Park</td>
</tr>
</tbody>
</table>
Fields Available: date, time (24hr), home_team, away_team, venue, score, attendance
Secondary: MLS Official
URL Pattern: https://www.mlssoccer.com/schedule/scores
API Endpoint: https://sportapi.mlssoccer.com/api/matches?culture=en-us&dateFrom={date}&dateTo={date}
NWSL Data Sources
Primary: FBref (NWSL)
URL Pattern: https://fbref.com/en/comps/182/{YEAR}/schedule/{YEAR}-NWSL-Scores-and-Fixtures
Example: https://fbref.com/en/comps/182/2024/schedule/2024-NWSL-Scores-and-Fixtures
HTML Structure: Same as MLS (FBref standard format)
Secondary: NWSL Official
URL Pattern: https://www.nwslsoccer.com/schedule
3. Schedule Parser Changes
File: Scripts/scrape_schedules.py
3.1 Add Team Mappings (after NHL_TEAMS ~line 180)
# =============================================================================
# WNBA TEAMS
# =============================================================================
WNBA_TEAMS = {
'ATL': {'name': 'Atlanta Dream', 'city': 'Atlanta', 'arena': 'Gateway Center Arena'},
'CHI': {'name': 'Chicago Sky', 'city': 'Chicago', 'arena': 'Wintrust Arena'},
'CON': {'name': 'Connecticut Sun', 'city': 'Uncasville', 'arena': 'Mohegan Sun Arena'},
'DAL': {'name': 'Dallas Wings', 'city': 'Arlington', 'arena': 'College Park Center'},
'IND': {'name': 'Indiana Fever', 'city': 'Indianapolis', 'arena': 'Gainbridge Fieldhouse'},
'LVA': {'name': 'Las Vegas Aces', 'city': 'Las Vegas', 'arena': 'Michelob Ultra Arena'},
'LAS': {'name': 'Los Angeles Sparks', 'city': 'Los Angeles', 'arena': 'Crypto.com Arena'},
'MIN': {'name': 'Minnesota Lynx', 'city': 'Minneapolis', 'arena': 'Target Center'},
'NYL': {'name': 'New York Liberty', 'city': 'Brooklyn', 'arena': 'Barclays Center'},
'PHO': {'name': 'Phoenix Mercury', 'city': 'Phoenix', 'arena': 'Footprint Center'},
'SEA': {'name': 'Seattle Storm', 'city': 'Seattle', 'arena': 'Climate Pledge Arena'},
'WAS': {'name': 'Washington Mystics', 'city': 'Washington', 'arena': 'Entertainment & Sports Arena'},
# Expansion teams (add as announced)
'GSV': {'name': 'Golden State Valkyries', 'city': 'San Francisco', 'arena': 'Chase Center'},
'POR': {'name': 'Portland Expansion', 'city': 'Portland', 'arena': 'TBD'},
'TOR': {'name': 'Toronto Expansion', 'city': 'Toronto', 'arena': 'TBD'},
}
# =============================================================================
# MLS TEAMS
# =============================================================================
MLS_TEAMS = {
'ATL': {'name': 'Atlanta United FC', 'city': 'Atlanta', 'stadium': 'Mercedes-Benz Stadium'},
'AUS': {'name': 'Austin FC', 'city': 'Austin', 'stadium': 'Q2 Stadium'},
'CHI': {'name': 'Chicago Fire FC', 'city': 'Chicago', 'stadium': 'Soldier Field'},
'CIN': {'name': 'FC Cincinnati', 'city': 'Cincinnati', 'stadium': 'TQL Stadium'},
'CLB': {'name': 'Columbus Crew', 'city': 'Columbus', 'stadium': 'Lower.com Field'},
'COL': {'name': 'Colorado Rapids', 'city': 'Commerce City', 'stadium': 'Dick\'s Sporting Goods Park'},
'DAL': {'name': 'FC Dallas', 'city': 'Frisco', 'stadium': 'Toyota Stadium'},
'DCU': {'name': 'D.C. United', 'city': 'Washington', 'stadium': 'Audi Field'},
'HOU': {'name': 'Houston Dynamo FC', 'city': 'Houston', 'stadium': 'Shell Energy Stadium'},
'LAG': {'name': 'LA Galaxy', 'city': 'Carson', 'stadium': 'Dignity Health Sports Park'},
'LAF': {'name': 'Los Angeles FC', 'city': 'Los Angeles', 'stadium': 'BMO Stadium'},
'MIA': {'name': 'Inter Miami CF', 'city': 'Fort Lauderdale', 'stadium': 'Chase Stadium'},
'MIN': {'name': 'Minnesota United FC', 'city': 'Saint Paul', 'stadium': 'Allianz Field'},
'MTL': {'name': 'CF Montréal', 'city': 'Montreal', 'stadium': 'Stade Saputo'},
'NSH': {'name': 'Nashville SC', 'city': 'Nashville', 'stadium': 'Geodis Park'},
'NER': {'name': 'New England Revolution', 'city': 'Foxborough', 'stadium': 'Gillette Stadium'},
'NYC': {'name': 'New York City FC', 'city': 'New York', 'stadium': 'Yankee Stadium'},
'NYR': {'name': 'New York Red Bulls', 'city': 'Harrison', 'stadium': 'Red Bull Arena'},
'ORL': {'name': 'Orlando City SC', 'city': 'Orlando', 'stadium': 'Exploria Stadium'},
'PHI': {'name': 'Philadelphia Union', 'city': 'Chester', 'stadium': 'Subaru Park'},
'POR': {'name': 'Portland Timbers', 'city': 'Portland', 'stadium': 'Providence Park'},
'RSL': {'name': 'Real Salt Lake', 'city': 'Sandy', 'stadium': 'America First Field'},
'SJE': {'name': 'San Jose Earthquakes', 'city': 'San Jose', 'stadium': 'PayPal Park'},
'SEA': {'name': 'Seattle Sounders FC', 'city': 'Seattle', 'stadium': 'Lumen Field'},
'SKC': {'name': 'Sporting Kansas City', 'city': 'Kansas City', 'stadium': 'Children\'s Mercy Park'},
'STL': {'name': 'St. Louis City SC', 'city': 'St. Louis', 'stadium': 'CityPark'},
'TOR': {'name': 'Toronto FC', 'city': 'Toronto', 'stadium': 'BMO Field'},
'VAN': {'name': 'Vancouver Whitecaps FC', 'city': 'Vancouver', 'stadium': 'BC Place'},
'SDG': {'name': 'San Diego FC', 'city': 'San Diego', 'stadium': 'Snapdragon Stadium'}, # 2025 expansion
}
# =============================================================================
# NWSL TEAMS
# =============================================================================
NWSL_TEAMS = {
'ANG': {'name': 'Angel City FC', 'city': 'Los Angeles', 'stadium': 'BMO Stadium'},
'CHI': {'name': 'Chicago Red Stars', 'city': 'Chicago', 'stadium': 'SeatGeek Stadium'},
'HOU': {'name': 'Houston Dash', 'city': 'Houston', 'stadium': 'Shell Energy Stadium'},
'KCC': {'name': 'Kansas City Current', 'city': 'Kansas City', 'stadium': 'CPKC Stadium'},
'LOU': {'name': 'Racing Louisville FC', 'city': 'Louisville', 'stadium': 'Lynn Family Stadium'},
'NCC': {'name': 'North Carolina Courage', 'city': 'Cary', 'stadium': 'WakeMed Soccer Park'},
'NJG': {'name': 'NJ/NY Gotham FC', 'city': 'Harrison', 'stadium': 'Red Bull Arena'},
'ORL': {'name': 'Orlando Pride', 'city': 'Orlando', 'stadium': 'Exploria Stadium'},
'POR': {'name': 'Portland Thorns FC', 'city': 'Portland', 'stadium': 'Providence Park'},
'SDW': {'name': 'San Diego Wave FC', 'city': 'San Diego', 'stadium': 'Snapdragon Stadium'},
'SEA': {'name': 'Seattle Reign FC', 'city': 'Seattle', 'stadium': 'Lumen Field'},
'UTA': {'name': 'Utah Royals FC', 'city': 'Sandy', 'stadium': 'America First Field'},
'WAS': {'name': 'Washington Spirit', 'city': 'Washington', 'stadium': 'Audi Field'},
'BAY': {'name': 'Bay FC', 'city': 'San Francisco', 'stadium': 'PayPal Park'}, # 2024 expansion
}
3.2 Update get_team_abbrev() Function
def get_team_abbrev(team_name: str, sport: str) -> str:
"""Get team abbreviation from full name."""
team_maps = {
'NBA': NBA_TEAMS,
'MLB': MLB_TEAMS,
'NHL': NHL_TEAMS,
'WNBA': WNBA_TEAMS,
'MLS': MLS_TEAMS,
'NWSL': NWSL_TEAMS,
}
teams = team_maps.get(sport, {})
# Direct match on abbreviation
for abbrev, data in teams.items():
if team_name.lower() == data['name'].lower():
return abbrev
# Partial match (e.g., "Hawks" matches "Atlanta Hawks")
if team_name.lower() in data['name'].lower():
return abbrev
# Fallback: first 3 characters
return team_name[:3].upper()
3.3 Add WNBA Scraper
def scrape_wnba_basketball_reference(season: int) -> list[Game]:
"""
Scrape WNBA schedule from Basketball-Reference.
URL: https://www.basketball-reference.com/wnba/years/{YEAR}_games.html
Season year is the calendar year (e.g., 2025 for 2025 season)
"""
games = []
url = f"https://www.basketball-reference.com/wnba/years/{season}_games.html"
print(f"Scraping WNBA {season} from Basketball-Reference...")
soup = fetch_page(url, 'basketball-reference.com')
if not soup:
return games
table = soup.find('table', {'id': 'schedule'})
if not table:
print(" No schedule table found")
return games
tbody = table.find('tbody')
if not tbody:
return games
for row in tbody.find_all('tr'):
if row.get('class') and 'thead' in row.get('class'):
continue
try:
# Parse date
date_cell = row.find('th', {'data-stat': 'date_game'})
if not date_cell:
continue
date_link = date_cell.find('a')
date_str = date_link.text if date_link else date_cell.text
# Parse time
time_cell = row.find('td', {'data-stat': 'game_start_time'})
time_str = time_cell.text.strip() if time_cell else None
# Parse teams
visitor_cell = row.find('td', {'data-stat': 'visitor_team_name'})
home_cell = row.find('td', {'data-stat': 'home_team_name'})
if not visitor_cell or not home_cell:
continue
away_team = visitor_cell.find('a').text if visitor_cell.find('a') else visitor_cell.text
home_team = home_cell.find('a').text if home_cell.find('a') else home_cell.text
# Parse arena
arena_cell = row.find('td', {'data-stat': 'arena_name'})
arena = arena_cell.text.strip() if arena_cell else ''
# Convert date (format: "Sat, May 18, 2024")
try:
parsed_date = datetime.strptime(date_str.strip(), '%a, %b %d, %Y')
date_formatted = parsed_date.strftime('%Y-%m-%d')
except:
continue
# Generate game ID
home_abbrev = get_team_abbrev(home_team, 'WNBA')
away_abbrev = get_team_abbrev(away_team, 'WNBA')
game_id = f"wnba_{date_formatted}_{away_abbrev}_{home_abbrev}".lower()
game = Game(
id=game_id,
sport='WNBA',
season=str(season),
date=date_formatted,
time=time_str,
home_team=home_team,
away_team=away_team,
home_team_abbrev=home_abbrev,
away_team_abbrev=away_abbrev,
venue=arena,
source='basketball-reference.com'
)
games.append(game)
except Exception as e:
print(f" Error parsing row: {e}")
continue
print(f" Found {len(games)} games from Basketball-Reference")
return games
3.4 Add MLS Scraper
def scrape_mls_fbref(season: int) -> list[Game]:
"""
Scrape MLS schedule from FBref.
URL: https://fbref.com/en/comps/22/{YEAR}/schedule/{YEAR}-Major-League-Soccer-Scores-and-Fixtures
"""
games = []
url = f"https://fbref.com/en/comps/22/{season}/schedule/{season}-Major-League-Soccer-Scores-and-Fixtures"
print(f"Scraping MLS {season} from FBref...")
soup = fetch_page(url, 'fbref.com')
if not soup:
return games
# FBref uses table with id like sched_{year}_22_1
table = soup.find('table', {'id': lambda x: x and 'sched_' in x})
if not table:
print(" No schedule table found")
return games
tbody = table.find('tbody')
if not tbody:
return games
for row in tbody.find_all('tr'):
try:
# Parse date (format: 2024-02-24)
date_cell = row.find('td', {'data-stat': 'date'})
if not date_cell:
continue
date_str = date_cell.text.strip()
# Parse time (24hr format: 19:30)
time_cell = row.find('td', {'data-stat': 'time'})
time_str = time_cell.text.strip() if time_cell else None
# Convert 24hr to 12hr format for consistency
if time_str:
try:
t = datetime.strptime(time_str, '%H:%M')
time_str = t.strftime('%I:%M%p').lstrip('0').lower()
except:
pass
# Parse teams
home_cell = row.find('td', {'data-stat': 'home_team'})
away_cell = row.find('td', {'data-stat': 'away_team'})
if not home_cell or not away_cell:
continue
home_team = home_cell.find('a').text if home_cell.find('a') else home_cell.text
away_team = away_cell.find('a').text if away_cell.find('a') else away_cell.text
home_team = home_team.strip()
away_team = away_team.strip()
if not home_team or not away_team:
continue
# Parse venue
venue_cell = row.find('td', {'data-stat': 'venue'})
venue = venue_cell.text.strip() if venue_cell else ''
# Generate game ID
home_abbrev = get_team_abbrev(home_team, 'MLS')
away_abbrev = get_team_abbrev(away_team, 'MLS')
game_id = f"mls_{date_str}_{away_abbrev}_{home_abbrev}".lower()
game = Game(
id=game_id,
sport='MLS',
season=str(season),
date=date_str,
time=time_str,
home_team=home_team,
away_team=away_team,
home_team_abbrev=home_abbrev,
away_team_abbrev=away_abbrev,
venue=venue,
source='fbref.com'
)
games.append(game)
except Exception as e:
print(f" Error parsing row: {e}")
continue
print(f" Found {len(games)} games from FBref")
return games
3.5 Add NWSL Scraper
def scrape_nwsl_fbref(season: int) -> list[Game]:
"""
Scrape NWSL schedule from FBref.
URL: https://fbref.com/en/comps/182/{YEAR}/schedule/{YEAR}-NWSL-Scores-and-Fixtures
"""
games = []
url = f"https://fbref.com/en/comps/182/{season}/schedule/{season}-NWSL-Scores-and-Fixtures"
print(f"Scraping NWSL {season} from FBref...")
soup = fetch_page(url, 'fbref.com')
if not soup:
return games
table = soup.find('table', {'id': lambda x: x and 'sched_' in x})
if not table:
print(" No schedule table found")
return games
tbody = table.find('tbody')
if not tbody:
return games
for row in tbody.find_all('tr'):
try:
date_cell = row.find('td', {'data-stat': 'date'})
if not date_cell:
continue
date_str = date_cell.text.strip()
time_cell = row.find('td', {'data-stat': 'time'})
time_str = time_cell.text.strip() if time_cell else None
if time_str:
try:
t = datetime.strptime(time_str, '%H:%M')
time_str = t.strftime('%I:%M%p').lstrip('0').lower()
except:
pass
home_cell = row.find('td', {'data-stat': 'home_team'})
away_cell = row.find('td', {'data-stat': 'away_team'})
if not home_cell or not away_cell:
continue
home_team = home_cell.find('a').text if home_cell.find('a') else home_cell.text
away_team = away_cell.find('a').text if away_cell.find('a') else away_cell.text
home_team = home_team.strip()
away_team = away_team.strip()
if not home_team or not away_team:
continue
venue_cell = row.find('td', {'data-stat': 'venue'})
venue = venue_cell.text.strip() if venue_cell else ''
home_abbrev = get_team_abbrev(home_team, 'NWSL')
away_abbrev = get_team_abbrev(away_team, 'NWSL')
game_id = f"nwsl_{date_str}_{away_abbrev}_{home_abbrev}".lower()
game = Game(
id=game_id,
sport='NWSL',
season=str(season),
date=date_str,
time=time_str,
home_team=home_team,
away_team=away_team,
home_team_abbrev=home_abbrev,
away_team_abbrev=away_abbrev,
venue=venue,
source='fbref.com'
)
games.append(game)
except Exception as e:
print(f" Error parsing row: {e}")
continue
print(f" Found {len(games)} games from FBref")
return games
4. Stadium & Team Canonicalization
4.1 Canonical ID Patterns
Stadiums (per-sport, even for shared venues):
stadium_{sport}_{normalized_name}
Examples:
stadium_wnba_barclays_center(WNBA Liberty)stadium_nba_barclays_center(NBA Nets)stadium_mls_mercedes_benz_stadiumstadium_nwsl_providence_park
Teams:
team_{sport}_{abbrev}
Examples:
team_wnba_nyl(New York Liberty)team_mls_atl(Atlanta United)team_nwsl_por(Portland Thorns)
Games:
game_{sport}_{season}_{date}_{away}_{home}
Examples:
game_wnba_2025_20250518_dal_atlgame_mls_2025_20250301_mia_laggame_nwsl_2025_20250315_por_ang
4.2 Shared Venue Handling
Critical Rule: Stadiums are per-sport entities. A physical venue shared between sports creates MULTIPLE canonical stadium records.
Example: Barclays Center
// Stadium for NBA Nets
{
"canonical_id": "stadium_nba_barclays_center",
"name": "Barclays Center",
"city": "Brooklyn",
"sport": "NBA",
"primary_team_abbrevs": ["BRK"]
}
// Stadium for WNBA Liberty
{
"canonical_id": "stadium_wnba_barclays_center",
"name": "Barclays Center",
"city": "Brooklyn",
"sport": "WNBA",
"primary_team_abbrevs": ["NYL"]
}
Rationale: Trip planning needs sport-specific filtering. A user planning an NBA trip shouldn't see WNBA games unless explicitly requested.
4.3 Update canonicalize_stadiums.py
Add to generate_stadiums_from_teams():
def generate_stadiums_from_teams() -> list[Stadium]:
"""Generate stadium entries from team mappings."""
stadiums = []
# Existing: NBA, MLB, NHL
for abbrev, data in NBA_TEAMS.items():
stadiums.append(create_stadium(data, 'NBA', [abbrev]))
# ... existing MLB, NHL
# NEW: WNBA
for abbrev, data in WNBA_TEAMS.items():
stadiums.append(Stadium(
id=f"wnba_{normalize_name(data['arena'])}",
name=data['arena'],
city=data['city'],
state=get_state_for_city(data['city']),
latitude=0.0, # Geocoded later
longitude=0.0,
capacity=0,
sport='WNBA',
team_abbrevs=[abbrev],
source='team_mapping'
))
# NEW: MLS
for abbrev, data in MLS_TEAMS.items():
stadiums.append(Stadium(
id=f"mls_{normalize_name(data['stadium'])}",
name=data['stadium'],
city=data['city'],
state=get_state_for_city(data['city']),
latitude=0.0,
longitude=0.0,
capacity=0,
sport='MLS',
team_abbrevs=[abbrev],
source='team_mapping'
))
# NEW: NWSL
for abbrev, data in NWSL_TEAMS.items():
stadiums.append(Stadium(
id=f"nwsl_{normalize_name(data['stadium'])}",
name=data['stadium'],
city=data['city'],
state=get_state_for_city(data['city']),
latitude=0.0,
longitude=0.0,
capacity=0,
sport='NWSL',
team_abbrevs=[abbrev],
source='team_mapping'
))
return stadiums
4.4 Update canonicalize_teams.py
Add league structure mappings:
# WNBA has no conferences/divisions in traditional sense
WNBA_DIVISIONS = {abbrev: (None, None) for abbrev in WNBA_TEAMS}
# MLS Conferences
MLS_DIVISIONS = {
# Eastern Conference
'ATL': ('mls_eastern', None),
'CHI': ('mls_eastern', None),
'CIN': ('mls_eastern', None),
'CLB': ('mls_eastern', None),
'DCU': ('mls_eastern', None),
'MIA': ('mls_eastern', None),
'MTL': ('mls_eastern', None),
'NSH': ('mls_eastern', None),
'NER': ('mls_eastern', None),
'NYC': ('mls_eastern', None),
'NYR': ('mls_eastern', None),
'ORL': ('mls_eastern', None),
'PHI': ('mls_eastern', None),
'TOR': ('mls_eastern', None),
# Western Conference
'AUS': ('mls_western', None),
'COL': ('mls_western', None),
'DAL': ('mls_western', None),
'HOU': ('mls_western', None),
'LAG': ('mls_western', None),
'LAF': ('mls_western', None),
'MIN': ('mls_western', None),
'POR': ('mls_western', None),
'RSL': ('mls_western', None),
'SJE': ('mls_western', None),
'SEA': ('mls_western', None),
'SKC': ('mls_western', None),
'STL': ('mls_western', None),
'VAN': ('mls_western', None),
'SDG': ('mls_western', None),
}
# NWSL has no conferences
NWSL_DIVISIONS = {abbrev: (None, None) for abbrev in NWSL_TEAMS}
5. Local Canonical JSON Updates
5.1 stadiums_canonical.json
New entries follow existing format:
{
"canonical_id": "stadium_wnba_barclays_center",
"name": "Barclays Center",
"city": "Brooklyn",
"state": "NY",
"latitude": 40.6826,
"longitude": -73.9754,
"capacity": 17732,
"sport": "WNBA",
"primary_team_abbrevs": ["NYL"],
"year_opened": 2012
}
5.2 teams_canonical.json
{
"canonical_id": "team_wnba_nyl",
"name": "New York Liberty",
"abbreviation": "NYL",
"sport": "WNBA",
"city": "Brooklyn",
"stadium_canonical_id": "stadium_wnba_barclays_center",
"conference_id": null,
"division_id": null,
"primary_color": "#6ECEB2",
"secondary_color": "#000000"
}
5.3 games_canonical.json
{
"canonical_id": "game_wnba_2025_20250518_dal_atl",
"sport": "WNBA",
"season": "2025",
"date": "2025-05-18",
"time": "7:30p",
"home_team_canonical_id": "team_wnba_atl",
"away_team_canonical_id": "team_wnba_dal",
"stadium_canonical_id": "stadium_wnba_gateway_center_arena",
"is_playoff": false,
"broadcast": null
}
5.4 Validation Rules
Update validate_canonical.py:
VALID_SPORTS = {'NBA', 'MLB', 'NHL', 'WNBA', 'MLS', 'NWSL'}
def validate_sport_field(sport: str) -> list[str]:
"""Validate sport is one of the supported values."""
errors = []
if sport not in VALID_SPORTS:
errors.append(f"Invalid sport: {sport}. Must be one of {VALID_SPORTS}")
return errors
6. CloudKit Integration
6.1 Record Types (Already Exist)
No new record types needed. Existing types support new sports:
Stadium- add records with sport="WNBA"/"MLS"/"NWSL"Team- add records with sport="WNBA"/"MLS"/"NWSL"Game- add records with sport="WNBA"/"MLS"/"NWSL"StadiumAlias- unchangedTeamAlias- unchangedLeagueStructure- add new entries for MLS conferences
6.2 Field Mapping (Unchanged)
Stadium Record:
recordName: canonical_id (e.g., "stadium_wnba_barclays_center")
fields:
- uuid: STRING (deterministic from canonical_id)
- name: STRING
- city: STRING
- state: STRING
- latitude: DOUBLE
- longitude: DOUBLE
- capacity: INT64
- sport: STRING ("WNBA", "MLS", "NWSL")
- yearOpened: INT64
- imageURL: STRING (optional)
- lastModified: TIMESTAMP
- schemaVersion: INT64
Team Record:
recordName: canonical_id (e.g., "team_wnba_nyl")
fields:
- uuid: STRING
- name: STRING
- abbreviation: STRING
- sport: STRING
- city: STRING
- stadiumCanonicalId: STRING (reference by canonical_id)
- conferenceId: STRING (optional)
- divisionId: STRING (optional)
- primaryColor: STRING
- secondaryColor: STRING
- lastModified: TIMESTAMP
- schemaVersion: INT64
Game Record:
recordName: canonical_id (e.g., "game_wnba_2025_20250518_dal_atl")
fields:
- uuid: STRING
- sport: STRING
- season: STRING
- dateTime: TIMESTAMP
- homeTeamCanonicalId: STRING
- awayTeamCanonicalId: STRING
- stadiumCanonicalId: STRING
- isPlayoff: INT64 (0 or 1)
- broadcastInfo: STRING (optional)
- lastModified: TIMESTAMP
- schemaVersion: INT64
6.3 Index Requirements
Ensure CloudKit has indexes for:
Game:sport(sortable),dateTime(sortable, queryable)Team:sport(queryable)Stadium:sport(queryable)
6.4 Import Script Updates
Update cloudkit_import.py to handle new sports in validation:
VALID_SPORTS = {'NBA', 'MLB', 'NHL', 'WNBA', 'MLS', 'NWSL'}
def validate_game_record(game: dict) -> list[str]:
errors = []
if game.get('sport') not in VALID_SPORTS:
errors.append(f"Invalid sport: {game.get('sport')}")
return errors
7. App-Side Integration (SwiftUI)
7.1 Update Sport Enum
File: SportsTime/Core/Models/Domain/Sport.swift
enum Sport: String, Codable, CaseIterable, Identifiable {
case mlb = "MLB"
case nba = "NBA"
case nhl = "NHL"
case nfl = "NFL"
case mls = "MLS"
case wnba = "WNBA"
case nwsl = "NWSL"
var id: String { rawValue }
var displayName: String {
switch self {
case .mlb: return "Major League Baseball"
case .nba: return "National Basketball Association"
case .nhl: return "National Hockey League"
case .nfl: return "National Football League"
case .mls: return "Major League Soccer"
case .wnba: return "Women's National Basketball Association"
case .nwsl: return "National Women's Soccer League"
}
}
var iconName: String {
switch self {
case .mlb: return "baseball.fill"
case .nba: return "basketball.fill"
case .nhl: return "hockey.puck.fill"
case .nfl: return "football.fill"
case .mls: return "soccerball"
case .wnba: return "basketball.fill"
case .nwsl: return "soccerball"
}
}
var color: Color {
switch self {
case .mlb: return .red
case .nba: return .orange
case .nhl: return .blue
case .nfl: return .brown
case .mls: return .green
case .wnba: return .purple
case .nwsl: return .pink
}
}
var seasonMonths: (start: Int, end: Int) {
switch self {
case .mlb: return (3, 10) // March - October
case .nba: return (10, 6) // October - June (wraps)
case .nhl: return (10, 6) // October - June (wraps)
case .nfl: return (9, 2) // September - February (wraps)
case .mls: return (2, 12) // February - December
case .wnba: return (5, 10) // May - October
case .nwsl: return (3, 11) // March - November
}
}
/// Currently supported sports
static var supported: [Sport] {
[.mlb, .nba, .nhl, .wnba, .mls, .nwsl]
}
}
7.2 Trip Planner - No Changes Required
The trip planner uses Sport enum and fetches games by sport. New sports automatically work because:
DataProvider.fetchGames(sports:startDate:endDate:)queries by sport string- Games are filtered by
sportStrings.contains(canonical.sport) - Route planning is sport-agnostic (uses stadium coordinates)
7.3 Stadium Tracker - No Changes Required
Stadium progress uses Stadium.sport field. New sports automatically appear in:
- Stadium list filtering by sport
- Progress tracking per sport
7.4 UI Considerations
Sport Selection Chips: The SportSelectionChip already uses Sport.allCases. Adding new cases automatically adds them to the UI.
Filter Sections: Update default selections if desired:
// In TripCreationViewModel
var selectedSports: Set<Sport> = [.mlb, .nba, .nhl] // Consider adding new sports
8. Testing & Validation
8.1 Data Integrity Checks
Python validation queries (add to validate_canonical.py):
def validate_new_sports(stadiums, teams, games):
"""Validate WNBA, MLS, NWSL data integrity."""
errors = []
# Check all sports have stadiums
for sport in ['WNBA', 'MLS', 'NWSL']:
sport_stadiums = [s for s in stadiums if s['sport'] == sport]
if not sport_stadiums:
errors.append(f"No stadiums for {sport}")
sport_teams = [t for t in teams if t['sport'] == sport]
if not sport_teams:
errors.append(f"No teams for {sport}")
sport_games = [g for g in games if g['sport'] == sport]
if not sport_games:
errors.append(f"No games for {sport}")
# Check team->stadium references
stadium_ids = {s['canonical_id'] for s in stadiums}
for team in teams:
if team['stadium_canonical_id'] not in stadium_ids:
errors.append(f"Team {team['canonical_id']} references unknown stadium {team['stadium_canonical_id']}")
# Check game->team and game->stadium references
team_ids = {t['canonical_id'] for t in teams}
for game in games:
if game['home_team_canonical_id'] not in team_ids:
errors.append(f"Game {game['canonical_id']} references unknown home team")
if game['away_team_canonical_id'] not in team_ids:
errors.append(f"Game {game['canonical_id']} references unknown away team")
if game['stadium_canonical_id'] not in stadium_ids:
errors.append(f"Game {game['canonical_id']} references unknown stadium")
return errors
8.2 App Smoke Tests
-
Sport Selection:
- Open Trip Creation
- Verify WNBA, MLS, NWSL chips appear
- Select each new sport
- Verify games load for date range
-
Trip Planning:
- Select WNBA + dates during WNBA season
- Verify trip results show WNBA games
- Verify stadium locations are correct
-
Stadium Progress:
- Navigate to Progress tab
- Filter by WNBA/MLS/NWSL
- Verify stadium list shows correct venues
-
Mixed Sport Trips:
- Select NBA + WNBA (they share arenas)
- Verify trips correctly handle both sports
- Verify no duplicate stadiums in single stop
8.3 Edge Case Tests
-
Shared Venues:
- Create trip with MLS Atlanta United + NFL Falcons (same venue)
- Verify games at Mercedes-Benz Stadium appear for both sports
-
Canadian Teams (MLS/NWSL):
- Create trip including Toronto FC
- Verify timezone handling is correct
-
Midweek Matches (MLS):
- Verify Wednesday/Thursday games don't break route planning
9. Pipeline Update Summary
run_canonicalization_pipeline.py Changes
# In run_pipeline():
# STAGE 1: SCRAPING
# ... existing NBA, MLB, NHL ...
# NEW: WNBA
print_section(f"WNBA {season}")
wnba_games = scrape_wnba_basketball_reference(season)
wnba_games = assign_stable_ids(wnba_games, 'WNBA', str(season))
all_games.extend(wnba_games)
print(f" Scraped {len(wnba_games)} WNBA games")
# NEW: MLS
print_section(f"MLS {season}")
mls_games = scrape_mls_fbref(season)
mls_games = assign_stable_ids(mls_games, 'MLS', str(season))
all_games.extend(mls_games)
print(f" Scraped {len(mls_games)} MLS games")
# NEW: NWSL
print_section(f"NWSL {season}")
nwsl_games = scrape_nwsl_fbref(season)
nwsl_games = assign_stable_ids(nwsl_games, 'NWSL', str(season))
all_games.extend(nwsl_games)
print(f" Scraped {len(nwsl_games)} NWSL games")
10. Checklist
Definition of Done
- Scraping: WNBA, MLS, NWSL scrapers added and tested
- Team Mappings: All current teams with correct abbreviations
- Stadiums: All venues canonicalized with coordinates
- Canonicalization: Pipeline runs without errors for new sports
- Validation: All integrity checks pass
- CloudKit: Records uploaded successfully
- Swift Enum: Sport cases added with correct metadata
- Trip Planning: New sports can be planned into trips
- Stadium Tracking: New stadiums appear in progress
- No Regressions: Existing MLB/NBA/NHL functionality unchanged
Files Modified
| File | Changes |
|---|---|
Scripts/scrape_schedules.py |
Add team mappings, scrapers |
Scripts/canonicalize_stadiums.py |
Generate new sport stadiums |
Scripts/canonicalize_teams.py |
Add league structure mappings |
Scripts/run_canonicalization_pipeline.py |
Add scraping calls |
Scripts/validate_canonical.py |
Add new sport validation |
Scripts/cloudkit_import.py |
Add sport validation |
SportsTime/Core/Models/Domain/Sport.swift |
Add enum cases |
SportsTime/Resources/stadiums_canonical.json |
New venue records |
SportsTime/Resources/teams_canonical.json |
New team records |
SportsTime/Resources/games_canonical.json |
New game records |