# WNBA, MLS, and NWSL Implementation Guide Complete end-to-end implementation for adding WNBA, MLS, and NWSL to SportsTime. --- ## 1. League Overview ### WNBA (Women's National Basketball Association) - **Teams**: 13 (expanding to 15 by 2026) - **Season**: May - September (regular season), September - October (playoffs) - **Game Cadence**: ~40 games per team, 3-4 games per week - **Special Considerations**: - Many teams share arenas with NBA teams (key for stadium handling) - Olympic break in summer every 4 years - Commissioner's Cup midseason tournament **Shared Venues (WNBA/NBA)**: | WNBA Team | NBA Team | Arena | |-----------|----------|-------| | Atlanta Dream | Hawks | State Farm Arena | | Chicago Sky | Bulls | Wintrust Arena (different) | | Dallas Wings | Mavericks | College Park Center (different) | | Indiana Fever | Pacers | Gainbridge Fieldhouse | | Los Angeles Sparks | Lakers/Clippers | Crypto.com Arena | | Minnesota Lynx | Timberwolves | Target Center | | New York Liberty | Knicks | Barclays Center | | Phoenix Mercury | Suns | Footprint Center | | Washington Mystics | Wizards | Entertainment & Sports Arena (different) | ### MLS (Major League Soccer) - **Teams**: 29 teams (2024), expanding - **Season**: February/March - October (regular season), October - December (playoffs) - **Game Cadence**: 34 games per team, 1-2 games per week - **Special Considerations**: - Some teams share NFL stadiums (Atlanta, Seattle, New England) - Midweek matches (Wednesday/Thursday) common - US Open Cup adds additional games - Canadian teams (Toronto, Vancouver, Montreal) - timezone handling **Shared Venues (MLS/NFL)**: | MLS Team | NFL Team | Stadium | |----------|----------|---------| | Atlanta United | Falcons | Mercedes-Benz Stadium | | Seattle Sounders | Seahawks | Lumen Field | | New England Revolution | Patriots | Gillette Stadium | ### NWSL (National Women's Soccer League) - **Teams**: 14 teams (2024) - **Season**: March - November (regular season + playoffs) - **Game Cadence**: 26 games per team, 1-2 games per week - **Special Considerations**: - Some share MLS stadiums (Portland, Orlando, Kansas City) - Many use smaller soccer-specific venues - Expansion teams frequently added --- ## 2. Schedule & Data Sources ### WNBA Data Sources **Primary: Basketball-Reference (Women)** ``` URL Pattern: https://www.basketball-reference.com/wnba/years/{YEAR}_games.html Example: https://www.basketball-reference.com/wnba/years/2025_games.html ``` **HTML Structure**: ```html
Fri, May 17, 2024 7:30p Dallas Wings Atlanta Dream Gateway Center Arena
``` **Fields Available**: date, time, home_team, away_team, arena, attendance, box_score_link **Secondary: ESPN WNBA** ``` URL Pattern: https://www.espn.com/wnba/schedule/_/date/{YYYYMMDD} ``` ### MLS Data Sources **Primary: FBref (Football Reference)** ``` URL Pattern: https://fbref.com/en/comps/22/{YEAR}/schedule/{YEAR}-Major-League-Soccer-Scores-and-Fixtures Example: https://fbref.com/en/comps/22/2024/schedule/2024-Major-League-Soccer-Scores-and-Fixtures ``` **HTML Structure**: ```html
2024-02-24 19:30 LA Galaxy Inter Miami Dignity Health Sports Park
``` **Fields Available**: date, time (24hr), home_team, away_team, venue, score, attendance **Secondary: MLS Official** ``` URL Pattern: https://www.mlssoccer.com/schedule/scores API Endpoint: https://sportapi.mlssoccer.com/api/matches?culture=en-us&dateFrom={date}&dateTo={date} ``` ### NWSL Data Sources **Primary: FBref (NWSL)** ``` URL Pattern: https://fbref.com/en/comps/182/{YEAR}/schedule/{YEAR}-NWSL-Scores-and-Fixtures Example: https://fbref.com/en/comps/182/2024/schedule/2024-NWSL-Scores-and-Fixtures ``` **HTML Structure**: Same as MLS (FBref standard format) **Secondary: NWSL Official** ``` URL Pattern: https://www.nwslsoccer.com/schedule ``` --- ## 3. Schedule Parser Changes ### File: `Scripts/scrape_schedules.py` #### 3.1 Add Team Mappings (after NHL_TEAMS ~line 180) ```python # ============================================================================= # WNBA TEAMS # ============================================================================= WNBA_TEAMS = { 'ATL': {'name': 'Atlanta Dream', 'city': 'Atlanta', 'arena': 'Gateway Center Arena'}, 'CHI': {'name': 'Chicago Sky', 'city': 'Chicago', 'arena': 'Wintrust Arena'}, 'CON': {'name': 'Connecticut Sun', 'city': 'Uncasville', 'arena': 'Mohegan Sun Arena'}, 'DAL': {'name': 'Dallas Wings', 'city': 'Arlington', 'arena': 'College Park Center'}, 'IND': {'name': 'Indiana Fever', 'city': 'Indianapolis', 'arena': 'Gainbridge Fieldhouse'}, 'LVA': {'name': 'Las Vegas Aces', 'city': 'Las Vegas', 'arena': 'Michelob Ultra Arena'}, 'LAS': {'name': 'Los Angeles Sparks', 'city': 'Los Angeles', 'arena': 'Crypto.com Arena'}, 'MIN': {'name': 'Minnesota Lynx', 'city': 'Minneapolis', 'arena': 'Target Center'}, 'NYL': {'name': 'New York Liberty', 'city': 'Brooklyn', 'arena': 'Barclays Center'}, 'PHO': {'name': 'Phoenix Mercury', 'city': 'Phoenix', 'arena': 'Footprint Center'}, 'SEA': {'name': 'Seattle Storm', 'city': 'Seattle', 'arena': 'Climate Pledge Arena'}, 'WAS': {'name': 'Washington Mystics', 'city': 'Washington', 'arena': 'Entertainment & Sports Arena'}, # Expansion teams (add as announced) 'GSV': {'name': 'Golden State Valkyries', 'city': 'San Francisco', 'arena': 'Chase Center'}, 'POR': {'name': 'Portland Expansion', 'city': 'Portland', 'arena': 'TBD'}, 'TOR': {'name': 'Toronto Expansion', 'city': 'Toronto', 'arena': 'TBD'}, } # ============================================================================= # MLS TEAMS # ============================================================================= MLS_TEAMS = { 'ATL': {'name': 'Atlanta United FC', 'city': 'Atlanta', 'stadium': 'Mercedes-Benz Stadium'}, 'AUS': {'name': 'Austin FC', 'city': 'Austin', 'stadium': 'Q2 Stadium'}, 'CHI': {'name': 'Chicago Fire FC', 'city': 'Chicago', 'stadium': 'Soldier Field'}, 'CIN': {'name': 'FC Cincinnati', 'city': 'Cincinnati', 'stadium': 'TQL Stadium'}, 'CLB': {'name': 'Columbus Crew', 'city': 'Columbus', 'stadium': 'Lower.com Field'}, 'COL': {'name': 'Colorado Rapids', 'city': 'Commerce City', 'stadium': 'Dick\'s Sporting Goods Park'}, 'DAL': {'name': 'FC Dallas', 'city': 'Frisco', 'stadium': 'Toyota Stadium'}, 'DCU': {'name': 'D.C. United', 'city': 'Washington', 'stadium': 'Audi Field'}, 'HOU': {'name': 'Houston Dynamo FC', 'city': 'Houston', 'stadium': 'Shell Energy Stadium'}, 'LAG': {'name': 'LA Galaxy', 'city': 'Carson', 'stadium': 'Dignity Health Sports Park'}, 'LAF': {'name': 'Los Angeles FC', 'city': 'Los Angeles', 'stadium': 'BMO Stadium'}, 'MIA': {'name': 'Inter Miami CF', 'city': 'Fort Lauderdale', 'stadium': 'Chase Stadium'}, 'MIN': {'name': 'Minnesota United FC', 'city': 'Saint Paul', 'stadium': 'Allianz Field'}, 'MTL': {'name': 'CF Montréal', 'city': 'Montreal', 'stadium': 'Stade Saputo'}, 'NSH': {'name': 'Nashville SC', 'city': 'Nashville', 'stadium': 'Geodis Park'}, 'NER': {'name': 'New England Revolution', 'city': 'Foxborough', 'stadium': 'Gillette Stadium'}, 'NYC': {'name': 'New York City FC', 'city': 'New York', 'stadium': 'Yankee Stadium'}, 'NYR': {'name': 'New York Red Bulls', 'city': 'Harrison', 'stadium': 'Red Bull Arena'}, 'ORL': {'name': 'Orlando City SC', 'city': 'Orlando', 'stadium': 'Exploria Stadium'}, 'PHI': {'name': 'Philadelphia Union', 'city': 'Chester', 'stadium': 'Subaru Park'}, 'POR': {'name': 'Portland Timbers', 'city': 'Portland', 'stadium': 'Providence Park'}, 'RSL': {'name': 'Real Salt Lake', 'city': 'Sandy', 'stadium': 'America First Field'}, 'SJE': {'name': 'San Jose Earthquakes', 'city': 'San Jose', 'stadium': 'PayPal Park'}, 'SEA': {'name': 'Seattle Sounders FC', 'city': 'Seattle', 'stadium': 'Lumen Field'}, 'SKC': {'name': 'Sporting Kansas City', 'city': 'Kansas City', 'stadium': 'Children\'s Mercy Park'}, 'STL': {'name': 'St. Louis City SC', 'city': 'St. Louis', 'stadium': 'CityPark'}, 'TOR': {'name': 'Toronto FC', 'city': 'Toronto', 'stadium': 'BMO Field'}, 'VAN': {'name': 'Vancouver Whitecaps FC', 'city': 'Vancouver', 'stadium': 'BC Place'}, 'SDG': {'name': 'San Diego FC', 'city': 'San Diego', 'stadium': 'Snapdragon Stadium'}, # 2025 expansion } # ============================================================================= # NWSL TEAMS # ============================================================================= NWSL_TEAMS = { 'ANG': {'name': 'Angel City FC', 'city': 'Los Angeles', 'stadium': 'BMO Stadium'}, 'CHI': {'name': 'Chicago Red Stars', 'city': 'Chicago', 'stadium': 'SeatGeek Stadium'}, 'HOU': {'name': 'Houston Dash', 'city': 'Houston', 'stadium': 'Shell Energy Stadium'}, 'KCC': {'name': 'Kansas City Current', 'city': 'Kansas City', 'stadium': 'CPKC Stadium'}, 'LOU': {'name': 'Racing Louisville FC', 'city': 'Louisville', 'stadium': 'Lynn Family Stadium'}, 'NCC': {'name': 'North Carolina Courage', 'city': 'Cary', 'stadium': 'WakeMed Soccer Park'}, 'NJG': {'name': 'NJ/NY Gotham FC', 'city': 'Harrison', 'stadium': 'Red Bull Arena'}, 'ORL': {'name': 'Orlando Pride', 'city': 'Orlando', 'stadium': 'Exploria Stadium'}, 'POR': {'name': 'Portland Thorns FC', 'city': 'Portland', 'stadium': 'Providence Park'}, 'SDW': {'name': 'San Diego Wave FC', 'city': 'San Diego', 'stadium': 'Snapdragon Stadium'}, 'SEA': {'name': 'Seattle Reign FC', 'city': 'Seattle', 'stadium': 'Lumen Field'}, 'UTA': {'name': 'Utah Royals FC', 'city': 'Sandy', 'stadium': 'America First Field'}, 'WAS': {'name': 'Washington Spirit', 'city': 'Washington', 'stadium': 'Audi Field'}, 'BAY': {'name': 'Bay FC', 'city': 'San Francisco', 'stadium': 'PayPal Park'}, # 2024 expansion } ``` #### 3.2 Update `get_team_abbrev()` Function ```python def get_team_abbrev(team_name: str, sport: str) -> str: """Get team abbreviation from full name.""" team_maps = { 'NBA': NBA_TEAMS, 'MLB': MLB_TEAMS, 'NHL': NHL_TEAMS, 'WNBA': WNBA_TEAMS, 'MLS': MLS_TEAMS, 'NWSL': NWSL_TEAMS, } teams = team_maps.get(sport, {}) # Direct match on abbreviation for abbrev, data in teams.items(): if team_name.lower() == data['name'].lower(): return abbrev # Partial match (e.g., "Hawks" matches "Atlanta Hawks") if team_name.lower() in data['name'].lower(): return abbrev # Fallback: first 3 characters return team_name[:3].upper() ``` #### 3.3 Add WNBA Scraper ```python def scrape_wnba_basketball_reference(season: int) -> list[Game]: """ Scrape WNBA schedule from Basketball-Reference. URL: https://www.basketball-reference.com/wnba/years/{YEAR}_games.html Season year is the calendar year (e.g., 2025 for 2025 season) """ games = [] url = f"https://www.basketball-reference.com/wnba/years/{season}_games.html" print(f"Scraping WNBA {season} from Basketball-Reference...") soup = fetch_page(url, 'basketball-reference.com') if not soup: return games table = soup.find('table', {'id': 'schedule'}) if not table: print(" No schedule table found") return games tbody = table.find('tbody') if not tbody: return games for row in tbody.find_all('tr'): if row.get('class') and 'thead' in row.get('class'): continue try: # Parse date date_cell = row.find('th', {'data-stat': 'date_game'}) if not date_cell: continue date_link = date_cell.find('a') date_str = date_link.text if date_link else date_cell.text # Parse time time_cell = row.find('td', {'data-stat': 'game_start_time'}) time_str = time_cell.text.strip() if time_cell else None # Parse teams visitor_cell = row.find('td', {'data-stat': 'visitor_team_name'}) home_cell = row.find('td', {'data-stat': 'home_team_name'}) if not visitor_cell or not home_cell: continue away_team = visitor_cell.find('a').text if visitor_cell.find('a') else visitor_cell.text home_team = home_cell.find('a').text if home_cell.find('a') else home_cell.text # Parse arena arena_cell = row.find('td', {'data-stat': 'arena_name'}) arena = arena_cell.text.strip() if arena_cell else '' # Convert date (format: "Sat, May 18, 2024") try: parsed_date = datetime.strptime(date_str.strip(), '%a, %b %d, %Y') date_formatted = parsed_date.strftime('%Y-%m-%d') except: continue # Generate game ID home_abbrev = get_team_abbrev(home_team, 'WNBA') away_abbrev = get_team_abbrev(away_team, 'WNBA') game_id = f"wnba_{date_formatted}_{away_abbrev}_{home_abbrev}".lower() game = Game( id=game_id, sport='WNBA', season=str(season), date=date_formatted, time=time_str, home_team=home_team, away_team=away_team, home_team_abbrev=home_abbrev, away_team_abbrev=away_abbrev, venue=arena, source='basketball-reference.com' ) games.append(game) except Exception as e: print(f" Error parsing row: {e}") continue print(f" Found {len(games)} games from Basketball-Reference") return games ``` #### 3.4 Add MLS Scraper ```python def scrape_mls_fbref(season: int) -> list[Game]: """ Scrape MLS schedule from FBref. URL: https://fbref.com/en/comps/22/{YEAR}/schedule/{YEAR}-Major-League-Soccer-Scores-and-Fixtures """ games = [] url = f"https://fbref.com/en/comps/22/{season}/schedule/{season}-Major-League-Soccer-Scores-and-Fixtures" print(f"Scraping MLS {season} from FBref...") soup = fetch_page(url, 'fbref.com') if not soup: return games # FBref uses table with id like sched_{year}_22_1 table = soup.find('table', {'id': lambda x: x and 'sched_' in x}) if not table: print(" No schedule table found") return games tbody = table.find('tbody') if not tbody: return games for row in tbody.find_all('tr'): try: # Parse date (format: 2024-02-24) date_cell = row.find('td', {'data-stat': 'date'}) if not date_cell: continue date_str = date_cell.text.strip() # Parse time (24hr format: 19:30) time_cell = row.find('td', {'data-stat': 'time'}) time_str = time_cell.text.strip() if time_cell else None # Convert 24hr to 12hr format for consistency if time_str: try: t = datetime.strptime(time_str, '%H:%M') time_str = t.strftime('%I:%M%p').lstrip('0').lower() except: pass # Parse teams home_cell = row.find('td', {'data-stat': 'home_team'}) away_cell = row.find('td', {'data-stat': 'away_team'}) if not home_cell or not away_cell: continue home_team = home_cell.find('a').text if home_cell.find('a') else home_cell.text away_team = away_cell.find('a').text if away_cell.find('a') else away_cell.text home_team = home_team.strip() away_team = away_team.strip() if not home_team or not away_team: continue # Parse venue venue_cell = row.find('td', {'data-stat': 'venue'}) venue = venue_cell.text.strip() if venue_cell else '' # Generate game ID home_abbrev = get_team_abbrev(home_team, 'MLS') away_abbrev = get_team_abbrev(away_team, 'MLS') game_id = f"mls_{date_str}_{away_abbrev}_{home_abbrev}".lower() game = Game( id=game_id, sport='MLS', season=str(season), date=date_str, time=time_str, home_team=home_team, away_team=away_team, home_team_abbrev=home_abbrev, away_team_abbrev=away_abbrev, venue=venue, source='fbref.com' ) games.append(game) except Exception as e: print(f" Error parsing row: {e}") continue print(f" Found {len(games)} games from FBref") return games ``` #### 3.5 Add NWSL Scraper ```python def scrape_nwsl_fbref(season: int) -> list[Game]: """ Scrape NWSL schedule from FBref. URL: https://fbref.com/en/comps/182/{YEAR}/schedule/{YEAR}-NWSL-Scores-and-Fixtures """ games = [] url = f"https://fbref.com/en/comps/182/{season}/schedule/{season}-NWSL-Scores-and-Fixtures" print(f"Scraping NWSL {season} from FBref...") soup = fetch_page(url, 'fbref.com') if not soup: return games table = soup.find('table', {'id': lambda x: x and 'sched_' in x}) if not table: print(" No schedule table found") return games tbody = table.find('tbody') if not tbody: return games for row in tbody.find_all('tr'): try: date_cell = row.find('td', {'data-stat': 'date'}) if not date_cell: continue date_str = date_cell.text.strip() time_cell = row.find('td', {'data-stat': 'time'}) time_str = time_cell.text.strip() if time_cell else None if time_str: try: t = datetime.strptime(time_str, '%H:%M') time_str = t.strftime('%I:%M%p').lstrip('0').lower() except: pass home_cell = row.find('td', {'data-stat': 'home_team'}) away_cell = row.find('td', {'data-stat': 'away_team'}) if not home_cell or not away_cell: continue home_team = home_cell.find('a').text if home_cell.find('a') else home_cell.text away_team = away_cell.find('a').text if away_cell.find('a') else away_cell.text home_team = home_team.strip() away_team = away_team.strip() if not home_team or not away_team: continue venue_cell = row.find('td', {'data-stat': 'venue'}) venue = venue_cell.text.strip() if venue_cell else '' home_abbrev = get_team_abbrev(home_team, 'NWSL') away_abbrev = get_team_abbrev(away_team, 'NWSL') game_id = f"nwsl_{date_str}_{away_abbrev}_{home_abbrev}".lower() game = Game( id=game_id, sport='NWSL', season=str(season), date=date_str, time=time_str, home_team=home_team, away_team=away_team, home_team_abbrev=home_abbrev, away_team_abbrev=away_abbrev, venue=venue, source='fbref.com' ) games.append(game) except Exception as e: print(f" Error parsing row: {e}") continue print(f" Found {len(games)} games from FBref") return games ``` --- ## 4. Stadium & Team Canonicalization ### 4.1 Canonical ID Patterns **Stadiums** (per-sport, even for shared venues): ``` stadium_{sport}_{normalized_name} ``` Examples: - `stadium_wnba_barclays_center` (WNBA Liberty) - `stadium_nba_barclays_center` (NBA Nets) - `stadium_mls_mercedes_benz_stadium` - `stadium_nwsl_providence_park` **Teams**: ``` team_{sport}_{abbrev} ``` Examples: - `team_wnba_nyl` (New York Liberty) - `team_mls_atl` (Atlanta United) - `team_nwsl_por` (Portland Thorns) **Games**: ``` game_{sport}_{season}_{date}_{away}_{home} ``` Examples: - `game_wnba_2025_20250518_dal_atl` - `game_mls_2025_20250301_mia_lag` - `game_nwsl_2025_20250315_por_ang` ### 4.2 Shared Venue Handling **Critical Rule**: Stadiums are per-sport entities. A physical venue shared between sports creates MULTIPLE canonical stadium records. **Example: Barclays Center** ```json // Stadium for NBA Nets { "canonical_id": "stadium_nba_barclays_center", "name": "Barclays Center", "city": "Brooklyn", "sport": "NBA", "primary_team_abbrevs": ["BRK"] } // Stadium for WNBA Liberty { "canonical_id": "stadium_wnba_barclays_center", "name": "Barclays Center", "city": "Brooklyn", "sport": "WNBA", "primary_team_abbrevs": ["NYL"] } ``` **Rationale**: Trip planning needs sport-specific filtering. A user planning an NBA trip shouldn't see WNBA games unless explicitly requested. ### 4.3 Update `canonicalize_stadiums.py` Add to `generate_stadiums_from_teams()`: ```python def generate_stadiums_from_teams() -> list[Stadium]: """Generate stadium entries from team mappings.""" stadiums = [] # Existing: NBA, MLB, NHL for abbrev, data in NBA_TEAMS.items(): stadiums.append(create_stadium(data, 'NBA', [abbrev])) # ... existing MLB, NHL # NEW: WNBA for abbrev, data in WNBA_TEAMS.items(): stadiums.append(Stadium( id=f"wnba_{normalize_name(data['arena'])}", name=data['arena'], city=data['city'], state=get_state_for_city(data['city']), latitude=0.0, # Geocoded later longitude=0.0, capacity=0, sport='WNBA', team_abbrevs=[abbrev], source='team_mapping' )) # NEW: MLS for abbrev, data in MLS_TEAMS.items(): stadiums.append(Stadium( id=f"mls_{normalize_name(data['stadium'])}", name=data['stadium'], city=data['city'], state=get_state_for_city(data['city']), latitude=0.0, longitude=0.0, capacity=0, sport='MLS', team_abbrevs=[abbrev], source='team_mapping' )) # NEW: NWSL for abbrev, data in NWSL_TEAMS.items(): stadiums.append(Stadium( id=f"nwsl_{normalize_name(data['stadium'])}", name=data['stadium'], city=data['city'], state=get_state_for_city(data['city']), latitude=0.0, longitude=0.0, capacity=0, sport='NWSL', team_abbrevs=[abbrev], source='team_mapping' )) return stadiums ``` ### 4.4 Update `canonicalize_teams.py` Add league structure mappings: ```python # WNBA has no conferences/divisions in traditional sense WNBA_DIVISIONS = {abbrev: (None, None) for abbrev in WNBA_TEAMS} # MLS Conferences MLS_DIVISIONS = { # Eastern Conference 'ATL': ('mls_eastern', None), 'CHI': ('mls_eastern', None), 'CIN': ('mls_eastern', None), 'CLB': ('mls_eastern', None), 'DCU': ('mls_eastern', None), 'MIA': ('mls_eastern', None), 'MTL': ('mls_eastern', None), 'NSH': ('mls_eastern', None), 'NER': ('mls_eastern', None), 'NYC': ('mls_eastern', None), 'NYR': ('mls_eastern', None), 'ORL': ('mls_eastern', None), 'PHI': ('mls_eastern', None), 'TOR': ('mls_eastern', None), # Western Conference 'AUS': ('mls_western', None), 'COL': ('mls_western', None), 'DAL': ('mls_western', None), 'HOU': ('mls_western', None), 'LAG': ('mls_western', None), 'LAF': ('mls_western', None), 'MIN': ('mls_western', None), 'POR': ('mls_western', None), 'RSL': ('mls_western', None), 'SJE': ('mls_western', None), 'SEA': ('mls_western', None), 'SKC': ('mls_western', None), 'STL': ('mls_western', None), 'VAN': ('mls_western', None), 'SDG': ('mls_western', None), } # NWSL has no conferences NWSL_DIVISIONS = {abbrev: (None, None) for abbrev in NWSL_TEAMS} ``` --- ## 5. Local Canonical JSON Updates ### 5.1 stadiums_canonical.json New entries follow existing format: ```json { "canonical_id": "stadium_wnba_barclays_center", "name": "Barclays Center", "city": "Brooklyn", "state": "NY", "latitude": 40.6826, "longitude": -73.9754, "capacity": 17732, "sport": "WNBA", "primary_team_abbrevs": ["NYL"], "year_opened": 2012 } ``` ### 5.2 teams_canonical.json ```json { "canonical_id": "team_wnba_nyl", "name": "New York Liberty", "abbreviation": "NYL", "sport": "WNBA", "city": "Brooklyn", "stadium_canonical_id": "stadium_wnba_barclays_center", "conference_id": null, "division_id": null, "primary_color": "#6ECEB2", "secondary_color": "#000000" } ``` ### 5.3 games_canonical.json ```json { "canonical_id": "game_wnba_2025_20250518_dal_atl", "sport": "WNBA", "season": "2025", "date": "2025-05-18", "time": "7:30p", "home_team_canonical_id": "team_wnba_atl", "away_team_canonical_id": "team_wnba_dal", "stadium_canonical_id": "stadium_wnba_gateway_center_arena", "is_playoff": false, "broadcast": null } ``` ### 5.4 Validation Rules Update `validate_canonical.py`: ```python VALID_SPORTS = {'NBA', 'MLB', 'NHL', 'WNBA', 'MLS', 'NWSL'} def validate_sport_field(sport: str) -> list[str]: """Validate sport is one of the supported values.""" errors = [] if sport not in VALID_SPORTS: errors.append(f"Invalid sport: {sport}. Must be one of {VALID_SPORTS}") return errors ``` --- ## 6. CloudKit Integration ### 6.1 Record Types (Already Exist) No new record types needed. Existing types support new sports: - `Stadium` - add records with sport="WNBA"/"MLS"/"NWSL" - `Team` - add records with sport="WNBA"/"MLS"/"NWSL" - `Game` - add records with sport="WNBA"/"MLS"/"NWSL" - `StadiumAlias` - unchanged - `TeamAlias` - unchanged - `LeagueStructure` - add new entries for MLS conferences ### 6.2 Field Mapping (Unchanged) **Stadium Record**: ``` recordName: canonical_id (e.g., "stadium_wnba_barclays_center") fields: - uuid: STRING (deterministic from canonical_id) - name: STRING - city: STRING - state: STRING - latitude: DOUBLE - longitude: DOUBLE - capacity: INT64 - sport: STRING ("WNBA", "MLS", "NWSL") - yearOpened: INT64 - imageURL: STRING (optional) - lastModified: TIMESTAMP - schemaVersion: INT64 ``` **Team Record**: ``` recordName: canonical_id (e.g., "team_wnba_nyl") fields: - uuid: STRING - name: STRING - abbreviation: STRING - sport: STRING - city: STRING - stadiumCanonicalId: STRING (reference by canonical_id) - conferenceId: STRING (optional) - divisionId: STRING (optional) - primaryColor: STRING - secondaryColor: STRING - lastModified: TIMESTAMP - schemaVersion: INT64 ``` **Game Record**: ``` recordName: canonical_id (e.g., "game_wnba_2025_20250518_dal_atl") fields: - uuid: STRING - sport: STRING - season: STRING - dateTime: TIMESTAMP - homeTeamCanonicalId: STRING - awayTeamCanonicalId: STRING - stadiumCanonicalId: STRING - isPlayoff: INT64 (0 or 1) - broadcastInfo: STRING (optional) - lastModified: TIMESTAMP - schemaVersion: INT64 ``` ### 6.3 Index Requirements Ensure CloudKit has indexes for: - `Game`: `sport` (sortable), `dateTime` (sortable, queryable) - `Team`: `sport` (queryable) - `Stadium`: `sport` (queryable) ### 6.4 Import Script Updates Update `cloudkit_import.py` to handle new sports in validation: ```python VALID_SPORTS = {'NBA', 'MLB', 'NHL', 'WNBA', 'MLS', 'NWSL'} def validate_game_record(game: dict) -> list[str]: errors = [] if game.get('sport') not in VALID_SPORTS: errors.append(f"Invalid sport: {game.get('sport')}") return errors ``` --- ## 7. App-Side Integration (SwiftUI) ### 7.1 Update Sport Enum **File**: `SportsTime/Core/Models/Domain/Sport.swift` ```swift enum Sport: String, Codable, CaseIterable, Identifiable { case mlb = "MLB" case nba = "NBA" case nhl = "NHL" case nfl = "NFL" case mls = "MLS" case wnba = "WNBA" case nwsl = "NWSL" var id: String { rawValue } var displayName: String { switch self { case .mlb: return "Major League Baseball" case .nba: return "National Basketball Association" case .nhl: return "National Hockey League" case .nfl: return "National Football League" case .mls: return "Major League Soccer" case .wnba: return "Women's National Basketball Association" case .nwsl: return "National Women's Soccer League" } } var iconName: String { switch self { case .mlb: return "baseball.fill" case .nba: return "basketball.fill" case .nhl: return "hockey.puck.fill" case .nfl: return "football.fill" case .mls: return "soccerball" case .wnba: return "basketball.fill" case .nwsl: return "soccerball" } } var color: Color { switch self { case .mlb: return .red case .nba: return .orange case .nhl: return .blue case .nfl: return .brown case .mls: return .green case .wnba: return .purple case .nwsl: return .pink } } var seasonMonths: (start: Int, end: Int) { switch self { case .mlb: return (3, 10) // March - October case .nba: return (10, 6) // October - June (wraps) case .nhl: return (10, 6) // October - June (wraps) case .nfl: return (9, 2) // September - February (wraps) case .mls: return (2, 12) // February - December case .wnba: return (5, 10) // May - October case .nwsl: return (3, 11) // March - November } } /// Currently supported sports static var supported: [Sport] { [.mlb, .nba, .nhl, .wnba, .mls, .nwsl] } } ``` ### 7.2 Trip Planner - No Changes Required The trip planner uses `Sport` enum and fetches games by sport. New sports automatically work because: 1. `DataProvider.fetchGames(sports:startDate:endDate:)` queries by sport string 2. Games are filtered by `sportStrings.contains(canonical.sport)` 3. Route planning is sport-agnostic (uses stadium coordinates) ### 7.3 Stadium Tracker - No Changes Required Stadium progress uses `Stadium.sport` field. New sports automatically appear in: - Stadium list filtering by sport - Progress tracking per sport ### 7.4 UI Considerations **Sport Selection Chips**: The `SportSelectionChip` already uses `Sport.allCases`. Adding new cases automatically adds them to the UI. **Filter Sections**: Update default selections if desired: ```swift // In TripCreationViewModel var selectedSports: Set = [.mlb, .nba, .nhl] // Consider adding new sports ``` --- ## 8. Testing & Validation ### 8.1 Data Integrity Checks **Python validation queries** (add to `validate_canonical.py`): ```python def validate_new_sports(stadiums, teams, games): """Validate WNBA, MLS, NWSL data integrity.""" errors = [] # Check all sports have stadiums for sport in ['WNBA', 'MLS', 'NWSL']: sport_stadiums = [s for s in stadiums if s['sport'] == sport] if not sport_stadiums: errors.append(f"No stadiums for {sport}") sport_teams = [t for t in teams if t['sport'] == sport] if not sport_teams: errors.append(f"No teams for {sport}") sport_games = [g for g in games if g['sport'] == sport] if not sport_games: errors.append(f"No games for {sport}") # Check team->stadium references stadium_ids = {s['canonical_id'] for s in stadiums} for team in teams: if team['stadium_canonical_id'] not in stadium_ids: errors.append(f"Team {team['canonical_id']} references unknown stadium {team['stadium_canonical_id']}") # Check game->team and game->stadium references team_ids = {t['canonical_id'] for t in teams} for game in games: if game['home_team_canonical_id'] not in team_ids: errors.append(f"Game {game['canonical_id']} references unknown home team") if game['away_team_canonical_id'] not in team_ids: errors.append(f"Game {game['canonical_id']} references unknown away team") if game['stadium_canonical_id'] not in stadium_ids: errors.append(f"Game {game['canonical_id']} references unknown stadium") return errors ``` ### 8.2 App Smoke Tests 1. **Sport Selection**: - Open Trip Creation - Verify WNBA, MLS, NWSL chips appear - Select each new sport - Verify games load for date range 2. **Trip Planning**: - Select WNBA + dates during WNBA season - Verify trip results show WNBA games - Verify stadium locations are correct 3. **Stadium Progress**: - Navigate to Progress tab - Filter by WNBA/MLS/NWSL - Verify stadium list shows correct venues 4. **Mixed Sport Trips**: - Select NBA + WNBA (they share arenas) - Verify trips correctly handle both sports - Verify no duplicate stadiums in single stop ### 8.3 Edge Case Tests 1. **Shared Venues**: - Create trip with MLS Atlanta United + NFL Falcons (same venue) - Verify games at Mercedes-Benz Stadium appear for both sports 2. **Canadian Teams** (MLS/NWSL): - Create trip including Toronto FC - Verify timezone handling is correct 3. **Midweek Matches** (MLS): - Verify Wednesday/Thursday games don't break route planning --- ## 9. Pipeline Update Summary ### run_canonicalization_pipeline.py Changes ```python # In run_pipeline(): # STAGE 1: SCRAPING # ... existing NBA, MLB, NHL ... # NEW: WNBA print_section(f"WNBA {season}") wnba_games = scrape_wnba_basketball_reference(season) wnba_games = assign_stable_ids(wnba_games, 'WNBA', str(season)) all_games.extend(wnba_games) print(f" Scraped {len(wnba_games)} WNBA games") # NEW: MLS print_section(f"MLS {season}") mls_games = scrape_mls_fbref(season) mls_games = assign_stable_ids(mls_games, 'MLS', str(season)) all_games.extend(mls_games) print(f" Scraped {len(mls_games)} MLS games") # NEW: NWSL print_section(f"NWSL {season}") nwsl_games = scrape_nwsl_fbref(season) nwsl_games = assign_stable_ids(nwsl_games, 'NWSL', str(season)) all_games.extend(nwsl_games) print(f" Scraped {len(nwsl_games)} NWSL games") ``` --- ## 10. Checklist ### Definition of Done - [ ] **Scraping**: WNBA, MLS, NWSL scrapers added and tested - [ ] **Team Mappings**: All current teams with correct abbreviations - [ ] **Stadiums**: All venues canonicalized with coordinates - [ ] **Canonicalization**: Pipeline runs without errors for new sports - [ ] **Validation**: All integrity checks pass - [ ] **CloudKit**: Records uploaded successfully - [ ] **Swift Enum**: Sport cases added with correct metadata - [ ] **Trip Planning**: New sports can be planned into trips - [ ] **Stadium Tracking**: New stadiums appear in progress - [ ] **No Regressions**: Existing MLB/NBA/NHL functionality unchanged ### Files Modified | File | Changes | |------|---------| | `Scripts/scrape_schedules.py` | Add team mappings, scrapers | | `Scripts/canonicalize_stadiums.py` | Generate new sport stadiums | | `Scripts/canonicalize_teams.py` | Add league structure mappings | | `Scripts/run_canonicalization_pipeline.py` | Add scraping calls | | `Scripts/validate_canonical.py` | Add new sport validation | | `Scripts/cloudkit_import.py` | Add sport validation | | `SportsTime/Core/Models/Domain/Sport.swift` | Add enum cases | | `SportsTime/Resources/stadiums_canonical.json` | New venue records | | `SportsTime/Resources/teams_canonical.json` | New team records | | `SportsTime/Resources/games_canonical.json` | New game records |