Files
Sportstime/docs/WNBA_MLS_NWSL_IMPLEMENTATION.md
Trey t 8790d2ad73 Remove CFB/NASCAR/PGA and streamline to 8 supported sports
- Remove College Football, NASCAR, and PGA from scraper and app
- Clean all data files (stadiums, games, pipeline reports)
- Update Sport.swift enum and all UI components
- Add sportstime.py CLI tool for pipeline management
- Add DATA_SCRAPING.md documentation
- Add WNBA/MLS/NWSL implementation documentation
- Scraper now supports: NBA, MLB, NHL, NFL, WNBA, MLS, NWSL, CBB

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 23:22:13 -06:00

36 KiB

WNBA, MLS, and NWSL Implementation Guide

Complete end-to-end implementation for adding WNBA, MLS, and NWSL to SportsTime.


1. League Overview

WNBA (Women's National Basketball Association)

  • Teams: 13 (expanding to 15 by 2026)
  • Season: May - September (regular season), September - October (playoffs)
  • Game Cadence: ~40 games per team, 3-4 games per week
  • Special Considerations:
    • Many teams share arenas with NBA teams (key for stadium handling)
    • Olympic break in summer every 4 years
    • Commissioner's Cup midseason tournament

Shared Venues (WNBA/NBA):

WNBA Team NBA Team Arena
Atlanta Dream Hawks State Farm Arena
Chicago Sky Bulls Wintrust Arena (different)
Dallas Wings Mavericks College Park Center (different)
Indiana Fever Pacers Gainbridge Fieldhouse
Los Angeles Sparks Lakers/Clippers Crypto.com Arena
Minnesota Lynx Timberwolves Target Center
New York Liberty Knicks Barclays Center
Phoenix Mercury Suns Footprint Center
Washington Mystics Wizards Entertainment & Sports Arena (different)

MLS (Major League Soccer)

  • Teams: 29 teams (2024), expanding
  • Season: February/March - October (regular season), October - December (playoffs)
  • Game Cadence: 34 games per team, 1-2 games per week
  • Special Considerations:
    • Some teams share NFL stadiums (Atlanta, Seattle, New England)
    • Midweek matches (Wednesday/Thursday) common
    • US Open Cup adds additional games
    • Canadian teams (Toronto, Vancouver, Montreal) - timezone handling

Shared Venues (MLS/NFL):

MLS Team NFL Team Stadium
Atlanta United Falcons Mercedes-Benz Stadium
Seattle Sounders Seahawks Lumen Field
New England Revolution Patriots Gillette Stadium

NWSL (National Women's Soccer League)

  • Teams: 14 teams (2024)
  • Season: March - November (regular season + playoffs)
  • Game Cadence: 26 games per team, 1-2 games per week
  • Special Considerations:
    • Some share MLS stadiums (Portland, Orlando, Kansas City)
    • Many use smaller soccer-specific venues
    • Expansion teams frequently added

2. Schedule & Data Sources

WNBA Data Sources

Primary: Basketball-Reference (Women)

URL Pattern: https://www.basketball-reference.com/wnba/years/{YEAR}_games.html
Example: https://www.basketball-reference.com/wnba/years/2025_games.html

HTML Structure:

<table id="schedule">
  <tbody>
    <tr>
      <th data-stat="date_game">Fri, May 17, 2024</th>
      <td data-stat="game_start_time">7:30p</td>
      <td data-stat="visitor_team_name">Dallas Wings</td>
      <td data-stat="home_team_name">Atlanta Dream</td>
      <td data-stat="arena_name">Gateway Center Arena</td>
    </tr>
  </tbody>
</table>

Fields Available: date, time, home_team, away_team, arena, attendance, box_score_link

Secondary: ESPN WNBA

URL Pattern: https://www.espn.com/wnba/schedule/_/date/{YYYYMMDD}

MLS Data Sources

Primary: FBref (Football Reference)

URL Pattern: https://fbref.com/en/comps/22/{YEAR}/schedule/{YEAR}-Major-League-Soccer-Scores-and-Fixtures
Example: https://fbref.com/en/comps/22/2024/schedule/2024-Major-League-Soccer-Scores-and-Fixtures

HTML Structure:

<table id="sched_2024_22_1">
  <tbody>
    <tr>
      <td data-stat="date">2024-02-24</td>
      <td data-stat="time">19:30</td>
      <td data-stat="home_team">LA Galaxy</td>
      <td data-stat="away_team">Inter Miami</td>
      <td data-stat="venue">Dignity Health Sports Park</td>
    </tr>
  </tbody>
</table>

Fields Available: date, time (24hr), home_team, away_team, venue, score, attendance

Secondary: MLS Official

URL Pattern: https://www.mlssoccer.com/schedule/scores
API Endpoint: https://sportapi.mlssoccer.com/api/matches?culture=en-us&dateFrom={date}&dateTo={date}

NWSL Data Sources

Primary: FBref (NWSL)

URL Pattern: https://fbref.com/en/comps/182/{YEAR}/schedule/{YEAR}-NWSL-Scores-and-Fixtures
Example: https://fbref.com/en/comps/182/2024/schedule/2024-NWSL-Scores-and-Fixtures

HTML Structure: Same as MLS (FBref standard format)

Secondary: NWSL Official

URL Pattern: https://www.nwslsoccer.com/schedule

3. Schedule Parser Changes

File: Scripts/scrape_schedules.py

3.1 Add Team Mappings (after NHL_TEAMS ~line 180)

# =============================================================================
# WNBA TEAMS
# =============================================================================

WNBA_TEAMS = {
    'ATL': {'name': 'Atlanta Dream', 'city': 'Atlanta', 'arena': 'Gateway Center Arena'},
    'CHI': {'name': 'Chicago Sky', 'city': 'Chicago', 'arena': 'Wintrust Arena'},
    'CON': {'name': 'Connecticut Sun', 'city': 'Uncasville', 'arena': 'Mohegan Sun Arena'},
    'DAL': {'name': 'Dallas Wings', 'city': 'Arlington', 'arena': 'College Park Center'},
    'IND': {'name': 'Indiana Fever', 'city': 'Indianapolis', 'arena': 'Gainbridge Fieldhouse'},
    'LVA': {'name': 'Las Vegas Aces', 'city': 'Las Vegas', 'arena': 'Michelob Ultra Arena'},
    'LAS': {'name': 'Los Angeles Sparks', 'city': 'Los Angeles', 'arena': 'Crypto.com Arena'},
    'MIN': {'name': 'Minnesota Lynx', 'city': 'Minneapolis', 'arena': 'Target Center'},
    'NYL': {'name': 'New York Liberty', 'city': 'Brooklyn', 'arena': 'Barclays Center'},
    'PHO': {'name': 'Phoenix Mercury', 'city': 'Phoenix', 'arena': 'Footprint Center'},
    'SEA': {'name': 'Seattle Storm', 'city': 'Seattle', 'arena': 'Climate Pledge Arena'},
    'WAS': {'name': 'Washington Mystics', 'city': 'Washington', 'arena': 'Entertainment & Sports Arena'},
    # Expansion teams (add as announced)
    'GSV': {'name': 'Golden State Valkyries', 'city': 'San Francisco', 'arena': 'Chase Center'},
    'POR': {'name': 'Portland Expansion', 'city': 'Portland', 'arena': 'TBD'},
    'TOR': {'name': 'Toronto Expansion', 'city': 'Toronto', 'arena': 'TBD'},
}

# =============================================================================
# MLS TEAMS
# =============================================================================

MLS_TEAMS = {
    'ATL': {'name': 'Atlanta United FC', 'city': 'Atlanta', 'stadium': 'Mercedes-Benz Stadium'},
    'AUS': {'name': 'Austin FC', 'city': 'Austin', 'stadium': 'Q2 Stadium'},
    'CHI': {'name': 'Chicago Fire FC', 'city': 'Chicago', 'stadium': 'Soldier Field'},
    'CIN': {'name': 'FC Cincinnati', 'city': 'Cincinnati', 'stadium': 'TQL Stadium'},
    'CLB': {'name': 'Columbus Crew', 'city': 'Columbus', 'stadium': 'Lower.com Field'},
    'COL': {'name': 'Colorado Rapids', 'city': 'Commerce City', 'stadium': 'Dick\'s Sporting Goods Park'},
    'DAL': {'name': 'FC Dallas', 'city': 'Frisco', 'stadium': 'Toyota Stadium'},
    'DCU': {'name': 'D.C. United', 'city': 'Washington', 'stadium': 'Audi Field'},
    'HOU': {'name': 'Houston Dynamo FC', 'city': 'Houston', 'stadium': 'Shell Energy Stadium'},
    'LAG': {'name': 'LA Galaxy', 'city': 'Carson', 'stadium': 'Dignity Health Sports Park'},
    'LAF': {'name': 'Los Angeles FC', 'city': 'Los Angeles', 'stadium': 'BMO Stadium'},
    'MIA': {'name': 'Inter Miami CF', 'city': 'Fort Lauderdale', 'stadium': 'Chase Stadium'},
    'MIN': {'name': 'Minnesota United FC', 'city': 'Saint Paul', 'stadium': 'Allianz Field'},
    'MTL': {'name': 'CF Montréal', 'city': 'Montreal', 'stadium': 'Stade Saputo'},
    'NSH': {'name': 'Nashville SC', 'city': 'Nashville', 'stadium': 'Geodis Park'},
    'NER': {'name': 'New England Revolution', 'city': 'Foxborough', 'stadium': 'Gillette Stadium'},
    'NYC': {'name': 'New York City FC', 'city': 'New York', 'stadium': 'Yankee Stadium'},
    'NYR': {'name': 'New York Red Bulls', 'city': 'Harrison', 'stadium': 'Red Bull Arena'},
    'ORL': {'name': 'Orlando City SC', 'city': 'Orlando', 'stadium': 'Exploria Stadium'},
    'PHI': {'name': 'Philadelphia Union', 'city': 'Chester', 'stadium': 'Subaru Park'},
    'POR': {'name': 'Portland Timbers', 'city': 'Portland', 'stadium': 'Providence Park'},
    'RSL': {'name': 'Real Salt Lake', 'city': 'Sandy', 'stadium': 'America First Field'},
    'SJE': {'name': 'San Jose Earthquakes', 'city': 'San Jose', 'stadium': 'PayPal Park'},
    'SEA': {'name': 'Seattle Sounders FC', 'city': 'Seattle', 'stadium': 'Lumen Field'},
    'SKC': {'name': 'Sporting Kansas City', 'city': 'Kansas City', 'stadium': 'Children\'s Mercy Park'},
    'STL': {'name': 'St. Louis City SC', 'city': 'St. Louis', 'stadium': 'CityPark'},
    'TOR': {'name': 'Toronto FC', 'city': 'Toronto', 'stadium': 'BMO Field'},
    'VAN': {'name': 'Vancouver Whitecaps FC', 'city': 'Vancouver', 'stadium': 'BC Place'},
    'SDG': {'name': 'San Diego FC', 'city': 'San Diego', 'stadium': 'Snapdragon Stadium'},  # 2025 expansion
}

# =============================================================================
# NWSL TEAMS
# =============================================================================

NWSL_TEAMS = {
    'ANG': {'name': 'Angel City FC', 'city': 'Los Angeles', 'stadium': 'BMO Stadium'},
    'CHI': {'name': 'Chicago Red Stars', 'city': 'Chicago', 'stadium': 'SeatGeek Stadium'},
    'HOU': {'name': 'Houston Dash', 'city': 'Houston', 'stadium': 'Shell Energy Stadium'},
    'KCC': {'name': 'Kansas City Current', 'city': 'Kansas City', 'stadium': 'CPKC Stadium'},
    'LOU': {'name': 'Racing Louisville FC', 'city': 'Louisville', 'stadium': 'Lynn Family Stadium'},
    'NCC': {'name': 'North Carolina Courage', 'city': 'Cary', 'stadium': 'WakeMed Soccer Park'},
    'NJG': {'name': 'NJ/NY Gotham FC', 'city': 'Harrison', 'stadium': 'Red Bull Arena'},
    'ORL': {'name': 'Orlando Pride', 'city': 'Orlando', 'stadium': 'Exploria Stadium'},
    'POR': {'name': 'Portland Thorns FC', 'city': 'Portland', 'stadium': 'Providence Park'},
    'SDW': {'name': 'San Diego Wave FC', 'city': 'San Diego', 'stadium': 'Snapdragon Stadium'},
    'SEA': {'name': 'Seattle Reign FC', 'city': 'Seattle', 'stadium': 'Lumen Field'},
    'UTA': {'name': 'Utah Royals FC', 'city': 'Sandy', 'stadium': 'America First Field'},
    'WAS': {'name': 'Washington Spirit', 'city': 'Washington', 'stadium': 'Audi Field'},
    'BAY': {'name': 'Bay FC', 'city': 'San Francisco', 'stadium': 'PayPal Park'},  # 2024 expansion
}

3.2 Update get_team_abbrev() Function

def get_team_abbrev(team_name: str, sport: str) -> str:
    """Get team abbreviation from full name."""
    team_maps = {
        'NBA': NBA_TEAMS,
        'MLB': MLB_TEAMS,
        'NHL': NHL_TEAMS,
        'WNBA': WNBA_TEAMS,
        'MLS': MLS_TEAMS,
        'NWSL': NWSL_TEAMS,
    }

    teams = team_maps.get(sport, {})

    # Direct match on abbreviation
    for abbrev, data in teams.items():
        if team_name.lower() == data['name'].lower():
            return abbrev
        # Partial match (e.g., "Hawks" matches "Atlanta Hawks")
        if team_name.lower() in data['name'].lower():
            return abbrev

    # Fallback: first 3 characters
    return team_name[:3].upper()

3.3 Add WNBA Scraper

def scrape_wnba_basketball_reference(season: int) -> list[Game]:
    """
    Scrape WNBA schedule from Basketball-Reference.
    URL: https://www.basketball-reference.com/wnba/years/{YEAR}_games.html
    Season year is the calendar year (e.g., 2025 for 2025 season)
    """
    games = []
    url = f"https://www.basketball-reference.com/wnba/years/{season}_games.html"

    print(f"Scraping WNBA {season} from Basketball-Reference...")
    soup = fetch_page(url, 'basketball-reference.com')

    if not soup:
        return games

    table = soup.find('table', {'id': 'schedule'})
    if not table:
        print("  No schedule table found")
        return games

    tbody = table.find('tbody')
    if not tbody:
        return games

    for row in tbody.find_all('tr'):
        if row.get('class') and 'thead' in row.get('class'):
            continue

        try:
            # Parse date
            date_cell = row.find('th', {'data-stat': 'date_game'})
            if not date_cell:
                continue
            date_link = date_cell.find('a')
            date_str = date_link.text if date_link else date_cell.text

            # Parse time
            time_cell = row.find('td', {'data-stat': 'game_start_time'})
            time_str = time_cell.text.strip() if time_cell else None

            # Parse teams
            visitor_cell = row.find('td', {'data-stat': 'visitor_team_name'})
            home_cell = row.find('td', {'data-stat': 'home_team_name'})

            if not visitor_cell or not home_cell:
                continue

            away_team = visitor_cell.find('a').text if visitor_cell.find('a') else visitor_cell.text
            home_team = home_cell.find('a').text if home_cell.find('a') else home_cell.text

            # Parse arena
            arena_cell = row.find('td', {'data-stat': 'arena_name'})
            arena = arena_cell.text.strip() if arena_cell else ''

            # Convert date (format: "Sat, May 18, 2024")
            try:
                parsed_date = datetime.strptime(date_str.strip(), '%a, %b %d, %Y')
                date_formatted = parsed_date.strftime('%Y-%m-%d')
            except:
                continue

            # Generate game ID
            home_abbrev = get_team_abbrev(home_team, 'WNBA')
            away_abbrev = get_team_abbrev(away_team, 'WNBA')
            game_id = f"wnba_{date_formatted}_{away_abbrev}_{home_abbrev}".lower()

            game = Game(
                id=game_id,
                sport='WNBA',
                season=str(season),
                date=date_formatted,
                time=time_str,
                home_team=home_team,
                away_team=away_team,
                home_team_abbrev=home_abbrev,
                away_team_abbrev=away_abbrev,
                venue=arena,
                source='basketball-reference.com'
            )
            games.append(game)

        except Exception as e:
            print(f"  Error parsing row: {e}")
            continue

    print(f"  Found {len(games)} games from Basketball-Reference")
    return games

3.4 Add MLS Scraper

def scrape_mls_fbref(season: int) -> list[Game]:
    """
    Scrape MLS schedule from FBref.
    URL: https://fbref.com/en/comps/22/{YEAR}/schedule/{YEAR}-Major-League-Soccer-Scores-and-Fixtures
    """
    games = []
    url = f"https://fbref.com/en/comps/22/{season}/schedule/{season}-Major-League-Soccer-Scores-and-Fixtures"

    print(f"Scraping MLS {season} from FBref...")
    soup = fetch_page(url, 'fbref.com')

    if not soup:
        return games

    # FBref uses table with id like sched_{year}_22_1
    table = soup.find('table', {'id': lambda x: x and 'sched_' in x})
    if not table:
        print("  No schedule table found")
        return games

    tbody = table.find('tbody')
    if not tbody:
        return games

    for row in tbody.find_all('tr'):
        try:
            # Parse date (format: 2024-02-24)
            date_cell = row.find('td', {'data-stat': 'date'})
            if not date_cell:
                continue
            date_str = date_cell.text.strip()

            # Parse time (24hr format: 19:30)
            time_cell = row.find('td', {'data-stat': 'time'})
            time_str = time_cell.text.strip() if time_cell else None

            # Convert 24hr to 12hr format for consistency
            if time_str:
                try:
                    t = datetime.strptime(time_str, '%H:%M')
                    time_str = t.strftime('%I:%M%p').lstrip('0').lower()
                except:
                    pass

            # Parse teams
            home_cell = row.find('td', {'data-stat': 'home_team'})
            away_cell = row.find('td', {'data-stat': 'away_team'})

            if not home_cell or not away_cell:
                continue

            home_team = home_cell.find('a').text if home_cell.find('a') else home_cell.text
            away_team = away_cell.find('a').text if away_cell.find('a') else away_cell.text

            home_team = home_team.strip()
            away_team = away_team.strip()

            if not home_team or not away_team:
                continue

            # Parse venue
            venue_cell = row.find('td', {'data-stat': 'venue'})
            venue = venue_cell.text.strip() if venue_cell else ''

            # Generate game ID
            home_abbrev = get_team_abbrev(home_team, 'MLS')
            away_abbrev = get_team_abbrev(away_team, 'MLS')
            game_id = f"mls_{date_str}_{away_abbrev}_{home_abbrev}".lower()

            game = Game(
                id=game_id,
                sport='MLS',
                season=str(season),
                date=date_str,
                time=time_str,
                home_team=home_team,
                away_team=away_team,
                home_team_abbrev=home_abbrev,
                away_team_abbrev=away_abbrev,
                venue=venue,
                source='fbref.com'
            )
            games.append(game)

        except Exception as e:
            print(f"  Error parsing row: {e}")
            continue

    print(f"  Found {len(games)} games from FBref")
    return games

3.5 Add NWSL Scraper

def scrape_nwsl_fbref(season: int) -> list[Game]:
    """
    Scrape NWSL schedule from FBref.
    URL: https://fbref.com/en/comps/182/{YEAR}/schedule/{YEAR}-NWSL-Scores-and-Fixtures
    """
    games = []
    url = f"https://fbref.com/en/comps/182/{season}/schedule/{season}-NWSL-Scores-and-Fixtures"

    print(f"Scraping NWSL {season} from FBref...")
    soup = fetch_page(url, 'fbref.com')

    if not soup:
        return games

    table = soup.find('table', {'id': lambda x: x and 'sched_' in x})
    if not table:
        print("  No schedule table found")
        return games

    tbody = table.find('tbody')
    if not tbody:
        return games

    for row in tbody.find_all('tr'):
        try:
            date_cell = row.find('td', {'data-stat': 'date'})
            if not date_cell:
                continue
            date_str = date_cell.text.strip()

            time_cell = row.find('td', {'data-stat': 'time'})
            time_str = time_cell.text.strip() if time_cell else None

            if time_str:
                try:
                    t = datetime.strptime(time_str, '%H:%M')
                    time_str = t.strftime('%I:%M%p').lstrip('0').lower()
                except:
                    pass

            home_cell = row.find('td', {'data-stat': 'home_team'})
            away_cell = row.find('td', {'data-stat': 'away_team'})

            if not home_cell or not away_cell:
                continue

            home_team = home_cell.find('a').text if home_cell.find('a') else home_cell.text
            away_team = away_cell.find('a').text if away_cell.find('a') else away_cell.text

            home_team = home_team.strip()
            away_team = away_team.strip()

            if not home_team or not away_team:
                continue

            venue_cell = row.find('td', {'data-stat': 'venue'})
            venue = venue_cell.text.strip() if venue_cell else ''

            home_abbrev = get_team_abbrev(home_team, 'NWSL')
            away_abbrev = get_team_abbrev(away_team, 'NWSL')
            game_id = f"nwsl_{date_str}_{away_abbrev}_{home_abbrev}".lower()

            game = Game(
                id=game_id,
                sport='NWSL',
                season=str(season),
                date=date_str,
                time=time_str,
                home_team=home_team,
                away_team=away_team,
                home_team_abbrev=home_abbrev,
                away_team_abbrev=away_abbrev,
                venue=venue,
                source='fbref.com'
            )
            games.append(game)

        except Exception as e:
            print(f"  Error parsing row: {e}")
            continue

    print(f"  Found {len(games)} games from FBref")
    return games

4. Stadium & Team Canonicalization

4.1 Canonical ID Patterns

Stadiums (per-sport, even for shared venues):

stadium_{sport}_{normalized_name}

Examples:

  • stadium_wnba_barclays_center (WNBA Liberty)
  • stadium_nba_barclays_center (NBA Nets)
  • stadium_mls_mercedes_benz_stadium
  • stadium_nwsl_providence_park

Teams:

team_{sport}_{abbrev}

Examples:

  • team_wnba_nyl (New York Liberty)
  • team_mls_atl (Atlanta United)
  • team_nwsl_por (Portland Thorns)

Games:

game_{sport}_{season}_{date}_{away}_{home}

Examples:

  • game_wnba_2025_20250518_dal_atl
  • game_mls_2025_20250301_mia_lag
  • game_nwsl_2025_20250315_por_ang

4.2 Shared Venue Handling

Critical Rule: Stadiums are per-sport entities. A physical venue shared between sports creates MULTIPLE canonical stadium records.

Example: Barclays Center

// Stadium for NBA Nets
{
  "canonical_id": "stadium_nba_barclays_center",
  "name": "Barclays Center",
  "city": "Brooklyn",
  "sport": "NBA",
  "primary_team_abbrevs": ["BRK"]
}

// Stadium for WNBA Liberty
{
  "canonical_id": "stadium_wnba_barclays_center",
  "name": "Barclays Center",
  "city": "Brooklyn",
  "sport": "WNBA",
  "primary_team_abbrevs": ["NYL"]
}

Rationale: Trip planning needs sport-specific filtering. A user planning an NBA trip shouldn't see WNBA games unless explicitly requested.

4.3 Update canonicalize_stadiums.py

Add to generate_stadiums_from_teams():

def generate_stadiums_from_teams() -> list[Stadium]:
    """Generate stadium entries from team mappings."""
    stadiums = []

    # Existing: NBA, MLB, NHL
    for abbrev, data in NBA_TEAMS.items():
        stadiums.append(create_stadium(data, 'NBA', [abbrev]))
    # ... existing MLB, NHL

    # NEW: WNBA
    for abbrev, data in WNBA_TEAMS.items():
        stadiums.append(Stadium(
            id=f"wnba_{normalize_name(data['arena'])}",
            name=data['arena'],
            city=data['city'],
            state=get_state_for_city(data['city']),
            latitude=0.0,  # Geocoded later
            longitude=0.0,
            capacity=0,
            sport='WNBA',
            team_abbrevs=[abbrev],
            source='team_mapping'
        ))

    # NEW: MLS
    for abbrev, data in MLS_TEAMS.items():
        stadiums.append(Stadium(
            id=f"mls_{normalize_name(data['stadium'])}",
            name=data['stadium'],
            city=data['city'],
            state=get_state_for_city(data['city']),
            latitude=0.0,
            longitude=0.0,
            capacity=0,
            sport='MLS',
            team_abbrevs=[abbrev],
            source='team_mapping'
        ))

    # NEW: NWSL
    for abbrev, data in NWSL_TEAMS.items():
        stadiums.append(Stadium(
            id=f"nwsl_{normalize_name(data['stadium'])}",
            name=data['stadium'],
            city=data['city'],
            state=get_state_for_city(data['city']),
            latitude=0.0,
            longitude=0.0,
            capacity=0,
            sport='NWSL',
            team_abbrevs=[abbrev],
            source='team_mapping'
        ))

    return stadiums

4.4 Update canonicalize_teams.py

Add league structure mappings:

# WNBA has no conferences/divisions in traditional sense
WNBA_DIVISIONS = {abbrev: (None, None) for abbrev in WNBA_TEAMS}

# MLS Conferences
MLS_DIVISIONS = {
    # Eastern Conference
    'ATL': ('mls_eastern', None),
    'CHI': ('mls_eastern', None),
    'CIN': ('mls_eastern', None),
    'CLB': ('mls_eastern', None),
    'DCU': ('mls_eastern', None),
    'MIA': ('mls_eastern', None),
    'MTL': ('mls_eastern', None),
    'NSH': ('mls_eastern', None),
    'NER': ('mls_eastern', None),
    'NYC': ('mls_eastern', None),
    'NYR': ('mls_eastern', None),
    'ORL': ('mls_eastern', None),
    'PHI': ('mls_eastern', None),
    'TOR': ('mls_eastern', None),
    # Western Conference
    'AUS': ('mls_western', None),
    'COL': ('mls_western', None),
    'DAL': ('mls_western', None),
    'HOU': ('mls_western', None),
    'LAG': ('mls_western', None),
    'LAF': ('mls_western', None),
    'MIN': ('mls_western', None),
    'POR': ('mls_western', None),
    'RSL': ('mls_western', None),
    'SJE': ('mls_western', None),
    'SEA': ('mls_western', None),
    'SKC': ('mls_western', None),
    'STL': ('mls_western', None),
    'VAN': ('mls_western', None),
    'SDG': ('mls_western', None),
}

# NWSL has no conferences
NWSL_DIVISIONS = {abbrev: (None, None) for abbrev in NWSL_TEAMS}

5. Local Canonical JSON Updates

5.1 stadiums_canonical.json

New entries follow existing format:

{
  "canonical_id": "stadium_wnba_barclays_center",
  "name": "Barclays Center",
  "city": "Brooklyn",
  "state": "NY",
  "latitude": 40.6826,
  "longitude": -73.9754,
  "capacity": 17732,
  "sport": "WNBA",
  "primary_team_abbrevs": ["NYL"],
  "year_opened": 2012
}

5.2 teams_canonical.json

{
  "canonical_id": "team_wnba_nyl",
  "name": "New York Liberty",
  "abbreviation": "NYL",
  "sport": "WNBA",
  "city": "Brooklyn",
  "stadium_canonical_id": "stadium_wnba_barclays_center",
  "conference_id": null,
  "division_id": null,
  "primary_color": "#6ECEB2",
  "secondary_color": "#000000"
}

5.3 games_canonical.json

{
  "canonical_id": "game_wnba_2025_20250518_dal_atl",
  "sport": "WNBA",
  "season": "2025",
  "date": "2025-05-18",
  "time": "7:30p",
  "home_team_canonical_id": "team_wnba_atl",
  "away_team_canonical_id": "team_wnba_dal",
  "stadium_canonical_id": "stadium_wnba_gateway_center_arena",
  "is_playoff": false,
  "broadcast": null
}

5.4 Validation Rules

Update validate_canonical.py:

VALID_SPORTS = {'NBA', 'MLB', 'NHL', 'WNBA', 'MLS', 'NWSL'}

def validate_sport_field(sport: str) -> list[str]:
    """Validate sport is one of the supported values."""
    errors = []
    if sport not in VALID_SPORTS:
        errors.append(f"Invalid sport: {sport}. Must be one of {VALID_SPORTS}")
    return errors

6. CloudKit Integration

6.1 Record Types (Already Exist)

No new record types needed. Existing types support new sports:

  • Stadium - add records with sport="WNBA"/"MLS"/"NWSL"
  • Team - add records with sport="WNBA"/"MLS"/"NWSL"
  • Game - add records with sport="WNBA"/"MLS"/"NWSL"
  • StadiumAlias - unchanged
  • TeamAlias - unchanged
  • LeagueStructure - add new entries for MLS conferences

6.2 Field Mapping (Unchanged)

Stadium Record:

recordName: canonical_id (e.g., "stadium_wnba_barclays_center")
fields:
  - uuid: STRING (deterministic from canonical_id)
  - name: STRING
  - city: STRING
  - state: STRING
  - latitude: DOUBLE
  - longitude: DOUBLE
  - capacity: INT64
  - sport: STRING ("WNBA", "MLS", "NWSL")
  - yearOpened: INT64
  - imageURL: STRING (optional)
  - lastModified: TIMESTAMP
  - schemaVersion: INT64

Team Record:

recordName: canonical_id (e.g., "team_wnba_nyl")
fields:
  - uuid: STRING
  - name: STRING
  - abbreviation: STRING
  - sport: STRING
  - city: STRING
  - stadiumCanonicalId: STRING (reference by canonical_id)
  - conferenceId: STRING (optional)
  - divisionId: STRING (optional)
  - primaryColor: STRING
  - secondaryColor: STRING
  - lastModified: TIMESTAMP
  - schemaVersion: INT64

Game Record:

recordName: canonical_id (e.g., "game_wnba_2025_20250518_dal_atl")
fields:
  - uuid: STRING
  - sport: STRING
  - season: STRING
  - dateTime: TIMESTAMP
  - homeTeamCanonicalId: STRING
  - awayTeamCanonicalId: STRING
  - stadiumCanonicalId: STRING
  - isPlayoff: INT64 (0 or 1)
  - broadcastInfo: STRING (optional)
  - lastModified: TIMESTAMP
  - schemaVersion: INT64

6.3 Index Requirements

Ensure CloudKit has indexes for:

  • Game: sport (sortable), dateTime (sortable, queryable)
  • Team: sport (queryable)
  • Stadium: sport (queryable)

6.4 Import Script Updates

Update cloudkit_import.py to handle new sports in validation:

VALID_SPORTS = {'NBA', 'MLB', 'NHL', 'WNBA', 'MLS', 'NWSL'}

def validate_game_record(game: dict) -> list[str]:
    errors = []
    if game.get('sport') not in VALID_SPORTS:
        errors.append(f"Invalid sport: {game.get('sport')}")
    return errors

7. App-Side Integration (SwiftUI)

7.1 Update Sport Enum

File: SportsTime/Core/Models/Domain/Sport.swift

enum Sport: String, Codable, CaseIterable, Identifiable {
    case mlb = "MLB"
    case nba = "NBA"
    case nhl = "NHL"
    case nfl = "NFL"
    case mls = "MLS"
    case wnba = "WNBA"
    case nwsl = "NWSL"

    var id: String { rawValue }

    var displayName: String {
        switch self {
        case .mlb: return "Major League Baseball"
        case .nba: return "National Basketball Association"
        case .nhl: return "National Hockey League"
        case .nfl: return "National Football League"
        case .mls: return "Major League Soccer"
        case .wnba: return "Women's National Basketball Association"
        case .nwsl: return "National Women's Soccer League"
        }
    }

    var iconName: String {
        switch self {
        case .mlb: return "baseball.fill"
        case .nba: return "basketball.fill"
        case .nhl: return "hockey.puck.fill"
        case .nfl: return "football.fill"
        case .mls: return "soccerball"
        case .wnba: return "basketball.fill"
        case .nwsl: return "soccerball"
        }
    }

    var color: Color {
        switch self {
        case .mlb: return .red
        case .nba: return .orange
        case .nhl: return .blue
        case .nfl: return .brown
        case .mls: return .green
        case .wnba: return .purple
        case .nwsl: return .pink
        }
    }

    var seasonMonths: (start: Int, end: Int) {
        switch self {
        case .mlb: return (3, 10)   // March - October
        case .nba: return (10, 6)   // October - June (wraps)
        case .nhl: return (10, 6)   // October - June (wraps)
        case .nfl: return (9, 2)    // September - February (wraps)
        case .mls: return (2, 12)   // February - December
        case .wnba: return (5, 10)  // May - October
        case .nwsl: return (3, 11)  // March - November
        }
    }

    /// Currently supported sports
    static var supported: [Sport] {
        [.mlb, .nba, .nhl, .wnba, .mls, .nwsl]
    }
}

7.2 Trip Planner - No Changes Required

The trip planner uses Sport enum and fetches games by sport. New sports automatically work because:

  1. DataProvider.fetchGames(sports:startDate:endDate:) queries by sport string
  2. Games are filtered by sportStrings.contains(canonical.sport)
  3. Route planning is sport-agnostic (uses stadium coordinates)

7.3 Stadium Tracker - No Changes Required

Stadium progress uses Stadium.sport field. New sports automatically appear in:

  • Stadium list filtering by sport
  • Progress tracking per sport

7.4 UI Considerations

Sport Selection Chips: The SportSelectionChip already uses Sport.allCases. Adding new cases automatically adds them to the UI.

Filter Sections: Update default selections if desired:

// In TripCreationViewModel
var selectedSports: Set<Sport> = [.mlb, .nba, .nhl]  // Consider adding new sports

8. Testing & Validation

8.1 Data Integrity Checks

Python validation queries (add to validate_canonical.py):

def validate_new_sports(stadiums, teams, games):
    """Validate WNBA, MLS, NWSL data integrity."""
    errors = []

    # Check all sports have stadiums
    for sport in ['WNBA', 'MLS', 'NWSL']:
        sport_stadiums = [s for s in stadiums if s['sport'] == sport]
        if not sport_stadiums:
            errors.append(f"No stadiums for {sport}")

        sport_teams = [t for t in teams if t['sport'] == sport]
        if not sport_teams:
            errors.append(f"No teams for {sport}")

        sport_games = [g for g in games if g['sport'] == sport]
        if not sport_games:
            errors.append(f"No games for {sport}")

    # Check team->stadium references
    stadium_ids = {s['canonical_id'] for s in stadiums}
    for team in teams:
        if team['stadium_canonical_id'] not in stadium_ids:
            errors.append(f"Team {team['canonical_id']} references unknown stadium {team['stadium_canonical_id']}")

    # Check game->team and game->stadium references
    team_ids = {t['canonical_id'] for t in teams}
    for game in games:
        if game['home_team_canonical_id'] not in team_ids:
            errors.append(f"Game {game['canonical_id']} references unknown home team")
        if game['away_team_canonical_id'] not in team_ids:
            errors.append(f"Game {game['canonical_id']} references unknown away team")
        if game['stadium_canonical_id'] not in stadium_ids:
            errors.append(f"Game {game['canonical_id']} references unknown stadium")

    return errors

8.2 App Smoke Tests

  1. Sport Selection:

    • Open Trip Creation
    • Verify WNBA, MLS, NWSL chips appear
    • Select each new sport
    • Verify games load for date range
  2. Trip Planning:

    • Select WNBA + dates during WNBA season
    • Verify trip results show WNBA games
    • Verify stadium locations are correct
  3. Stadium Progress:

    • Navigate to Progress tab
    • Filter by WNBA/MLS/NWSL
    • Verify stadium list shows correct venues
  4. Mixed Sport Trips:

    • Select NBA + WNBA (they share arenas)
    • Verify trips correctly handle both sports
    • Verify no duplicate stadiums in single stop

8.3 Edge Case Tests

  1. Shared Venues:

    • Create trip with MLS Atlanta United + NFL Falcons (same venue)
    • Verify games at Mercedes-Benz Stadium appear for both sports
  2. Canadian Teams (MLS/NWSL):

    • Create trip including Toronto FC
    • Verify timezone handling is correct
  3. Midweek Matches (MLS):

    • Verify Wednesday/Thursday games don't break route planning

9. Pipeline Update Summary

run_canonicalization_pipeline.py Changes

# In run_pipeline():

# STAGE 1: SCRAPING
# ... existing NBA, MLB, NHL ...

# NEW: WNBA
print_section(f"WNBA {season}")
wnba_games = scrape_wnba_basketball_reference(season)
wnba_games = assign_stable_ids(wnba_games, 'WNBA', str(season))
all_games.extend(wnba_games)
print(f"  Scraped {len(wnba_games)} WNBA games")

# NEW: MLS
print_section(f"MLS {season}")
mls_games = scrape_mls_fbref(season)
mls_games = assign_stable_ids(mls_games, 'MLS', str(season))
all_games.extend(mls_games)
print(f"  Scraped {len(mls_games)} MLS games")

# NEW: NWSL
print_section(f"NWSL {season}")
nwsl_games = scrape_nwsl_fbref(season)
nwsl_games = assign_stable_ids(nwsl_games, 'NWSL', str(season))
all_games.extend(nwsl_games)
print(f"  Scraped {len(nwsl_games)} NWSL games")

10. Checklist

Definition of Done

  • Scraping: WNBA, MLS, NWSL scrapers added and tested
  • Team Mappings: All current teams with correct abbreviations
  • Stadiums: All venues canonicalized with coordinates
  • Canonicalization: Pipeline runs without errors for new sports
  • Validation: All integrity checks pass
  • CloudKit: Records uploaded successfully
  • Swift Enum: Sport cases added with correct metadata
  • Trip Planning: New sports can be planned into trips
  • Stadium Tracking: New stadiums appear in progress
  • No Regressions: Existing MLB/NBA/NHL functionality unchanged

Files Modified

File Changes
Scripts/scrape_schedules.py Add team mappings, scrapers
Scripts/canonicalize_stadiums.py Generate new sport stadiums
Scripts/canonicalize_teams.py Add league structure mappings
Scripts/run_canonicalization_pipeline.py Add scraping calls
Scripts/validate_canonical.py Add new sport validation
Scripts/cloudkit_import.py Add sport validation
SportsTime/Core/Models/Domain/Sport.swift Add enum cases
SportsTime/Resources/stadiums_canonical.json New venue records
SportsTime/Resources/teams_canonical.json New team records
SportsTime/Resources/games_canonical.json New game records