- Remove College Football, NASCAR, and PGA from scraper and app - Clean all data files (stadiums, games, pipeline reports) - Update Sport.swift enum and all UI components - Add sportstime.py CLI tool for pipeline management - Add DATA_SCRAPING.md documentation - Add WNBA/MLS/NWSL implementation documentation - Scraper now supports: NBA, MLB, NHL, NFL, WNBA, MLS, NWSL, CBB Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1112 lines
36 KiB
Markdown
1112 lines
36 KiB
Markdown
# WNBA, MLS, and NWSL Implementation Guide
|
|
|
|
Complete end-to-end implementation for adding WNBA, MLS, and NWSL to SportsTime.
|
|
|
|
---
|
|
|
|
## 1. League Overview
|
|
|
|
### WNBA (Women's National Basketball Association)
|
|
- **Teams**: 13 (expanding to 15 by 2026)
|
|
- **Season**: May - September (regular season), September - October (playoffs)
|
|
- **Game Cadence**: ~40 games per team, 3-4 games per week
|
|
- **Special Considerations**:
|
|
- Many teams share arenas with NBA teams (key for stadium handling)
|
|
- Olympic break in summer every 4 years
|
|
- Commissioner's Cup midseason tournament
|
|
|
|
**Shared Venues (WNBA/NBA)**:
|
|
| WNBA Team | NBA Team | Arena |
|
|
|-----------|----------|-------|
|
|
| Atlanta Dream | Hawks | State Farm Arena |
|
|
| Chicago Sky | Bulls | Wintrust Arena (different) |
|
|
| Dallas Wings | Mavericks | College Park Center (different) |
|
|
| Indiana Fever | Pacers | Gainbridge Fieldhouse |
|
|
| Los Angeles Sparks | Lakers/Clippers | Crypto.com Arena |
|
|
| Minnesota Lynx | Timberwolves | Target Center |
|
|
| New York Liberty | Knicks | Barclays Center |
|
|
| Phoenix Mercury | Suns | Footprint Center |
|
|
| Washington Mystics | Wizards | Entertainment & Sports Arena (different) |
|
|
|
|
### MLS (Major League Soccer)
|
|
- **Teams**: 29 teams (2024), expanding
|
|
- **Season**: February/March - October (regular season), October - December (playoffs)
|
|
- **Game Cadence**: 34 games per team, 1-2 games per week
|
|
- **Special Considerations**:
|
|
- Some teams share NFL stadiums (Atlanta, Seattle, New England)
|
|
- Midweek matches (Wednesday/Thursday) common
|
|
- US Open Cup adds additional games
|
|
- Canadian teams (Toronto, Vancouver, Montreal) - timezone handling
|
|
|
|
**Shared Venues (MLS/NFL)**:
|
|
| MLS Team | NFL Team | Stadium |
|
|
|----------|----------|---------|
|
|
| Atlanta United | Falcons | Mercedes-Benz Stadium |
|
|
| Seattle Sounders | Seahawks | Lumen Field |
|
|
| New England Revolution | Patriots | Gillette Stadium |
|
|
|
|
### NWSL (National Women's Soccer League)
|
|
- **Teams**: 14 teams (2024)
|
|
- **Season**: March - November (regular season + playoffs)
|
|
- **Game Cadence**: 26 games per team, 1-2 games per week
|
|
- **Special Considerations**:
|
|
- Some share MLS stadiums (Portland, Orlando, Kansas City)
|
|
- Many use smaller soccer-specific venues
|
|
- Expansion teams frequently added
|
|
|
|
---
|
|
|
|
## 2. Schedule & Data Sources
|
|
|
|
### WNBA Data Sources
|
|
|
|
**Primary: Basketball-Reference (Women)**
|
|
```
|
|
URL Pattern: https://www.basketball-reference.com/wnba/years/{YEAR}_games.html
|
|
Example: https://www.basketball-reference.com/wnba/years/2025_games.html
|
|
```
|
|
|
|
**HTML Structure**:
|
|
```html
|
|
<table id="schedule">
|
|
<tbody>
|
|
<tr>
|
|
<th data-stat="date_game">Fri, May 17, 2024</th>
|
|
<td data-stat="game_start_time">7:30p</td>
|
|
<td data-stat="visitor_team_name">Dallas Wings</td>
|
|
<td data-stat="home_team_name">Atlanta Dream</td>
|
|
<td data-stat="arena_name">Gateway Center Arena</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
```
|
|
|
|
**Fields Available**: date, time, home_team, away_team, arena, attendance, box_score_link
|
|
|
|
**Secondary: ESPN WNBA**
|
|
```
|
|
URL Pattern: https://www.espn.com/wnba/schedule/_/date/{YYYYMMDD}
|
|
```
|
|
|
|
### MLS Data Sources
|
|
|
|
**Primary: FBref (Football Reference)**
|
|
```
|
|
URL Pattern: https://fbref.com/en/comps/22/{YEAR}/schedule/{YEAR}-Major-League-Soccer-Scores-and-Fixtures
|
|
Example: https://fbref.com/en/comps/22/2024/schedule/2024-Major-League-Soccer-Scores-and-Fixtures
|
|
```
|
|
|
|
**HTML Structure**:
|
|
```html
|
|
<table id="sched_2024_22_1">
|
|
<tbody>
|
|
<tr>
|
|
<td data-stat="date">2024-02-24</td>
|
|
<td data-stat="time">19:30</td>
|
|
<td data-stat="home_team">LA Galaxy</td>
|
|
<td data-stat="away_team">Inter Miami</td>
|
|
<td data-stat="venue">Dignity Health Sports Park</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
```
|
|
|
|
**Fields Available**: date, time (24hr), home_team, away_team, venue, score, attendance
|
|
|
|
**Secondary: MLS Official**
|
|
```
|
|
URL Pattern: https://www.mlssoccer.com/schedule/scores
|
|
API Endpoint: https://sportapi.mlssoccer.com/api/matches?culture=en-us&dateFrom={date}&dateTo={date}
|
|
```
|
|
|
|
### NWSL Data Sources
|
|
|
|
**Primary: FBref (NWSL)**
|
|
```
|
|
URL Pattern: https://fbref.com/en/comps/182/{YEAR}/schedule/{YEAR}-NWSL-Scores-and-Fixtures
|
|
Example: https://fbref.com/en/comps/182/2024/schedule/2024-NWSL-Scores-and-Fixtures
|
|
```
|
|
|
|
**HTML Structure**: Same as MLS (FBref standard format)
|
|
|
|
**Secondary: NWSL Official**
|
|
```
|
|
URL Pattern: https://www.nwslsoccer.com/schedule
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Schedule Parser Changes
|
|
|
|
### File: `Scripts/scrape_schedules.py`
|
|
|
|
#### 3.1 Add Team Mappings (after NHL_TEAMS ~line 180)
|
|
|
|
```python
|
|
# =============================================================================
|
|
# WNBA TEAMS
|
|
# =============================================================================
|
|
|
|
WNBA_TEAMS = {
|
|
'ATL': {'name': 'Atlanta Dream', 'city': 'Atlanta', 'arena': 'Gateway Center Arena'},
|
|
'CHI': {'name': 'Chicago Sky', 'city': 'Chicago', 'arena': 'Wintrust Arena'},
|
|
'CON': {'name': 'Connecticut Sun', 'city': 'Uncasville', 'arena': 'Mohegan Sun Arena'},
|
|
'DAL': {'name': 'Dallas Wings', 'city': 'Arlington', 'arena': 'College Park Center'},
|
|
'IND': {'name': 'Indiana Fever', 'city': 'Indianapolis', 'arena': 'Gainbridge Fieldhouse'},
|
|
'LVA': {'name': 'Las Vegas Aces', 'city': 'Las Vegas', 'arena': 'Michelob Ultra Arena'},
|
|
'LAS': {'name': 'Los Angeles Sparks', 'city': 'Los Angeles', 'arena': 'Crypto.com Arena'},
|
|
'MIN': {'name': 'Minnesota Lynx', 'city': 'Minneapolis', 'arena': 'Target Center'},
|
|
'NYL': {'name': 'New York Liberty', 'city': 'Brooklyn', 'arena': 'Barclays Center'},
|
|
'PHO': {'name': 'Phoenix Mercury', 'city': 'Phoenix', 'arena': 'Footprint Center'},
|
|
'SEA': {'name': 'Seattle Storm', 'city': 'Seattle', 'arena': 'Climate Pledge Arena'},
|
|
'WAS': {'name': 'Washington Mystics', 'city': 'Washington', 'arena': 'Entertainment & Sports Arena'},
|
|
# Expansion teams (add as announced)
|
|
'GSV': {'name': 'Golden State Valkyries', 'city': 'San Francisco', 'arena': 'Chase Center'},
|
|
'POR': {'name': 'Portland Expansion', 'city': 'Portland', 'arena': 'TBD'},
|
|
'TOR': {'name': 'Toronto Expansion', 'city': 'Toronto', 'arena': 'TBD'},
|
|
}
|
|
|
|
# =============================================================================
|
|
# MLS TEAMS
|
|
# =============================================================================
|
|
|
|
MLS_TEAMS = {
|
|
'ATL': {'name': 'Atlanta United FC', 'city': 'Atlanta', 'stadium': 'Mercedes-Benz Stadium'},
|
|
'AUS': {'name': 'Austin FC', 'city': 'Austin', 'stadium': 'Q2 Stadium'},
|
|
'CHI': {'name': 'Chicago Fire FC', 'city': 'Chicago', 'stadium': 'Soldier Field'},
|
|
'CIN': {'name': 'FC Cincinnati', 'city': 'Cincinnati', 'stadium': 'TQL Stadium'},
|
|
'CLB': {'name': 'Columbus Crew', 'city': 'Columbus', 'stadium': 'Lower.com Field'},
|
|
'COL': {'name': 'Colorado Rapids', 'city': 'Commerce City', 'stadium': 'Dick\'s Sporting Goods Park'},
|
|
'DAL': {'name': 'FC Dallas', 'city': 'Frisco', 'stadium': 'Toyota Stadium'},
|
|
'DCU': {'name': 'D.C. United', 'city': 'Washington', 'stadium': 'Audi Field'},
|
|
'HOU': {'name': 'Houston Dynamo FC', 'city': 'Houston', 'stadium': 'Shell Energy Stadium'},
|
|
'LAG': {'name': 'LA Galaxy', 'city': 'Carson', 'stadium': 'Dignity Health Sports Park'},
|
|
'LAF': {'name': 'Los Angeles FC', 'city': 'Los Angeles', 'stadium': 'BMO Stadium'},
|
|
'MIA': {'name': 'Inter Miami CF', 'city': 'Fort Lauderdale', 'stadium': 'Chase Stadium'},
|
|
'MIN': {'name': 'Minnesota United FC', 'city': 'Saint Paul', 'stadium': 'Allianz Field'},
|
|
'MTL': {'name': 'CF Montréal', 'city': 'Montreal', 'stadium': 'Stade Saputo'},
|
|
'NSH': {'name': 'Nashville SC', 'city': 'Nashville', 'stadium': 'Geodis Park'},
|
|
'NER': {'name': 'New England Revolution', 'city': 'Foxborough', 'stadium': 'Gillette Stadium'},
|
|
'NYC': {'name': 'New York City FC', 'city': 'New York', 'stadium': 'Yankee Stadium'},
|
|
'NYR': {'name': 'New York Red Bulls', 'city': 'Harrison', 'stadium': 'Red Bull Arena'},
|
|
'ORL': {'name': 'Orlando City SC', 'city': 'Orlando', 'stadium': 'Exploria Stadium'},
|
|
'PHI': {'name': 'Philadelphia Union', 'city': 'Chester', 'stadium': 'Subaru Park'},
|
|
'POR': {'name': 'Portland Timbers', 'city': 'Portland', 'stadium': 'Providence Park'},
|
|
'RSL': {'name': 'Real Salt Lake', 'city': 'Sandy', 'stadium': 'America First Field'},
|
|
'SJE': {'name': 'San Jose Earthquakes', 'city': 'San Jose', 'stadium': 'PayPal Park'},
|
|
'SEA': {'name': 'Seattle Sounders FC', 'city': 'Seattle', 'stadium': 'Lumen Field'},
|
|
'SKC': {'name': 'Sporting Kansas City', 'city': 'Kansas City', 'stadium': 'Children\'s Mercy Park'},
|
|
'STL': {'name': 'St. Louis City SC', 'city': 'St. Louis', 'stadium': 'CityPark'},
|
|
'TOR': {'name': 'Toronto FC', 'city': 'Toronto', 'stadium': 'BMO Field'},
|
|
'VAN': {'name': 'Vancouver Whitecaps FC', 'city': 'Vancouver', 'stadium': 'BC Place'},
|
|
'SDG': {'name': 'San Diego FC', 'city': 'San Diego', 'stadium': 'Snapdragon Stadium'}, # 2025 expansion
|
|
}
|
|
|
|
# =============================================================================
|
|
# NWSL TEAMS
|
|
# =============================================================================
|
|
|
|
NWSL_TEAMS = {
|
|
'ANG': {'name': 'Angel City FC', 'city': 'Los Angeles', 'stadium': 'BMO Stadium'},
|
|
'CHI': {'name': 'Chicago Red Stars', 'city': 'Chicago', 'stadium': 'SeatGeek Stadium'},
|
|
'HOU': {'name': 'Houston Dash', 'city': 'Houston', 'stadium': 'Shell Energy Stadium'},
|
|
'KCC': {'name': 'Kansas City Current', 'city': 'Kansas City', 'stadium': 'CPKC Stadium'},
|
|
'LOU': {'name': 'Racing Louisville FC', 'city': 'Louisville', 'stadium': 'Lynn Family Stadium'},
|
|
'NCC': {'name': 'North Carolina Courage', 'city': 'Cary', 'stadium': 'WakeMed Soccer Park'},
|
|
'NJG': {'name': 'NJ/NY Gotham FC', 'city': 'Harrison', 'stadium': 'Red Bull Arena'},
|
|
'ORL': {'name': 'Orlando Pride', 'city': 'Orlando', 'stadium': 'Exploria Stadium'},
|
|
'POR': {'name': 'Portland Thorns FC', 'city': 'Portland', 'stadium': 'Providence Park'},
|
|
'SDW': {'name': 'San Diego Wave FC', 'city': 'San Diego', 'stadium': 'Snapdragon Stadium'},
|
|
'SEA': {'name': 'Seattle Reign FC', 'city': 'Seattle', 'stadium': 'Lumen Field'},
|
|
'UTA': {'name': 'Utah Royals FC', 'city': 'Sandy', 'stadium': 'America First Field'},
|
|
'WAS': {'name': 'Washington Spirit', 'city': 'Washington', 'stadium': 'Audi Field'},
|
|
'BAY': {'name': 'Bay FC', 'city': 'San Francisco', 'stadium': 'PayPal Park'}, # 2024 expansion
|
|
}
|
|
```
|
|
|
|
#### 3.2 Update `get_team_abbrev()` Function
|
|
|
|
```python
|
|
def get_team_abbrev(team_name: str, sport: str) -> str:
|
|
"""Get team abbreviation from full name."""
|
|
team_maps = {
|
|
'NBA': NBA_TEAMS,
|
|
'MLB': MLB_TEAMS,
|
|
'NHL': NHL_TEAMS,
|
|
'WNBA': WNBA_TEAMS,
|
|
'MLS': MLS_TEAMS,
|
|
'NWSL': NWSL_TEAMS,
|
|
}
|
|
|
|
teams = team_maps.get(sport, {})
|
|
|
|
# Direct match on abbreviation
|
|
for abbrev, data in teams.items():
|
|
if team_name.lower() == data['name'].lower():
|
|
return abbrev
|
|
# Partial match (e.g., "Hawks" matches "Atlanta Hawks")
|
|
if team_name.lower() in data['name'].lower():
|
|
return abbrev
|
|
|
|
# Fallback: first 3 characters
|
|
return team_name[:3].upper()
|
|
```
|
|
|
|
#### 3.3 Add WNBA Scraper
|
|
|
|
```python
|
|
def scrape_wnba_basketball_reference(season: int) -> list[Game]:
|
|
"""
|
|
Scrape WNBA schedule from Basketball-Reference.
|
|
URL: https://www.basketball-reference.com/wnba/years/{YEAR}_games.html
|
|
Season year is the calendar year (e.g., 2025 for 2025 season)
|
|
"""
|
|
games = []
|
|
url = f"https://www.basketball-reference.com/wnba/years/{season}_games.html"
|
|
|
|
print(f"Scraping WNBA {season} from Basketball-Reference...")
|
|
soup = fetch_page(url, 'basketball-reference.com')
|
|
|
|
if not soup:
|
|
return games
|
|
|
|
table = soup.find('table', {'id': 'schedule'})
|
|
if not table:
|
|
print(" No schedule table found")
|
|
return games
|
|
|
|
tbody = table.find('tbody')
|
|
if not tbody:
|
|
return games
|
|
|
|
for row in tbody.find_all('tr'):
|
|
if row.get('class') and 'thead' in row.get('class'):
|
|
continue
|
|
|
|
try:
|
|
# Parse date
|
|
date_cell = row.find('th', {'data-stat': 'date_game'})
|
|
if not date_cell:
|
|
continue
|
|
date_link = date_cell.find('a')
|
|
date_str = date_link.text if date_link else date_cell.text
|
|
|
|
# Parse time
|
|
time_cell = row.find('td', {'data-stat': 'game_start_time'})
|
|
time_str = time_cell.text.strip() if time_cell else None
|
|
|
|
# Parse teams
|
|
visitor_cell = row.find('td', {'data-stat': 'visitor_team_name'})
|
|
home_cell = row.find('td', {'data-stat': 'home_team_name'})
|
|
|
|
if not visitor_cell or not home_cell:
|
|
continue
|
|
|
|
away_team = visitor_cell.find('a').text if visitor_cell.find('a') else visitor_cell.text
|
|
home_team = home_cell.find('a').text if home_cell.find('a') else home_cell.text
|
|
|
|
# Parse arena
|
|
arena_cell = row.find('td', {'data-stat': 'arena_name'})
|
|
arena = arena_cell.text.strip() if arena_cell else ''
|
|
|
|
# Convert date (format: "Sat, May 18, 2024")
|
|
try:
|
|
parsed_date = datetime.strptime(date_str.strip(), '%a, %b %d, %Y')
|
|
date_formatted = parsed_date.strftime('%Y-%m-%d')
|
|
except:
|
|
continue
|
|
|
|
# Generate game ID
|
|
home_abbrev = get_team_abbrev(home_team, 'WNBA')
|
|
away_abbrev = get_team_abbrev(away_team, 'WNBA')
|
|
game_id = f"wnba_{date_formatted}_{away_abbrev}_{home_abbrev}".lower()
|
|
|
|
game = Game(
|
|
id=game_id,
|
|
sport='WNBA',
|
|
season=str(season),
|
|
date=date_formatted,
|
|
time=time_str,
|
|
home_team=home_team,
|
|
away_team=away_team,
|
|
home_team_abbrev=home_abbrev,
|
|
away_team_abbrev=away_abbrev,
|
|
venue=arena,
|
|
source='basketball-reference.com'
|
|
)
|
|
games.append(game)
|
|
|
|
except Exception as e:
|
|
print(f" Error parsing row: {e}")
|
|
continue
|
|
|
|
print(f" Found {len(games)} games from Basketball-Reference")
|
|
return games
|
|
```
|
|
|
|
#### 3.4 Add MLS Scraper
|
|
|
|
```python
|
|
def scrape_mls_fbref(season: int) -> list[Game]:
|
|
"""
|
|
Scrape MLS schedule from FBref.
|
|
URL: https://fbref.com/en/comps/22/{YEAR}/schedule/{YEAR}-Major-League-Soccer-Scores-and-Fixtures
|
|
"""
|
|
games = []
|
|
url = f"https://fbref.com/en/comps/22/{season}/schedule/{season}-Major-League-Soccer-Scores-and-Fixtures"
|
|
|
|
print(f"Scraping MLS {season} from FBref...")
|
|
soup = fetch_page(url, 'fbref.com')
|
|
|
|
if not soup:
|
|
return games
|
|
|
|
# FBref uses table with id like sched_{year}_22_1
|
|
table = soup.find('table', {'id': lambda x: x and 'sched_' in x})
|
|
if not table:
|
|
print(" No schedule table found")
|
|
return games
|
|
|
|
tbody = table.find('tbody')
|
|
if not tbody:
|
|
return games
|
|
|
|
for row in tbody.find_all('tr'):
|
|
try:
|
|
# Parse date (format: 2024-02-24)
|
|
date_cell = row.find('td', {'data-stat': 'date'})
|
|
if not date_cell:
|
|
continue
|
|
date_str = date_cell.text.strip()
|
|
|
|
# Parse time (24hr format: 19:30)
|
|
time_cell = row.find('td', {'data-stat': 'time'})
|
|
time_str = time_cell.text.strip() if time_cell else None
|
|
|
|
# Convert 24hr to 12hr format for consistency
|
|
if time_str:
|
|
try:
|
|
t = datetime.strptime(time_str, '%H:%M')
|
|
time_str = t.strftime('%I:%M%p').lstrip('0').lower()
|
|
except:
|
|
pass
|
|
|
|
# Parse teams
|
|
home_cell = row.find('td', {'data-stat': 'home_team'})
|
|
away_cell = row.find('td', {'data-stat': 'away_team'})
|
|
|
|
if not home_cell or not away_cell:
|
|
continue
|
|
|
|
home_team = home_cell.find('a').text if home_cell.find('a') else home_cell.text
|
|
away_team = away_cell.find('a').text if away_cell.find('a') else away_cell.text
|
|
|
|
home_team = home_team.strip()
|
|
away_team = away_team.strip()
|
|
|
|
if not home_team or not away_team:
|
|
continue
|
|
|
|
# Parse venue
|
|
venue_cell = row.find('td', {'data-stat': 'venue'})
|
|
venue = venue_cell.text.strip() if venue_cell else ''
|
|
|
|
# Generate game ID
|
|
home_abbrev = get_team_abbrev(home_team, 'MLS')
|
|
away_abbrev = get_team_abbrev(away_team, 'MLS')
|
|
game_id = f"mls_{date_str}_{away_abbrev}_{home_abbrev}".lower()
|
|
|
|
game = Game(
|
|
id=game_id,
|
|
sport='MLS',
|
|
season=str(season),
|
|
date=date_str,
|
|
time=time_str,
|
|
home_team=home_team,
|
|
away_team=away_team,
|
|
home_team_abbrev=home_abbrev,
|
|
away_team_abbrev=away_abbrev,
|
|
venue=venue,
|
|
source='fbref.com'
|
|
)
|
|
games.append(game)
|
|
|
|
except Exception as e:
|
|
print(f" Error parsing row: {e}")
|
|
continue
|
|
|
|
print(f" Found {len(games)} games from FBref")
|
|
return games
|
|
```
|
|
|
|
#### 3.5 Add NWSL Scraper
|
|
|
|
```python
|
|
def scrape_nwsl_fbref(season: int) -> list[Game]:
|
|
"""
|
|
Scrape NWSL schedule from FBref.
|
|
URL: https://fbref.com/en/comps/182/{YEAR}/schedule/{YEAR}-NWSL-Scores-and-Fixtures
|
|
"""
|
|
games = []
|
|
url = f"https://fbref.com/en/comps/182/{season}/schedule/{season}-NWSL-Scores-and-Fixtures"
|
|
|
|
print(f"Scraping NWSL {season} from FBref...")
|
|
soup = fetch_page(url, 'fbref.com')
|
|
|
|
if not soup:
|
|
return games
|
|
|
|
table = soup.find('table', {'id': lambda x: x and 'sched_' in x})
|
|
if not table:
|
|
print(" No schedule table found")
|
|
return games
|
|
|
|
tbody = table.find('tbody')
|
|
if not tbody:
|
|
return games
|
|
|
|
for row in tbody.find_all('tr'):
|
|
try:
|
|
date_cell = row.find('td', {'data-stat': 'date'})
|
|
if not date_cell:
|
|
continue
|
|
date_str = date_cell.text.strip()
|
|
|
|
time_cell = row.find('td', {'data-stat': 'time'})
|
|
time_str = time_cell.text.strip() if time_cell else None
|
|
|
|
if time_str:
|
|
try:
|
|
t = datetime.strptime(time_str, '%H:%M')
|
|
time_str = t.strftime('%I:%M%p').lstrip('0').lower()
|
|
except:
|
|
pass
|
|
|
|
home_cell = row.find('td', {'data-stat': 'home_team'})
|
|
away_cell = row.find('td', {'data-stat': 'away_team'})
|
|
|
|
if not home_cell or not away_cell:
|
|
continue
|
|
|
|
home_team = home_cell.find('a').text if home_cell.find('a') else home_cell.text
|
|
away_team = away_cell.find('a').text if away_cell.find('a') else away_cell.text
|
|
|
|
home_team = home_team.strip()
|
|
away_team = away_team.strip()
|
|
|
|
if not home_team or not away_team:
|
|
continue
|
|
|
|
venue_cell = row.find('td', {'data-stat': 'venue'})
|
|
venue = venue_cell.text.strip() if venue_cell else ''
|
|
|
|
home_abbrev = get_team_abbrev(home_team, 'NWSL')
|
|
away_abbrev = get_team_abbrev(away_team, 'NWSL')
|
|
game_id = f"nwsl_{date_str}_{away_abbrev}_{home_abbrev}".lower()
|
|
|
|
game = Game(
|
|
id=game_id,
|
|
sport='NWSL',
|
|
season=str(season),
|
|
date=date_str,
|
|
time=time_str,
|
|
home_team=home_team,
|
|
away_team=away_team,
|
|
home_team_abbrev=home_abbrev,
|
|
away_team_abbrev=away_abbrev,
|
|
venue=venue,
|
|
source='fbref.com'
|
|
)
|
|
games.append(game)
|
|
|
|
except Exception as e:
|
|
print(f" Error parsing row: {e}")
|
|
continue
|
|
|
|
print(f" Found {len(games)} games from FBref")
|
|
return games
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Stadium & Team Canonicalization
|
|
|
|
### 4.1 Canonical ID Patterns
|
|
|
|
**Stadiums** (per-sport, even for shared venues):
|
|
```
|
|
stadium_{sport}_{normalized_name}
|
|
```
|
|
|
|
Examples:
|
|
- `stadium_wnba_barclays_center` (WNBA Liberty)
|
|
- `stadium_nba_barclays_center` (NBA Nets)
|
|
- `stadium_mls_mercedes_benz_stadium`
|
|
- `stadium_nwsl_providence_park`
|
|
|
|
**Teams**:
|
|
```
|
|
team_{sport}_{abbrev}
|
|
```
|
|
|
|
Examples:
|
|
- `team_wnba_nyl` (New York Liberty)
|
|
- `team_mls_atl` (Atlanta United)
|
|
- `team_nwsl_por` (Portland Thorns)
|
|
|
|
**Games**:
|
|
```
|
|
game_{sport}_{season}_{date}_{away}_{home}
|
|
```
|
|
|
|
Examples:
|
|
- `game_wnba_2025_20250518_dal_atl`
|
|
- `game_mls_2025_20250301_mia_lag`
|
|
- `game_nwsl_2025_20250315_por_ang`
|
|
|
|
### 4.2 Shared Venue Handling
|
|
|
|
**Critical Rule**: Stadiums are per-sport entities. A physical venue shared between sports creates MULTIPLE canonical stadium records.
|
|
|
|
**Example: Barclays Center**
|
|
```json
|
|
// Stadium for NBA Nets
|
|
{
|
|
"canonical_id": "stadium_nba_barclays_center",
|
|
"name": "Barclays Center",
|
|
"city": "Brooklyn",
|
|
"sport": "NBA",
|
|
"primary_team_abbrevs": ["BRK"]
|
|
}
|
|
|
|
// Stadium for WNBA Liberty
|
|
{
|
|
"canonical_id": "stadium_wnba_barclays_center",
|
|
"name": "Barclays Center",
|
|
"city": "Brooklyn",
|
|
"sport": "WNBA",
|
|
"primary_team_abbrevs": ["NYL"]
|
|
}
|
|
```
|
|
|
|
**Rationale**: Trip planning needs sport-specific filtering. A user planning an NBA trip shouldn't see WNBA games unless explicitly requested.
|
|
|
|
### 4.3 Update `canonicalize_stadiums.py`
|
|
|
|
Add to `generate_stadiums_from_teams()`:
|
|
|
|
```python
|
|
def generate_stadiums_from_teams() -> list[Stadium]:
|
|
"""Generate stadium entries from team mappings."""
|
|
stadiums = []
|
|
|
|
# Existing: NBA, MLB, NHL
|
|
for abbrev, data in NBA_TEAMS.items():
|
|
stadiums.append(create_stadium(data, 'NBA', [abbrev]))
|
|
# ... existing MLB, NHL
|
|
|
|
# NEW: WNBA
|
|
for abbrev, data in WNBA_TEAMS.items():
|
|
stadiums.append(Stadium(
|
|
id=f"wnba_{normalize_name(data['arena'])}",
|
|
name=data['arena'],
|
|
city=data['city'],
|
|
state=get_state_for_city(data['city']),
|
|
latitude=0.0, # Geocoded later
|
|
longitude=0.0,
|
|
capacity=0,
|
|
sport='WNBA',
|
|
team_abbrevs=[abbrev],
|
|
source='team_mapping'
|
|
))
|
|
|
|
# NEW: MLS
|
|
for abbrev, data in MLS_TEAMS.items():
|
|
stadiums.append(Stadium(
|
|
id=f"mls_{normalize_name(data['stadium'])}",
|
|
name=data['stadium'],
|
|
city=data['city'],
|
|
state=get_state_for_city(data['city']),
|
|
latitude=0.0,
|
|
longitude=0.0,
|
|
capacity=0,
|
|
sport='MLS',
|
|
team_abbrevs=[abbrev],
|
|
source='team_mapping'
|
|
))
|
|
|
|
# NEW: NWSL
|
|
for abbrev, data in NWSL_TEAMS.items():
|
|
stadiums.append(Stadium(
|
|
id=f"nwsl_{normalize_name(data['stadium'])}",
|
|
name=data['stadium'],
|
|
city=data['city'],
|
|
state=get_state_for_city(data['city']),
|
|
latitude=0.0,
|
|
longitude=0.0,
|
|
capacity=0,
|
|
sport='NWSL',
|
|
team_abbrevs=[abbrev],
|
|
source='team_mapping'
|
|
))
|
|
|
|
return stadiums
|
|
```
|
|
|
|
### 4.4 Update `canonicalize_teams.py`
|
|
|
|
Add league structure mappings:
|
|
|
|
```python
|
|
# WNBA has no conferences/divisions in traditional sense
|
|
WNBA_DIVISIONS = {abbrev: (None, None) for abbrev in WNBA_TEAMS}
|
|
|
|
# MLS Conferences
|
|
MLS_DIVISIONS = {
|
|
# Eastern Conference
|
|
'ATL': ('mls_eastern', None),
|
|
'CHI': ('mls_eastern', None),
|
|
'CIN': ('mls_eastern', None),
|
|
'CLB': ('mls_eastern', None),
|
|
'DCU': ('mls_eastern', None),
|
|
'MIA': ('mls_eastern', None),
|
|
'MTL': ('mls_eastern', None),
|
|
'NSH': ('mls_eastern', None),
|
|
'NER': ('mls_eastern', None),
|
|
'NYC': ('mls_eastern', None),
|
|
'NYR': ('mls_eastern', None),
|
|
'ORL': ('mls_eastern', None),
|
|
'PHI': ('mls_eastern', None),
|
|
'TOR': ('mls_eastern', None),
|
|
# Western Conference
|
|
'AUS': ('mls_western', None),
|
|
'COL': ('mls_western', None),
|
|
'DAL': ('mls_western', None),
|
|
'HOU': ('mls_western', None),
|
|
'LAG': ('mls_western', None),
|
|
'LAF': ('mls_western', None),
|
|
'MIN': ('mls_western', None),
|
|
'POR': ('mls_western', None),
|
|
'RSL': ('mls_western', None),
|
|
'SJE': ('mls_western', None),
|
|
'SEA': ('mls_western', None),
|
|
'SKC': ('mls_western', None),
|
|
'STL': ('mls_western', None),
|
|
'VAN': ('mls_western', None),
|
|
'SDG': ('mls_western', None),
|
|
}
|
|
|
|
# NWSL has no conferences
|
|
NWSL_DIVISIONS = {abbrev: (None, None) for abbrev in NWSL_TEAMS}
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Local Canonical JSON Updates
|
|
|
|
### 5.1 stadiums_canonical.json
|
|
|
|
New entries follow existing format:
|
|
|
|
```json
|
|
{
|
|
"canonical_id": "stadium_wnba_barclays_center",
|
|
"name": "Barclays Center",
|
|
"city": "Brooklyn",
|
|
"state": "NY",
|
|
"latitude": 40.6826,
|
|
"longitude": -73.9754,
|
|
"capacity": 17732,
|
|
"sport": "WNBA",
|
|
"primary_team_abbrevs": ["NYL"],
|
|
"year_opened": 2012
|
|
}
|
|
```
|
|
|
|
### 5.2 teams_canonical.json
|
|
|
|
```json
|
|
{
|
|
"canonical_id": "team_wnba_nyl",
|
|
"name": "New York Liberty",
|
|
"abbreviation": "NYL",
|
|
"sport": "WNBA",
|
|
"city": "Brooklyn",
|
|
"stadium_canonical_id": "stadium_wnba_barclays_center",
|
|
"conference_id": null,
|
|
"division_id": null,
|
|
"primary_color": "#6ECEB2",
|
|
"secondary_color": "#000000"
|
|
}
|
|
```
|
|
|
|
### 5.3 games_canonical.json
|
|
|
|
```json
|
|
{
|
|
"canonical_id": "game_wnba_2025_20250518_dal_atl",
|
|
"sport": "WNBA",
|
|
"season": "2025",
|
|
"date": "2025-05-18",
|
|
"time": "7:30p",
|
|
"home_team_canonical_id": "team_wnba_atl",
|
|
"away_team_canonical_id": "team_wnba_dal",
|
|
"stadium_canonical_id": "stadium_wnba_gateway_center_arena",
|
|
"is_playoff": false,
|
|
"broadcast": null
|
|
}
|
|
```
|
|
|
|
### 5.4 Validation Rules
|
|
|
|
Update `validate_canonical.py`:
|
|
|
|
```python
|
|
VALID_SPORTS = {'NBA', 'MLB', 'NHL', 'WNBA', 'MLS', 'NWSL'}
|
|
|
|
def validate_sport_field(sport: str) -> list[str]:
|
|
"""Validate sport is one of the supported values."""
|
|
errors = []
|
|
if sport not in VALID_SPORTS:
|
|
errors.append(f"Invalid sport: {sport}. Must be one of {VALID_SPORTS}")
|
|
return errors
|
|
```
|
|
|
|
---
|
|
|
|
## 6. CloudKit Integration
|
|
|
|
### 6.1 Record Types (Already Exist)
|
|
|
|
No new record types needed. Existing types support new sports:
|
|
|
|
- `Stadium` - add records with sport="WNBA"/"MLS"/"NWSL"
|
|
- `Team` - add records with sport="WNBA"/"MLS"/"NWSL"
|
|
- `Game` - add records with sport="WNBA"/"MLS"/"NWSL"
|
|
- `StadiumAlias` - unchanged
|
|
- `TeamAlias` - unchanged
|
|
- `LeagueStructure` - add new entries for MLS conferences
|
|
|
|
### 6.2 Field Mapping (Unchanged)
|
|
|
|
**Stadium Record**:
|
|
```
|
|
recordName: canonical_id (e.g., "stadium_wnba_barclays_center")
|
|
fields:
|
|
- uuid: STRING (deterministic from canonical_id)
|
|
- name: STRING
|
|
- city: STRING
|
|
- state: STRING
|
|
- latitude: DOUBLE
|
|
- longitude: DOUBLE
|
|
- capacity: INT64
|
|
- sport: STRING ("WNBA", "MLS", "NWSL")
|
|
- yearOpened: INT64
|
|
- imageURL: STRING (optional)
|
|
- lastModified: TIMESTAMP
|
|
- schemaVersion: INT64
|
|
```
|
|
|
|
**Team Record**:
|
|
```
|
|
recordName: canonical_id (e.g., "team_wnba_nyl")
|
|
fields:
|
|
- uuid: STRING
|
|
- name: STRING
|
|
- abbreviation: STRING
|
|
- sport: STRING
|
|
- city: STRING
|
|
- stadiumCanonicalId: STRING (reference by canonical_id)
|
|
- conferenceId: STRING (optional)
|
|
- divisionId: STRING (optional)
|
|
- primaryColor: STRING
|
|
- secondaryColor: STRING
|
|
- lastModified: TIMESTAMP
|
|
- schemaVersion: INT64
|
|
```
|
|
|
|
**Game Record**:
|
|
```
|
|
recordName: canonical_id (e.g., "game_wnba_2025_20250518_dal_atl")
|
|
fields:
|
|
- uuid: STRING
|
|
- sport: STRING
|
|
- season: STRING
|
|
- dateTime: TIMESTAMP
|
|
- homeTeamCanonicalId: STRING
|
|
- awayTeamCanonicalId: STRING
|
|
- stadiumCanonicalId: STRING
|
|
- isPlayoff: INT64 (0 or 1)
|
|
- broadcastInfo: STRING (optional)
|
|
- lastModified: TIMESTAMP
|
|
- schemaVersion: INT64
|
|
```
|
|
|
|
### 6.3 Index Requirements
|
|
|
|
Ensure CloudKit has indexes for:
|
|
- `Game`: `sport` (sortable), `dateTime` (sortable, queryable)
|
|
- `Team`: `sport` (queryable)
|
|
- `Stadium`: `sport` (queryable)
|
|
|
|
### 6.4 Import Script Updates
|
|
|
|
Update `cloudkit_import.py` to handle new sports in validation:
|
|
|
|
```python
|
|
VALID_SPORTS = {'NBA', 'MLB', 'NHL', 'WNBA', 'MLS', 'NWSL'}
|
|
|
|
def validate_game_record(game: dict) -> list[str]:
|
|
errors = []
|
|
if game.get('sport') not in VALID_SPORTS:
|
|
errors.append(f"Invalid sport: {game.get('sport')}")
|
|
return errors
|
|
```
|
|
|
|
---
|
|
|
|
## 7. App-Side Integration (SwiftUI)
|
|
|
|
### 7.1 Update Sport Enum
|
|
|
|
**File**: `SportsTime/Core/Models/Domain/Sport.swift`
|
|
|
|
```swift
|
|
enum Sport: String, Codable, CaseIterable, Identifiable {
|
|
case mlb = "MLB"
|
|
case nba = "NBA"
|
|
case nhl = "NHL"
|
|
case nfl = "NFL"
|
|
case mls = "MLS"
|
|
case wnba = "WNBA"
|
|
case nwsl = "NWSL"
|
|
|
|
var id: String { rawValue }
|
|
|
|
var displayName: String {
|
|
switch self {
|
|
case .mlb: return "Major League Baseball"
|
|
case .nba: return "National Basketball Association"
|
|
case .nhl: return "National Hockey League"
|
|
case .nfl: return "National Football League"
|
|
case .mls: return "Major League Soccer"
|
|
case .wnba: return "Women's National Basketball Association"
|
|
case .nwsl: return "National Women's Soccer League"
|
|
}
|
|
}
|
|
|
|
var iconName: String {
|
|
switch self {
|
|
case .mlb: return "baseball.fill"
|
|
case .nba: return "basketball.fill"
|
|
case .nhl: return "hockey.puck.fill"
|
|
case .nfl: return "football.fill"
|
|
case .mls: return "soccerball"
|
|
case .wnba: return "basketball.fill"
|
|
case .nwsl: return "soccerball"
|
|
}
|
|
}
|
|
|
|
var color: Color {
|
|
switch self {
|
|
case .mlb: return .red
|
|
case .nba: return .orange
|
|
case .nhl: return .blue
|
|
case .nfl: return .brown
|
|
case .mls: return .green
|
|
case .wnba: return .purple
|
|
case .nwsl: return .pink
|
|
}
|
|
}
|
|
|
|
var seasonMonths: (start: Int, end: Int) {
|
|
switch self {
|
|
case .mlb: return (3, 10) // March - October
|
|
case .nba: return (10, 6) // October - June (wraps)
|
|
case .nhl: return (10, 6) // October - June (wraps)
|
|
case .nfl: return (9, 2) // September - February (wraps)
|
|
case .mls: return (2, 12) // February - December
|
|
case .wnba: return (5, 10) // May - October
|
|
case .nwsl: return (3, 11) // March - November
|
|
}
|
|
}
|
|
|
|
/// Currently supported sports
|
|
static var supported: [Sport] {
|
|
[.mlb, .nba, .nhl, .wnba, .mls, .nwsl]
|
|
}
|
|
}
|
|
```
|
|
|
|
### 7.2 Trip Planner - No Changes Required
|
|
|
|
The trip planner uses `Sport` enum and fetches games by sport. New sports automatically work because:
|
|
|
|
1. `DataProvider.fetchGames(sports:startDate:endDate:)` queries by sport string
|
|
2. Games are filtered by `sportStrings.contains(canonical.sport)`
|
|
3. Route planning is sport-agnostic (uses stadium coordinates)
|
|
|
|
### 7.3 Stadium Tracker - No Changes Required
|
|
|
|
Stadium progress uses `Stadium.sport` field. New sports automatically appear in:
|
|
- Stadium list filtering by sport
|
|
- Progress tracking per sport
|
|
|
|
### 7.4 UI Considerations
|
|
|
|
**Sport Selection Chips**: The `SportSelectionChip` already uses `Sport.allCases`. Adding new cases automatically adds them to the UI.
|
|
|
|
**Filter Sections**: Update default selections if desired:
|
|
```swift
|
|
// In TripCreationViewModel
|
|
var selectedSports: Set<Sport> = [.mlb, .nba, .nhl] // Consider adding new sports
|
|
```
|
|
|
|
---
|
|
|
|
## 8. Testing & Validation
|
|
|
|
### 8.1 Data Integrity Checks
|
|
|
|
**Python validation queries** (add to `validate_canonical.py`):
|
|
|
|
```python
|
|
def validate_new_sports(stadiums, teams, games):
|
|
"""Validate WNBA, MLS, NWSL data integrity."""
|
|
errors = []
|
|
|
|
# Check all sports have stadiums
|
|
for sport in ['WNBA', 'MLS', 'NWSL']:
|
|
sport_stadiums = [s for s in stadiums if s['sport'] == sport]
|
|
if not sport_stadiums:
|
|
errors.append(f"No stadiums for {sport}")
|
|
|
|
sport_teams = [t for t in teams if t['sport'] == sport]
|
|
if not sport_teams:
|
|
errors.append(f"No teams for {sport}")
|
|
|
|
sport_games = [g for g in games if g['sport'] == sport]
|
|
if not sport_games:
|
|
errors.append(f"No games for {sport}")
|
|
|
|
# Check team->stadium references
|
|
stadium_ids = {s['canonical_id'] for s in stadiums}
|
|
for team in teams:
|
|
if team['stadium_canonical_id'] not in stadium_ids:
|
|
errors.append(f"Team {team['canonical_id']} references unknown stadium {team['stadium_canonical_id']}")
|
|
|
|
# Check game->team and game->stadium references
|
|
team_ids = {t['canonical_id'] for t in teams}
|
|
for game in games:
|
|
if game['home_team_canonical_id'] not in team_ids:
|
|
errors.append(f"Game {game['canonical_id']} references unknown home team")
|
|
if game['away_team_canonical_id'] not in team_ids:
|
|
errors.append(f"Game {game['canonical_id']} references unknown away team")
|
|
if game['stadium_canonical_id'] not in stadium_ids:
|
|
errors.append(f"Game {game['canonical_id']} references unknown stadium")
|
|
|
|
return errors
|
|
```
|
|
|
|
### 8.2 App Smoke Tests
|
|
|
|
1. **Sport Selection**:
|
|
- Open Trip Creation
|
|
- Verify WNBA, MLS, NWSL chips appear
|
|
- Select each new sport
|
|
- Verify games load for date range
|
|
|
|
2. **Trip Planning**:
|
|
- Select WNBA + dates during WNBA season
|
|
- Verify trip results show WNBA games
|
|
- Verify stadium locations are correct
|
|
|
|
3. **Stadium Progress**:
|
|
- Navigate to Progress tab
|
|
- Filter by WNBA/MLS/NWSL
|
|
- Verify stadium list shows correct venues
|
|
|
|
4. **Mixed Sport Trips**:
|
|
- Select NBA + WNBA (they share arenas)
|
|
- Verify trips correctly handle both sports
|
|
- Verify no duplicate stadiums in single stop
|
|
|
|
### 8.3 Edge Case Tests
|
|
|
|
1. **Shared Venues**:
|
|
- Create trip with MLS Atlanta United + NFL Falcons (same venue)
|
|
- Verify games at Mercedes-Benz Stadium appear for both sports
|
|
|
|
2. **Canadian Teams** (MLS/NWSL):
|
|
- Create trip including Toronto FC
|
|
- Verify timezone handling is correct
|
|
|
|
3. **Midweek Matches** (MLS):
|
|
- Verify Wednesday/Thursday games don't break route planning
|
|
|
|
---
|
|
|
|
## 9. Pipeline Update Summary
|
|
|
|
### run_canonicalization_pipeline.py Changes
|
|
|
|
```python
|
|
# In run_pipeline():
|
|
|
|
# STAGE 1: SCRAPING
|
|
# ... existing NBA, MLB, NHL ...
|
|
|
|
# NEW: WNBA
|
|
print_section(f"WNBA {season}")
|
|
wnba_games = scrape_wnba_basketball_reference(season)
|
|
wnba_games = assign_stable_ids(wnba_games, 'WNBA', str(season))
|
|
all_games.extend(wnba_games)
|
|
print(f" Scraped {len(wnba_games)} WNBA games")
|
|
|
|
# NEW: MLS
|
|
print_section(f"MLS {season}")
|
|
mls_games = scrape_mls_fbref(season)
|
|
mls_games = assign_stable_ids(mls_games, 'MLS', str(season))
|
|
all_games.extend(mls_games)
|
|
print(f" Scraped {len(mls_games)} MLS games")
|
|
|
|
# NEW: NWSL
|
|
print_section(f"NWSL {season}")
|
|
nwsl_games = scrape_nwsl_fbref(season)
|
|
nwsl_games = assign_stable_ids(nwsl_games, 'NWSL', str(season))
|
|
all_games.extend(nwsl_games)
|
|
print(f" Scraped {len(nwsl_games)} NWSL games")
|
|
```
|
|
|
|
---
|
|
|
|
## 10. Checklist
|
|
|
|
### Definition of Done
|
|
|
|
- [ ] **Scraping**: WNBA, MLS, NWSL scrapers added and tested
|
|
- [ ] **Team Mappings**: All current teams with correct abbreviations
|
|
- [ ] **Stadiums**: All venues canonicalized with coordinates
|
|
- [ ] **Canonicalization**: Pipeline runs without errors for new sports
|
|
- [ ] **Validation**: All integrity checks pass
|
|
- [ ] **CloudKit**: Records uploaded successfully
|
|
- [ ] **Swift Enum**: Sport cases added with correct metadata
|
|
- [ ] **Trip Planning**: New sports can be planned into trips
|
|
- [ ] **Stadium Tracking**: New stadiums appear in progress
|
|
- [ ] **No Regressions**: Existing MLB/NBA/NHL functionality unchanged
|
|
|
|
### Files Modified
|
|
|
|
| File | Changes |
|
|
|------|---------|
|
|
| `Scripts/scrape_schedules.py` | Add team mappings, scrapers |
|
|
| `Scripts/canonicalize_stadiums.py` | Generate new sport stadiums |
|
|
| `Scripts/canonicalize_teams.py` | Add league structure mappings |
|
|
| `Scripts/run_canonicalization_pipeline.py` | Add scraping calls |
|
|
| `Scripts/validate_canonical.py` | Add new sport validation |
|
|
| `Scripts/cloudkit_import.py` | Add sport validation |
|
|
| `SportsTime/Core/Models/Domain/Sport.swift` | Add enum cases |
|
|
| `SportsTime/Resources/stadiums_canonical.json` | New venue records |
|
|
| `SportsTime/Resources/teams_canonical.json` | New team records |
|
|
| `SportsTime/Resources/games_canonical.json` | New game records |
|