chore: remove scraper, add docs, add marketing-videos gitignore

- Remove Scripts/ directory (scraper no longer needed)
- Add themed background documentation to CLAUDE.md
- Add .gitignore for marketing-videos to prevent node_modules tracking

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-01-26 18:13:12 -06:00
parent bfa172de38
commit dbb0099776
129 changed files with 14805 additions and 25325 deletions

View File

@@ -20,7 +20,14 @@
"Bash(git add:*)",
"Bash(git commit:*)",
"Skill(superpowers:writing-plans)",
"Bash(find:*)"
"Bash(find:*)",
"Bash(python:*)",
"Skill(remotion)",
"Bash(npm install:*)",
"Bash(npx tsc:*)",
"Bash(timeout 10 npx remotion:*)",
"Bash(npx remotion:*)",
"Bash(xcrun xcresulttool:*)"
]
}
}

View File

@@ -137,6 +137,30 @@ TripCreationView → TripCreationViewModel → PlanningRequest
- CloudKit container ID: `iCloud.com.sportstime.app`
- `PDFGenerator` and `ExportService` are `@MainActor final class` (not actors) because they access MainActor-isolated UI properties and use UIKit drawing
### Themed Background System
All views use `.themedBackground()` modifier for consistent backgrounds app-wide.
**Components** (`Core/Theme/`):
- `ThemedBackground` (ViewModifiers.swift) - Conditionally shows static gradient or animated background
- `AnimatedSportsBackground` (AnimatedBackground.swift) - Floating sports icons with route lines
- `DesignStyleManager.shared.animationsEnabled` - Toggle controlled in Settings
**How it works:**
```swift
// All views apply this modifier - animation state is automatic
.themedBackground()
// The modifier checks DesignStyleManager internally:
if DesignStyleManager.shared.animationsEnabled {
AnimatedSportsBackground() // Floating icons + route lines
} else {
Theme.backgroundGradient(colorScheme) // Static gradient
}
```
**Adding new screens:** Just apply `.themedBackground()` - no need to handle animation logic.
### iOS 26 API Notes
**Deprecated APIs** (use with `@available(iOS, deprecated: 26.0)` annotation):

View File

@@ -1,27 +0,0 @@
{
"permissions": {
"allow": [
"Skill(superpowers:brainstorming)",
"Skill(superpowers:writing-plans)",
"Skill(superpowers:using-git-worktrees)",
"Skill(superpowers:subagent-driven-development)",
"Bash(git add:*)",
"Bash(git commit:*)",
"WebSearch",
"Bash(wc:*)",
"Bash(find:*)",
"Skill(lead-designer)",
"Skill(frontend-design:frontend-design)",
"Bash(python3:*)",
"Bash(ls:*)",
"Bash(xargs basename:*)",
"Bash(python -m sportstime_parser:*)",
"Bash(python -m py_compile:*)",
"WebFetch(domain:en.wikipedia.org)",
"Bash(tree:*)",
"Bash(python:*)",
"Bash(grep:*)",
"Bash(xargs cat:*)"
]
}
}

49
Scripts/.gitignore vendored
View File

@@ -1,49 +0,0 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
# Virtual environments
venv/
ENV/
env/
.venv/
# IDE
.idea/
.vscode/
*.swp
*.swo
# Output and logs
output/
logs/
*.log
# Secrets
*.pem
.env
.env.*
# Parser state
.parser_state/
# Claude Code
.claude/

View File

@@ -1,833 +0,0 @@
# SportsTime Parser
A Python package for scraping, normalizing, and uploading sports schedule data to CloudKit for the SportsTime iOS app.
## Table of Contents
- [Overview](#overview)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Architecture](#architecture)
- [Directory Structure](#directory-structure)
- [Configuration](#configuration)
- [Data Models](#data-models)
- [Normalizers](#normalizers)
- [Scrapers](#scrapers)
- [Uploaders](#uploaders)
- [Utilities](#utilities)
- [Manual Review Workflow](#manual-review-workflow)
- [Adding a New Sport](#adding-a-new-sport)
- [Troubleshooting](#troubleshooting)
## Overview
The `sportstime_parser` package provides a complete pipeline for:
1. **Scraping** game schedules from multiple sources (Basketball-Reference, ESPN, MLB API, etc.)
2. **Normalizing** raw data to canonical identifiers (teams, stadiums, games)
3. **Resolving** team/stadium names using exact matching, historical aliases, and fuzzy matching
4. **Uploading** data to CloudKit with diff-based sync and resumable uploads
### Supported Sports
| Sport | Code | Sources | Season Format |
|-------|------|---------|---------------|
| NBA | `nba` | Basketball-Reference, ESPN, CBS | Oct-Jun (split year) |
| MLB | `mlb` | Baseball-Reference, MLB API, ESPN | Mar-Nov (single year) |
| NFL | `nfl` | ESPN, Pro-Football-Reference, CBS | Sep-Feb (split year) |
| NHL | `nhl` | Hockey-Reference, NHL API, ESPN | Oct-Jun (split year) |
| MLS | `mls` | ESPN, FBref | Feb-Nov (single year) |
| WNBA | `wnba` | ESPN | May-Oct (single year) |
| NWSL | `nwsl` | ESPN | Mar-Nov (single year) |
## Installation
```bash
cd Scripts
pip install -r requirements.txt
```
### Dependencies
- `requests` - HTTP requests with session management
- `beautifulsoup4` + `lxml` - HTML parsing
- `rapidfuzz` - Fuzzy string matching
- `pyjwt` + `cryptography` - CloudKit JWT authentication
- `rich` - Terminal UI (progress bars, logging)
- `pytz` / `timezonefinder` - Timezone detection
## Quick Start
### Scrape a Single Sport
```python
from sportstime_parser.scrapers import create_nba_scraper
scraper = create_nba_scraper(season=2025)
result = scraper.scrape_all()
print(f"Games: {result.game_count}")
print(f"Teams: {result.team_count}")
print(f"Stadiums: {result.stadium_count}")
print(f"Needs review: {result.review_count}")
```
### Upload to CloudKit
```python
from sportstime_parser.uploaders import CloudKitClient, RecordDiffer
client = CloudKitClient(environment="development")
differ = RecordDiffer()
# Compare local vs remote
diff = differ.diff_games(local_games, remote_records)
# Upload changes
records = diff.get_records_to_upload()
result = await client.save_records(records)
```
## Architecture
```
┌─────────────────────────────────────────────────────────────────────────┐
│ DATA SOURCES │
│ Basketball-Reference │ ESPN API │ MLB API │ Hockey-Reference │ etc. │
└────────────────────────────────┬────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ SCRAPERS │
│ NBAScraper │ MLBScraper │ NFLScraper │ NHLScraper │ MLSScraper │ etc. │
│ │
│ Features: │
│ • Multi-source fallback (try sources in priority order) │
│ • Automatic rate limiting with exponential backoff │
│ • Doubleheader detection │
│ • International game filtering (NFL London, NHL Global Series) │
└────────────────────────────────┬────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ NORMALIZERS │
│ TeamResolver │ StadiumResolver │ CanonicalIdGenerator │ AliasLoader │
│ │
│ Resolution Strategy (in order): │
│ 1. Exact match against canonical mappings │
│ 2. Date-aware alias lookup (handles renames/relocations) │
│ 3. Fuzzy matching with confidence threshold (85%) │
│ 4. Flag for manual review if unresolved or low confidence │
└────────────────────────────────┬────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ DATA MODELS │
│ Game │ Team │ Stadium │ ManualReviewItem │
│ │
│ All models use canonical IDs: │
│ • team_nba_lal (Los Angeles Lakers) │
│ • stadium_nba_los_angeles_lakers (Crypto.com Arena) │
│ • game_nba_2025_20251022_bos_lal (specific game) │
└────────────────────────────────┬────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ UPLOADERS │
│ CloudKitClient │ RecordDiffer │ StateManager │
│ │
│ Features: │
│ • JWT authentication with Apple's CloudKit Web Services │
│ • Batch operations (up to 200 records per request) │
│ • Diff-based sync (only upload changes) │
│ • Resumable uploads with persistent state │
└────────────────────────────────┬────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ CLOUDKIT │
│ Public Database: Games, Teams, Stadiums, Aliases │
└─────────────────────────────────────────────────────────────────────────┘
```
## Directory Structure
```
Scripts/
├── README.md # This file
├── requirements.txt # Python dependencies
├── pyproject.toml # Package configuration
├── league_structure.json # League hierarchy (conferences, divisions)
├── team_aliases.json # Historical team name mappings
├── stadium_aliases.json # Historical stadium name mappings
├── logs/ # Runtime logs (auto-created)
├── output/ # Scrape output files (auto-created)
└── sportstime_parser/ # Main package
├── __init__.py
├── config.py # Configuration constants
├── SOURCES.md # Data source documentation
├── models/ # Data classes
│ ├── game.py # Game model
│ ├── team.py # Team model
│ ├── stadium.py # Stadium model
│ └── aliases.py # Alias and ManualReviewItem models
├── normalizers/ # Name resolution
│ ├── canonical_id.py # ID generation
│ ├── alias_loader.py # Alias loading and resolution
│ ├── fuzzy.py # Fuzzy string matching
│ ├── timezone.py # Timezone detection
│ ├── team_resolver.py # Team name resolution
│ └── stadium_resolver.py # Stadium name resolution
├── scrapers/ # Sport-specific scrapers
│ ├── base.py # Abstract base scraper
│ ├── nba.py # NBA scraper
│ ├── mlb.py # MLB scraper
│ ├── nfl.py # NFL scraper
│ ├── nhl.py # NHL scraper
│ ├── mls.py # MLS scraper
│ ├── wnba.py # WNBA scraper
│ └── nwsl.py # NWSL scraper
├── uploaders/ # CloudKit integration
│ ├── cloudkit.py # CloudKit Web Services client
│ ├── diff.py # Record diffing
│ └── state.py # Resumable upload state
└── utils/ # Shared utilities
├── logging.py # Rich-based logging
├── http.py # Rate-limited HTTP client
└── progress.py # Progress tracking
```
## Configuration
### config.py
Key configuration constants:
```python
# Directories
SCRIPTS_DIR = Path(__file__).parent.parent # Scripts/
OUTPUT_DIR = SCRIPTS_DIR / "output" # JSON output
STATE_DIR = SCRIPTS_DIR / ".parser_state" # Upload state
# CloudKit
CLOUDKIT_CONTAINER = "iCloud.com.sportstime.app"
CLOUDKIT_ENVIRONMENT = "development" # or "production"
# Rate Limiting
DEFAULT_REQUEST_DELAY = 3.0 # seconds between requests
MAX_RETRIES = 3 # retry attempts
BACKOFF_FACTOR = 2.0 # exponential backoff multiplier
INITIAL_BACKOFF = 5.0 # initial backoff duration
# Fuzzy Matching
FUZZY_THRESHOLD = 85 # minimum match confidence (0-100)
# Expected game counts (for validation)
EXPECTED_GAME_COUNTS = {
"nba": 1230, # 30 teams × 82 games ÷ 2
"mlb": 2430, # 30 teams × 162 games ÷ 2
"nfl": 272, # Regular season only
"nhl": 1312, # 32 teams × 82 games ÷ 2
"mls": 544, # 29 teams × ~34 games ÷ 2
"wnba": 228, # 12 teams × 40 games ÷ 2
"nwsl": 182, # 14 teams × 26 games ÷ 2
}
# Geography (for filtering international games)
ALLOWED_COUNTRIES = {"USA", "Canada"}
```
### league_structure.json
Defines the hierarchical structure of each league:
```json
{
"nba": {
"name": "National Basketball Association",
"conferences": {
"Eastern": {
"divisions": {
"Atlantic": ["BOS", "BKN", "NYK", "PHI", "TOR"],
"Central": ["CHI", "CLE", "DET", "IND", "MIL"],
"Southeast": ["ATL", "CHA", "MIA", "ORL", "WAS"]
}
},
"Western": { ... }
}
},
"mlb": { ... },
...
}
```
### team_aliases.json / stadium_aliases.json
Historical name mappings with validity dates:
```json
{
"team_mlb_athletics": [
{
"alias": "Oakland Athletics",
"alias_type": "full_name",
"valid_from": "1968-01-01",
"valid_until": "2024-12-31"
},
{
"alias": "Las Vegas Athletics",
"alias_type": "full_name",
"valid_from": "2028-01-01",
"valid_until": null
}
]
}
```
## Data Models
### Game
```python
@dataclass
class Game:
id: str # Canonical ID: game_{sport}_{season}_{date}_{away}_{home}
sport: str # Sport code (nba, mlb, etc.)
season: int # Season start year
home_team_id: str # Canonical team ID
away_team_id: str # Canonical team ID
stadium_id: str # Canonical stadium ID
game_date: datetime # UTC datetime
game_number: Optional[int] # 1 or 2 for doubleheaders
home_score: Optional[int] # None if not played
away_score: Optional[int]
status: str # scheduled, final, postponed, cancelled
source_url: Optional[str] # For manual review
raw_home_team: Optional[str] # Original scraped value
raw_away_team: Optional[str]
raw_stadium: Optional[str]
```
### Team
```python
@dataclass
class Team:
id: str # Canonical ID: team_{sport}_{abbrev}
sport: str
city: str # e.g., "Los Angeles"
name: str # e.g., "Lakers"
full_name: str # e.g., "Los Angeles Lakers"
abbreviation: str # e.g., "LAL"
conference: Optional[str] # e.g., "Western"
division: Optional[str] # e.g., "Pacific"
stadium_id: Optional[str] # Home stadium
primary_color: Optional[str]
secondary_color: Optional[str]
logo_url: Optional[str]
```
### Stadium
```python
@dataclass
class Stadium:
id: str # Canonical ID: stadium_{sport}_{city_team}
sport: str
name: str # Current name (e.g., "Crypto.com Arena")
city: str
state: Optional[str]
country: str
latitude: Optional[float]
longitude: Optional[float]
capacity: Optional[int]
surface: Optional[str] # grass, turf, ice, hardwood
roof_type: Optional[str] # dome, retractable, open
opened_year: Optional[int]
image_url: Optional[str]
timezone: Optional[str]
```
### ManualReviewItem
```python
@dataclass
class ManualReviewItem:
item_type: str # "team" or "stadium"
raw_value: str # Original scraped value
suggested_id: Optional[str] # Best fuzzy match (if any)
confidence: float # 0.0 - 1.0
reason: str # Why review is needed
source_url: Optional[str] # Where it came from
sport: str
check_date: Optional[date] # For date-aware alias lookup
```
## Normalizers
### Canonical ID Generation
IDs are deterministic and immutable:
```python
# Team ID
generate_team_id("nba", "LAL")
# → "team_nba_lal"
# Stadium ID
generate_stadium_id("nba", "Los Angeles", "Lakers")
# → "stadium_nba_los_angeles_lakers"
# Game ID
generate_game_id(
sport="nba",
season=2025,
away_abbrev="BOS",
home_abbrev="LAL",
game_date=datetime(2025, 10, 22),
game_number=None
)
# → "game_nba_2025_20251022_bos_lal"
# Doubleheader Game ID
generate_game_id(..., game_number=2)
# → "game_nba_2025_20251022_bos_lal_2"
```
### Team Resolution
The `TeamResolver` uses a three-stage strategy:
```python
resolver = get_team_resolver("nba")
result = resolver.resolve(
"Los Angeles Lakers",
check_date=date(2025, 10, 22),
source_url="https://..."
)
# Result:
# - canonical_id: "team_nba_lal"
# - confidence: 1.0 (exact match)
# - review_item: None
```
**Resolution stages:**
1. **Exact Match**: Check against canonical team mappings
- Full name: "Los Angeles Lakers"
- City + Name: "Los Angeles" + "Lakers"
- Abbreviation: "LAL"
2. **Alias Lookup**: Check historical aliases with date awareness
- "Oakland Athletics" → "team_mlb_athletics" (valid until 2024-12-31)
- Handles relocations: "Oakland" → "Las Vegas" transition
3. **Fuzzy Match**: Use rapidfuzz with 85% threshold
- "LA Lakers" → "Los Angeles Lakers" (92% match)
- Low-confidence matches flagged for review
### Stadium Resolution
Similar three-stage strategy with additional location awareness:
```python
resolver = get_stadium_resolver("nba")
result = resolver.resolve(
"Crypto.com Arena",
check_date=date(2025, 10, 22)
)
```
**Key features:**
- Handles naming rights changes (Staples Center → Crypto.com Arena)
- Date-aware: "Staples Center" resolves correctly for historical games
- Location-based fallback using latitude/longitude
## Scrapers
### Base Scraper
All scrapers extend `BaseScraper` with these features:
```python
class BaseScraper(ABC):
def __init__(self, sport: str, season: int): ...
# Required implementations
def _get_sources(self) -> list[str]: ...
def _scrape_games_from_source(self, source: str) -> list[RawGameData]: ...
def _normalize_games(self, raw_games) -> tuple[list[Game], list[ManualReviewItem]]: ...
def scrape_teams(self) -> list[Team]: ...
def scrape_stadiums(self) -> list[Stadium]: ...
# Built-in features
def scrape_games(self) -> ScrapeResult:
"""Multi-source fallback - tries each source in order."""
...
def scrape_all(self) -> ScrapeResult:
"""Scrapes games, teams, and stadiums with progress tracking."""
...
```
### NBA Scraper
```python
class NBAScraper(BaseScraper):
"""
Sources (in priority order):
1. Basketball-Reference - HTML tables, monthly pages
2. ESPN API - JSON, per-date queries
3. CBS Sports - Backup (not implemented)
Season: October to June (split year, e.g., 2025-26)
"""
```
**Basketball-Reference parsing:**
- URL: `https://www.basketball-reference.com/leagues/NBA_{year}_games-{month}.html`
- Table columns: date_game, visitor_team_name, home_team_name, visitor_pts, home_pts, arena_name
### MLB Scraper
```python
class MLBScraper(BaseScraper):
"""
Sources:
1. Baseball-Reference - Single page per season
2. MLB Stats API - Official API with date range queries
3. ESPN API - Backup
Season: March to November (single year)
Handles: Doubleheaders with game_number
"""
```
### NFL Scraper
```python
class NFLScraper(BaseScraper):
"""
Sources:
1. ESPN API - Week-based queries
2. Pro-Football-Reference - Single page per season
Season: September to February (split year)
Filters: International games (London, Mexico City, Frankfurt)
Scrapes: Preseason (4 weeks), Regular (18 weeks), Postseason (4 rounds)
"""
```
### NHL Scraper
```python
class NHLScraper(BaseScraper):
"""
Sources:
1. Hockey-Reference - Single page per season
2. NHL API - New API (api-web.nhle.com)
3. ESPN API - Backup
Season: October to June (split year)
Filters: International games (Prague, Stockholm, Helsinki)
"""
```
### MLS / WNBA / NWSL Scrapers
All use ESPN API as primary source with similar structure:
- Single calendar year seasons
- Conference-based organization (MLS) or single table (WNBA, NWSL)
## Uploaders
### CloudKit Client
```python
class CloudKitClient:
"""CloudKit Web Services API client with JWT authentication."""
def __init__(
self,
container_id: str = CLOUDKIT_CONTAINER,
environment: str = "development", # or "production"
key_id: str = None, # From CloudKit Dashboard
private_key: str = None, # EC P-256 private key
): ...
async def fetch_records(
self,
record_type: RecordType,
filter_by: Optional[dict] = None,
sort_by: Optional[str] = None,
) -> list[dict]: ...
async def save_records(
self,
records: list[CloudKitRecord],
batch_size: int = 200,
) -> BatchResult: ...
async def delete_records(
self,
record_names: list[str],
record_type: RecordType,
) -> BatchResult: ...
```
**Authentication:**
- Uses EC P-256 key pair
- JWT tokens signed with private key
- Tokens valid for 30 minutes
### Record Differ
```python
class RecordDiffer:
"""Compares local records with CloudKit records."""
def diff_games(self, local: list[Game], remote: list[dict]) -> DiffResult: ...
def diff_teams(self, local: list[Team], remote: list[dict]) -> DiffResult: ...
def diff_stadiums(self, local: list[Stadium], remote: list[dict]) -> DiffResult: ...
```
**DiffResult:**
```python
@dataclass
class DiffResult:
creates: list[RecordDiff] # New records to create
updates: list[RecordDiff] # Changed records to update
deletes: list[RecordDiff] # Remote records to delete
unchanged: list[RecordDiff] # Records with no changes
def get_records_to_upload(self) -> list[CloudKitRecord]:
"""Returns creates + updates ready for upload."""
```
### State Manager
```python
class StateManager:
"""Manages resumable upload state."""
def load_session(self, sport, season, environment) -> Optional[UploadSession]: ...
def save_session(self, session: UploadSession) -> None: ...
def get_session_or_create(
self,
sport, season, environment,
record_names: list[tuple[str, str]],
resume: bool = False,
) -> UploadSession: ...
```
**State persistence:**
- Stored in `.parser_state/upload_state_{sport}_{season}_{env}.json`
- Tracks: pending, uploaded, failed records
- Supports retry with backoff
## Utilities
### HTTP Client
```python
class RateLimitedSession:
"""HTTP session with rate limiting and exponential backoff."""
def __init__(
self,
delay: float = 3.0, # Seconds between requests
max_retries: int = 3,
backoff_factor: float = 2.0,
): ...
def get(self, url, **kwargs) -> Response: ...
def get_json(self, url, **kwargs) -> dict: ...
def get_html(self, url, **kwargs) -> str: ...
```
**Features:**
- User-agent rotation (5 different Chrome/Firefox/Safari agents)
- Per-domain rate limiting
- Automatic 429 handling with exponential backoff + jitter
- Connection pooling
### Logging
```python
from sportstime_parser.utils import get_logger, log_success, log_error
logger = get_logger() # Rich-formatted logger
logger.info("Starting scrape")
log_success("Scraped 1230 games") # Green checkmark
log_error("Failed to parse") # Red X
```
**Log output:**
- Console: Rich-formatted with colors
- File: `logs/parser_{timestamp}.log`
### Progress Tracking
```python
from sportstime_parser.utils import ScrapeProgress, track_progress
# Specialized scrape tracking
progress = ScrapeProgress("nba", 2025)
progress.start()
with progress.scraping_schedule(total_months=9) as advance:
for month in months:
fetch(month)
advance()
progress.finish() # Prints summary
# Generic progress bar
for game in track_progress(games, "Processing games"):
process(game)
```
## Manual Review Workflow
When the system can't confidently resolve a team or stadium:
1. **Low confidence fuzzy match** (< 85%):
```
ManualReviewItem(
item_type="team",
raw_value="LA Lakers",
suggested_id="team_nba_lal",
confidence=0.82,
reason="Fuzzy match below threshold"
)
```
2. **No match found**:
```
ManualReviewItem(
raw_value="Unknown Team FC",
suggested_id=None,
confidence=0.0,
reason="No match found in canonical mappings"
)
```
3. **Ambiguous match** (multiple candidates):
```
ManualReviewItem(
raw_value="LA",
suggested_id="team_nba_lac",
confidence=0.5,
reason="Ambiguous: could be Lakers or Clippers"
)
```
**Resolution:**
- Review items are exported to JSON
- Manually verify and add to `team_aliases.json` or `stadium_aliases.json`
- Re-run scrape - aliases will be used for resolution
## Adding a New Sport
1. **Create scraper** in `scrapers/{sport}.py`:
```python
class NewSportScraper(BaseScraper):
def __init__(self, season: int, **kwargs):
super().__init__("newsport", season, **kwargs)
self._team_resolver = get_team_resolver("newsport")
self._stadium_resolver = get_stadium_resolver("newsport")
def _get_sources(self) -> list[str]:
return ["primary_source", "backup_source"]
def _scrape_games_from_source(self, source: str) -> list[RawGameData]:
# Implement source-specific scraping
...
def _normalize_games(self, raw_games) -> tuple[list[Game], list[ManualReviewItem]]:
# Use resolvers to normalize
...
def scrape_teams(self) -> list[Team]:
# Return canonical team list
...
def scrape_stadiums(self) -> list[Stadium]:
# Return canonical stadium list
...
```
2. **Add team mappings** in `normalizers/team_resolver.py`:
```python
TEAM_MAPPINGS["newsport"] = {
"ABC": ("team_newsport_abc", "Full Team Name", "City"),
...
}
```
3. **Add stadium mappings** in `normalizers/stadium_resolver.py`:
```python
STADIUM_MAPPINGS["newsport"] = {
"stadium_newsport_venue": StadiumInfo(
name="Venue Name",
city="City",
state="State",
country="USA",
latitude=40.0,
longitude=-74.0,
),
...
}
```
4. **Add to league_structure.json** (if hierarchical)
5. **Update config.py**:
```python
EXPECTED_GAME_COUNTS["newsport"] = 500
```
6. **Export from `__init__.py`**
## Troubleshooting
### Rate Limiting (429 errors)
The system handles these automatically with exponential backoff. If persistent:
- Increase `DEFAULT_REQUEST_DELAY` in config.py
- Check if source has changed their rate limits
### Missing Teams/Stadiums
1. Check scraper logs for raw values
2. Add to `team_aliases.json` or `stadium_aliases.json`
3. Or add to canonical mappings if it's a new team/stadium
### CloudKit Authentication Errors
1. Verify key_id matches CloudKit Dashboard
2. Check private key format (EC P-256, PEM)
3. Ensure container identifier is correct
### Incomplete Scrapes
The system discards partial data on errors. Check:
- `logs/` for error details
- Network connectivity
- Source website availability
### International Games Appearing
NFL and NHL scrapers filter these automatically. If new locations emerge:
- Add to `INTERNATIONAL_LOCATIONS` in the scraper
- Or add filtering logic for neutral site games
## Contributing
1. Follow existing patterns for new scrapers
2. Always use canonical IDs
3. Add aliases for historical names
4. Include source URLs for traceability
5. Test with multiple seasons

View File

@@ -1,805 +0,0 @@
# SportsTime Data Audit Report
**Generated:** 2026-01-20
**Scope:** NBA, MLB, NFL, NHL, MLS, WNBA, NWSL
**Data Pipeline:** Scripts → CloudKit → iOS App
---
## Executive Summary
The data audit identified **15 issues** across the SportsTime data pipeline, with significant gaps in source reliability, stadium resolution, and iOS data freshness.
| Severity | Count | Description |
|----------|-------|-------------|
| **Critical** | 1 | iOS bundled data severely outdated |
| **High** | 4 | Single-source sports, NHL stadium data, NBA naming rights |
| **Medium** | 6 | Alias gaps, outdated config, silent game exclusion |
| **Low** | 4 | Minor configuration and coverage issues |
### Key Findings
**Data Pipeline Health:**
-**Canonical ID system**: 100% format compliance across 7,186 IDs
-**Team mappings**: All 183 teams correctly mapped with current abbreviations
-**Referential integrity**: Zero orphan references (0 games pointing to non-existent teams/stadiums)
- ⚠️ **Stadium resolution**: 1,466 games (21.6%) have unresolved stadiums
**Critical Risks:**
1. **ESPN single-point-of-failure** for WNBA, NWSL, MLS - if ESPN changes, 3 sports lose all data
2. **NHL has 100% missing stadiums** - Hockey Reference provides no venue data
3. **iOS bundled data 27% behind** - 1,820 games missing from first-launch experience
**Root Causes:**
- Stadium naming rights changed faster than alias updates (2024-2025)
- Fallback source limit (`max_sources_to_try = 2`) prevents third source from being tried
- Hockey Reference source limitation (no venue info) combined with fallback limit
- iOS bundled JSON not updated with latest pipeline output
---
## Phase Status Tracking
| Phase | Status | Issues Found |
|-------|--------|--------------|
| 1. Hardcoded Mapping Audit | ✅ COMPLETE | 1 Low |
| 2. Alias File Completeness | ✅ COMPLETE | 1 Medium, 1 Low |
| 3. Scraper Source Reliability | ✅ COMPLETE | 2 High, 1 Medium |
| 4. Game Count & Coverage | ✅ COMPLETE | 2 High, 2 Medium, 1 Low |
| 5. Canonical ID Consistency | ✅ COMPLETE | 0 issues |
| 6. Referential Integrity | ✅ COMPLETE | 1 Medium (NHL source) |
| 7. iOS Data Reception | ✅ COMPLETE | 1 Critical, 1 Medium, 1 Low |
---
## Phase 1 Results: Hardcoded Mapping Audit
**Files Audited:**
- `sportstime_parser/normalizers/team_resolver.py` (TEAM_MAPPINGS)
- `sportstime_parser/normalizers/stadium_resolver.py` (STADIUM_MAPPINGS)
### Team Counts
| Sport | Hardcoded | Expected | Abbreviations | Status |
|-------|-----------|----------|---------------|--------|
| NBA | 30 | 30 | 38 | ✅ |
| MLB | 30 | 30 | 38 | ✅ |
| NFL | 32 | 32 | 40 | ✅ |
| NHL | 32 | 32 | 41 | ✅ |
| MLS | 30 | 30* | 32 | ✅ |
| WNBA | 13 | 13 | 13 | ✅ |
| NWSL | 16 | 16 | 24 | ✅ |
*MLS: 29 original teams + San Diego FC (2025 expansion) = 30
### Stadium Counts
| Sport | Hardcoded | Notes | Status |
|-------|-----------|-------|--------|
| NBA | 30 | 1 per team | ✅ |
| MLB | 57 | 30 regular + 18 spring training + 9 special venues | ✅ |
| NFL | 30 | Includes shared venues (SoFi Stadium: LAR+LAC, MetLife: NYG+NYJ) | ✅ |
| NHL | 32 | 1 per team | ✅ |
| MLS | 30 | 1 per team | ✅ |
| WNBA | 13 | 1 per team | ✅ |
| NWSL | 19 | 14 current + 5 expansion team venues (Boston/Denver) | ✅ |
### Recent Updates Verification
| Update | Type | Status | Notes |
|--------|------|--------|-------|
| Utah Hockey Club (NHL) | Relocation | ✅ Present | ARI + UTA abbreviations both map to `team_nhl_ari` |
| Golden State Valkyries (WNBA) | Expansion 2025 | ✅ Present | `team_wnba_gsv` with Chase Center venue |
| Boston Legacy FC (NWSL) | Expansion 2026 | ✅ Present | `team_nwsl_bos` with Gillette Stadium |
| Denver Summit FC (NWSL) | Expansion 2026 | ✅ Present | `team_nwsl_den` with Dick's Sporting Goods Park |
| Oakland A's → Sacramento | Temporary relocation | ✅ Present | `stadium_mlb_sutter_health_park` |
| San Diego FC (MLS) | Expansion 2025 | ✅ Present | `team_mls_sd` with Snapdragon Stadium |
| FedExField → Northwest Stadium | Naming rights | ✅ Present | `stadium_nfl_northwest_stadium` |
### NFL Stadium Sharing
| Stadium | Teams | Status |
|---------|-------|--------|
| SoFi Stadium | LAR, LAC | ✅ Correct |
| MetLife Stadium | NYG, NYJ | ✅ Correct |
### Issues Found
| # | Issue | Severity | Description |
|---|-------|----------|-------------|
| 1 | WNBA single abbreviations | Low | All 13 WNBA teams have only 1 abbreviation each. May need additional abbreviations for source compatibility. |
### Phase 1 Summary
**Result: PASS** - All team and stadium mappings are complete and up-to-date with 2025-2026 changes.
- ✅ All 7 sports have correct team counts
- ✅ All stadium counts are appropriate (including spring training, special venues)
- ✅ Recent franchise moves/expansions are reflected
- ✅ Stadium sharing is correctly handled
- ✅ Naming rights updates are current
---
## Phase 2 Results: Alias File Completeness
**Files Audited:**
- `Scripts/team_aliases.json`
- `Scripts/stadium_aliases.json`
### Team Aliases Summary
| Sport | Entries | Coverage | Status |
|-------|---------|----------|--------|
| MLB | 23 | Historical relocations/renames | ✅ |
| NBA | 29 | Historical relocations/renames | ✅ |
| NHL | 24 | Historical relocations/renames | ✅ |
| NFL | 0 | **No aliases** | ⚠️ |
| MLS | 0 | No aliases (newer league) | ✅ |
| WNBA | 0 | No aliases (newer league) | ✅ |
| NWSL | 0 | No aliases (newer league) | ✅ |
| **Total** | **76** | | |
- All 76 entries have valid date ranges
- No orphan references (all canonical IDs exist in mappings)
### Stadium Aliases Summary
| Sport | Entries | Coverage | Status |
|-------|---------|----------|--------|
| MLB | 109 | Regular + spring training + special venues | ✅ |
| NFL | 65 | Naming rights history | ✅ |
| NBA | 44 | Naming rights history | ✅ |
| NHL | 39 | Naming rights history | ✅ |
| MLS | 35 | Current + naming variants | ✅ |
| WNBA | 15 | Current + naming variants | ✅ |
| NWSL | 14 | Current + naming variants | ✅ |
| **Total** | **321** | | |
- 65 entries have date ranges (historical naming rights)
- 256 entries are permanent aliases (no date restrictions)
### Orphan Reference Check
| Type | Count | Status |
|------|-------|--------|
| Team aliases with invalid references | 0 | ✅ |
| Stadium aliases with invalid references | **5** | ❌ |
**Orphan Stadium References Found:**
| Alias Name | References (Invalid) | Correct ID |
|------------|---------------------|------------|
| Broncos Stadium at Mile High | `stadium_nfl_empower_field_at_mile_high` | `stadium_nfl_empower_field` |
| Sports Authority Field at Mile High | `stadium_nfl_empower_field_at_mile_high` | `stadium_nfl_empower_field` |
| Invesco Field at Mile High | `stadium_nfl_empower_field_at_mile_high` | `stadium_nfl_empower_field` |
| Mile High Stadium | `stadium_nfl_empower_field_at_mile_high` | `stadium_nfl_empower_field` |
| Arrowhead Stadium | `stadium_nfl_geha_field_at_arrowhead_stadium` | `stadium_nfl_arrowhead_stadium` |
### Historical Changes Coverage
| Historical Name | Current Team | In Aliases? |
|-----------------|--------------|-------------|
| Montreal Expos | Washington Nationals | ✅ |
| Seattle SuperSonics | Oklahoma City Thunder | ✅ |
| Arizona Coyotes | Utah Hockey Club | ✅ |
| Cleveland Indians | Cleveland Guardians | ✅ |
| Hartford Whalers | Carolina Hurricanes | ✅ |
| Quebec Nordiques | Colorado Avalanche | ✅ |
| Vancouver Grizzlies | Memphis Grizzlies | ✅ |
| Washington Redskins | Washington Commanders | ❌ Missing |
| Washington Football Team | Washington Commanders | ❌ Missing |
| Brooklyn Dodgers | Los Angeles Dodgers | ❌ Missing |
### Issues Found
| # | Issue | Severity | Description |
|---|-------|----------|-------------|
| 2 | Orphan stadium alias references | Medium | 5 stadium aliases point to non-existent canonical IDs (`stadium_nfl_empower_field_at_mile_high`, `stadium_nfl_geha_field_at_arrowhead_stadium`). Causes resolution failures for historical Denver/KC stadium names. |
| 3 | No NFL team aliases | Low | Missing Washington Redskins/Football Team historical names. Limits historical game matching for NFL. |
### Phase 2 Summary
**Result: PASS with issues** - Alias files cover most historical changes but have referential integrity bugs.
- ✅ Team aliases cover MLB/NBA/NHL historical changes
- ✅ Stadium aliases cover naming rights changes across all sports
- ✅ No date range validation errors
- ❌ 5 orphan stadium references need fixing
- ⚠️ No NFL team aliases (Washington Redskins/Football Team missing)
---
## Phase 3 Results: Scraper Source Reliability
**Files Audited:**
- `sportstime_parser/scrapers/base.py` (fallback logic)
- `sportstime_parser/scrapers/nba.py`, `mlb.py`, `nfl.py`, `nhl.py`, `mls.py`, `wnba.py`, `nwsl.py`
### Source Dependency Matrix
| Sport | Primary | Status | Fallback 1 | Status | Fallback 2 | Status | Risk |
|-------|---------|--------|------------|--------|------------|--------|------|
| NBA | basketball_reference | ✅ | espn | ✅ | cbs | ❌ NOT IMPL | Medium |
| MLB | mlb_api | ✅ | espn | ✅ | baseball_reference | ✅ | Low |
| NFL | espn | ✅ | pro_football_reference | ✅ | cbs | ❌ NOT IMPL | Medium |
| NHL | hockey_reference | ✅ | nhl_api | ✅ | espn | ✅ | Low |
| MLS | espn | ✅ | fbref | ❌ NOT IMPL | - | - | **HIGH** |
| WNBA | espn | ✅ | - | - | - | - | **HIGH** |
| NWSL | espn | ✅ | - | - | - | - | **HIGH** |
### Unimplemented Sources
| Sport | Source | Line | Status |
|-------|--------|------|--------|
| NBA | cbs | `nba.py:421` | `raise NotImplementedError("CBS scraper not implemented")` |
| NFL | cbs | `nfl.py:386` | `raise NotImplementedError("CBS scraper not implemented")` |
| MLS | fbref | `mls.py:214` | `raise NotImplementedError("FBref scraper not implemented")` |
### Fallback Logic Analysis
**File:** `base.py:189`
```python
max_sources_to_try = 2 # Don't try all sources if first few return nothing
```
**Impact:**
- Even if 3 sources are declared, only 2 are tried
- If sources 1 and 2 fail, source 3 is never attempted
- This limits resilience for NBA, MLB, NFL, NHL which have 3 sources
### International Game Filtering
| Sport | Hardcoded Locations | Notes |
|-------|---------------------|-------|
| NFL | London, Mexico City, Frankfurt, Munich, São Paulo | ✅ Complete for 2025 |
| NHL | Prague, Stockholm, Helsinki, Tampere, Gothenburg | ✅ Complete for 2025 |
| NBA | None | ⚠️ No international filtering (Abu Dhabi games?) |
| MLB | None | ⚠️ No international filtering (Mexico City games?) |
| MLS | None | N/A (domestic only) |
| WNBA | None | N/A (domestic only) |
| NWSL | None | N/A (domestic only) |
### Single Point of Failure Risk
| Sport | Primary Source | If ESPN Fails... | Risk Level |
|-------|----------------|------------------|------------|
| WNBA | ESPN only | **Complete data loss** | Critical |
| NWSL | ESPN only | **Complete data loss** | Critical |
| MLS | ESPN only (fbref not impl) | **Complete data loss** | Critical |
| NBA | Basketball-Ref → ESPN | ESPN fallback available | Low |
| NFL | ESPN → Pro-Football-Ref | Fallback available | Low |
| NHL | Hockey-Ref → NHL API → ESPN | Two fallbacks | Very Low |
| MLB | MLB API → ESPN → B-Ref | Two fallbacks | Very Low |
### Issues Found
| # | Issue | Severity | Description |
|---|-------|----------|-------------|
| 4 | WNBA/NWSL/MLS single source | High | ESPN is the only working source for 3 sports. If ESPN changes or fails, data collection completely stops. |
| 5 | max_sources_to_try = 2 | High | Third fallback source never tried even if available. Reduces resilience for NBA/MLB/NFL/NHL. |
| 6 | CBS/FBref not implemented | Medium | Declared fallback sources raise NotImplementedError. Appears functional in config but fails at runtime. |
### Phase 3 Summary
**Result: FAIL** - Critical single-point-of-failure for 3 sports.
- ❌ WNBA, NWSL, MLS have only ESPN (no resilience)
- ❌ Fallback limit of 2 prevents third source from being tried
- ⚠️ CBS and FBref declared but not implemented
- ✅ MLB and NHL have full fallback chains
- ✅ International game filtering present for NFL/NHL
---
## Phase 4 Results: Game Count & Coverage
**Files Audited:**
- `Scripts/output/games_*.json` (all 2025 season files)
- `Scripts/output/validation_*.md` (all validation reports)
- `sportstime_parser/config.py` (EXPECTED_GAME_COUNTS)
### Coverage Summary
| Sport | Scraped | Expected | Coverage | Status |
|-------|---------|----------|----------|--------|
| NBA | 1,231 | 1,230 | 100.1% | ✅ |
| MLB | 2,866 | 2,430 | 117.9% | ⚠️ Includes spring training |
| NFL | 330 | 272 | 121.3% | ⚠️ Includes preseason/playoffs |
| NHL | 1,312 | 1,312 | 100.0% | ✅ |
| MLS | 542 | 493 | 109.9% | ✅ Includes playoffs |
| WNBA | 322 | 220 | **146.4%** | ⚠️ Expected count outdated |
| NWSL | 189 | 182 | 103.8% | ✅ |
### Date Range Analysis
| Sport | Start Date | End Date | Notes |
|-------|------------|----------|-------|
| NBA | 2025-10-21 | 2026-04-12 | Regular season only |
| MLB | 2025-03-01 | 2025-11-02 | Includes spring training (417 games in March) |
| NFL | 2025-08-01 | 2026-01-25 | Includes preseason (49 in Aug) + playoffs (28 in Jan) |
| NHL | 2025-10-07 | 2026-04-16 | Regular season only |
| MLS | 2025-02-22 | 2025-11-30 | Regular season + playoffs |
| WNBA | 2025-05-02 | 2025-10-11 | Regular season + playoffs |
| NWSL | 2025-03-15 | 2025-11-23 | Regular season + playoffs |
### Game Status Distribution
All games across all sports have status `unknown` - game status is not being properly parsed from sources.
### Duplicate Game Detection
| Sport | Duplicates Found | Details |
|-------|-----------------|---------|
| NBA | 0 | ✅ |
| MLB | 1 | `game_mlb_2025_20250508_det_col_1` appears twice (doubleheader handling issue) |
| NFL | 0 | ✅ |
| NHL | 0 | ✅ |
| MLS | 0 | ✅ |
| WNBA | 0 | ✅ |
| NWSL | 0 | ✅ |
### Validation Report Analysis
| Sport | Total Games | Unresolved Teams | Unresolved Stadiums | Manual Review Items |
|-------|-------------|------------------|---------------------|---------------------|
| NBA | 1,231 | 0 | **131** | 131 |
| MLB | 2,866 | 12 | 4 | 20 |
| NFL | 330 | 1 | 5 | 11 |
| NHL | 1,312 | 0 | 0 | **1,312** (all missing stadiums) |
| MLS | 542 | 1 | **64** | 129 |
| WNBA | 322 | 5 | **65** | 135 |
| NWSL | 189 | 0 | **16** | 32 |
### Top Unresolved Stadium Names (Recent Naming Rights)
| Stadium Name | Occurrences | Actual Venue | Issue |
|--------------|-------------|--------------|-------|
| Sports Illustrated Stadium | 11 | MLS expansion venue | New venue, missing alias |
| Mortgage Matchup Center | 8 | Rocket Mortgage FieldHouse (CLE) | 2025 naming rights change |
| ScottsMiracle-Gro Field | 4 | MLS Columbus Crew | Missing alias |
| Energizer Park | 3 | MLS CITY SC (STL?) | Missing alias |
| Xfinity Mobile Arena | 3 | Intuit Dome (LAC) | 2025 naming rights change |
| Rocket Arena | 3 | Toyota Center (HOU) | Potential name change |
| CareFirst Arena | 2 | Washington Mystics venue | New WNBA venue name |
### Unresolved Teams (Exhibition/International)
| Team Name | Sport | Type | Games |
|-----------|-------|------|-------|
| BRAZIL | WNBA | International exhibition | 2 |
| Toyota Antelopes | WNBA | Japanese team | 2 |
| TEAM CLARK | WNBA | All-Star Game | 1 |
| (Various MLB) | MLB | International teams | 12 |
| (MLS international) | MLS | CCL/exhibition | 1 |
| (NFL preseason) | NFL | Pre-season exhibition | 1 |
### NHL Stadium Data Issue
**Critical:** Hockey Reference does not provide stadium data. All 1,312 NHL games have `raw_stadium: None`, causing 100% of games to have missing stadium IDs. The NHL fallback sources (NHL API, ESPN) should provide this data, but the `max_sources_to_try = 2` limit combined with Hockey Reference success means fallbacks are never attempted.
### Expected Count Updates Needed
| Sport | Current Expected | Recommended | Reason |
|-------|------------------|-------------|--------|
| WNBA | 220 | **286** | 13 teams × 44 games / 2 (expanded with Golden State Valkyries) |
| NFL | 272 | 272 (filter preseason) | Or document that 330 includes preseason |
| MLB | 2,430 | 2,430 (filter spring training) | Or document that 2,866 includes spring training |
### Issues Found
| # | Issue | Severity | Description |
|---|-------|----------|-------------|
| 7 | NHL has no stadium data | High | Hockey Reference provides no venue info. All 1,312 games missing stadium_id. Fallback sources not tried. |
| 8 | 131 NBA stadium resolution failures | High | Recent naming rights changes ("Mortgage Matchup Center", "Xfinity Mobile Arena") not in aliases. |
| 9 | Outdated WNBA expected count | Medium | Config says 220 but WNBA expanded to 13 teams in 2025; actual is 322 (286 regular + playoffs). |
| 10 | MLS/WNBA stadium alias gaps | Medium | 64 MLS + 65 WNBA unresolved stadiums from new/renamed venues. |
| 11 | Game status not parsed | Low | All games have status `unknown` instead of final/scheduled/postponed. |
### Phase 4 Summary
**Result: FAIL** - Significant stadium resolution failures across multiple sports.
- ❌ 131 NBA games missing stadium (naming rights changes)
- ❌ 1,312 NHL games missing stadium (source doesn't provide data)
- ❌ 64 MLS + 65 WNBA stadiums unresolved (new/renamed venues)
- ⚠️ WNBA expected count severely outdated (220 vs 322 actual)
- ⚠️ MLB/NFL include preseason/spring training games
- ✅ No significant duplicate games (1 MLB doubleheader edge case)
- ✅ All teams resolved except exhibition/international games
---
## Phase 5 Results: Canonical ID Consistency
**Files Audited:**
- `sportstime_parser/normalizers/canonical_id.py` (Python ID generation)
- `SportsTime/Core/Models/Local/CanonicalModels.swift` (iOS models)
- `SportsTime/Core/Services/BootstrapService.swift` (iOS JSON parsing)
- All `Scripts/output/*.json` files (generated IDs)
### Format Validation
| Type | Total IDs | Valid | Invalid | Pass Rate |
|------|-----------|-------|---------|-----------|
| Team | 183 | 183 | 0 | 100.0% ✅ |
| Stadium | 211 | 211 | 0 | 100.0% ✅ |
| Game | 6,792 | 6,792 | 0 | 100.0% ✅ |
### ID Format Patterns (all validated)
```
Teams: team_{sport}_{abbrev} → team_nba_lal
Stadiums: stadium_{sport}_{normalized_name} → stadium_nba_cryptocom_arena
Games: game_{sport}_{season}_{YYYYMMDD}_{away}_{home}[_{#}]
→ game_nba_2025_20251021_hou_okc
```
### Normalization Quality
| Check | Result |
|-------|--------|
| Double underscores (`__`) | 0 found ✅ |
| Leading/trailing underscores | 0 found ✅ |
| Uppercase letters | 0 found ✅ |
| Special characters | 0 found ✅ |
### Abbreviation Lengths (Teams)
| Length | Count |
|--------|-------|
| 2 chars | 21 |
| 3 chars | 161 |
| 4 chars | 1 |
### Stadium ID Lengths
- Minimum: 8 characters
- Maximum: 29 characters
- Average: 16.2 characters
### iOS Cross-Compatibility
| Aspect | Status | Notes |
|--------|--------|-------|
| Field naming convention | ✅ Compatible | Python uses snake_case; iOS `BootstrapService` uses matching Codable structs |
| Deterministic UUID generation | ✅ Compatible | iOS uses SHA256 hash of canonical_id - matches any valid string |
| Schema version | ✅ Compatible | Both use version 1 |
| Required fields | ✅ Present | All iOS-required fields present in JSON output |
### Field Mapping (Python → iOS)
| Python Field | iOS Field | Notes |
|--------------|-----------|-------|
| `canonical_id` | `canonicalId` | Mapped via `JSONCanonicalStadium.canonical_id``CanonicalStadium.canonicalId` |
| `home_team_canonical_id` | `homeTeamCanonicalId` | Explicit mapping in BootstrapService |
| `away_team_canonical_id` | `awayTeamCanonicalId` | Explicit mapping in BootstrapService |
| `stadium_canonical_id` | `stadiumCanonicalId` | Explicit mapping in BootstrapService |
| `game_datetime_utc` | `dateTime` | ISO 8601 parsing with fallback to legacy format |
### Issues Found
**No issues found.** All canonical IDs are:
- Correctly formatted according to defined patterns
- Properly normalized (lowercase, no special characters)
- Deterministic (same input produces same output)
- Compatible with iOS parsing
### Phase 5 Summary
**Result: PASS** - All canonical IDs are consistent and iOS-compatible.
- ✅ 100% format validation pass rate across 7,186 IDs
- ✅ No normalization issues found
- ✅ iOS BootstrapService explicitly handles snake_case → camelCase mapping
- ✅ Deterministic UUID generation using SHA256 hash
---
## Phase 6 Results: Referential Integrity
**Files Audited:**
- `Scripts/output/games_*_2025.json`
- `Scripts/output/teams_*.json`
- `Scripts/output/stadiums_*.json`
### Game → Team References
| Sport | Total Games | Valid Home | Valid Away | Orphan Home | Orphan Away | Status |
|-------|-------------|------------|------------|-------------|-------------|--------|
| NBA | 1,231 | 1,231 | 1,231 | 0 | 0 | ✅ |
| MLB | 2,866 | 2,866 | 2,866 | 0 | 0 | ✅ |
| NFL | 330 | 330 | 330 | 0 | 0 | ✅ |
| NHL | 1,312 | 1,312 | 1,312 | 0 | 0 | ✅ |
| MLS | 542 | 542 | 542 | 0 | 0 | ✅ |
| WNBA | 322 | 322 | 322 | 0 | 0 | ✅ |
| NWSL | 189 | 189 | 189 | 0 | 0 | ✅ |
**Result:** 100% valid team references across all 6,792 games.
### Game → Stadium References
| Sport | Total Games | Valid | Missing | Percentage Missing |
|-------|-------------|-------|---------|-------------------|
| NBA | 1,231 | 1,231 | 0 | 0.0% ✅ |
| MLB | 2,866 | 2,862 | 4 | 0.1% ✅ |
| NFL | 330 | 325 | 5 | 1.5% ✅ |
| NHL | 1,312 | 0 | **1,312** | **100%** ❌ |
| MLS | 542 | 478 | 64 | 11.8% ⚠️ |
| WNBA | 322 | 257 | 65 | 20.2% ⚠️ |
| NWSL | 189 | 173 | 16 | 8.5% ⚠️ |
**Note:** "Missing" means `stadium_canonical_id` is empty (resolution failed at scrape time). This is NOT orphan references to non-existent stadiums.
### Team → Stadium References
| Sport | Teams | Valid Stadium | Invalid | Status |
|-------|-------|---------------|---------|--------|
| NBA | 30 | 30 | 0 | ✅ |
| MLB | 30 | 30 | 0 | ✅ |
| NFL | 32 | 32 | 0 | ✅ |
| NHL | 32 | 32 | 0 | ✅ |
| MLS | 30 | 30 | 0 | ✅ |
| WNBA | 13 | 13 | 0 | ✅ |
| NWSL | 16 | 16 | 0 | ✅ |
**Result:** 100% valid team → stadium references.
### Cross-Sport Stadium Check
✅ No stadiums are duplicated across sports. Each `stadium_{sport}_*` ID is unique to its sport.
### Missing Stadium Root Causes
| Sport | Missing | Root Cause |
|-------|---------|------------|
| NHL | 1,312 | **Hockey Reference provides no venue data** - source limitation |
| MLS | 64 | New/renamed stadiums not in aliases (see Phase 4) |
| WNBA | 65 | New venue names not in aliases (see Phase 4) |
| NWSL | 16 | Expansion team venues + alternate venues |
| NFL | 5 | International games not in stadium mappings |
| MLB | 4 | Exhibition/international games |
### Orphan Reference Summary
| Reference Type | Total Checked | Orphans Found |
|----------------|---------------|---------------|
| Game → Home Team | 6,792 | 0 ✅ |
| Game → Away Team | 6,792 | 0 ✅ |
| Game → Stadium | 6,792 | 0 ✅ |
| Team → Stadium | 183 | 0 ✅ |
**Note:** Zero orphan references. All "missing" stadiums are resolution failures (empty string), not references to non-existent canonical IDs.
### Issues Found
| # | Issue | Severity | Description |
|---|-------|----------|-------------|
| 12 | NHL games have no stadium data | Medium | Hockey Reference source doesn't provide venue information. All 1,312 NHL games have empty stadium_canonical_id. Fallback sources could provide this data but are limited by `max_sources_to_try = 2`. |
### Phase 6 Summary
**Result: PASS with known limitations** - No orphan references exist; missing stadiums are resolution failures.
- ✅ 100% valid team references (home and away)
- ✅ 100% valid team → stadium references
- ✅ No orphan references to non-existent canonical IDs
- ⚠️ 1,466 games (21.6%) have empty stadium_canonical_id (resolution failures, not orphans)
- ⚠️ NHL accounts for 90% of missing stadium data (source limitation)
---
## Phase 7 Results: iOS Data Reception
**Files Audited:**
- `SportsTime/Core/Services/BootstrapService.swift` (JSON parsing)
- `SportsTime/Core/Services/CanonicalSyncService.swift` (CloudKit sync)
- `SportsTime/Core/Services/DataProvider.swift` (data access)
- `SportsTime/Core/Models/Local/CanonicalModels.swift` (SwiftData models)
- `SportsTime/Resources/*_canonical.json` (bundled data files)
### Bundled Data Comparison
| Data Type | iOS Bundled | Scripts Output | Difference | Status |
|-----------|-------------|----------------|------------|--------|
| Teams | 148 | 183 | **-35** (19%) | ❌ STALE |
| Stadiums | 122 | 211 | **-89** (42%) | ❌ STALE |
| Games | 4,972 | 6,792 | **-1,820** (27%) | ❌ STALE |
**iOS bundled data is significantly outdated compared to Scripts output.**
### Field Mapping Verification
| Python Field | iOS JSON Struct | iOS Model | Type Match | Status |
|--------------|-----------------|-----------|------------|--------|
| `canonical_id` | `canonical_id` | `canonicalId` | String ✅ | ✅ |
| `name` | `name` | `name` | String ✅ | ✅ |
| `game_datetime_utc` | `game_datetime_utc` | `dateTime` | ISO 8601 → Date ✅ | ✅ |
| `date` + `time` (legacy) | `date`, `time` | `dateTime` | Fallback parsing ✅ | ✅ |
| `home_team_canonical_id` | `home_team_canonical_id` | `homeTeamCanonicalId` | String ✅ | ✅ |
| `away_team_canonical_id` | `away_team_canonical_id` | `awayTeamCanonicalId` | String ✅ | ✅ |
| `stadium_canonical_id` | `stadium_canonical_id` | `stadiumCanonicalId` | String ✅ | ✅ |
| `sport` | `sport` | `sport` | String ✅ | ✅ |
| `season` | `season` | `season` | String ✅ | ✅ |
| `is_playoff` | `is_playoff` | `isPlayoff` | Bool ✅ | ✅ |
| `broadcast_info` | `broadcast_info` | `broadcastInfo` | String? ✅ | ✅ |
**Result:** All field mappings are correct and compatible.
### Date Parsing Compatibility
iOS `BootstrapService` supports both formats:
```swift
// New canonical format (preferred)
let game_datetime_utc: String? // ISO 8601
// Legacy format (fallback)
let date: String? // "YYYY-MM-DD"
let time: String? // "HH:mm" or "TBD"
```
**Current iOS bundled games use legacy format.** After updating bundled data, new `game_datetime_utc` format will be used.
### Missing Reference Handling
**`DataProvider.filterRichGames()` behavior:**
```swift
return games.compactMap { game in
guard let homeTeam = teamsById[game.homeTeamId],
let awayTeam = teamsById[game.awayTeamId],
let stadium = stadiumsById[game.stadiumId] else {
return nil // Silently drops game
}
return RichGame(...)
}
```
**Impact:**
- Games with missing stadium IDs are **silently excluded** from RichGame queries
- No error logging or fallback behavior
- User sees fewer games than expected without explanation
### Deduplication Logic
**Bootstrap:** No explicit deduplication. If bundled JSON contains duplicate canonical IDs, both would be inserted into SwiftData (leading to potential query issues).
**CloudKit Sync:** Uses upsert pattern with canonical ID as unique key - duplicates would overwrite.
### Schema Version Compatibility
| Component | Schema Version | Status |
|-----------|----------------|--------|
| Scripts output | 1 | ✅ |
| iOS CanonicalModels | 1 | ✅ |
| iOS BootstrapService | Expects 1 | ✅ |
**Compatible.** Schema version mismatch protection exists in `CanonicalSyncService`:
```swift
case .schemaVersionTooNew(let version):
return "Data requires app version supporting schema \(version). Please update the app."
```
### Bootstrap Order Validation
iOS bootstraps in correct dependency order:
1. Stadiums (no dependencies)
2. Stadium aliases (depends on stadiums)
3. League structure (no dependencies)
4. Teams (depends on stadiums)
5. Team aliases (depends on teams)
6. Games (depends on teams + stadiums)
**Correct - prevents orphan references during bootstrap.**
### CloudKit Sync Validation
`CanonicalSyncService` syncs in same dependency order and tracks:
- Per-entity sync timestamps
- Skipped records (incompatible schema version)
- Skipped records (older than local)
- Sync duration and cancellation
**Well-designed sync infrastructure.**
### Issues Found
| # | Issue | Severity | Description |
|---|-------|----------|-------------|
| 13 | iOS bundled data severely outdated | **Critical** | Missing 35 teams (19%), 89 stadiums (42%), 1,820 games (27%). First-launch experience shows incomplete data until CloudKit sync completes. |
| 14 | Silent game exclusion in RichGame queries | Medium | `filterRichGames()` silently drops games with missing team/stadium references. Users see fewer games without explanation. |
| 15 | No bootstrap deduplication | Low | Duplicate game IDs in bundled JSON would create duplicate SwiftData records. Low risk since JSON is generated correctly. |
### Phase 7 Summary
**Result: FAIL** - iOS bundled data is critically outdated.
- ❌ iOS bundled data missing 35 teams, 89 stadiums, 1,820 games
- ⚠️ Games with unresolved references silently dropped from RichGame queries
- ✅ Field mapping between Python and iOS is correct
- ✅ Date parsing supports both legacy and new formats
- ✅ Schema versions are compatible
- ✅ Bootstrap/sync order handles dependencies correctly
---
## Prioritized Issue List
| # | Issue | Severity | Phase | Root Cause | Remediation |
|---|-------|----------|-------|------------|-------------|
| 13 | iOS bundled data severely outdated | **Critical** | 7 | Bundled JSON not updated after pipeline runs | Copy Scripts/output/*_canonical.json to iOS Resources/ and rebuild |
| 4 | WNBA/NWSL/MLS ESPN-only source | **High** | 3 | No implemented fallback sources | Implement alternative scrapers (FBref for MLS, WNBA League Pass) |
| 5 | max_sources_to_try = 2 limits fallback | **High** | 3 | Hardcoded limit in base.py:189 | Increase to 3 or remove limit for sports with 3+ sources |
| 7 | NHL has no stadium data from primary source | **High** | 4 | Hockey Reference doesn't provide venue info | Force NHL to use NHL API or ESPN as primary (they provide venues) |
| 8 | 131 NBA stadium resolution failures | **High** | 4 | 2024-2025 naming rights not in aliases | Add aliases: "Mortgage Matchup Center" → Rocket Mortgage FieldHouse, "Xfinity Mobile Arena" → Intuit Dome |
| 2 | Orphan stadium alias references | **Medium** | 2 | Wrong canonical IDs in stadium_aliases.json | Fix 5 Denver/KC stadium aliases pointing to non-existent IDs |
| 6 | CBS/FBref scrapers declared but not implemented | **Medium** | 3 | NotImplementedError at runtime | Either implement or remove from source lists to avoid confusion |
| 9 | Outdated WNBA expected count | **Medium** | 4 | WNBA expanded to 13 teams in 2025 | Update config.py EXPECTED_GAME_COUNTS["wnba"] from 220 to 286 |
| 10 | MLS/WNBA stadium alias gaps | **Medium** | 4 | New/renamed venues missing from aliases | Add 129 missing stadium aliases (64 MLS + 65 WNBA) |
| 12 | NHL games have no stadium data | **Medium** | 6 | Same as Issue #7 | See Issue #7 remediation |
| 14 | Silent game exclusion in RichGame queries | **Medium** | 7 | compactMap silently drops games | Log dropped games or return partial RichGame with placeholder stadium |
| 1 | WNBA single abbreviations | **Low** | 1 | Only 1 abbreviation per team | Add alternative abbreviations for source compatibility |
| 3 | No NFL team aliases | **Low** | 2 | Missing Washington Redskins/Football Team | Add historical Washington team name aliases |
| 11 | Game status not parsed | **Low** | 4 | Status field always "unknown" | Parse game status from source data (final, scheduled, postponed) |
| 15 | No bootstrap deduplication | **Low** | 7 | No explicit duplicate check during bootstrap | Add deduplication check in bootstrapGames() |
---
## Recommended Next Steps
### Immediate (Before Next Release)
1. **Update iOS bundled data** (Issue #13)
```bash
cp Scripts/output/stadiums_*.json SportsTime/Resources/stadiums_canonical.json
cp Scripts/output/teams_*.json SportsTime/Resources/teams_canonical.json
cp Scripts/output/games_*.json SportsTime/Resources/games_canonical.json
```
2. **Fix NHL stadium data** (Issues #7, #12)
- Change NHL primary source from Hockey Reference to NHL API
- Or: Increase `max_sources_to_try` to 3 so fallbacks are attempted
3. **Add critical stadium aliases** (Issues #8, #10)
- "Mortgage Matchup Center" → `stadium_nba_rocket_mortgage_fieldhouse`
- "Xfinity Mobile Arena" → `stadium_nba_intuit_dome`
- Run validation report to identify all unresolved venue names
### Short-term (This Quarter)
4. **Implement MLS fallback source** (Issue #4)
- FBref has MLS data with venue information
- Reduces ESPN single-point-of-failure risk
5. **Fix orphan alias references** (Issue #2)
- Correct 5 NFL stadium aliases pointing to wrong canonical IDs
- Add validation check to prevent future orphan references
6. **Update expected game counts** (Issue #9)
- WNBA: 220 → 286 (13 teams × 44 games / 2)
### Long-term (Next Quarter)
7. **Implement WNBA/NWSL fallback sources** (Issue #4)
- Consider WNBA League Pass API or other sources
- NWSL has limited data availability - may need to accept ESPN-only
8. **Add RichGame partial loading** (Issue #14)
- Log games dropped due to missing references
- Consider returning games with placeholder stadiums for NHL
9. **Parse game status** (Issue #11)
- Extract final/scheduled/postponed from source data
- Enables filtering by game state
---
## Verification Checklist
After implementing fixes, verify:
- [ ] Run `python -m sportstime_parser scrape --sport all --season 2025`
- [ ] Check validation reports show <5% unresolved stadiums per sport
- [ ] Copy output JSON to iOS Resources/
- [ ] Build iOS app and verify data loads at startup
- [ ] Query RichGames and verify game count matches expectations
- [ ] Run CloudKit sync and verify no errors

File diff suppressed because it is too large Load Diff

View File

@@ -1,371 +0,0 @@
[
{
"id": "mlb_league",
"sport": "MLB",
"type": "league",
"name": "Major League Baseball",
"abbreviation": "MLB",
"parent_id": null,
"display_order": 0
},
{
"id": "mlb_al",
"sport": "MLB",
"type": "conference",
"name": "American League",
"abbreviation": "AL",
"parent_id": "mlb_league",
"display_order": 1
},
{
"id": "mlb_nl",
"sport": "MLB",
"type": "conference",
"name": "National League",
"abbreviation": "NL",
"parent_id": "mlb_league",
"display_order": 2
},
{
"id": "mlb_al_east",
"sport": "MLB",
"type": "division",
"name": "AL East",
"abbreviation": null,
"parent_id": "mlb_al",
"display_order": 3
},
{
"id": "mlb_al_central",
"sport": "MLB",
"type": "division",
"name": "AL Central",
"abbreviation": null,
"parent_id": "mlb_al",
"display_order": 4
},
{
"id": "mlb_al_west",
"sport": "MLB",
"type": "division",
"name": "AL West",
"abbreviation": null,
"parent_id": "mlb_al",
"display_order": 5
},
{
"id": "mlb_nl_east",
"sport": "MLB",
"type": "division",
"name": "NL East",
"abbreviation": null,
"parent_id": "mlb_nl",
"display_order": 6
},
{
"id": "mlb_nl_central",
"sport": "MLB",
"type": "division",
"name": "NL Central",
"abbreviation": null,
"parent_id": "mlb_nl",
"display_order": 7
},
{
"id": "mlb_nl_west",
"sport": "MLB",
"type": "division",
"name": "NL West",
"abbreviation": null,
"parent_id": "mlb_nl",
"display_order": 8
},
{
"id": "nba_league",
"sport": "NBA",
"type": "league",
"name": "National Basketball Association",
"abbreviation": "NBA",
"parent_id": null,
"display_order": 9
},
{
"id": "nba_eastern",
"sport": "NBA",
"type": "conference",
"name": "Eastern Conference",
"abbreviation": "East",
"parent_id": "nba_league",
"display_order": 10
},
{
"id": "nba_western",
"sport": "NBA",
"type": "conference",
"name": "Western Conference",
"abbreviation": "West",
"parent_id": "nba_league",
"display_order": 11
},
{
"id": "nba_atlantic",
"sport": "NBA",
"type": "division",
"name": "Atlantic",
"abbreviation": null,
"parent_id": "nba_eastern",
"display_order": 12
},
{
"id": "nba_central",
"sport": "NBA",
"type": "division",
"name": "Central",
"abbreviation": null,
"parent_id": "nba_eastern",
"display_order": 13
},
{
"id": "nba_southeast",
"sport": "NBA",
"type": "division",
"name": "Southeast",
"abbreviation": null,
"parent_id": "nba_eastern",
"display_order": 14
},
{
"id": "nba_northwest",
"sport": "NBA",
"type": "division",
"name": "Northwest",
"abbreviation": null,
"parent_id": "nba_western",
"display_order": 15
},
{
"id": "nba_pacific",
"sport": "NBA",
"type": "division",
"name": "Pacific",
"abbreviation": null,
"parent_id": "nba_western",
"display_order": 16
},
{
"id": "nba_southwest",
"sport": "NBA",
"type": "division",
"name": "Southwest",
"abbreviation": null,
"parent_id": "nba_western",
"display_order": 17
},
{
"id": "nfl_league",
"sport": "NFL",
"type": "league",
"name": "National Football League",
"abbreviation": "NFL",
"parent_id": null,
"display_order": 18
},
{
"id": "nfl_afc",
"sport": "NFL",
"type": "conference",
"name": "American Football Conference",
"abbreviation": "AFC",
"parent_id": "nfl_league",
"display_order": 19
},
{
"id": "nfl_nfc",
"sport": "NFL",
"type": "conference",
"name": "National Football Conference",
"abbreviation": "NFC",
"parent_id": "nfl_league",
"display_order": 20
},
{
"id": "nfl_afc_east",
"sport": "NFL",
"type": "division",
"name": "AFC East",
"abbreviation": null,
"parent_id": "nfl_afc",
"display_order": 21
},
{
"id": "nfl_afc_north",
"sport": "NFL",
"type": "division",
"name": "AFC North",
"abbreviation": null,
"parent_id": "nfl_afc",
"display_order": 22
},
{
"id": "nfl_afc_south",
"sport": "NFL",
"type": "division",
"name": "AFC South",
"abbreviation": null,
"parent_id": "nfl_afc",
"display_order": 23
},
{
"id": "nfl_afc_west",
"sport": "NFL",
"type": "division",
"name": "AFC West",
"abbreviation": null,
"parent_id": "nfl_afc",
"display_order": 24
},
{
"id": "nfl_nfc_east",
"sport": "NFL",
"type": "division",
"name": "NFC East",
"abbreviation": null,
"parent_id": "nfl_nfc",
"display_order": 25
},
{
"id": "nfl_nfc_north",
"sport": "NFL",
"type": "division",
"name": "NFC North",
"abbreviation": null,
"parent_id": "nfl_nfc",
"display_order": 26
},
{
"id": "nfl_nfc_south",
"sport": "NFL",
"type": "division",
"name": "NFC South",
"abbreviation": null,
"parent_id": "nfl_nfc",
"display_order": 27
},
{
"id": "nfl_nfc_west",
"sport": "NFL",
"type": "division",
"name": "NFC West",
"abbreviation": null,
"parent_id": "nfl_nfc",
"display_order": 28
},
{
"id": "nhl_league",
"sport": "NHL",
"type": "league",
"name": "National Hockey League",
"abbreviation": "NHL",
"parent_id": null,
"display_order": 29
},
{
"id": "nhl_eastern",
"sport": "NHL",
"type": "conference",
"name": "Eastern Conference",
"abbreviation": "East",
"parent_id": "nhl_league",
"display_order": 30
},
{
"id": "nhl_western",
"sport": "NHL",
"type": "conference",
"name": "Western Conference",
"abbreviation": "West",
"parent_id": "nhl_league",
"display_order": 31
},
{
"id": "nhl_atlantic",
"sport": "NHL",
"type": "division",
"name": "Atlantic",
"abbreviation": null,
"parent_id": "nhl_eastern",
"display_order": 32
},
{
"id": "nhl_metropolitan",
"sport": "NHL",
"type": "division",
"name": "Metropolitan",
"abbreviation": null,
"parent_id": "nhl_eastern",
"display_order": 33
},
{
"id": "nhl_central",
"sport": "NHL",
"type": "division",
"name": "Central",
"abbreviation": null,
"parent_id": "nhl_western",
"display_order": 34
},
{
"id": "nhl_pacific",
"sport": "NHL",
"type": "division",
"name": "Pacific",
"abbreviation": null,
"parent_id": "nhl_western",
"display_order": 35
},
{
"id": "wnba_league",
"sport": "WNBA",
"type": "league",
"name": "Women's National Basketball Association",
"abbreviation": "WNBA",
"parent_id": null,
"display_order": 36
},
{
"id": "mls_league",
"sport": "MLS",
"type": "league",
"name": "Major League Soccer",
"abbreviation": "MLS",
"parent_id": null,
"display_order": 37
},
{
"id": "mls_eastern",
"sport": "MLS",
"type": "conference",
"name": "Eastern Conference",
"abbreviation": "East",
"parent_id": "mls_league",
"display_order": 38
},
{
"id": "mls_western",
"sport": "MLS",
"type": "conference",
"name": "Western Conference",
"abbreviation": "West",
"parent_id": "mls_league",
"display_order": 39
},
{
"id": "nwsl_league",
"sport": "NWSL",
"type": "league",
"name": "National Women's Soccer League",
"abbreviation": "NWSL",
"parent_id": null,
"display_order": 40
}
]

View File

@@ -1,66 +0,0 @@
[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "sportstime-parser"
version = "0.1.0"
description = "Sports data scraper and CloudKit uploader for SportsTime app"
readme = "README.md"
requires-python = ">=3.11"
license = {text = "MIT"}
authors = [
{name = "SportsTime Team"}
]
keywords = ["sports", "scraper", "cloudkit", "nba", "mlb", "nfl", "nhl", "mls"]
classifiers = [
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
]
dependencies = [
"requests>=2.31.0",
"beautifulsoup4>=4.12.0",
"lxml>=5.0.0",
"rapidfuzz>=3.5.0",
"python-dateutil>=2.8.0",
"pytz>=2024.1",
"rich>=13.7.0",
"pyjwt>=2.8.0",
"cryptography>=42.0.0",
]
[project.optional-dependencies]
dev = [
"pytest>=8.0.0",
"pytest-cov>=4.1.0",
"responses>=0.25.0",
]
[project.scripts]
sportstime-parser = "sportstime_parser.__main__:main"
[tool.setuptools.packages.find]
where = ["."]
include = ["sportstime_parser*"]
[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]
python_functions = ["test_*"]
addopts = "-v --tb=short"
[tool.coverage.run]
source = ["sportstime_parser"]
omit = ["tests/*"]
[tool.coverage.report]
exclude_lines = [
"pragma: no cover",
"if __name__ == .__main__.:",
"raise NotImplementedError",
]

View File

@@ -1,15 +0,0 @@
# Core dependencies
requests>=2.31.0
beautifulsoup4>=4.12.0
lxml>=5.0.0
rapidfuzz>=3.5.0
python-dateutil>=2.8.0
pytz>=2024.1
rich>=13.7.0
pyjwt>=2.8.0
cryptography>=42.0.0
# Development dependencies
pytest>=8.0.0
pytest-cov>=4.1.0
responses>=0.25.0

View File

@@ -1,688 +0,0 @@
# SportsTime Parser
A Python CLI tool for scraping sports schedules, normalizing data with canonical IDs, and uploading to CloudKit.
## Features
- Scrapes game schedules from multiple sources with automatic fallback
- Supports 7 major sports leagues: NBA, MLB, NFL, NHL, MLS, WNBA, NWSL
- Generates deterministic canonical IDs for games, teams, and stadiums
- Produces validation reports with manual review lists
- Uploads to CloudKit with resumable, diff-based updates
## Requirements
- Python 3.11+
- CloudKit credentials (for upload functionality)
## Installation
```bash
# From the Scripts directory
cd Scripts
# Install in development mode
pip install -e ".[dev]"
# Or install dependencies only
pip install -r requirements.txt
```
## Quick Start
```bash
# Scrape NBA 2025-26 season
sportstime-parser scrape nba --season 2025
# Scrape all sports
sportstime-parser scrape all --season 2025
# Validate existing scraped data
sportstime-parser validate nba --season 2025
# Check status
sportstime-parser status
# Upload to CloudKit (development)
sportstime-parser upload nba --season 2025
# Upload to CloudKit (production)
sportstime-parser upload nba --season 2025 --environment production
```
## CLI Reference
### scrape
Scrape game schedules, teams, and stadiums from web sources.
```bash
sportstime-parser scrape <sport> [options]
Arguments:
sport Sport to scrape: nba, mlb, nfl, nhl, mls, wnba, nwsl, or "all"
Options:
--season, -s INT Season start year (default: 2025)
--dry-run Parse and validate only, don't write output files
--verbose, -v Enable verbose output
```
**Examples:**
```bash
# Scrape NBA 2025-26 season
sportstime-parser scrape nba --season 2025
# Scrape all sports with verbose output
sportstime-parser scrape all --season 2025 --verbose
# Dry run to test without writing files
sportstime-parser scrape mlb --season 2026 --dry-run
```
### validate
Run validation on existing scraped data and regenerate reports. Validation performs these checks:
1. **Game Coverage**: Compares scraped game count against expected totals per league (e.g., ~1,230 for NBA, ~2,430 for MLB)
2. **Team Resolution**: Identifies team names that couldn't be matched to canonical IDs using fuzzy matching
3. **Stadium Resolution**: Identifies venue names that couldn't be matched to canonical stadium IDs
4. **Duplicate Detection**: Finds games with the same home/away teams on the same date (potential doubleheader issues or data errors)
5. **Missing Data**: Flags games missing required fields (stadium_id, team IDs, valid dates)
The output is a Markdown report with:
- Summary statistics (total games, valid games, coverage percentage)
- Manual review items grouped by type (unresolved teams, unresolved stadiums, duplicates)
- Fuzzy match suggestions with confidence scores to help resolve unmatched names
```bash
sportstime-parser validate <sport> [options]
Arguments:
sport Sport to validate: nba, mlb, nfl, nhl, mls, wnba, nwsl, or "all"
Options:
--season, -s INT Season start year (default: 2025)
```
**Examples:**
```bash
# Validate NBA data
sportstime-parser validate nba --season 2025
# Validate all sports
sportstime-parser validate all
```
### upload
Upload scraped data to CloudKit with diff-based updates.
```bash
sportstime-parser upload <sport> [options]
Arguments:
sport Sport to upload: nba, mlb, nfl, nhl, mls, wnba, nwsl, or "all"
Options:
--season, -s INT Season start year (default: 2025)
--environment, -e CloudKit environment: development or production (default: development)
--resume Resume interrupted upload from last checkpoint
```
**Examples:**
```bash
# Upload NBA to development
sportstime-parser upload nba --season 2025
# Upload to production
sportstime-parser upload nba --season 2025 --environment production
# Resume interrupted upload
sportstime-parser upload mlb --season 2026 --resume
```
### status
Show current scrape and upload status.
```bash
sportstime-parser status
```
### retry
Retry failed uploads from previous attempts.
```bash
sportstime-parser retry <sport> [options]
Arguments:
sport Sport to retry: nba, mlb, nfl, nhl, mls, wnba, nwsl, or "all"
Options:
--season, -s INT Season start year (default: 2025)
--environment, -e CloudKit environment (default: development)
--max-retries INT Maximum retry attempts per record (default: 3)
```
### clear
Clear upload session state to start fresh.
```bash
sportstime-parser clear <sport> [options]
Arguments:
sport Sport to clear: nba, mlb, nfl, nhl, mls, wnba, nwsl, or "all"
Options:
--season, -s INT Season start year (default: 2025)
--environment, -e CloudKit environment (default: development)
```
## CloudKit Configuration
To upload data to CloudKit, you need to configure authentication credentials.
### 1. Get Credentials from Apple Developer Portal
1. Go to [Apple Developer Portal](https://developer.apple.com)
2. Navigate to **Certificates, Identifiers & Profiles** > **Keys**
3. Create a new key with **CloudKit** capability
4. Download the private key file (.p8)
5. Note the Key ID
### 2. Set Environment Variables
```bash
# Key ID from Apple Developer Portal
export CLOUDKIT_KEY_ID="your_key_id_here"
# Path to private key file
export CLOUDKIT_PRIVATE_KEY_PATH="/path/to/AuthKey_XXXXXX.p8"
# Or provide key content directly (useful for CI/CD)
export CLOUDKIT_PRIVATE_KEY="-----BEGIN EC PRIVATE KEY-----
...key content...
-----END EC PRIVATE KEY-----"
```
### 3. Verify Configuration
```bash
sportstime-parser status
```
The status output will show whether CloudKit is configured correctly.
## Output Files
Scraped data is saved to the `output/` directory:
```
output/
games_nba_2025.json # Game schedules
teams_nba.json # Team data
stadiums_nba.json # Stadium data
validation_nba_2025.md # Validation report
```
## Validation Reports
Validation reports are generated in Markdown format at `output/validation_{sport}_{season}.md`.
### Report Sections
**Summary Table**
| Metric | Description |
|--------|-------------|
| Total Games | Number of games scraped |
| Valid Games | Games with all required fields resolved |
| Coverage | Percentage of expected games found (based on league schedule) |
| Unresolved Teams | Team names that couldn't be matched |
| Unresolved Stadiums | Venue names that couldn't be matched |
| Duplicates | Potential duplicate game entries |
**Manual Review Items**
Items are grouped by type and include the raw value, source URL, and suggested fixes:
- **Unresolved Teams**: Team names not in the alias mapping. Add to `team_aliases.json` to resolve.
- **Unresolved Stadiums**: Venue names not recognized. Common for renamed arenas (naming rights changes). Add to `stadium_aliases.json`.
- **Duplicate Games**: Same matchup on same date. May indicate doubleheader parsing issues or duplicate entries from different sources.
- **Missing Data**: Games missing stadium coordinates or other required fields.
**Fuzzy Match Suggestions**
For each unresolved name, the validator provides the top fuzzy matches with confidence scores (0-100). High-confidence matches (>80) are likely correct; lower scores need manual verification.
## Canonical IDs
Canonical IDs are stable, deterministic identifiers that enable cross-referencing between games, teams, and stadiums across different data sources.
### ID Formats
**Games**
```
{sport}_{season}_{away}_{home}_{MMDD}[_{game_number}]
```
Examples:
- `nba_2025_hou_okc_1021` - NBA 2025-26, Houston @ OKC, Oct 21
- `mlb_2026_nyy_bos_0401_1` - MLB 2026, Yankees @ Red Sox, Apr 1, Game 1 (doubleheader)
**Teams**
```
{sport}_{city}_{name}
```
Examples:
- `nba_la_lakers`
- `mlb_new_york_yankees`
- `nfl_new_york_giants`
**Stadiums**
```
{sport}_{normalized_name}
```
Examples:
- `mlb_yankee_stadium`
- `nba_crypto_com_arena`
- `nfl_sofi_stadium`
### Generated vs Matched IDs
| Entity | Generated | Matched |
|--------|-----------|---------|
| **Teams** | Pre-defined in `team_resolver.py` mappings | Resolved from raw scraped names via aliases + fuzzy matching |
| **Stadiums** | Pre-defined in `stadium_resolver.py` mappings | Resolved from raw venue names via aliases + fuzzy matching |
| **Games** | Generated at scrape time from resolved team IDs + date | N/A (always generated, never matched) |
**Resolution Flow:**
```
Raw Name (from scraper)
Exact Match (alias lookup in team_aliases.json / stadium_aliases.json)
↓ (if no match)
Fuzzy Match (Levenshtein distance against known names)
↓ (if confidence > threshold)
Canonical ID assigned
↓ (if no match)
Manual Review Item created
```
### Cross-References
Entities reference each other via canonical IDs:
```
┌─────────────────────────────────────────────────────────────┐
│ Game │
│ id: nba_2025_hou_okc_1021 │
│ home_team_id: nba_oklahoma_city_thunder ──────────────┐ │
│ away_team_id: nba_houston_rockets ────────────────┐ │ │
│ stadium_id: nba_paycom_center ────────────────┐ │ │ │
└─────────────────────────────────────────────────│───│───│───┘
│ │ │
┌─────────────────────────────────────────────────│───│───│───┐
│ Stadium │ │ │ │
│ id: nba_paycom_center ◄───────────────────────┘ │ │ │
│ name: "Paycom Center" │ │ │
│ city: "Oklahoma City" │ │ │
│ latitude: 35.4634 │ │ │
│ longitude: -97.5151 │ │ │
└─────────────────────────────────────────────────────│───│───┘
│ │
┌─────────────────────────────────────────────────────│───│───┐
│ Team │ │ │
│ id: nba_houston_rockets ◄─────────────────────────┘ │ │
│ name: "Rockets" │ │
│ city: "Houston" │ │
│ stadium_id: nba_toyota_center │ │
└─────────────────────────────────────────────────────────│───┘
┌─────────────────────────────────────────────────────────│───┐
│ Team │ │
│ id: nba_oklahoma_city_thunder ◄───────────────────────┘ │
│ name: "Thunder" │
│ city: "Oklahoma City" │
│ stadium_id: nba_paycom_center │
└─────────────────────────────────────────────────────────────┘
```
### Alias Files
Aliases map variant names to canonical IDs:
**`team_aliases.json`**
```json
{
"nba": {
"LA Lakers": "nba_la_lakers",
"Los Angeles Lakers": "nba_la_lakers",
"LAL": "nba_la_lakers"
}
}
```
**`stadium_aliases.json`**
```json
{
"nba": {
"Crypto.com Arena": "nba_crypto_com_arena",
"Staples Center": "nba_crypto_com_arena",
"STAPLES Center": "nba_crypto_com_arena"
}
}
```
When a scraper returns a raw name like "LA Lakers", the resolver:
1. Checks `team_aliases.json` for an exact match → finds `nba_la_lakers`
2. If no exact match, runs fuzzy matching against all known team names
3. If fuzzy match confidence > 80%, uses that canonical ID
4. Otherwise, creates a manual review item for human resolution
## Adding a New Sport
To add support for a new sport (e.g., `cfb` for college football), update these files:
### 1. Configuration (`config.py`)
Add the sport to `SUPPORTED_SPORTS` and `EXPECTED_GAME_COUNTS`:
```python
SUPPORTED_SPORTS: list[str] = [
"nba", "mlb", "nfl", "nhl", "mls", "wnba", "nwsl",
"cfb", # ← Add new sport
]
EXPECTED_GAME_COUNTS: dict[str, int] = {
# ... existing sports ...
"cfb": 900, # ← Add expected game count for validation
}
```
### 2. Team Mappings (`normalizers/team_resolver.py`)
Add team definitions to `TEAM_MAPPINGS`. Each entry maps an abbreviation to `(canonical_id, full_name, city)`:
```python
TEAM_MAPPINGS: dict[str, dict[str, tuple[str, str, str]]] = {
# ... existing sports ...
"cfb": {
"ALA": ("team_cfb_ala", "Alabama Crimson Tide", "Tuscaloosa"),
"OSU": ("team_cfb_osu", "Ohio State Buckeyes", "Columbus"),
# ... all teams ...
},
}
```
### 3. Stadium Mappings (`normalizers/stadium_resolver.py`)
Add stadium definitions to `STADIUM_MAPPINGS`. Each entry is a `StadiumInfo` with coordinates:
```python
STADIUM_MAPPINGS: dict[str, dict[str, StadiumInfo]] = {
# ... existing sports ...
"cfb": {
"stadium_cfb_bryant_denny": StadiumInfo(
id="stadium_cfb_bryant_denny",
name="Bryant-Denny Stadium",
city="Tuscaloosa",
state="AL",
country="USA",
sport="cfb",
latitude=33.2083,
longitude=-87.5503,
),
# ... all stadiums ...
},
}
```
### 4. Scraper Implementation (`scrapers/cfb.py`)
Create a new scraper class extending `BaseScraper`:
```python
from .base import BaseScraper, RawGameData, ScrapeResult
class CFBScraper(BaseScraper):
def __init__(self, season: int, **kwargs):
super().__init__("cfb", season, **kwargs)
self._team_resolver = get_team_resolver("cfb")
self._stadium_resolver = get_stadium_resolver("cfb")
def _get_sources(self) -> list[str]:
return ["espn", "sports_reference"] # Priority order
def _get_source_url(self, source: str, **kwargs) -> str:
# Return URL for each source
...
def _scrape_games_from_source(self, source: str) -> list[RawGameData]:
# Implement scraping logic
...
def _normalize_games(self, raw_games: list[RawGameData]) -> tuple[list[Game], list[ManualReviewItem]]:
# Convert raw data to Game objects using resolvers
...
def scrape_teams(self) -> list[Team]:
# Return Team objects from TEAM_MAPPINGS
...
def scrape_stadiums(self) -> list[Stadium]:
# Return Stadium objects from STADIUM_MAPPINGS
...
def create_cfb_scraper(season: int) -> CFBScraper:
return CFBScraper(season=season)
```
### 5. Register Scraper (`scrapers/__init__.py`)
Export the new scraper:
```python
from .cfb import CFBScraper, create_cfb_scraper
__all__ = [
# ... existing exports ...
"CFBScraper",
"create_cfb_scraper",
]
```
### 6. CLI Registration (`cli.py`)
Add the sport to `get_scraper()`:
```python
def get_scraper(sport: str, season: int):
# ... existing sports ...
elif sport == "cfb":
from .scrapers.cfb import create_cfb_scraper
return create_cfb_scraper(season)
```
### 7. Alias Files (`team_aliases.json`, `stadium_aliases.json`)
Add initial aliases for common name variants:
```json
// team_aliases.json
{
"cfb": {
"Alabama": "team_cfb_ala",
"Bama": "team_cfb_ala",
"Roll Tide": "team_cfb_ala"
}
}
// stadium_aliases.json
{
"cfb": {
"Bryant Denny Stadium": "stadium_cfb_bryant_denny",
"Bryant-Denny": "stadium_cfb_bryant_denny"
}
}
```
### 8. Documentation (`SOURCES.md`)
Document data sources with URLs, rate limits, and notes:
```markdown
## CFB (College Football)
**Teams**: 134 (FBS)
**Expected Games**: ~900 per season
**Season**: August - January
### Sources
| Priority | Source | URL Pattern | Data Type |
|----------|--------|-------------|-----------|
| 1 | ESPN API | `site.api.espn.com/apis/site/v2/sports/football/college-football/scoreboard` | JSON |
| 2 | Sports-Reference | `sports-reference.com/cfb/years/{YEAR}-schedule.html` | HTML |
```
### 9. Tests (`tests/test_scrapers/test_cfb.py`)
Create tests for the new scraper:
```python
import pytest
from sportstime_parser.scrapers.cfb import CFBScraper, create_cfb_scraper
class TestCFBScraper:
def test_factory_creates_scraper(self):
scraper = create_cfb_scraper(season=2025)
assert scraper.sport == "cfb"
assert scraper.season == 2025
def test_get_sources_returns_priority_list(self):
scraper = CFBScraper(season=2025)
sources = scraper._get_sources()
assert "espn" in sources
# ... more tests ...
```
### Checklist
- [ ] Add to `SUPPORTED_SPORTS` in `config.py`
- [ ] Add to `EXPECTED_GAME_COUNTS` in `config.py`
- [ ] Add team mappings to `team_resolver.py`
- [ ] Add stadium mappings to `stadium_resolver.py`
- [ ] Create `scrapers/{sport}.py` with scraper class
- [ ] Export in `scrapers/__init__.py`
- [ ] Register in `cli.py` `get_scraper()`
- [ ] Add aliases to `team_aliases.json`
- [ ] Add aliases to `stadium_aliases.json`
- [ ] Document sources in `SOURCES.md`
- [ ] Create tests in `tests/test_scrapers/`
- [ ] Run `pytest` to verify all tests pass
- [ ] Run dry-run scrape: `sportstime-parser scrape {sport} --season 2025 --dry-run`
## Development
### Running Tests
```bash
# Run all tests
pytest
# Run with coverage
pytest --cov=sportstime_parser --cov-report=html
# Run specific test file
pytest tests/test_scrapers/test_nba.py
# Run with verbose output
pytest -v
```
### Project Structure
```
sportstime_parser/
__init__.py
__main__.py # CLI entry point
cli.py # Subcommand definitions
config.py # Constants, defaults
models/
game.py # Game dataclass
team.py # Team dataclass
stadium.py # Stadium dataclass
aliases.py # Alias dataclasses
scrapers/
base.py # BaseScraper abstract class
nba.py # NBA scrapers
mlb.py # MLB scrapers
nfl.py # NFL scrapers
nhl.py # NHL scrapers
mls.py # MLS scrapers
wnba.py # WNBA scrapers
nwsl.py # NWSL scrapers
normalizers/
canonical_id.py # ID generation
team_resolver.py # Team name resolution
stadium_resolver.py # Stadium name resolution
timezone.py # Timezone conversion
fuzzy.py # Fuzzy matching
validators/
report.py # Validation report generator
uploaders/
cloudkit.py # CloudKit Web Services client
state.py # Resumable upload state
diff.py # Record comparison
utils/
http.py # Rate-limited HTTP client
logging.py # Verbose logger
progress.py # Progress bars
```
## Troubleshooting
### "No games file found"
Run the scrape command first:
```bash
sportstime-parser scrape nba --season 2025
```
### "CloudKit not configured"
Set the required environment variables:
```bash
export CLOUDKIT_KEY_ID="your_key_id"
export CLOUDKIT_PRIVATE_KEY_PATH="/path/to/key.p8"
```
### Rate limit errors
The scraper includes automatic rate limiting and exponential backoff. If you encounter persistent rate limit errors:
1. Wait a few minutes before retrying
2. Try scraping one sport at a time instead of "all"
3. Check that you're not running multiple instances
### Scrape fails with no data
1. Check your internet connection
2. Run with `--verbose` to see detailed error messages
3. The scraper will try multiple sources - if all fail, the source websites may be temporarily unavailable
## License
MIT

View File

@@ -1,254 +0,0 @@
# Data Sources
This document lists all data sources used by the SportsTime parser, including URLs, rate limits, and data freshness expectations.
## Source Priority
Each sport has multiple sources configured in priority order. The scraper tries each source in order and uses the first one that succeeds. If a source fails (network error, parsing error, etc.), it falls back to the next source.
---
## NBA (National Basketball Association)
**Teams**: 30
**Expected Games**: ~1,230 per season
**Season**: October - June (spans two calendar years)
### Sources
| Priority | Source | URL Pattern | Data Type |
|----------|--------|-------------|-----------|
| 1 | Basketball-Reference | `basketball-reference.com/leagues/NBA_{YEAR}_games-{month}.html` | HTML |
| 2 | ESPN API | `site.api.espn.com/apis/site/v2/sports/basketball/nba/scoreboard` | JSON |
| 3 | CBS Sports | `cbssports.com/nba/schedule/` | HTML |
### Rate Limits
- **Basketball-Reference**: ~1 request/second recommended
- **ESPN API**: No published limit, use 1 request/second to be safe
- **CBS Sports**: ~1 request/second recommended
### Notes
- Basketball-Reference is the most reliable source with complete historical data
- ESPN API is good for current/future seasons
- Games organized by month on Basketball-Reference
---
## MLB (Major League Baseball)
**Teams**: 30
**Expected Games**: ~2,430 per season
**Season**: March/April - October/November (single calendar year)
### Sources
| Priority | Source | URL Pattern | Data Type |
|----------|--------|-------------|-----------|
| 1 | Baseball-Reference | `baseball-reference.com/leagues/majors/{YEAR}-schedule.shtml` | HTML |
| 2 | MLB Stats API | `statsapi.mlb.com/api/v1/schedule` | JSON |
| 3 | ESPN API | `site.api.espn.com/apis/site/v2/sports/baseball/mlb/scoreboard` | JSON |
### Rate Limits
- **Baseball-Reference**: ~1 request/second recommended
- **MLB Stats API**: No published limit, use 0.5 request/second
- **ESPN API**: ~1 request/second
### Notes
- MLB has doubleheaders; games are suffixed with `_1`, `_2`
- Single schedule page per season on Baseball-Reference
- MLB Stats API allows date range queries for efficiency
---
## NFL (National Football League)
**Teams**: 32
**Expected Games**: ~272 per season (regular season only)
**Season**: September - February (spans two calendar years)
### Sources
| Priority | Source | URL Pattern | Data Type |
|----------|--------|-------------|-----------|
| 1 | ESPN API | `site.api.espn.com/apis/site/v2/sports/football/nfl/scoreboard` | JSON |
| 2 | Pro-Football-Reference | `pro-football-reference.com/years/{YEAR}/games.htm` | HTML |
| 3 | CBS Sports | `cbssports.com/nfl/schedule/` | HTML |
### Rate Limits
- **ESPN API**: ~1 request/second
- **Pro-Football-Reference**: ~1 request/second
- **CBS Sports**: ~1 request/second
### Notes
- ESPN API uses week numbers instead of dates
- International games (London, Mexico City, Frankfurt, etc.) are filtered out
- Includes preseason, regular season, and playoffs
---
## NHL (National Hockey League)
**Teams**: 32 (including Utah Hockey Club)
**Expected Games**: ~1,312 per season
**Season**: October - June (spans two calendar years)
### Sources
| Priority | Source | URL Pattern | Data Type |
|----------|--------|-------------|-----------|
| 1 | Hockey-Reference | `hockey-reference.com/leagues/NHL_{YEAR}_games.html` | HTML |
| 2 | NHL API | `api-web.nhle.com/v1/schedule/{date}` | JSON |
| 3 | ESPN API | `site.api.espn.com/apis/site/v2/sports/hockey/nhl/scoreboard` | JSON |
### Rate Limits
- **Hockey-Reference**: ~1 request/second
- **NHL API**: No published limit, use 0.5 request/second
- **ESPN API**: ~1 request/second
### Notes
- International games (Prague, Stockholm, Helsinki, etc.) are filtered out
- Single schedule page per season on Hockey-Reference
---
## MLS (Major League Soccer)
**Teams**: 30 (including San Diego FC)
**Expected Games**: ~493 per season
**Season**: February/March - October/November (single calendar year)
### Sources
| Priority | Source | URL Pattern | Data Type |
|----------|--------|-------------|-----------|
| 1 | ESPN API | `site.api.espn.com/apis/site/v2/sports/soccer/usa.1/scoreboard` | JSON |
| 2 | FBref | `fbref.com/en/comps/22/{YEAR}/schedule/` | HTML |
### Rate Limits
- **ESPN API**: ~1 request/second
- **FBref**: ~1 request/second
### Notes
- MLS runs within a single calendar year
- Some teams share stadiums with NFL teams
---
## WNBA (Women's National Basketball Association)
**Teams**: 13 (including Golden State Valkyries)
**Expected Games**: ~220 per season
**Season**: May - October (single calendar year)
### Sources
| Priority | Source | URL Pattern | Data Type |
|----------|--------|-------------|-----------|
| 1 | ESPN API | `site.api.espn.com/apis/site/v2/sports/basketball/wnba/scoreboard` | JSON |
### Rate Limits
- **ESPN API**: ~1 request/second
### Notes
- Many WNBA teams share arenas with NBA teams
- Teams and stadiums are hardcoded (smaller league)
---
## NWSL (National Women's Soccer League)
**Teams**: 14
**Expected Games**: ~182 per season
**Season**: March - November (single calendar year)
### Sources
| Priority | Source | URL Pattern | Data Type |
|----------|--------|-------------|-----------|
| 1 | ESPN API | `site.api.espn.com/apis/site/v2/sports/soccer/usa.nwsl/scoreboard` | JSON |
### Rate Limits
- **ESPN API**: ~1 request/second
### Notes
- Many NWSL teams share stadiums with MLS teams
- Teams and stadiums are hardcoded (smaller league)
---
## Stadium Data Sources
Stadium coordinates and metadata come from multiple sources:
| Sport | Sources |
|-------|---------|
| MLB | MLBScoreBot GitHub, cageyjames GeoJSON, hardcoded |
| NFL | NFLScoreBot GitHub, brianhatchl GeoJSON, hardcoded |
| NBA | Hardcoded |
| NHL | Hardcoded |
| MLS | gavinr GeoJSON, hardcoded |
| WNBA | Hardcoded (shared with NBA) |
| NWSL | Hardcoded (shared with MLS) |
---
## General Guidelines
### Rate Limiting
All scrapers implement:
1. **Default delay**: 1 second between requests
2. **Auto-detection**: Detects HTTP 429 (Too Many Requests) responses
3. **Exponential backoff**: Starts at 1 second, doubles up to 3 retries
4. **Connection pooling**: Reuses HTTP connections for efficiency
### Error Handling
- **Partial data**: If a source fails mid-scrape, partial data is discarded
- **Source fallback**: Automatically tries the next source on failure
- **Logging**: All errors are logged for debugging
### Data Freshness
| Data Type | Freshness |
|-----------|-----------|
| Games (future) | Check weekly during season |
| Games (past) | Final scores available within hours |
| Teams | Update at start of each season |
| Stadiums | Update when venues change |
### Geographic Filter
Games at venues outside USA, Canada, and Mexico are automatically filtered out:
- **NFL**: London, Frankfurt, Munich, Mexico City, São Paulo
- **NHL**: Prague, Stockholm, Helsinki, Tampere, Gothenburg
---
## Legal Considerations
This tool is designed for personal/educational use. When using these sources:
1. Respect robots.txt files
2. Don't make excessive requests
3. Cache responses when possible
4. Check each source's Terms of Service
5. Consider that schedule data may be copyrighted
The ESPN API is undocumented but publicly accessible. Sports-Reference sites allow scraping but request reasonable rate limiting.

View File

@@ -1,8 +0,0 @@
"""SportsTime Parser - Sports data scraper and CloudKit uploader."""
__version__ = "0.1.0"
__author__ = "SportsTime Team"
from .cli import run_cli
__all__ = ["run_cli", "__version__"]

View File

@@ -1,14 +0,0 @@
"""Entry point for sportstime-parser CLI."""
import sys
from .cli import run_cli
def main() -> int:
"""Main entry point."""
return run_cli()
if __name__ == "__main__":
sys.exit(main())

File diff suppressed because it is too large Load Diff

View File

@@ -1,59 +0,0 @@
"""Configuration constants for sportstime-parser."""
from pathlib import Path
# Package paths
PACKAGE_DIR = Path(__file__).parent
SCRIPTS_DIR = PACKAGE_DIR.parent
OUTPUT_DIR = SCRIPTS_DIR / "output"
STATE_DIR = SCRIPTS_DIR / ".parser_state"
# Alias files (existing in Scripts/)
TEAM_ALIASES_FILE = SCRIPTS_DIR / "team_aliases.json"
STADIUM_ALIASES_FILE = SCRIPTS_DIR / "stadium_aliases.json"
LEAGUE_STRUCTURE_FILE = SCRIPTS_DIR / "league_structure.json"
# Supported sports
SUPPORTED_SPORTS: list[str] = [
"nba",
"mlb",
"nfl",
"nhl",
"mls",
"wnba",
"nwsl",
]
# Default season (start year of the season, e.g., 2025 for 2025-26)
DEFAULT_SEASON: int = 2025
# CloudKit configuration
CLOUDKIT_CONTAINER_ID: str = "iCloud.com.sportstime.app"
CLOUDKIT_ENVIRONMENT: str = "development"
CLOUDKIT_BATCH_SIZE: int = 200
CLOUDKIT_KEY_ID: str = "152be0715e0276e31aaea5cbfe79dc872f298861a55c70fae14e5fe3e026cff9"
CLOUDKIT_PRIVATE_KEY_PATH: Path = SCRIPTS_DIR / "eckey.pem"
# Rate limiting
DEFAULT_REQUEST_DELAY: float = 1.0 # seconds between requests
MAX_RETRIES: int = 3
BACKOFF_FACTOR: float = 2.0 # exponential backoff multiplier
INITIAL_BACKOFF: float = 1.0 # initial backoff in seconds
# Expected game counts per sport (approximate, for validation)
# Updated 2026-01-20 based on 2025-26 season data
EXPECTED_GAME_COUNTS: dict[str, int] = {
"nba": 1230, # 30 teams × 82 games / 2
"mlb": 2430, # 30 teams × 162 games / 2 (regular season only)
"nfl": 272, # 32 teams × 17 games / 2 (regular season only)
"nhl": 1312, # 32 teams × 82 games / 2
"mls": 540, # 30 teams × varies (updated for 2025 expansion)
"wnba": 286, # 13 teams × 44 games / 2 (updated for 2025 expansion)
"nwsl": 188, # 14→16 teams × varies (updated for 2025 expansion)
}
# Minimum match score for fuzzy matching (0-100)
FUZZY_MATCH_THRESHOLD: int = 80
# Geographic filter (only include games in these countries)
ALLOWED_COUNTRIES: set[str] = {"USA", "US", "United States", "Canada", "Mexico"}

View File

@@ -1,52 +0,0 @@
"""Data models for sportstime-parser."""
from .game import Game, save_games, load_games
from .team import Team, save_teams, load_teams
from .stadium import Stadium, save_stadiums, load_stadiums
from .aliases import (
AliasType,
ReviewReason,
TeamAlias,
StadiumAlias,
FuzzyMatch,
ManualReviewItem,
)
from .sport import (
Sport,
LeagueStructure,
LeagueStructureType,
save_sports,
load_sports,
save_league_structures,
load_league_structures,
)
__all__ = [
# Game
"Game",
"save_games",
"load_games",
# Team
"Team",
"save_teams",
"load_teams",
# Stadium
"Stadium",
"save_stadiums",
"load_stadiums",
# Aliases
"AliasType",
"ReviewReason",
"TeamAlias",
"StadiumAlias",
"FuzzyMatch",
"ManualReviewItem",
# Sport and League Structure
"Sport",
"LeagueStructure",
"LeagueStructureType",
"save_sports",
"load_sports",
"save_league_structures",
"load_league_structures",
]

View File

@@ -1,262 +0,0 @@
"""Alias and manual review data models for sportstime-parser."""
from dataclasses import dataclass, field
from datetime import date, datetime
from enum import Enum
from typing import Optional
import json
class AliasType(Enum):
"""Type of team alias."""
NAME = "name"
ABBREVIATION = "abbreviation"
CITY = "city"
class ReviewReason(Enum):
"""Reason an item requires manual review."""
UNRESOLVED_TEAM = "unresolved_team"
UNRESOLVED_STADIUM = "unresolved_stadium"
LOW_CONFIDENCE_MATCH = "low_confidence_match"
MISSING_DATA = "missing_data"
DUPLICATE_GAME = "duplicate_game"
TIMEZONE_UNKNOWN = "timezone_unknown"
GEOGRAPHIC_FILTER = "geographic_filter"
@dataclass
class TeamAlias:
"""Represents a team alias with optional date validity.
Attributes:
id: Unique alias ID
team_canonical_id: The canonical team ID this alias resolves to
alias_type: Type of alias (name, abbreviation, city)
alias_value: The alias value to match against
valid_from: Start date of alias validity (None = always valid)
valid_until: End date of alias validity (None = still valid)
"""
id: str
team_canonical_id: str
alias_type: AliasType
alias_value: str
valid_from: Optional[date] = None
valid_until: Optional[date] = None
def is_valid_on(self, check_date: date) -> bool:
"""Check if this alias is valid on the given date."""
if self.valid_from and check_date < self.valid_from:
return False
if self.valid_until and check_date > self.valid_until:
return False
return True
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization."""
return {
"id": self.id,
"team_canonical_id": self.team_canonical_id,
"alias_type": self.alias_type.value,
"alias_value": self.alias_value,
"valid_from": self.valid_from.isoformat() if self.valid_from else None,
"valid_until": self.valid_until.isoformat() if self.valid_until else None,
}
@classmethod
def from_dict(cls, data: dict) -> "TeamAlias":
"""Create a TeamAlias from a dictionary."""
valid_from = None
if data.get("valid_from"):
valid_from = date.fromisoformat(data["valid_from"])
valid_until = None
if data.get("valid_until"):
valid_until = date.fromisoformat(data["valid_until"])
return cls(
id=data["id"],
team_canonical_id=data["team_canonical_id"],
alias_type=AliasType(data["alias_type"]),
alias_value=data["alias_value"],
valid_from=valid_from,
valid_until=valid_until,
)
@dataclass
class StadiumAlias:
"""Represents a stadium alias with optional date validity.
Attributes:
alias_name: The alias name to match against (lowercase)
stadium_canonical_id: The canonical stadium ID this alias resolves to
valid_from: Start date of alias validity (None = always valid)
valid_until: End date of alias validity (None = still valid)
"""
alias_name: str
stadium_canonical_id: str
valid_from: Optional[date] = None
valid_until: Optional[date] = None
def is_valid_on(self, check_date: date) -> bool:
"""Check if this alias is valid on the given date."""
if self.valid_from and check_date < self.valid_from:
return False
if self.valid_until and check_date > self.valid_until:
return False
return True
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization."""
return {
"alias_name": self.alias_name,
"stadium_canonical_id": self.stadium_canonical_id,
"valid_from": self.valid_from.isoformat() if self.valid_from else None,
"valid_until": self.valid_until.isoformat() if self.valid_until else None,
}
@classmethod
def from_dict(cls, data: dict) -> "StadiumAlias":
"""Create a StadiumAlias from a dictionary."""
valid_from = None
if data.get("valid_from"):
valid_from = date.fromisoformat(data["valid_from"])
valid_until = None
if data.get("valid_until"):
valid_until = date.fromisoformat(data["valid_until"])
return cls(
alias_name=data["alias_name"],
stadium_canonical_id=data["stadium_canonical_id"],
valid_from=valid_from,
valid_until=valid_until,
)
@dataclass
class FuzzyMatch:
"""Represents a fuzzy match suggestion with confidence score."""
canonical_id: str
canonical_name: str
confidence: int # 0-100
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization."""
return {
"canonical_id": self.canonical_id,
"canonical_name": self.canonical_name,
"confidence": self.confidence,
}
@dataclass
class ManualReviewItem:
"""Represents an item requiring manual review.
Attributes:
id: Unique review item ID
reason: Why this item needs review
sport: Sport code
raw_value: The original unresolved value
context: Additional context about the issue
source_url: URL of the source page
suggested_matches: List of potential matches with confidence scores
game_date: Date of the game (if applicable)
created_at: When this review item was created
"""
id: str
reason: ReviewReason
sport: str
raw_value: str
context: dict = field(default_factory=dict)
source_url: Optional[str] = None
suggested_matches: list[FuzzyMatch] = field(default_factory=list)
game_date: Optional[date] = None
created_at: datetime = field(default_factory=datetime.now)
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization."""
return {
"id": self.id,
"reason": self.reason.value,
"sport": self.sport,
"raw_value": self.raw_value,
"context": self.context,
"source_url": self.source_url,
"suggested_matches": [m.to_dict() for m in self.suggested_matches],
"game_date": self.game_date.isoformat() if self.game_date else None,
"created_at": self.created_at.isoformat(),
}
@classmethod
def from_dict(cls, data: dict) -> "ManualReviewItem":
"""Create a ManualReviewItem from a dictionary."""
game_date = None
if data.get("game_date"):
game_date = date.fromisoformat(data["game_date"])
created_at = datetime.now()
if data.get("created_at"):
created_at = datetime.fromisoformat(data["created_at"])
suggested_matches = []
for match_data in data.get("suggested_matches", []):
suggested_matches.append(FuzzyMatch(
canonical_id=match_data["canonical_id"],
canonical_name=match_data["canonical_name"],
confidence=match_data["confidence"],
))
return cls(
id=data["id"],
reason=ReviewReason(data["reason"]),
sport=data["sport"],
raw_value=data["raw_value"],
context=data.get("context", {}),
source_url=data.get("source_url"),
suggested_matches=suggested_matches,
game_date=game_date,
created_at=created_at,
)
def to_markdown(self) -> str:
"""Generate markdown representation for validation report."""
lines = [
f"### {self.reason.value.replace('_', ' ').title()}: {self.raw_value}",
"",
f"**Sport**: {self.sport.upper()}",
]
if self.game_date:
lines.append(f"**Game Date**: {self.game_date.isoformat()}")
if self.context:
lines.append("")
lines.append("**Context**:")
for key, value in self.context.items():
lines.append(f"- {key}: {value}")
if self.suggested_matches:
lines.append("")
lines.append("**Suggested Matches**:")
for i, match in enumerate(self.suggested_matches, 1):
marker = " <- likely correct" if match.confidence >= 90 else ""
lines.append(
f"{i}. `{match.canonical_id}` ({match.confidence}%){marker}"
)
if self.source_url:
lines.append("")
lines.append(f"**Source**: [{self.source_url}]({self.source_url})")
lines.append("")
lines.append("---")
lines.append("")
return "\n".join(lines)

View File

@@ -1,183 +0,0 @@
"""Game data model for sportstime-parser."""
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
from zoneinfo import ZoneInfo
import json
@dataclass
class Game:
"""Represents a game with all CloudKit fields.
Attributes:
id: Canonical game ID (e.g., 'nba_2025_hou_okc_1021')
sport: Sport code (e.g., 'nba', 'mlb')
season: Season start year (e.g., 2025 for 2025-26)
home_team_id: Canonical home team ID
away_team_id: Canonical away team ID
stadium_id: Canonical stadium ID
game_date: Game date/time in UTC
game_number: Game number for doubleheaders (1 or 2), None for single games
home_score: Final home team score (None if not played)
away_score: Final away team score (None if not played)
status: Game status ('scheduled', 'final', 'postponed', 'cancelled')
source_url: URL of the source page for manual review
raw_home_team: Original home team name from source (for debugging)
raw_away_team: Original away team name from source (for debugging)
raw_stadium: Original stadium name from source (for debugging)
"""
id: str
sport: str
season: int
home_team_id: str
away_team_id: str
stadium_id: str
game_date: datetime
game_number: Optional[int] = None
home_score: Optional[int] = None
away_score: Optional[int] = None
status: str = "scheduled"
source_url: Optional[str] = None
raw_home_team: Optional[str] = None
raw_away_team: Optional[str] = None
raw_stadium: Optional[str] = None
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization."""
return {
"id": self.id,
"sport": self.sport,
"season": self.season,
"home_team_id": self.home_team_id,
"away_team_id": self.away_team_id,
"stadium_id": self.stadium_id,
"game_date": self.game_date.isoformat(),
"game_number": self.game_number,
"home_score": self.home_score,
"away_score": self.away_score,
"status": self.status,
"source_url": self.source_url,
"raw_home_team": self.raw_home_team,
"raw_away_team": self.raw_away_team,
"raw_stadium": self.raw_stadium,
}
def to_canonical_dict(
self,
stadium_timezone: str,
is_playoff: bool = False,
broadcast: Optional[str] = None,
) -> dict:
"""Convert to canonical dictionary format matching iOS app schema.
Args:
stadium_timezone: IANA timezone of the stadium (e.g., 'America/Chicago')
is_playoff: Whether this is a playoff game
broadcast: Broadcast network info (e.g., 'ESPN')
Returns:
Dictionary with field names matching JSONCanonicalGame in BootstrapService.swift
"""
# Convert game_date to UTC
if self.game_date.tzinfo is None:
# Localize naive datetime to stadium timezone first
local_tz = ZoneInfo(stadium_timezone)
local_dt = self.game_date.replace(tzinfo=local_tz)
else:
local_dt = self.game_date
utc_dt = local_dt.astimezone(ZoneInfo("UTC"))
# Format season as string (e.g., 2025 -> "2025-26" for NBA/NHL, "2025" for MLB)
sport_lower = self.sport.lower()
if sport_lower in ("nba", "nhl"):
season_str = f"{self.season}-{str(self.season + 1)[-2:]}"
else:
season_str = str(self.season)
return {
"canonical_id": self.id,
"sport": self.sport.upper(), # iOS Sport enum expects uppercase (e.g., "NFL")
"season": season_str,
"game_datetime_utc": utc_dt.strftime("%Y-%m-%dT%H:%M:%SZ"),
"home_team_canonical_id": self.home_team_id,
"away_team_canonical_id": self.away_team_id,
"stadium_canonical_id": self.stadium_id,
"is_playoff": is_playoff,
"broadcast_info": broadcast,
}
@classmethod
def from_dict(cls, data: dict) -> "Game":
"""Create a Game from a dictionary (internal format)."""
game_date = data["game_date"]
if isinstance(game_date, str):
game_date = datetime.fromisoformat(game_date)
return cls(
id=data["id"],
sport=data["sport"],
season=data["season"],
home_team_id=data["home_team_id"],
away_team_id=data["away_team_id"],
stadium_id=data["stadium_id"],
game_date=game_date,
game_number=data.get("game_number"),
home_score=data.get("home_score"),
away_score=data.get("away_score"),
status=data.get("status", "scheduled"),
source_url=data.get("source_url"),
raw_home_team=data.get("raw_home_team"),
raw_away_team=data.get("raw_away_team"),
raw_stadium=data.get("raw_stadium"),
)
@classmethod
def from_canonical_dict(cls, data: dict) -> "Game":
"""Create a Game from a canonical dictionary (iOS app format)."""
# Handle 'Z' suffix (fromisoformat doesn't support it before Python 3.11)
date_str = data["game_datetime_utc"].replace("Z", "+00:00")
game_date = datetime.fromisoformat(date_str)
# Parse season string (e.g., "2025-26" -> 2025, or "2025" -> 2025)
season_str = data["season"]
season = int(season_str.split("-")[0])
return cls(
id=data["canonical_id"],
sport=data["sport"],
season=season,
home_team_id=data["home_team_canonical_id"],
away_team_id=data["away_team_canonical_id"],
stadium_id=data["stadium_canonical_id"],
game_date=game_date,
status="scheduled",
)
def to_json(self) -> str:
"""Serialize to JSON string."""
return json.dumps(self.to_dict(), indent=2)
@classmethod
def from_json(cls, json_str: str) -> "Game":
"""Deserialize from JSON string."""
return cls.from_dict(json.loads(json_str))
def save_games(games: list[Game], filepath: str) -> None:
"""Save a list of games to a JSON file."""
with open(filepath, "w", encoding="utf-8") as f:
json.dump([g.to_dict() for g in games], f, indent=2)
def load_games(filepath: str) -> list[Game]:
"""Load a list of games from a JSON file (auto-detects format)."""
with open(filepath, "r", encoding="utf-8") as f:
data = json.load(f)
# Detect format: canonical has "canonical_id" and "game_datetime_utc", internal has "id"
if data and "canonical_id" in data[0] and "game_datetime_utc" in data[0]:
return [Game.from_canonical_dict(d) for d in data]
return [Game.from_dict(d) for d in data]

View File

@@ -1,157 +0,0 @@
"""Sport and LeagueStructure data models for sportstime-parser."""
from dataclasses import dataclass
from enum import Enum
from typing import Optional
import json
class LeagueStructureType(str, Enum):
"""Type of league structure element."""
CONFERENCE = "conference"
DIVISION = "division"
LEAGUE = "league"
@dataclass
class Sport:
"""Represents a sport with all CloudKit fields.
Attributes:
id: Canonical sport ID (e.g., 'MLB', 'NBA')
abbreviation: Sport abbreviation (e.g., 'MLB', 'NBA')
display_name: Full display name (e.g., 'Major League Baseball')
icon_name: SF Symbol name for the sport icon
color_hex: Primary color as hex string (e.g., '#FF0000')
season_start_month: Month number when season typically starts (1-12)
season_end_month: Month number when season typically ends (1-12)
is_active: Whether the sport is currently active/supported
"""
id: str
abbreviation: str
display_name: str
icon_name: str
color_hex: str
season_start_month: int
season_end_month: int
is_active: bool = True
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization."""
return {
"id": self.id,
"abbreviation": self.abbreviation,
"display_name": self.display_name,
"icon_name": self.icon_name,
"color_hex": self.color_hex,
"season_start_month": self.season_start_month,
"season_end_month": self.season_end_month,
"is_active": self.is_active,
}
@classmethod
def from_dict(cls, data: dict) -> "Sport":
"""Create a Sport from a dictionary."""
return cls(
id=data["id"],
abbreviation=data["abbreviation"],
display_name=data["display_name"],
icon_name=data["icon_name"],
color_hex=data["color_hex"],
season_start_month=data["season_start_month"],
season_end_month=data["season_end_month"],
is_active=data.get("is_active", True),
)
def to_json(self) -> str:
"""Serialize to JSON string."""
return json.dumps(self.to_dict(), indent=2)
@classmethod
def from_json(cls, json_str: str) -> "Sport":
"""Deserialize from JSON string."""
return cls.from_dict(json.loads(json_str))
@dataclass
class LeagueStructure:
"""Represents a league structure element (conference, division, etc.).
Attributes:
id: Unique ID (e.g., 'nba_eastern', 'mlb_al_east')
sport: Sport code (e.g., 'NBA', 'MLB')
structure_type: Type of structure (conference, division, league)
name: Full name (e.g., 'Eastern Conference', 'AL East')
abbreviation: Optional abbreviation (e.g., 'East', 'ALE')
parent_id: Parent structure ID (e.g., division's parent is conference)
display_order: Order for display (0-indexed)
"""
id: str
sport: str
structure_type: LeagueStructureType
name: str
abbreviation: Optional[str] = None
parent_id: Optional[str] = None
display_order: int = 0
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization."""
return {
"id": self.id,
"sport": self.sport,
"structure_type": self.structure_type.value,
"name": self.name,
"abbreviation": self.abbreviation,
"parent_id": self.parent_id,
"display_order": self.display_order,
}
@classmethod
def from_dict(cls, data: dict) -> "LeagueStructure":
"""Create a LeagueStructure from a dictionary."""
return cls(
id=data["id"],
sport=data["sport"],
structure_type=LeagueStructureType(data["structure_type"]),
name=data["name"],
abbreviation=data.get("abbreviation"),
parent_id=data.get("parent_id"),
display_order=data.get("display_order", 0),
)
def to_json(self) -> str:
"""Serialize to JSON string."""
return json.dumps(self.to_dict(), indent=2)
@classmethod
def from_json(cls, json_str: str) -> "LeagueStructure":
"""Deserialize from JSON string."""
return cls.from_dict(json.loads(json_str))
def save_sports(sports: list[Sport], filepath: str) -> None:
"""Save a list of sports to a JSON file."""
with open(filepath, "w", encoding="utf-8") as f:
json.dump([s.to_dict() for s in sports], f, indent=2)
def load_sports(filepath: str) -> list[Sport]:
"""Load a list of sports from a JSON file."""
with open(filepath, "r", encoding="utf-8") as f:
data = json.load(f)
return [Sport.from_dict(d) for d in data]
def save_league_structures(structures: list[LeagueStructure], filepath: str) -> None:
"""Save a list of league structures to a JSON file."""
with open(filepath, "w", encoding="utf-8") as f:
json.dump([s.to_dict() for s in structures], f, indent=2)
def load_league_structures(filepath: str) -> list[LeagueStructure]:
"""Load a list of league structures from a JSON file."""
with open(filepath, "r", encoding="utf-8") as f:
data = json.load(f)
return [LeagueStructure.from_dict(d) for d in data]

View File

@@ -1,154 +0,0 @@
"""Stadium data model for sportstime-parser."""
from dataclasses import dataclass
from typing import Optional
import json
@dataclass
class Stadium:
"""Represents a stadium with all CloudKit fields.
Attributes:
id: Canonical stadium ID (e.g., 'stadium_nba_paycom_center')
sport: Primary sport code (e.g., 'nba', 'mlb')
name: Current stadium name (e.g., 'Paycom Center')
city: City name (e.g., 'Oklahoma City')
state: State/province code (e.g., 'OK', 'ON')
country: Country code (e.g., 'USA', 'Canada')
latitude: Latitude coordinate
longitude: Longitude coordinate
capacity: Seating capacity
surface: Playing surface (e.g., 'grass', 'turf', 'hardwood')
roof_type: Roof type (e.g., 'dome', 'retractable', 'open')
opened_year: Year stadium opened
image_url: URL to stadium image
timezone: IANA timezone (e.g., 'America/Chicago')
"""
id: str
sport: str
name: str
city: str
state: str
country: str
latitude: float
longitude: float
capacity: Optional[int] = None
surface: Optional[str] = None
roof_type: Optional[str] = None
opened_year: Optional[int] = None
image_url: Optional[str] = None
timezone: Optional[str] = None
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization."""
return {
"id": self.id,
"sport": self.sport,
"name": self.name,
"city": self.city,
"state": self.state,
"country": self.country,
"latitude": self.latitude,
"longitude": self.longitude,
"capacity": self.capacity,
"surface": self.surface,
"roof_type": self.roof_type,
"opened_year": self.opened_year,
"image_url": self.image_url,
"timezone": self.timezone,
}
def to_canonical_dict(self, primary_team_abbrevs: list[str] | None = None) -> dict:
"""Convert to canonical dictionary format matching iOS app schema.
Args:
primary_team_abbrevs: List of team abbreviations that play at this stadium.
If None, defaults to empty list.
Returns:
Dictionary with field names matching JSONCanonicalStadium in BootstrapService.swift
"""
return {
"canonical_id": self.id,
"name": self.name,
"city": self.city,
"state": self.state,
"latitude": self.latitude,
"longitude": self.longitude,
"capacity": self.capacity if self.capacity is not None else 0,
"sport": self.sport.upper(), # iOS Sport enum expects uppercase (e.g., "NFL")
"primary_team_abbrevs": primary_team_abbrevs or [],
"year_opened": self.opened_year,
"timezone_identifier": self.timezone,
"image_url": self.image_url,
}
@classmethod
def from_dict(cls, data: dict) -> "Stadium":
"""Create a Stadium from a dictionary (internal format)."""
return cls(
id=data["id"],
sport=data["sport"],
name=data["name"],
city=data["city"],
state=data["state"],
country=data["country"],
latitude=data["latitude"],
longitude=data["longitude"],
capacity=data.get("capacity"),
surface=data.get("surface"),
roof_type=data.get("roof_type"),
opened_year=data.get("opened_year"),
image_url=data.get("image_url"),
timezone=data.get("timezone"),
)
@classmethod
def from_canonical_dict(cls, data: dict) -> "Stadium":
"""Create a Stadium from a canonical dictionary (iOS app format)."""
return cls(
id=data["canonical_id"],
sport=data["sport"],
name=data["name"],
city=data["city"],
state=data["state"],
country="USA", # Canonical format doesn't include country
latitude=data["latitude"],
longitude=data["longitude"],
capacity=data.get("capacity"),
opened_year=data.get("year_opened"),
image_url=data.get("image_url"),
timezone=data.get("timezone_identifier"),
)
def to_json(self) -> str:
"""Serialize to JSON string."""
return json.dumps(self.to_dict(), indent=2)
@classmethod
def from_json(cls, json_str: str) -> "Stadium":
"""Deserialize from JSON string."""
return cls.from_dict(json.loads(json_str))
def is_in_allowed_region(self) -> bool:
"""Check if stadium is in USA, Canada, or Mexico."""
allowed = {"USA", "US", "United States", "Canada", "CA", "Mexico", "MX"}
return self.country in allowed
def save_stadiums(stadiums: list[Stadium], filepath: str) -> None:
"""Save a list of stadiums to a JSON file."""
with open(filepath, "w", encoding="utf-8") as f:
json.dump([s.to_dict() for s in stadiums], f, indent=2)
def load_stadiums(filepath: str) -> list[Stadium]:
"""Load a list of stadiums from a JSON file (auto-detects format)."""
with open(filepath, "r", encoding="utf-8") as f:
data = json.load(f)
# Detect format: canonical has "canonical_id", internal has "id"
if data and "canonical_id" in data[0]:
return [Stadium.from_canonical_dict(d) for d in data]
return [Stadium.from_dict(d) for d in data]

View File

@@ -1,177 +0,0 @@
"""Team data model for sportstime-parser."""
from dataclasses import dataclass
from typing import Optional
import json
@dataclass
class Team:
"""Represents a team with all CloudKit fields.
Attributes:
id: Canonical team ID (e.g., 'team_nba_okc')
sport: Sport code (e.g., 'nba', 'mlb')
city: Team city (e.g., 'Oklahoma City')
name: Team name (e.g., 'Thunder')
full_name: Full team name (e.g., 'Oklahoma City Thunder')
abbreviation: Official abbreviation (e.g., 'OKC')
conference: Conference name (e.g., 'Western', 'American')
division: Division name (e.g., 'Northwest', 'AL West')
primary_color: Primary team color as hex (e.g., '#007AC1')
secondary_color: Secondary team color as hex (e.g., '#EF3B24')
logo_url: URL to team logo image
stadium_id: Canonical ID of home stadium
"""
id: str
sport: str
city: str
name: str
full_name: str
abbreviation: str
conference: Optional[str] = None
division: Optional[str] = None
primary_color: Optional[str] = None
secondary_color: Optional[str] = None
logo_url: Optional[str] = None
stadium_id: Optional[str] = None
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization."""
return {
"id": self.id,
"sport": self.sport,
"city": self.city,
"name": self.name,
"full_name": self.full_name,
"abbreviation": self.abbreviation,
"conference": self.conference,
"division": self.division,
"primary_color": self.primary_color,
"secondary_color": self.secondary_color,
"logo_url": self.logo_url,
"stadium_id": self.stadium_id,
}
def _make_qualified_id(self, name: Optional[str]) -> Optional[str]:
"""Convert a conference/division name to a qualified ID.
Examples:
"Eastern""nba_eastern"
"AL West""mlb_al_west"
"Southeast""nba_southeast"
"""
if not name:
return None
# Lowercase, replace spaces with underscores
normalized = name.lower().replace(" ", "_")
return f"{self.sport.lower()}_{normalized}"
def to_canonical_dict(self) -> dict:
"""Convert to canonical dictionary format matching iOS app schema.
Returns:
Dictionary with field names matching JSONCanonicalTeam in BootstrapService.swift
"""
return {
"canonical_id": self.id,
"name": self.name,
"abbreviation": self.abbreviation,
"sport": self.sport.upper(), # iOS Sport enum expects uppercase (e.g., "NFL")
"city": self.city,
"stadium_canonical_id": self.stadium_id or "",
"conference_id": self._make_qualified_id(self.conference),
"division_id": self._make_qualified_id(self.division),
"primary_color": self.primary_color,
"secondary_color": self.secondary_color,
}
@classmethod
def from_dict(cls, data: dict) -> "Team":
"""Create a Team from a dictionary (internal format)."""
return cls(
id=data["id"],
sport=data["sport"],
city=data["city"],
name=data["name"],
full_name=data["full_name"],
abbreviation=data["abbreviation"],
conference=data.get("conference"),
division=data.get("division"),
primary_color=data.get("primary_color"),
secondary_color=data.get("secondary_color"),
logo_url=data.get("logo_url"),
stadium_id=data.get("stadium_id"),
)
@staticmethod
def _extract_name_from_qualified_id(qualified_id: Optional[str], sport: str) -> Optional[str]:
"""Extract the name portion from a qualified ID.
Examples:
"nba_eastern""Eastern"
"mlb_al_west""AL West"
"nba_southeast""Southeast"
"""
if not qualified_id:
return None
# Remove sport prefix (e.g., "nba_" or "mlb_")
prefix = f"{sport.lower()}_"
if qualified_id.startswith(prefix):
name = qualified_id[len(prefix):]
else:
name = qualified_id
# Convert underscores to spaces and title case
# Special handling for league abbreviations (AL, NL, etc.)
parts = name.split("_")
result_parts = []
for part in parts:
if part.upper() in ("AL", "NL", "AFC", "NFC"):
result_parts.append(part.upper())
else:
result_parts.append(part.capitalize())
return " ".join(result_parts)
@classmethod
def from_canonical_dict(cls, data: dict) -> "Team":
"""Create a Team from a canonical dictionary (iOS app format)."""
sport = data["sport"].lower()
return cls(
id=data["canonical_id"],
sport=data["sport"],
city=data["city"],
name=data["name"],
full_name=f"{data['city']} {data['name']}", # Reconstruct full_name
abbreviation=data["abbreviation"],
conference=cls._extract_name_from_qualified_id(data.get("conference_id"), sport),
division=cls._extract_name_from_qualified_id(data.get("division_id"), sport),
primary_color=data.get("primary_color"),
secondary_color=data.get("secondary_color"),
stadium_id=data.get("stadium_canonical_id"),
)
def to_json(self) -> str:
"""Serialize to JSON string."""
return json.dumps(self.to_dict(), indent=2)
@classmethod
def from_json(cls, json_str: str) -> "Team":
"""Deserialize from JSON string."""
return cls.from_dict(json.loads(json_str))
def save_teams(teams: list[Team], filepath: str) -> None:
"""Save a list of teams to a JSON file."""
with open(filepath, "w", encoding="utf-8") as f:
json.dump([t.to_dict() for t in teams], f, indent=2)
def load_teams(filepath: str) -> list[Team]:
"""Load a list of teams from a JSON file (auto-detects format)."""
with open(filepath, "r", encoding="utf-8") as f:
data = json.load(f)
# Detect format: canonical has "canonical_id", internal has "id"
if data and "canonical_id" in data[0]:
return [Team.from_canonical_dict(d) for d in data]
return [Team.from_dict(d) for d in data]

View File

@@ -1,91 +0,0 @@
"""Normalizers for team, stadium, and game data."""
from .canonical_id import (
generate_game_id,
generate_team_id,
generate_team_id_from_abbrev,
generate_stadium_id,
parse_game_id,
normalize_string,
)
from .timezone import (
TimezoneResult,
parse_datetime,
convert_to_utc,
detect_timezone_from_string,
detect_timezone_from_location,
get_stadium_timezone,
create_timezone_warning,
)
from .fuzzy import (
MatchCandidate,
fuzzy_match_team,
fuzzy_match_stadium,
exact_match,
best_match,
calculate_similarity,
normalize_for_matching,
)
from .alias_loader import (
TeamAliasLoader,
StadiumAliasLoader,
get_team_alias_loader,
get_stadium_alias_loader,
resolve_team_alias,
resolve_stadium_alias,
)
from .team_resolver import (
TeamResolver,
TeamResolveResult,
get_team_resolver,
resolve_team,
)
from .stadium_resolver import (
StadiumResolver,
StadiumResolveResult,
get_stadium_resolver,
resolve_stadium,
)
__all__ = [
# Canonical ID
"generate_game_id",
"generate_team_id",
"generate_team_id_from_abbrev",
"generate_stadium_id",
"parse_game_id",
"normalize_string",
# Timezone
"TimezoneResult",
"parse_datetime",
"convert_to_utc",
"detect_timezone_from_string",
"detect_timezone_from_location",
"get_stadium_timezone",
"create_timezone_warning",
# Fuzzy matching
"MatchCandidate",
"fuzzy_match_team",
"fuzzy_match_stadium",
"exact_match",
"best_match",
"calculate_similarity",
"normalize_for_matching",
# Alias loaders
"TeamAliasLoader",
"StadiumAliasLoader",
"get_team_alias_loader",
"get_stadium_alias_loader",
"resolve_team_alias",
"resolve_stadium_alias",
# Team resolver
"TeamResolver",
"TeamResolveResult",
"get_team_resolver",
"resolve_team",
# Stadium resolver
"StadiumResolver",
"StadiumResolveResult",
"get_stadium_resolver",
"resolve_stadium",
]

View File

@@ -1,312 +0,0 @@
"""Alias file loaders for team and stadium name resolution."""
import json
from datetime import date
from pathlib import Path
from typing import Optional
from ..config import TEAM_ALIASES_FILE, STADIUM_ALIASES_FILE
from ..models.aliases import TeamAlias, StadiumAlias, AliasType
class TeamAliasLoader:
"""Loader for team aliases with date-aware resolution.
Loads team aliases from JSON and provides lookup methods
with support for historical name changes.
"""
def __init__(self, filepath: Optional[Path] = None):
"""Initialize the loader.
Args:
filepath: Path to team_aliases.json, defaults to config value
"""
self.filepath = filepath or TEAM_ALIASES_FILE
self._aliases: list[TeamAlias] = []
self._by_value: dict[str, list[TeamAlias]] = {}
self._by_team: dict[str, list[TeamAlias]] = {}
self._loaded = False
def load(self) -> None:
"""Load aliases from the JSON file."""
if not self.filepath.exists():
self._loaded = True
return
with open(self.filepath, "r", encoding="utf-8") as f:
data = json.load(f)
self._aliases = []
self._by_value = {}
self._by_team = {}
for item in data:
alias = TeamAlias.from_dict(item)
self._aliases.append(alias)
# Index by lowercase value
value_key = alias.alias_value.lower()
if value_key not in self._by_value:
self._by_value[value_key] = []
self._by_value[value_key].append(alias)
# Index by team ID
if alias.team_canonical_id not in self._by_team:
self._by_team[alias.team_canonical_id] = []
self._by_team[alias.team_canonical_id].append(alias)
self._loaded = True
def _ensure_loaded(self) -> None:
"""Ensure aliases are loaded."""
if not self._loaded:
self.load()
def resolve(
self,
value: str,
check_date: Optional[date] = None,
alias_types: Optional[list[AliasType]] = None,
) -> Optional[str]:
"""Resolve an alias value to a canonical team ID.
Args:
value: Alias value to look up (case-insensitive)
check_date: Date to check validity (None = current date)
alias_types: Types of aliases to check (None = all types)
Returns:
Canonical team ID if found, None otherwise
"""
self._ensure_loaded()
if check_date is None:
check_date = date.today()
value_key = value.lower().strip()
aliases = self._by_value.get(value_key, [])
for alias in aliases:
# Check type filter
if alias_types and alias.alias_type not in alias_types:
continue
# Check date validity
if alias.is_valid_on(check_date):
return alias.team_canonical_id
return None
def get_aliases_for_team(
self,
team_id: str,
check_date: Optional[date] = None,
) -> list[TeamAlias]:
"""Get all aliases for a team.
Args:
team_id: Canonical team ID
check_date: Date to filter by (None = all aliases)
Returns:
List of TeamAlias objects
"""
self._ensure_loaded()
aliases = self._by_team.get(team_id, [])
if check_date:
aliases = [a for a in aliases if a.is_valid_on(check_date)]
return aliases
def get_all_values(
self,
alias_type: Optional[AliasType] = None,
) -> list[str]:
"""Get all alias values.
Args:
alias_type: Filter by alias type (None = all types)
Returns:
List of alias values
"""
self._ensure_loaded()
values = []
for alias in self._aliases:
if alias_type is None or alias.alias_type == alias_type:
values.append(alias.alias_value)
return values
class StadiumAliasLoader:
"""Loader for stadium aliases with date-aware resolution.
Loads stadium aliases from JSON and provides lookup methods
with support for historical name changes (e.g., naming rights).
"""
def __init__(self, filepath: Optional[Path] = None):
"""Initialize the loader.
Args:
filepath: Path to stadium_aliases.json, defaults to config value
"""
self.filepath = filepath or STADIUM_ALIASES_FILE
self._aliases: list[StadiumAlias] = []
self._by_name: dict[str, list[StadiumAlias]] = {}
self._by_stadium: dict[str, list[StadiumAlias]] = {}
self._loaded = False
def load(self) -> None:
"""Load aliases from the JSON file."""
if not self.filepath.exists():
self._loaded = True
return
with open(self.filepath, "r", encoding="utf-8") as f:
data = json.load(f)
self._aliases = []
self._by_name = {}
self._by_stadium = {}
for item in data:
alias = StadiumAlias.from_dict(item)
self._aliases.append(alias)
# Index by lowercase name
name_key = alias.alias_name.lower()
if name_key not in self._by_name:
self._by_name[name_key] = []
self._by_name[name_key].append(alias)
# Index by stadium ID
if alias.stadium_canonical_id not in self._by_stadium:
self._by_stadium[alias.stadium_canonical_id] = []
self._by_stadium[alias.stadium_canonical_id].append(alias)
self._loaded = True
def _ensure_loaded(self) -> None:
"""Ensure aliases are loaded."""
if not self._loaded:
self.load()
def resolve(
self,
name: str,
check_date: Optional[date] = None,
) -> Optional[str]:
"""Resolve a stadium name to a canonical stadium ID.
Args:
name: Stadium name to look up (case-insensitive)
check_date: Date to check validity (None = current date)
Returns:
Canonical stadium ID if found, None otherwise
"""
self._ensure_loaded()
if check_date is None:
check_date = date.today()
name_key = name.lower().strip()
aliases = self._by_name.get(name_key, [])
for alias in aliases:
if alias.is_valid_on(check_date):
return alias.stadium_canonical_id
return None
def get_aliases_for_stadium(
self,
stadium_id: str,
check_date: Optional[date] = None,
) -> list[StadiumAlias]:
"""Get all aliases for a stadium.
Args:
stadium_id: Canonical stadium ID
check_date: Date to filter by (None = all aliases)
Returns:
List of StadiumAlias objects
"""
self._ensure_loaded()
aliases = self._by_stadium.get(stadium_id, [])
if check_date:
aliases = [a for a in aliases if a.is_valid_on(check_date)]
return aliases
def get_all_names(self) -> list[str]:
"""Get all stadium alias names.
Returns:
List of stadium names
"""
self._ensure_loaded()
return [alias.alias_name for alias in self._aliases]
# Global loader instances (lazy initialized)
_team_alias_loader: Optional[TeamAliasLoader] = None
_stadium_alias_loader: Optional[StadiumAliasLoader] = None
def get_team_alias_loader() -> TeamAliasLoader:
"""Get the global team alias loader instance."""
global _team_alias_loader
if _team_alias_loader is None:
_team_alias_loader = TeamAliasLoader()
return _team_alias_loader
def get_stadium_alias_loader() -> StadiumAliasLoader:
"""Get the global stadium alias loader instance."""
global _stadium_alias_loader
if _stadium_alias_loader is None:
_stadium_alias_loader = StadiumAliasLoader()
return _stadium_alias_loader
def resolve_team_alias(
value: str,
check_date: Optional[date] = None,
) -> Optional[str]:
"""Convenience function to resolve a team alias.
Args:
value: Alias value (name, abbreviation, or city)
check_date: Date to check validity
Returns:
Canonical team ID if found
"""
return get_team_alias_loader().resolve(value, check_date)
def resolve_stadium_alias(
name: str,
check_date: Optional[date] = None,
) -> Optional[str]:
"""Convenience function to resolve a stadium alias.
Args:
name: Stadium name
check_date: Date to check validity
Returns:
Canonical stadium ID if found
"""
return get_stadium_alias_loader().resolve(name, check_date)

View File

@@ -1,284 +0,0 @@
"""Canonical ID generation for games, teams, and stadiums."""
import re
import unicodedata
from datetime import date, datetime
from typing import Optional
def normalize_string(s: str) -> str:
"""Normalize a string for use in canonical IDs.
- Convert to lowercase
- Replace spaces and hyphens with underscores
- Remove special characters (except underscores)
- Collapse multiple underscores
- Strip leading/trailing underscores
Args:
s: String to normalize
Returns:
Normalized string suitable for IDs
"""
# Convert to lowercase
result = s.lower()
# Normalize unicode (e.g., é -> e)
result = unicodedata.normalize("NFKD", result)
result = result.encode("ascii", "ignore").decode("ascii")
# Replace spaces and hyphens with underscores
result = re.sub(r"[\s\-]+", "_", result)
# Remove special characters except underscores
result = re.sub(r"[^a-z0-9_]", "", result)
# Collapse multiple underscores
result = re.sub(r"_+", "_", result)
# Strip leading/trailing underscores
result = result.strip("_")
return result
def generate_game_id(
sport: str,
season: int,
away_abbrev: str,
home_abbrev: str,
game_date: date | datetime,
game_number: Optional[int] = None,
) -> str:
"""Generate a canonical game ID.
Format: game_{sport}_{season}_{YYYYMMDD}_{away}_{home}[_{game_number}]
Args:
sport: Sport code (e.g., 'nba', 'mlb')
season: Season start year (e.g., 2025 for 2025-26)
away_abbrev: Away team abbreviation (e.g., 'HOU')
home_abbrev: Home team abbreviation (e.g., 'OKC')
game_date: Date of the game
game_number: Game number for doubleheaders (1 or 2), None for single games
Returns:
Canonical game ID (e.g., 'game_nba_2025_20251021_hou_okc')
Examples:
>>> generate_game_id('nba', 2025, 'HOU', 'OKC', date(2025, 10, 21))
'game_nba_2025_20251021_hou_okc'
>>> generate_game_id('mlb', 2026, 'NYY', 'BOS', date(2026, 4, 1), game_number=1)
'game_mlb_2026_20260401_nyy_bos_1'
"""
# Normalize sport and abbreviations
sport_norm = sport.lower()
away_norm = away_abbrev.lower()
home_norm = home_abbrev.lower()
# Format date as YYYYMMDD
if isinstance(game_date, datetime):
game_date = game_date.date()
date_str = game_date.strftime("%Y%m%d")
# Build ID with game_ prefix
parts = ["game", sport_norm, str(season), date_str, away_norm, home_norm]
# Add game number for doubleheaders
if game_number is not None:
parts.append(str(game_number))
return "_".join(parts)
def generate_team_id(sport: str, city: str, name: str) -> str:
"""Generate a canonical team ID.
Format: team_{sport}_{abbreviation}
For most teams, we use the standard abbreviation. This function generates
a fallback ID based on city and name for teams without a known abbreviation.
Args:
sport: Sport code (e.g., 'nba', 'mlb')
city: Team city (e.g., 'Los Angeles')
name: Team name (e.g., 'Lakers')
Returns:
Canonical team ID (e.g., 'team_nba_la_lakers')
Examples:
>>> generate_team_id('nba', 'Los Angeles', 'Lakers')
'team_nba_la_lakers'
>>> generate_team_id('mlb', 'New York', 'Yankees')
'team_mlb_new_york_yankees'
"""
sport_norm = sport.lower()
city_norm = normalize_string(city)
name_norm = normalize_string(name)
return f"team_{sport_norm}_{city_norm}_{name_norm}"
def generate_team_id_from_abbrev(sport: str, abbreviation: str) -> str:
"""Generate a canonical team ID from abbreviation.
Format: team_{sport}_{abbreviation}
Args:
sport: Sport code (e.g., 'nba', 'mlb')
abbreviation: Team abbreviation (e.g., 'LAL', 'NYY')
Returns:
Canonical team ID (e.g., 'team_nba_lal')
Examples:
>>> generate_team_id_from_abbrev('nba', 'LAL')
'team_nba_lal'
>>> generate_team_id_from_abbrev('mlb', 'NYY')
'team_mlb_nyy'
"""
sport_norm = sport.lower()
abbrev_norm = abbreviation.lower()
return f"team_{sport_norm}_{abbrev_norm}"
def generate_stadium_id(sport: str, name: str) -> str:
"""Generate a canonical stadium ID.
Format: stadium_{sport}_{normalized_name}
Args:
sport: Sport code (e.g., 'nba', 'mlb')
name: Stadium name (e.g., 'Yankee Stadium')
Returns:
Canonical stadium ID (e.g., 'stadium_mlb_yankee_stadium')
Examples:
>>> generate_stadium_id('nba', 'Crypto.com Arena')
'stadium_nba_cryptocom_arena'
>>> generate_stadium_id('mlb', 'Yankee Stadium')
'stadium_mlb_yankee_stadium'
"""
sport_norm = sport.lower()
name_norm = normalize_string(name)
return f"stadium_{sport_norm}_{name_norm}"
def parse_game_id(game_id: str) -> dict:
"""Parse a canonical game ID into its components.
Args:
game_id: Canonical game ID (e.g., 'game_nba_2025_20251021_hou_okc')
Returns:
Dictionary with keys: sport, season, away_abbrev, home_abbrev,
year, month, day, game_number (optional)
Raises:
ValueError: If game_id format is invalid
Examples:
>>> parse_game_id('game_nba_2025_20251021_hou_okc')
{'sport': 'nba', 'season': 2025, 'away_abbrev': 'hou',
'home_abbrev': 'okc', 'year': 2025, 'month': 10, 'day': 21, 'game_number': None}
>>> parse_game_id('game_mlb_2026_20260401_nyy_bos_1')
{'sport': 'mlb', 'season': 2026, 'away_abbrev': 'nyy',
'home_abbrev': 'bos', 'year': 2026, 'month': 4, 'day': 1, 'game_number': 1}
"""
parts = game_id.split("_")
if len(parts) < 6 or len(parts) > 7:
raise ValueError(f"Invalid game ID format: {game_id}")
if parts[0] != "game":
raise ValueError(f"Game ID must start with 'game_': {game_id}")
sport = parts[1]
season = int(parts[2])
date_str = parts[3]
away_abbrev = parts[4]
home_abbrev = parts[5]
if len(date_str) != 8:
raise ValueError(f"Invalid date format in game ID: {game_id}")
year = int(date_str[:4])
month = int(date_str[4:6])
day = int(date_str[6:])
game_number = None
if len(parts) == 7:
game_number = int(parts[6])
return {
"sport": sport,
"season": season,
"away_abbrev": away_abbrev,
"home_abbrev": home_abbrev,
"year": year,
"month": month,
"day": day,
"game_number": game_number,
}
def parse_team_id(team_id: str) -> dict:
"""Parse a canonical team ID into its components.
Args:
team_id: Canonical team ID (e.g., 'team_nba_lal')
Returns:
Dictionary with keys: sport, identifier (abbreviation or city_name)
Raises:
ValueError: If team_id format is invalid
"""
if not team_id.startswith("team_"):
raise ValueError(f"Invalid team ID format: {team_id}")
parts = team_id.split("_", 2)
if len(parts) < 3:
raise ValueError(f"Invalid team ID format: {team_id}")
return {
"sport": parts[1],
"identifier": parts[2],
}
def parse_stadium_id(stadium_id: str) -> dict:
"""Parse a canonical stadium ID into its components.
Args:
stadium_id: Canonical stadium ID (e.g., 'stadium_nba_paycom_center')
Returns:
Dictionary with keys: sport, name
Raises:
ValueError: If stadium_id format is invalid
"""
if not stadium_id.startswith("stadium_"):
raise ValueError(f"Invalid stadium ID format: {stadium_id}")
parts = stadium_id.split("_", 2)
if len(parts) < 3:
raise ValueError(f"Invalid stadium ID format: {stadium_id}")
return {
"sport": parts[1],
"name": parts[2],
}

View File

@@ -1,272 +0,0 @@
"""Fuzzy string matching utilities for team and stadium name resolution."""
from dataclasses import dataclass
from typing import Optional
from rapidfuzz import fuzz, process
from rapidfuzz.utils import default_process
from ..config import FUZZY_MATCH_THRESHOLD
from ..models.aliases import FuzzyMatch
@dataclass
class MatchCandidate:
"""A candidate for fuzzy matching.
Attributes:
canonical_id: The canonical ID of this candidate
name: The display name for this candidate
aliases: List of alternative names to match against
"""
canonical_id: str
name: str
aliases: list[str]
def normalize_for_matching(s: str) -> str:
"""Normalize a string for fuzzy matching.
- Convert to lowercase
- Remove common prefixes/suffixes
- Collapse whitespace
Args:
s: String to normalize
Returns:
Normalized string
"""
result = s.lower().strip()
# Remove common prefixes
prefixes = ["the ", "team ", "stadium "]
for prefix in prefixes:
if result.startswith(prefix):
result = result[len(prefix) :]
# Remove common suffixes
suffixes = [" stadium", " arena", " center", " field", " park"]
for suffix in suffixes:
if result.endswith(suffix):
result = result[: -len(suffix)]
return result.strip()
def fuzzy_match_team(
query: str,
candidates: list[MatchCandidate],
threshold: int = FUZZY_MATCH_THRESHOLD,
top_n: int = 3,
) -> list[FuzzyMatch]:
"""Find fuzzy matches for a team name.
Uses multiple matching strategies:
1. Token set ratio (handles word order differences)
2. Partial ratio (handles substring matches)
3. Standard ratio (overall similarity)
Args:
query: Team name to match
candidates: List of candidate teams to match against
threshold: Minimum score to consider a match (0-100)
top_n: Maximum number of matches to return
Returns:
List of FuzzyMatch objects sorted by confidence (descending)
"""
query_norm = normalize_for_matching(query)
# Build list of all matchable strings with their canonical IDs
match_strings: list[tuple[str, str, str]] = [] # (string, canonical_id, name)
for candidate in candidates:
# Add primary name
match_strings.append(
(normalize_for_matching(candidate.name), candidate.canonical_id, candidate.name)
)
# Add aliases
for alias in candidate.aliases:
match_strings.append(
(normalize_for_matching(alias), candidate.canonical_id, candidate.name)
)
# Score all candidates
scored: dict[str, tuple[int, str]] = {} # canonical_id -> (best_score, name)
for match_str, canonical_id, name in match_strings:
# Use multiple scoring methods
token_score = fuzz.token_set_ratio(query_norm, match_str)
partial_score = fuzz.partial_ratio(query_norm, match_str)
ratio_score = fuzz.ratio(query_norm, match_str)
# Weighted average favoring token_set_ratio for team names
score = int(0.5 * token_score + 0.3 * partial_score + 0.2 * ratio_score)
# Keep best score for each canonical ID
if canonical_id not in scored or score > scored[canonical_id][0]:
scored[canonical_id] = (score, name)
# Filter by threshold and sort
matches = [
FuzzyMatch(canonical_id=cid, canonical_name=name, confidence=score)
for cid, (score, name) in scored.items()
if score >= threshold
]
# Sort by confidence descending
matches.sort(key=lambda m: m.confidence, reverse=True)
return matches[:top_n]
def fuzzy_match_stadium(
query: str,
candidates: list[MatchCandidate],
threshold: int = FUZZY_MATCH_THRESHOLD,
top_n: int = 3,
) -> list[FuzzyMatch]:
"""Find fuzzy matches for a stadium name.
Uses matching strategies optimized for stadium names:
1. Token sort ratio (handles "X Stadium" vs "Stadium X")
2. Partial ratio (handles naming rights changes)
3. Standard ratio
Args:
query: Stadium name to match
candidates: List of candidate stadiums to match against
threshold: Minimum score to consider a match (0-100)
top_n: Maximum number of matches to return
Returns:
List of FuzzyMatch objects sorted by confidence (descending)
"""
query_norm = normalize_for_matching(query)
# Build list of all matchable strings
match_strings: list[tuple[str, str, str]] = []
for candidate in candidates:
match_strings.append(
(normalize_for_matching(candidate.name), candidate.canonical_id, candidate.name)
)
for alias in candidate.aliases:
match_strings.append(
(normalize_for_matching(alias), candidate.canonical_id, candidate.name)
)
# Score all candidates
scored: dict[str, tuple[int, str]] = {}
for match_str, canonical_id, name in match_strings:
# Use scoring methods suited for stadium names
token_sort_score = fuzz.token_sort_ratio(query_norm, match_str)
partial_score = fuzz.partial_ratio(query_norm, match_str)
ratio_score = fuzz.ratio(query_norm, match_str)
# Weighted average
score = int(0.4 * token_sort_score + 0.4 * partial_score + 0.2 * ratio_score)
if canonical_id not in scored or score > scored[canonical_id][0]:
scored[canonical_id] = (score, name)
# Filter and sort
matches = [
FuzzyMatch(canonical_id=cid, canonical_name=name, confidence=score)
for cid, (score, name) in scored.items()
if score >= threshold
]
matches.sort(key=lambda m: m.confidence, reverse=True)
return matches[:top_n]
def exact_match(
query: str,
candidates: list[MatchCandidate],
case_sensitive: bool = False,
) -> Optional[str]:
"""Find an exact match for a string.
Args:
query: String to match
candidates: List of candidates to match against
case_sensitive: Whether to use case-sensitive matching
Returns:
Canonical ID if exact match found, None otherwise
"""
if case_sensitive:
query_norm = query.strip()
else:
query_norm = query.lower().strip()
for candidate in candidates:
# Check primary name
name = candidate.name if case_sensitive else candidate.name.lower()
if query_norm == name.strip():
return candidate.canonical_id
# Check aliases
for alias in candidate.aliases:
alias_norm = alias if case_sensitive else alias.lower()
if query_norm == alias_norm.strip():
return candidate.canonical_id
return None
def best_match(
query: str,
candidates: list[MatchCandidate],
threshold: int = FUZZY_MATCH_THRESHOLD,
) -> Optional[FuzzyMatch]:
"""Find the best match for a query string.
First tries exact match, then falls back to fuzzy matching.
Args:
query: String to match
candidates: List of candidates
threshold: Minimum fuzzy match score
Returns:
Best FuzzyMatch or None if no match above threshold
"""
# Try exact match first
exact = exact_match(query, candidates)
if exact:
# Find the name for this ID
for c in candidates:
if c.canonical_id == exact:
return FuzzyMatch(
canonical_id=exact,
canonical_name=c.name,
confidence=100,
)
# Fall back to fuzzy matching
# Use team matching by default (works for both)
matches = fuzzy_match_team(query, candidates, threshold=threshold, top_n=1)
return matches[0] if matches else None
def calculate_similarity(s1: str, s2: str) -> int:
"""Calculate similarity between two strings.
Args:
s1: First string
s2: Second string
Returns:
Similarity score 0-100
"""
s1_norm = normalize_for_matching(s1)
s2_norm = normalize_for_matching(s2)
return fuzz.token_set_ratio(s1_norm, s2_norm)

View File

@@ -1,535 +0,0 @@
"""Stadium name resolver with exact, alias, and fuzzy matching."""
from dataclasses import dataclass
from datetime import date
from typing import Optional
from uuid import uuid4
from ..config import FUZZY_MATCH_THRESHOLD, ALLOWED_COUNTRIES
from ..models.aliases import FuzzyMatch, ManualReviewItem, ReviewReason
from .alias_loader import get_stadium_alias_loader, StadiumAliasLoader
from .fuzzy import MatchCandidate, fuzzy_match_stadium
@dataclass
class StadiumResolveResult:
"""Result of stadium resolution.
Attributes:
canonical_id: Resolved canonical stadium ID (None if unresolved)
confidence: Confidence in the match (100 for exact, lower for fuzzy)
match_type: How the match was made ('exact', 'alias', 'fuzzy', 'unresolved')
filtered_reason: Reason if stadium was filtered out (e.g., 'geographic')
review_item: ManualReviewItem if resolution failed or low confidence
"""
canonical_id: Optional[str]
confidence: int
match_type: str
filtered_reason: Optional[str] = None
review_item: Optional[ManualReviewItem] = None
@dataclass
class StadiumInfo:
"""Stadium information for matching."""
canonical_id: str
name: str
city: str
state: str
country: str
sport: str
latitude: float
longitude: float
timezone: str = "America/New_York" # IANA timezone identifier
# Hardcoded stadium mappings
# Format: {sport: {canonical_id: StadiumInfo}}
STADIUM_MAPPINGS: dict[str, dict[str, StadiumInfo]] = {
"nba": {
"stadium_nba_state_farm_arena": StadiumInfo("stadium_nba_state_farm_arena", "State Farm Arena", "Atlanta", "GA", "USA", "nba", 33.7573, -84.3963),
"stadium_nba_td_garden": StadiumInfo("stadium_nba_td_garden", "TD Garden", "Boston", "MA", "USA", "nba", 42.3662, -71.0621),
"stadium_nba_barclays_center": StadiumInfo("stadium_nba_barclays_center", "Barclays Center", "Brooklyn", "NY", "USA", "nba", 40.6826, -73.9754),
"stadium_nba_spectrum_center": StadiumInfo("stadium_nba_spectrum_center", "Spectrum Center", "Charlotte", "NC", "USA", "nba", 35.2251, -80.8392),
"stadium_nba_united_center": StadiumInfo("stadium_nba_united_center", "United Center", "Chicago", "IL", "USA", "nba", 41.8807, -87.6742, "America/Chicago"),
"stadium_nba_rocket_mortgage_fieldhouse": StadiumInfo("stadium_nba_rocket_mortgage_fieldhouse", "Rocket Mortgage FieldHouse", "Cleveland", "OH", "USA", "nba", 41.4965, -81.6882),
"stadium_nba_american_airlines_center": StadiumInfo("stadium_nba_american_airlines_center", "American Airlines Center", "Dallas", "TX", "USA", "nba", 32.7905, -96.8103, "America/Chicago"),
"stadium_nba_ball_arena": StadiumInfo("stadium_nba_ball_arena", "Ball Arena", "Denver", "CO", "USA", "nba", 39.7487, -105.0077, "America/Denver"),
"stadium_nba_little_caesars_arena": StadiumInfo("stadium_nba_little_caesars_arena", "Little Caesars Arena", "Detroit", "MI", "USA", "nba", 42.3411, -83.0553),
"stadium_nba_chase_center": StadiumInfo("stadium_nba_chase_center", "Chase Center", "San Francisco", "CA", "USA", "nba", 37.7680, -122.3877, "America/Los_Angeles"),
"stadium_nba_toyota_center": StadiumInfo("stadium_nba_toyota_center", "Toyota Center", "Houston", "TX", "USA", "nba", 29.7508, -95.3621, "America/Chicago"),
"stadium_nba_gainbridge_fieldhouse": StadiumInfo("stadium_nba_gainbridge_fieldhouse", "Gainbridge Fieldhouse", "Indianapolis", "IN", "USA", "nba", 39.7640, -86.1555),
"stadium_nba_intuit_dome": StadiumInfo("stadium_nba_intuit_dome", "Intuit Dome", "Inglewood", "CA", "USA", "nba", 33.9425, -118.3417, "America/Los_Angeles"),
"stadium_nba_cryptocom_arena": StadiumInfo("stadium_nba_cryptocom_arena", "Crypto.com Arena", "Los Angeles", "CA", "USA", "nba", 34.0430, -118.2673, "America/Los_Angeles"),
"stadium_nba_fedexforum": StadiumInfo("stadium_nba_fedexforum", "FedExForum", "Memphis", "TN", "USA", "nba", 35.1383, -90.0505, "America/Chicago"),
"stadium_nba_kaseya_center": StadiumInfo("stadium_nba_kaseya_center", "Kaseya Center", "Miami", "FL", "USA", "nba", 25.7814, -80.1870),
"stadium_nba_fiserv_forum": StadiumInfo("stadium_nba_fiserv_forum", "Fiserv Forum", "Milwaukee", "WI", "USA", "nba", 43.0451, -87.9172, "America/Chicago"),
"stadium_nba_target_center": StadiumInfo("stadium_nba_target_center", "Target Center", "Minneapolis", "MN", "USA", "nba", 44.9795, -93.2761, "America/Chicago"),
"stadium_nba_smoothie_king_center": StadiumInfo("stadium_nba_smoothie_king_center", "Smoothie King Center", "New Orleans", "LA", "USA", "nba", 29.9490, -90.0821, "America/Chicago"),
"stadium_nba_madison_square_garden": StadiumInfo("stadium_nba_madison_square_garden", "Madison Square Garden", "New York", "NY", "USA", "nba", 40.7505, -73.9934),
"stadium_nba_paycom_center": StadiumInfo("stadium_nba_paycom_center", "Paycom Center", "Oklahoma City", "OK", "USA", "nba", 35.4634, -97.5151, "America/Chicago"),
"stadium_nba_kia_center": StadiumInfo("stadium_nba_kia_center", "Kia Center", "Orlando", "FL", "USA", "nba", 28.5392, -81.3839),
"stadium_nba_wells_fargo_center": StadiumInfo("stadium_nba_wells_fargo_center", "Wells Fargo Center", "Philadelphia", "PA", "USA", "nba", 39.9012, -75.1720),
"stadium_nba_footprint_center": StadiumInfo("stadium_nba_footprint_center", "Footprint Center", "Phoenix", "AZ", "USA", "nba", 33.4457, -112.0712, "America/Phoenix"),
"stadium_nba_moda_center": StadiumInfo("stadium_nba_moda_center", "Moda Center", "Portland", "OR", "USA", "nba", 45.5316, -122.6668, "America/Los_Angeles"),
"stadium_nba_golden_1_center": StadiumInfo("stadium_nba_golden_1_center", "Golden 1 Center", "Sacramento", "CA", "USA", "nba", 38.5802, -121.4997, "America/Los_Angeles"),
"stadium_nba_frost_bank_center": StadiumInfo("stadium_nba_frost_bank_center", "Frost Bank Center", "San Antonio", "TX", "USA", "nba", 29.4270, -98.4375, "America/Chicago"),
"stadium_nba_scotiabank_arena": StadiumInfo("stadium_nba_scotiabank_arena", "Scotiabank Arena", "Toronto", "ON", "Canada", "nba", 43.6435, -79.3791, "America/Toronto"),
"stadium_nba_delta_center": StadiumInfo("stadium_nba_delta_center", "Delta Center", "Salt Lake City", "UT", "USA", "nba", 40.7683, -111.9011, "America/Denver"),
"stadium_nba_capital_one_arena": StadiumInfo("stadium_nba_capital_one_arena", "Capital One Arena", "Washington", "DC", "USA", "nba", 38.8981, -77.0209),
# International venues
"stadium_nba_mexico_city_arena": StadiumInfo("stadium_nba_mexico_city_arena", "Mexico City Arena", "Mexico City", "CDMX", "Mexico", "nba", 19.4042, -99.0970, "America/Mexico_City"),
},
"mlb": {
"stadium_mlb_chase_field": StadiumInfo("stadium_mlb_chase_field", "Chase Field", "Phoenix", "AZ", "USA", "mlb", 33.4455, -112.0667, "America/Phoenix"),
"stadium_mlb_truist_park": StadiumInfo("stadium_mlb_truist_park", "Truist Park", "Atlanta", "GA", "USA", "mlb", 33.8908, -84.4678),
"stadium_mlb_oriole_park_at_camden_yards": StadiumInfo("stadium_mlb_oriole_park_at_camden_yards", "Oriole Park at Camden Yards", "Baltimore", "MD", "USA", "mlb", 39.2839, -76.6217),
"stadium_mlb_fenway_park": StadiumInfo("stadium_mlb_fenway_park", "Fenway Park", "Boston", "MA", "USA", "mlb", 42.3467, -71.0972),
"stadium_mlb_wrigley_field": StadiumInfo("stadium_mlb_wrigley_field", "Wrigley Field", "Chicago", "IL", "USA", "mlb", 41.9484, -87.6553, "America/Chicago"),
"stadium_mlb_guaranteed_rate_field": StadiumInfo("stadium_mlb_guaranteed_rate_field", "Guaranteed Rate Field", "Chicago", "IL", "USA", "mlb", 41.8299, -87.6338, "America/Chicago"),
"stadium_mlb_great_american_ball_park": StadiumInfo("stadium_mlb_great_american_ball_park", "Great American Ball Park", "Cincinnati", "OH", "USA", "mlb", 39.0974, -84.5082),
"stadium_mlb_progressive_field": StadiumInfo("stadium_mlb_progressive_field", "Progressive Field", "Cleveland", "OH", "USA", "mlb", 41.4962, -81.6852),
"stadium_mlb_coors_field": StadiumInfo("stadium_mlb_coors_field", "Coors Field", "Denver", "CO", "USA", "mlb", 39.7559, -104.9942, "America/Denver"),
"stadium_mlb_comerica_park": StadiumInfo("stadium_mlb_comerica_park", "Comerica Park", "Detroit", "MI", "USA", "mlb", 42.3390, -83.0485),
"stadium_mlb_minute_maid_park": StadiumInfo("stadium_mlb_minute_maid_park", "Minute Maid Park", "Houston", "TX", "USA", "mlb", 29.7573, -95.3555, "America/Chicago"),
"stadium_mlb_kauffman_stadium": StadiumInfo("stadium_mlb_kauffman_stadium", "Kauffman Stadium", "Kansas City", "MO", "USA", "mlb", 39.0517, -94.4803, "America/Chicago"),
"stadium_mlb_angel_stadium": StadiumInfo("stadium_mlb_angel_stadium", "Angel Stadium", "Anaheim", "CA", "USA", "mlb", 33.8003, -117.8827, "America/Los_Angeles"),
"stadium_mlb_dodger_stadium": StadiumInfo("stadium_mlb_dodger_stadium", "Dodger Stadium", "Los Angeles", "CA", "USA", "mlb", 34.0739, -118.2400, "America/Los_Angeles"),
"stadium_mlb_loandepot_park": StadiumInfo("stadium_mlb_loandepot_park", "loanDepot park", "Miami", "FL", "USA", "mlb", 25.7781, -80.2195),
"stadium_mlb_american_family_field": StadiumInfo("stadium_mlb_american_family_field", "American Family Field", "Milwaukee", "WI", "USA", "mlb", 43.0280, -87.9712, "America/Chicago"),
"stadium_mlb_target_field": StadiumInfo("stadium_mlb_target_field", "Target Field", "Minneapolis", "MN", "USA", "mlb", 44.9818, -93.2775, "America/Chicago"),
"stadium_mlb_citi_field": StadiumInfo("stadium_mlb_citi_field", "Citi Field", "New York", "NY", "USA", "mlb", 40.7571, -73.8458),
"stadium_mlb_yankee_stadium": StadiumInfo("stadium_mlb_yankee_stadium", "Yankee Stadium", "Bronx", "NY", "USA", "mlb", 40.8296, -73.9262),
"stadium_mlb_sutter_health_park": StadiumInfo("stadium_mlb_sutter_health_park", "Sutter Health Park", "Sacramento", "CA", "USA", "mlb", 38.5803, -121.5005, "America/Los_Angeles"),
"stadium_mlb_citizens_bank_park": StadiumInfo("stadium_mlb_citizens_bank_park", "Citizens Bank Park", "Philadelphia", "PA", "USA", "mlb", 39.9061, -75.1665),
"stadium_mlb_pnc_park": StadiumInfo("stadium_mlb_pnc_park", "PNC Park", "Pittsburgh", "PA", "USA", "mlb", 40.4469, -80.0057),
"stadium_mlb_petco_park": StadiumInfo("stadium_mlb_petco_park", "Petco Park", "San Diego", "CA", "USA", "mlb", 32.7076, -117.1570, "America/Los_Angeles"),
"stadium_mlb_oracle_park": StadiumInfo("stadium_mlb_oracle_park", "Oracle Park", "San Francisco", "CA", "USA", "mlb", 37.7786, -122.3893, "America/Los_Angeles"),
"stadium_mlb_tmobile_park": StadiumInfo("stadium_mlb_tmobile_park", "T-Mobile Park", "Seattle", "WA", "USA", "mlb", 47.5914, -122.3325, "America/Los_Angeles"),
"stadium_mlb_busch_stadium": StadiumInfo("stadium_mlb_busch_stadium", "Busch Stadium", "St. Louis", "MO", "USA", "mlb", 38.6226, -90.1928, "America/Chicago"),
"stadium_mlb_tropicana_field": StadiumInfo("stadium_mlb_tropicana_field", "Tropicana Field", "St. Petersburg", "FL", "USA", "mlb", 27.7682, -82.6534),
"stadium_mlb_globe_life_field": StadiumInfo("stadium_mlb_globe_life_field", "Globe Life Field", "Arlington", "TX", "USA", "mlb", 32.7473, -97.0845, "America/Chicago"),
"stadium_mlb_rogers_centre": StadiumInfo("stadium_mlb_rogers_centre", "Rogers Centre", "Toronto", "ON", "Canada", "mlb", 43.6414, -79.3894, "America/Toronto"),
"stadium_mlb_nationals_park": StadiumInfo("stadium_mlb_nationals_park", "Nationals Park", "Washington", "DC", "USA", "mlb", 38.8730, -77.0074),
# Spring Training - Cactus League (Arizona)
"stadium_mlb_spring_salt_river_fields": StadiumInfo("stadium_mlb_spring_salt_river_fields", "Salt River Fields at Talking Stick", "Scottsdale", "AZ", "USA", "mlb", 33.5412, -111.8847, "America/Phoenix"),
"stadium_mlb_spring_sloan_park": StadiumInfo("stadium_mlb_spring_sloan_park", "Sloan Park", "Mesa", "AZ", "USA", "mlb", 33.4312, -111.8821, "America/Phoenix"),
"stadium_mlb_spring_hohokam_stadium": StadiumInfo("stadium_mlb_spring_hohokam_stadium", "Hohokam Stadium", "Mesa", "AZ", "USA", "mlb", 33.4385, -111.8295, "America/Phoenix"),
"stadium_mlb_spring_camelback_ranch": StadiumInfo("stadium_mlb_spring_camelback_ranch", "Camelback Ranch", "Glendale", "AZ", "USA", "mlb", 33.509, -112.272, "America/Phoenix"),
"stadium_mlb_spring_goodyear_ballpark": StadiumInfo("stadium_mlb_spring_goodyear_ballpark", "Goodyear Ballpark", "Goodyear", "AZ", "USA", "mlb", 33.4286, -112.3908, "America/Phoenix"),
"stadium_mlb_spring_tempe_diablo_stadium": StadiumInfo("stadium_mlb_spring_tempe_diablo_stadium", "Tempe Diablo Stadium", "Tempe", "AZ", "USA", "mlb", 33.4003, -111.9685, "America/Phoenix"),
"stadium_mlb_spring_scottsdale_stadium": StadiumInfo("stadium_mlb_spring_scottsdale_stadium", "Scottsdale Stadium", "Scottsdale", "AZ", "USA", "mlb", 33.4881, -111.9210, "America/Phoenix"),
"stadium_mlb_spring_american_family_fields": StadiumInfo("stadium_mlb_spring_american_family_fields", "American Family Fields of Phoenix", "Phoenix", "AZ", "USA", "mlb", 33.4916, -112.1733, "America/Phoenix"),
"stadium_mlb_spring_peoria_sports_complex": StadiumInfo("stadium_mlb_spring_peoria_sports_complex", "Peoria Sports Complex", "Peoria", "AZ", "USA", "mlb", 33.6224, -112.2274, "America/Phoenix"),
"stadium_mlb_spring_surprise_stadium": StadiumInfo("stadium_mlb_spring_surprise_stadium", "Surprise Stadium", "Surprise", "AZ", "USA", "mlb", 33.6306, -112.3332, "America/Phoenix"),
# Spring Training - Grapefruit League (Florida)
"stadium_mlb_spring_jetblue_park": StadiumInfo("stadium_mlb_spring_jetblue_park", "JetBlue Park", "Fort Myers", "FL", "USA", "mlb", 26.5511, -81.7620),
"stadium_mlb_spring_roger_dean_stadium": StadiumInfo("stadium_mlb_spring_roger_dean_stadium", "Roger Dean Chevrolet Stadium", "Jupiter", "FL", "USA", "mlb", 26.8910, -80.1166),
"stadium_mlb_spring_ed_smith_stadium": StadiumInfo("stadium_mlb_spring_ed_smith_stadium", "Ed Smith Stadium", "Sarasota", "FL", "USA", "mlb", 27.3482, -82.5176),
"stadium_mlb_spring_steinbrenner_field": StadiumInfo("stadium_mlb_spring_steinbrenner_field", "George M. Steinbrenner Field", "Tampa", "FL", "USA", "mlb", 27.9748, -82.5040),
"stadium_mlb_spring_td_ballpark": StadiumInfo("stadium_mlb_spring_td_ballpark", "TD Ballpark", "Dunedin", "FL", "USA", "mlb", 28.0039, -82.7867),
"stadium_mlb_spring_cooltoday_park": StadiumInfo("stadium_mlb_spring_cooltoday_park", "CoolToday Park", "North Port", "FL", "USA", "mlb", 27.0219, -82.2358),
"stadium_mlb_spring_hammond_stadium": StadiumInfo("stadium_mlb_spring_hammond_stadium", "Hammond Stadium", "Fort Myers", "FL", "USA", "mlb", 26.5363, -81.8385),
"stadium_mlb_spring_clover_park": StadiumInfo("stadium_mlb_spring_clover_park", "Clover Park", "Port St. Lucie", "FL", "USA", "mlb", 27.2900, -80.4100),
"stadium_mlb_spring_baycare_ballpark": StadiumInfo("stadium_mlb_spring_baycare_ballpark", "BayCare Ballpark", "Clearwater", "FL", "USA", "mlb", 27.9697, -82.7257),
"stadium_mlb_spring_lecom_park": StadiumInfo("stadium_mlb_spring_lecom_park", "LECOM Park", "Bradenton", "FL", "USA", "mlb", 27.4939, -82.5753),
"stadium_mlb_spring_charlotte_sports_park": StadiumInfo("stadium_mlb_spring_charlotte_sports_park", "Charlotte Sports Park", "Port Charlotte", "FL", "USA", "mlb", 26.9992, -82.1817),
"stadium_mlb_spring_cacti_park": StadiumInfo("stadium_mlb_spring_cacti_park", "CACTI Park of the Palm Beaches", "West Palm Beach", "FL", "USA", "mlb", 26.7697, -80.1014),
"stadium_mlb_spring_joker_marchant": StadiumInfo("stadium_mlb_spring_joker_marchant", "Publix Field at Joker Marchant Stadium", "Lakeland", "FL", "USA", "mlb", 28.0655, -81.9545),
# Special venues
"stadium_mlb_las_vegas_ballpark": StadiumInfo("stadium_mlb_las_vegas_ballpark", "Las Vegas Ballpark", "Las Vegas", "NV", "USA", "mlb", 36.0925, -115.1775, "America/Los_Angeles"),
"stadium_mlb_mexico_alfredo_harp_helu": StadiumInfo("stadium_mlb_mexico_alfredo_harp_helu", "Estadio Alfredo Harp Helu", "Mexico City", "CDMX", "Mexico", "mlb", 19.3825, -99.0928, "America/Mexico_City"),
"stadium_mlb_field_of_dreams": StadiumInfo("stadium_mlb_field_of_dreams", "Field of Dreams", "Dyersville", "IA", "USA", "mlb", 42.4671, -91.1095, "America/Chicago"),
"stadium_mlb_journey_bank_ballpark": StadiumInfo("stadium_mlb_journey_bank_ballpark", "Journey Bank Ballpark", "Williamsport", "PA", "USA", "mlb", 41.2415, -77.0011),
},
"nfl": {
"stadium_nfl_state_farm_stadium": StadiumInfo("stadium_nfl_state_farm_stadium", "State Farm Stadium", "Glendale", "AZ", "USA", "nfl", 33.5276, -112.2626, "America/Phoenix"),
"stadium_nfl_mercedes_benz_stadium": StadiumInfo("stadium_nfl_mercedes_benz_stadium", "Mercedes-Benz Stadium", "Atlanta", "GA", "USA", "nfl", 33.7553, -84.4006),
"stadium_nfl_mandt_bank_stadium": StadiumInfo("stadium_nfl_mandt_bank_stadium", "M&T Bank Stadium", "Baltimore", "MD", "USA", "nfl", 39.2780, -76.6227),
"stadium_nfl_highmark_stadium": StadiumInfo("stadium_nfl_highmark_stadium", "Highmark Stadium", "Orchard Park", "NY", "USA", "nfl", 42.7738, -78.7870),
"stadium_nfl_bank_of_america_stadium": StadiumInfo("stadium_nfl_bank_of_america_stadium", "Bank of America Stadium", "Charlotte", "NC", "USA", "nfl", 35.2258, -80.8528),
"stadium_nfl_soldier_field": StadiumInfo("stadium_nfl_soldier_field", "Soldier Field", "Chicago", "IL", "USA", "nfl", 41.8623, -87.6167, "America/Chicago"),
"stadium_nfl_paycor_stadium": StadiumInfo("stadium_nfl_paycor_stadium", "Paycor Stadium", "Cincinnati", "OH", "USA", "nfl", 39.0955, -84.5161),
"stadium_nfl_huntington_bank_field": StadiumInfo("stadium_nfl_huntington_bank_field", "Huntington Bank Field", "Cleveland", "OH", "USA", "nfl", 41.5061, -81.6995),
"stadium_nfl_att_stadium": StadiumInfo("stadium_nfl_att_stadium", "AT&T Stadium", "Arlington", "TX", "USA", "nfl", 32.7473, -97.0945, "America/Chicago"),
"stadium_nfl_empower_field": StadiumInfo("stadium_nfl_empower_field", "Empower Field at Mile High", "Denver", "CO", "USA", "nfl", 39.7439, -105.0201, "America/Denver"),
"stadium_nfl_ford_field": StadiumInfo("stadium_nfl_ford_field", "Ford Field", "Detroit", "MI", "USA", "nfl", 42.3400, -83.0456),
"stadium_nfl_lambeau_field": StadiumInfo("stadium_nfl_lambeau_field", "Lambeau Field", "Green Bay", "WI", "USA", "nfl", 44.5013, -88.0622, "America/Chicago"),
"stadium_nfl_nrg_stadium": StadiumInfo("stadium_nfl_nrg_stadium", "NRG Stadium", "Houston", "TX", "USA", "nfl", 29.6847, -95.4107, "America/Chicago"),
"stadium_nfl_lucas_oil_stadium": StadiumInfo("stadium_nfl_lucas_oil_stadium", "Lucas Oil Stadium", "Indianapolis", "IN", "USA", "nfl", 39.7601, -86.1639),
"stadium_nfl_everbank_stadium": StadiumInfo("stadium_nfl_everbank_stadium", "EverBank Stadium", "Jacksonville", "FL", "USA", "nfl", 30.3239, -81.6373),
"stadium_nfl_arrowhead_stadium": StadiumInfo("stadium_nfl_arrowhead_stadium", "Arrowhead Stadium", "Kansas City", "MO", "USA", "nfl", 39.0489, -94.4839, "America/Chicago"),
"stadium_nfl_allegiant_stadium": StadiumInfo("stadium_nfl_allegiant_stadium", "Allegiant Stadium", "Las Vegas", "NV", "USA", "nfl", 36.0909, -115.1833, "America/Los_Angeles"),
"stadium_nfl_sofi_stadium": StadiumInfo("stadium_nfl_sofi_stadium", "SoFi Stadium", "Inglewood", "CA", "USA", "nfl", 33.9534, -118.3386, "America/Los_Angeles"),
"stadium_nfl_hard_rock_stadium": StadiumInfo("stadium_nfl_hard_rock_stadium", "Hard Rock Stadium", "Miami Gardens", "FL", "USA", "nfl", 25.9580, -80.2389),
"stadium_nfl_us_bank_stadium": StadiumInfo("stadium_nfl_us_bank_stadium", "U.S. Bank Stadium", "Minneapolis", "MN", "USA", "nfl", 44.9737, -93.2575, "America/Chicago"),
"stadium_nfl_gillette_stadium": StadiumInfo("stadium_nfl_gillette_stadium", "Gillette Stadium", "Foxborough", "MA", "USA", "nfl", 42.0909, -71.2643),
"stadium_nfl_caesars_superdome": StadiumInfo("stadium_nfl_caesars_superdome", "Caesars Superdome", "New Orleans", "LA", "USA", "nfl", 29.9511, -90.0812, "America/Chicago"),
"stadium_nfl_metlife_stadium": StadiumInfo("stadium_nfl_metlife_stadium", "MetLife Stadium", "East Rutherford", "NJ", "USA", "nfl", 40.8128, -74.0742),
"stadium_nfl_lincoln_financial_field": StadiumInfo("stadium_nfl_lincoln_financial_field", "Lincoln Financial Field", "Philadelphia", "PA", "USA", "nfl", 39.9008, -75.1675),
"stadium_nfl_acrisure_stadium": StadiumInfo("stadium_nfl_acrisure_stadium", "Acrisure Stadium", "Pittsburgh", "PA", "USA", "nfl", 40.4468, -80.0158),
"stadium_nfl_levis_stadium": StadiumInfo("stadium_nfl_levis_stadium", "Levi's Stadium", "Santa Clara", "CA", "USA", "nfl", 37.4033, -121.9695, "America/Los_Angeles"),
"stadium_nfl_lumen_field": StadiumInfo("stadium_nfl_lumen_field", "Lumen Field", "Seattle", "WA", "USA", "nfl", 47.5952, -122.3316, "America/Los_Angeles"),
"stadium_nfl_raymond_james_stadium": StadiumInfo("stadium_nfl_raymond_james_stadium", "Raymond James Stadium", "Tampa", "FL", "USA", "nfl", 27.9759, -82.5033),
"stadium_nfl_nissan_stadium": StadiumInfo("stadium_nfl_nissan_stadium", "Nissan Stadium", "Nashville", "TN", "USA", "nfl", 36.1665, -86.7713, "America/Chicago"),
"stadium_nfl_northwest_stadium": StadiumInfo("stadium_nfl_northwest_stadium", "Northwest Stadium", "Landover", "MD", "USA", "nfl", 38.9076, -76.8645),
# Special and international venues
"stadium_nfl_tom_benson_hall_of_fame_stadium": StadiumInfo("stadium_nfl_tom_benson_hall_of_fame_stadium", "Tom Benson Hall of Fame Stadium", "Canton", "OH", "USA", "nfl", 40.8209, -81.3985),
"stadium_nfl_corinthians_arena": StadiumInfo("stadium_nfl_corinthians_arena", "Corinthians Arena", "São Paulo", "SP", "Brazil", "nfl", -23.5453, -46.4742, "America/Sao_Paulo"),
"stadium_nfl_croke_park": StadiumInfo("stadium_nfl_croke_park", "Croke Park", "Dublin", "", "Ireland", "nfl", 53.3609, -6.2514, "Europe/Dublin"),
"stadium_nfl_olympic_stadium_berlin": StadiumInfo("stadium_nfl_olympic_stadium_berlin", "Olympic Stadium Berlin", "Berlin", "", "Germany", "nfl", 52.5147, 13.2395, "Europe/Berlin"),
"stadium_nfl_santiago_bernabeu": StadiumInfo("stadium_nfl_santiago_bernabeu", "Santiago Bernabéu", "Madrid", "", "Spain", "nfl", 40.4531, -3.6883, "Europe/Madrid"),
},
"nhl": {
"stadium_nhl_honda_center": StadiumInfo("stadium_nhl_honda_center", "Honda Center", "Anaheim", "CA", "USA", "nhl", 33.8078, -117.8765, "America/Los_Angeles"),
"stadium_nhl_delta_center": StadiumInfo("stadium_nhl_delta_center", "Delta Center", "Salt Lake City", "UT", "USA", "nhl", 40.7683, -111.9011, "America/Denver"),
"stadium_nhl_td_garden": StadiumInfo("stadium_nhl_td_garden", "TD Garden", "Boston", "MA", "USA", "nhl", 42.3662, -71.0621),
"stadium_nhl_keybank_center": StadiumInfo("stadium_nhl_keybank_center", "KeyBank Center", "Buffalo", "NY", "USA", "nhl", 42.8750, -78.8764),
"stadium_nhl_scotiabank_saddledome": StadiumInfo("stadium_nhl_scotiabank_saddledome", "Scotiabank Saddledome", "Calgary", "AB", "Canada", "nhl", 51.0374, -114.0519, "America/Edmonton"),
"stadium_nhl_pnc_arena": StadiumInfo("stadium_nhl_pnc_arena", "PNC Arena", "Raleigh", "NC", "USA", "nhl", 35.8033, -78.7220),
"stadium_nhl_united_center": StadiumInfo("stadium_nhl_united_center", "United Center", "Chicago", "IL", "USA", "nhl", 41.8807, -87.6742, "America/Chicago"),
"stadium_nhl_ball_arena": StadiumInfo("stadium_nhl_ball_arena", "Ball Arena", "Denver", "CO", "USA", "nhl", 39.7487, -105.0077, "America/Denver"),
"stadium_nhl_nationwide_arena": StadiumInfo("stadium_nhl_nationwide_arena", "Nationwide Arena", "Columbus", "OH", "USA", "nhl", 39.9692, -83.0061),
"stadium_nhl_american_airlines_center": StadiumInfo("stadium_nhl_american_airlines_center", "American Airlines Center", "Dallas", "TX", "USA", "nhl", 32.7905, -96.8103, "America/Chicago"),
"stadium_nhl_little_caesars_arena": StadiumInfo("stadium_nhl_little_caesars_arena", "Little Caesars Arena", "Detroit", "MI", "USA", "nhl", 42.3411, -83.0553),
"stadium_nhl_rogers_place": StadiumInfo("stadium_nhl_rogers_place", "Rogers Place", "Edmonton", "AB", "Canada", "nhl", 53.5469, -113.4979, "America/Edmonton"),
"stadium_nhl_amerant_bank_arena": StadiumInfo("stadium_nhl_amerant_bank_arena", "Amerant Bank Arena", "Sunrise", "FL", "USA", "nhl", 26.1584, -80.3256),
"stadium_nhl_cryptocom_arena": StadiumInfo("stadium_nhl_cryptocom_arena", "Crypto.com Arena", "Los Angeles", "CA", "USA", "nhl", 34.0430, -118.2673, "America/Los_Angeles"),
"stadium_nhl_xcel_energy_center": StadiumInfo("stadium_nhl_xcel_energy_center", "Xcel Energy Center", "St. Paul", "MN", "USA", "nhl", 44.9448, -93.1010, "America/Chicago"),
"stadium_nhl_bell_centre": StadiumInfo("stadium_nhl_bell_centre", "Bell Centre", "Montreal", "QC", "Canada", "nhl", 45.4961, -73.5693, "America/Toronto"),
"stadium_nhl_bridgestone_arena": StadiumInfo("stadium_nhl_bridgestone_arena", "Bridgestone Arena", "Nashville", "TN", "USA", "nhl", 36.1592, -86.7785, "America/Chicago"),
"stadium_nhl_prudential_center": StadiumInfo("stadium_nhl_prudential_center", "Prudential Center", "Newark", "NJ", "USA", "nhl", 40.7334, -74.1712),
"stadium_nhl_ubs_arena": StadiumInfo("stadium_nhl_ubs_arena", "UBS Arena", "Elmont", "NY", "USA", "nhl", 40.7170, -73.7255),
"stadium_nhl_madison_square_garden": StadiumInfo("stadium_nhl_madison_square_garden", "Madison Square Garden", "New York", "NY", "USA", "nhl", 40.7505, -73.9934),
"stadium_nhl_canadian_tire_centre": StadiumInfo("stadium_nhl_canadian_tire_centre", "Canadian Tire Centre", "Ottawa", "ON", "Canada", "nhl", 45.2969, -75.9272, "America/Toronto"),
"stadium_nhl_wells_fargo_center": StadiumInfo("stadium_nhl_wells_fargo_center", "Wells Fargo Center", "Philadelphia", "PA", "USA", "nhl", 39.9012, -75.1720),
"stadium_nhl_ppg_paints_arena": StadiumInfo("stadium_nhl_ppg_paints_arena", "PPG Paints Arena", "Pittsburgh", "PA", "USA", "nhl", 40.4395, -79.9890),
"stadium_nhl_sap_center": StadiumInfo("stadium_nhl_sap_center", "SAP Center", "San Jose", "CA", "USA", "nhl", 37.3327, -121.9011, "America/Los_Angeles"),
"stadium_nhl_climate_pledge_arena": StadiumInfo("stadium_nhl_climate_pledge_arena", "Climate Pledge Arena", "Seattle", "WA", "USA", "nhl", 47.6221, -122.3540, "America/Los_Angeles"),
"stadium_nhl_enterprise_center": StadiumInfo("stadium_nhl_enterprise_center", "Enterprise Center", "St. Louis", "MO", "USA", "nhl", 38.6268, -90.2025, "America/Chicago"),
"stadium_nhl_amalie_arena": StadiumInfo("stadium_nhl_amalie_arena", "Amalie Arena", "Tampa", "FL", "USA", "nhl", 27.9428, -82.4519),
"stadium_nhl_scotiabank_arena": StadiumInfo("stadium_nhl_scotiabank_arena", "Scotiabank Arena", "Toronto", "ON", "Canada", "nhl", 43.6435, -79.3791, "America/Toronto"),
"stadium_nhl_rogers_arena": StadiumInfo("stadium_nhl_rogers_arena", "Rogers Arena", "Vancouver", "BC", "Canada", "nhl", 49.2778, -123.1088, "America/Vancouver"),
"stadium_nhl_tmobile_arena": StadiumInfo("stadium_nhl_tmobile_arena", "T-Mobile Arena", "Las Vegas", "NV", "USA", "nhl", 36.1028, -115.1783, "America/Los_Angeles"),
"stadium_nhl_capital_one_arena": StadiumInfo("stadium_nhl_capital_one_arena", "Capital One Arena", "Washington", "DC", "USA", "nhl", 38.8981, -77.0209),
"stadium_nhl_canada_life_centre": StadiumInfo("stadium_nhl_canada_life_centre", "Canada Life Centre", "Winnipeg", "MB", "Canada", "nhl", 49.8928, -97.1433, "America/Winnipeg"),
},
"mls": {
"stadium_mls_mercedes_benz_stadium": StadiumInfo("stadium_mls_mercedes_benz_stadium", "Mercedes-Benz Stadium", "Atlanta", "GA", "USA", "mls", 33.7553, -84.4006),
"stadium_mls_q2_stadium": StadiumInfo("stadium_mls_q2_stadium", "Q2 Stadium", "Austin", "TX", "USA", "mls", 30.3875, -97.7186, "America/Chicago"),
"stadium_mls_bank_of_america_stadium": StadiumInfo("stadium_mls_bank_of_america_stadium", "Bank of America Stadium", "Charlotte", "NC", "USA", "mls", 35.2258, -80.8528),
"stadium_mls_soldier_field": StadiumInfo("stadium_mls_soldier_field", "Soldier Field", "Chicago", "IL", "USA", "mls", 41.8623, -87.6167, "America/Chicago"),
"stadium_mls_tql_stadium": StadiumInfo("stadium_mls_tql_stadium", "TQL Stadium", "Cincinnati", "OH", "USA", "mls", 39.1112, -84.5225),
"stadium_mls_dicks_sporting_goods_park": StadiumInfo("stadium_mls_dicks_sporting_goods_park", "Dick's Sporting Goods Park", "Commerce City", "CO", "USA", "mls", 39.8056, -104.8922, "America/Denver"),
"stadium_mls_lowercom_field": StadiumInfo("stadium_mls_lowercom_field", "Lower.com Field", "Columbus", "OH", "USA", "mls", 39.9689, -83.0173),
"stadium_mls_toyota_stadium": StadiumInfo("stadium_mls_toyota_stadium", "Toyota Stadium", "Frisco", "TX", "USA", "mls", 33.1545, -96.8353, "America/Chicago"),
"stadium_mls_audi_field": StadiumInfo("stadium_mls_audi_field", "Audi Field", "Washington", "DC", "USA", "mls", 38.8687, -77.0128),
"stadium_mls_shell_energy_stadium": StadiumInfo("stadium_mls_shell_energy_stadium", "Shell Energy Stadium", "Houston", "TX", "USA", "mls", 29.7522, -95.3527, "America/Chicago"),
"stadium_mls_dignity_health_sports_park": StadiumInfo("stadium_mls_dignity_health_sports_park", "Dignity Health Sports Park", "Carson", "CA", "USA", "mls", 33.8644, -118.2611, "America/Los_Angeles"),
"stadium_mls_bmo_stadium": StadiumInfo("stadium_mls_bmo_stadium", "BMO Stadium", "Los Angeles", "CA", "USA", "mls", 34.0128, -118.2841, "America/Los_Angeles"),
"stadium_mls_chase_stadium": StadiumInfo("stadium_mls_chase_stadium", "Chase Stadium", "Fort Lauderdale", "FL", "USA", "mls", 26.1930, -80.1611),
"stadium_mls_allianz_field": StadiumInfo("stadium_mls_allianz_field", "Allianz Field", "St. Paul", "MN", "USA", "mls", 44.9528, -93.1650, "America/Chicago"),
"stadium_mls_stade_saputo": StadiumInfo("stadium_mls_stade_saputo", "Stade Saputo", "Montreal", "QC", "Canada", "mls", 45.5622, -73.5528, "America/Toronto"),
"stadium_mls_geodis_park": StadiumInfo("stadium_mls_geodis_park", "GEODIS Park", "Nashville", "TN", "USA", "mls", 36.1304, -86.7651, "America/Chicago"),
"stadium_mls_gillette_stadium": StadiumInfo("stadium_mls_gillette_stadium", "Gillette Stadium", "Foxborough", "MA", "USA", "mls", 42.0909, -71.2643),
"stadium_mls_yankee_stadium": StadiumInfo("stadium_mls_yankee_stadium", "Yankee Stadium", "Bronx", "NY", "USA", "mls", 40.8296, -73.9262),
"stadium_mls_red_bull_arena": StadiumInfo("stadium_mls_red_bull_arena", "Red Bull Arena", "Harrison", "NJ", "USA", "mls", 40.7369, -74.1503),
"stadium_mls_interco_stadium": StadiumInfo("stadium_mls_interco_stadium", "Inter&Co Stadium", "Orlando", "FL", "USA", "mls", 28.5411, -81.3895),
"stadium_mls_subaru_park": StadiumInfo("stadium_mls_subaru_park", "Subaru Park", "Chester", "PA", "USA", "mls", 39.8328, -75.3789),
"stadium_mls_providence_park": StadiumInfo("stadium_mls_providence_park", "Providence Park", "Portland", "OR", "USA", "mls", 45.5216, -122.6917, "America/Los_Angeles"),
"stadium_mls_america_first_field": StadiumInfo("stadium_mls_america_first_field", "America First Field", "Sandy", "UT", "USA", "mls", 40.5830, -111.8933, "America/Denver"),
"stadium_mls_paypal_park": StadiumInfo("stadium_mls_paypal_park", "PayPal Park", "San Jose", "CA", "USA", "mls", 37.3511, -121.9250, "America/Los_Angeles"),
"stadium_mls_snapdragon_stadium": StadiumInfo("stadium_mls_snapdragon_stadium", "Snapdragon Stadium", "San Diego", "CA", "USA", "mls", 32.7837, -117.1225, "America/Los_Angeles"),
"stadium_mls_lumen_field": StadiumInfo("stadium_mls_lumen_field", "Lumen Field", "Seattle", "WA", "USA", "mls", 47.5952, -122.3316, "America/Los_Angeles"),
"stadium_mls_childrens_mercy_park": StadiumInfo("stadium_mls_childrens_mercy_park", "Children's Mercy Park", "Kansas City", "KS", "USA", "mls", 39.1217, -94.8231, "America/Chicago"),
"stadium_mls_citypark": StadiumInfo("stadium_mls_citypark", "CITYPARK", "St. Louis", "MO", "USA", "mls", 38.6316, -90.2106, "America/Chicago"),
"stadium_mls_bmo_field": StadiumInfo("stadium_mls_bmo_field", "BMO Field", "Toronto", "ON", "Canada", "mls", 43.6332, -79.4186, "America/Toronto"),
"stadium_mls_bc_place": StadiumInfo("stadium_mls_bc_place", "BC Place", "Vancouver", "BC", "Canada", "mls", 49.2768, -123.1118, "America/Vancouver"),
# Alternate and special venues
"stadium_mls_miami_freedom_park": StadiumInfo("stadium_mls_miami_freedom_park", "Miami Freedom Park", "Miami", "FL", "USA", "mls", 25.789, -80.237),
"stadium_mls_citi_field": StadiumInfo("stadium_mls_citi_field", "Citi Field", "New York", "NY", "USA", "mls", 40.7571, -73.8458),
"stadium_mls_los_angeles_memorial_coliseum": StadiumInfo("stadium_mls_los_angeles_memorial_coliseum", "Los Angeles Memorial Coliseum", "Los Angeles", "CA", "USA", "mls", 34.0141, -118.2879, "America/Los_Angeles"),
"stadium_mls_mandt_bank_stadium": StadiumInfo("stadium_mls_mandt_bank_stadium", "M&T Bank Stadium", "Baltimore", "MD", "USA", "mls", 39.2780, -76.6227),
},
"wnba": {
"stadium_wnba_gateway_center_arena": StadiumInfo("stadium_wnba_gateway_center_arena", "Gateway Center Arena", "College Park", "GA", "USA", "wnba", 33.6510, -84.4474),
"stadium_wnba_wintrust_arena": StadiumInfo("stadium_wnba_wintrust_arena", "Wintrust Arena", "Chicago", "IL", "USA", "wnba", 41.8658, -87.6169, "America/Chicago"),
"stadium_wnba_mohegan_sun_arena": StadiumInfo("stadium_wnba_mohegan_sun_arena", "Mohegan Sun Arena", "Uncasville", "CT", "USA", "wnba", 41.4931, -72.0912),
"stadium_wnba_college_park_center": StadiumInfo("stadium_wnba_college_park_center", "College Park Center", "Arlington", "TX", "USA", "wnba", 32.7304, -97.1077, "America/Chicago"),
"stadium_wnba_chase_center": StadiumInfo("stadium_wnba_chase_center", "Chase Center", "San Francisco", "CA", "USA", "wnba", 37.7680, -122.3877, "America/Los_Angeles"),
"stadium_wnba_gainbridge_fieldhouse": StadiumInfo("stadium_wnba_gainbridge_fieldhouse", "Gainbridge Fieldhouse", "Indianapolis", "IN", "USA", "wnba", 39.7640, -86.1555),
"stadium_wnba_michelob_ultra_arena": StadiumInfo("stadium_wnba_michelob_ultra_arena", "Michelob Ultra Arena", "Las Vegas", "NV", "USA", "wnba", 36.0902, -115.1756, "America/Los_Angeles"),
"stadium_wnba_cryptocom_arena": StadiumInfo("stadium_wnba_cryptocom_arena", "Crypto.com Arena", "Los Angeles", "CA", "USA", "wnba", 34.0430, -118.2673, "America/Los_Angeles"),
"stadium_wnba_target_center": StadiumInfo("stadium_wnba_target_center", "Target Center", "Minneapolis", "MN", "USA", "wnba", 44.9795, -93.2761, "America/Chicago"),
"stadium_wnba_barclays_center": StadiumInfo("stadium_wnba_barclays_center", "Barclays Center", "Brooklyn", "NY", "USA", "wnba", 40.6826, -73.9754),
"stadium_wnba_footprint_center": StadiumInfo("stadium_wnba_footprint_center", "Footprint Center", "Phoenix", "AZ", "USA", "wnba", 33.4457, -112.0712, "America/Phoenix"),
"stadium_wnba_climate_pledge_arena": StadiumInfo("stadium_wnba_climate_pledge_arena", "Climate Pledge Arena", "Seattle", "WA", "USA", "wnba", 47.6221, -122.3540, "America/Los_Angeles"),
"stadium_wnba_entertainment_sports_arena": StadiumInfo("stadium_wnba_entertainment_sports_arena", "Entertainment & Sports Arena", "Washington", "DC", "USA", "wnba", 38.8690, -76.9745),
"stadium_wnba_state_farm_arena": StadiumInfo("stadium_wnba_state_farm_arena", "State Farm Arena", "Atlanta", "GA", "USA", "wnba", 33.7573, -84.3963),
"stadium_wnba_rocket_mortgage_fieldhouse": StadiumInfo("stadium_wnba_rocket_mortgage_fieldhouse", "Rocket Mortgage FieldHouse", "Cleveland", "OH", "USA", "wnba", 41.4965, -81.6882),
"stadium_wnba_cfg_bank_arena": StadiumInfo("stadium_wnba_cfg_bank_arena", "CFG Bank Arena", "Baltimore", "MD", "USA", "wnba", 39.2825, -76.6220),
"stadium_wnba_purcell_pavilion": StadiumInfo("stadium_wnba_purcell_pavilion", "Purcell Pavilion", "Notre Dame", "IN", "USA", "wnba", 41.6987, -86.2340),
},
"nwsl": {
"stadium_nwsl_bmo_stadium": StadiumInfo("stadium_nwsl_bmo_stadium", "BMO Stadium", "Los Angeles", "CA", "USA", "nwsl", 34.0128, -118.2841, "America/Los_Angeles"),
"stadium_nwsl_seatgeek_stadium": StadiumInfo("stadium_nwsl_seatgeek_stadium", "SeatGeek Stadium", "Bridgeview", "IL", "USA", "nwsl", 41.7500, -87.8028, "America/Chicago"),
"stadium_nwsl_shell_energy_stadium": StadiumInfo("stadium_nwsl_shell_energy_stadium", "Shell Energy Stadium", "Houston", "TX", "USA", "nwsl", 29.7522, -95.3527, "America/Chicago"),
"stadium_nwsl_cpkc_stadium": StadiumInfo("stadium_nwsl_cpkc_stadium", "CPKC Stadium", "Kansas City", "MO", "USA", "nwsl", 39.1050, -94.5580, "America/Chicago"),
"stadium_nwsl_red_bull_arena": StadiumInfo("stadium_nwsl_red_bull_arena", "Red Bull Arena", "Harrison", "NJ", "USA", "nwsl", 40.7369, -74.1503),
"stadium_nwsl_wakemed_soccer_park": StadiumInfo("stadium_nwsl_wakemed_soccer_park", "WakeMed Soccer Park", "Cary", "NC", "USA", "nwsl", 35.7879, -78.7806),
"stadium_nwsl_interco_stadium": StadiumInfo("stadium_nwsl_interco_stadium", "Inter&Co Stadium", "Orlando", "FL", "USA", "nwsl", 28.5411, -81.3895),
"stadium_nwsl_providence_park": StadiumInfo("stadium_nwsl_providence_park", "Providence Park", "Portland", "OR", "USA", "nwsl", 45.5216, -122.6917, "America/Los_Angeles"),
"stadium_nwsl_lynn_family_stadium": StadiumInfo("stadium_nwsl_lynn_family_stadium", "Lynn Family Stadium", "Louisville", "KY", "USA", "nwsl", 38.2219, -85.7381),
"stadium_nwsl_snapdragon_stadium": StadiumInfo("stadium_nwsl_snapdragon_stadium", "Snapdragon Stadium", "San Diego", "CA", "USA", "nwsl", 32.7837, -117.1225, "America/Los_Angeles"),
"stadium_nwsl_lumen_field": StadiumInfo("stadium_nwsl_lumen_field", "Lumen Field", "Seattle", "WA", "USA", "nwsl", 47.5952, -122.3316, "America/Los_Angeles"),
"stadium_nwsl_america_first_field": StadiumInfo("stadium_nwsl_america_first_field", "America First Field", "Sandy", "UT", "USA", "nwsl", 40.5830, -111.8933, "America/Denver"),
"stadium_nwsl_audi_field": StadiumInfo("stadium_nwsl_audi_field", "Audi Field", "Washington", "DC", "USA", "nwsl", 38.8687, -77.0128),
"stadium_nwsl_paypal_park": StadiumInfo("stadium_nwsl_paypal_park", "PayPal Park", "San Jose", "CA", "USA", "nwsl", 37.3511, -121.9250, "America/Los_Angeles"),
# Boston Legacy FC venues
"stadium_nwsl_gillette_stadium": StadiumInfo("stadium_nwsl_gillette_stadium", "Gillette Stadium", "Foxborough", "MA", "USA", "nwsl", 42.0909, -71.2643),
"stadium_nwsl_centreville_bank_stadium": StadiumInfo("stadium_nwsl_centreville_bank_stadium", "Centreville Bank Stadium", "Pawtucket", "RI", "USA", "nwsl", 41.8770, -71.3910),
# Denver Summit FC venues
"stadium_nwsl_empower_field": StadiumInfo("stadium_nwsl_empower_field", "Empower Field at Mile High", "Denver", "CO", "USA", "nwsl", 39.7439, -105.0201, "America/Denver"),
"stadium_nwsl_dicks_sporting_goods_park": StadiumInfo("stadium_nwsl_dicks_sporting_goods_park", "Dick's Sporting Goods Park", "Commerce City", "CO", "USA", "nwsl", 39.8056, -104.8922, "America/Denver"),
"stadium_nwsl_centennial_stadium": StadiumInfo("stadium_nwsl_centennial_stadium", "Centennial Stadium", "Centennial", "CO", "USA", "nwsl", 39.6000, -104.8800, "America/Denver"),
# Shared NFL/MLB venues
"stadium_nwsl_soldier_field": StadiumInfo("stadium_nwsl_soldier_field", "Soldier Field", "Chicago", "IL", "USA", "nwsl", 41.8623, -87.6167, "America/Chicago"),
"stadium_nwsl_oracle_park": StadiumInfo("stadium_nwsl_oracle_park", "Oracle Park", "San Francisco", "CA", "USA", "nwsl", 37.7786, -122.3893, "America/Los_Angeles"),
# Additional alternate venues
"stadium_nwsl_northwestern_medicine_field": StadiumInfo("stadium_nwsl_northwestern_medicine_field", "Northwestern Medicine Field at Martin Stadium", "Evanston", "IL", "USA", "nwsl", 42.0598, -87.6743, "America/Chicago"),
"stadium_nwsl_one_spokane_stadium": StadiumInfo("stadium_nwsl_one_spokane_stadium", "ONE Spokane Stadium", "Spokane", "WA", "USA", "nwsl", 47.6588, -117.4101, "America/Los_Angeles"),
},
}
class StadiumResolver:
"""Resolves stadium names to canonical IDs.
Resolution order:
1. Exact match against stadium names
2. Alias lookup (with date awareness)
3. Fuzzy match against all known names
4. Geographic filter check
5. Unresolved (returns ManualReviewItem)
"""
def __init__(
self,
sport: str,
alias_loader: Optional[StadiumAliasLoader] = None,
fuzzy_threshold: int = FUZZY_MATCH_THRESHOLD,
):
"""Initialize the resolver.
Args:
sport: Sport code (e.g., 'nba', 'mlb')
alias_loader: Stadium alias loader (default: global loader)
fuzzy_threshold: Minimum fuzzy match score
"""
self.sport = sport.lower()
self.alias_loader = alias_loader or get_stadium_alias_loader()
self.fuzzy_threshold = fuzzy_threshold
self._stadiums = STADIUM_MAPPINGS.get(self.sport, {})
# Build match candidates
self._candidates = self._build_candidates()
def _build_candidates(self) -> list[MatchCandidate]:
"""Build match candidates from stadium mappings."""
candidates = []
for stadium_id, info in self._stadiums.items():
# Get aliases for this stadium
aliases = [a.alias_name for a in self.alias_loader.get_aliases_for_stadium(stadium_id)]
# Add city as alias
aliases.append(info.city)
candidates.append(MatchCandidate(
canonical_id=stadium_id,
name=info.name,
aliases=aliases,
))
return candidates
def resolve(
self,
name: str,
check_date: Optional[date] = None,
country: Optional[str] = None,
source_url: Optional[str] = None,
) -> StadiumResolveResult:
"""Resolve a stadium name to a canonical ID.
Args:
name: Stadium name to resolve
check_date: Date for alias validity (None = today)
country: Country for geographic filtering (None = no filter)
source_url: Source URL for manual review items
Returns:
StadiumResolveResult with resolution details
"""
name_lower = name.lower().strip()
# 1. Exact match against stadium names
for stadium_id, info in self._stadiums.items():
if name_lower == info.name.lower():
return StadiumResolveResult(
canonical_id=stadium_id,
confidence=100,
match_type="exact",
)
# 2. Alias lookup
alias_result = self.alias_loader.resolve(name, check_date)
if alias_result:
# Verify it's for the right sport (alias file has all sports)
if alias_result.startswith(f"stadium_{self.sport}_"):
return StadiumResolveResult(
canonical_id=alias_result,
confidence=95,
match_type="alias",
)
# 3. Fuzzy match
matches = fuzzy_match_stadium(
name,
self._candidates,
threshold=self.fuzzy_threshold,
)
if matches:
best = matches[0]
review_item = None
# Create review item for low confidence matches
if best.confidence < 90:
review_item = ManualReviewItem(
id=f"stadium_{uuid4().hex[:8]}",
reason=ReviewReason.LOW_CONFIDENCE_MATCH,
sport=self.sport,
raw_value=name,
context={"match_type": "fuzzy"},
source_url=source_url,
suggested_matches=matches,
game_date=check_date,
)
return StadiumResolveResult(
canonical_id=best.canonical_id,
confidence=best.confidence,
match_type="fuzzy",
review_item=review_item,
)
# 4. Geographic filter check
if country and country not in ALLOWED_COUNTRIES:
review_item = ManualReviewItem(
id=f"stadium_{uuid4().hex[:8]}",
reason=ReviewReason.GEOGRAPHIC_FILTER,
sport=self.sport,
raw_value=name,
context={"country": country, "reason": "Stadium outside USA/Canada/Mexico"},
source_url=source_url,
game_date=check_date,
)
return StadiumResolveResult(
canonical_id=None,
confidence=0,
match_type="filtered",
filtered_reason="geographic",
review_item=review_item,
)
# 5. Unresolved
review_item = ManualReviewItem(
id=f"stadium_{uuid4().hex[:8]}",
reason=ReviewReason.UNRESOLVED_STADIUM,
sport=self.sport,
raw_value=name,
context={},
source_url=source_url,
suggested_matches=fuzzy_match_stadium(
name,
self._candidates,
threshold=50, # Lower threshold for suggestions
top_n=5,
),
game_date=check_date,
)
return StadiumResolveResult(
canonical_id=None,
confidence=0,
match_type="unresolved",
review_item=review_item,
)
def get_stadium_info(self, stadium_id: str) -> Optional[StadiumInfo]:
"""Get stadium info by ID.
Args:
stadium_id: Canonical stadium ID
Returns:
StadiumInfo or None
"""
return self._stadiums.get(stadium_id)
def get_all_stadiums(self) -> list[StadiumInfo]:
"""Get all stadiums for this sport.
Returns:
List of StadiumInfo objects
"""
return list(self._stadiums.values())
def is_in_allowed_region(self, stadium_id: str) -> bool:
"""Check if a stadium is in an allowed region.
Args:
stadium_id: Canonical stadium ID
Returns:
True if stadium is in USA, Canada, or Mexico
"""
info = self._stadiums.get(stadium_id)
if not info:
return False
return info.country in ALLOWED_COUNTRIES
# Cached resolvers
_resolvers: dict[str, StadiumResolver] = {}
def get_stadium_resolver(sport: str) -> StadiumResolver:
"""Get or create a stadium resolver for a sport."""
sport_lower = sport.lower()
if sport_lower not in _resolvers:
_resolvers[sport_lower] = StadiumResolver(sport_lower)
return _resolvers[sport_lower]
def resolve_stadium(
sport: str,
name: str,
check_date: Optional[date] = None,
) -> StadiumResolveResult:
"""Convenience function to resolve a stadium name.
Args:
sport: Sport code
name: Stadium name to resolve
check_date: Date for alias validity
Returns:
StadiumResolveResult
"""
return get_stadium_resolver(sport).resolve(name, check_date)

View File

@@ -1,514 +0,0 @@
"""Team name resolver with exact, alias, and fuzzy matching."""
from dataclasses import dataclass
from datetime import date
from typing import Optional
from uuid import uuid4
from ..config import FUZZY_MATCH_THRESHOLD
from ..models.aliases import (
AliasType,
FuzzyMatch,
ManualReviewItem,
ReviewReason,
)
from .alias_loader import get_team_alias_loader, TeamAliasLoader
from .fuzzy import MatchCandidate, fuzzy_match_team, exact_match
@dataclass
class TeamResolveResult:
"""Result of team resolution.
Attributes:
canonical_id: Resolved canonical team ID (None if unresolved)
confidence: Confidence in the match (100 for exact, lower for fuzzy)
match_type: How the match was made ('exact', 'alias', 'fuzzy', 'unresolved')
review_item: ManualReviewItem if resolution failed or low confidence
"""
canonical_id: Optional[str]
confidence: int
match_type: str
review_item: Optional[ManualReviewItem] = None
# Hardcoded team mappings for each sport
# Format: {sport: {abbreviation: (canonical_id, full_name, city, stadium_id)}}
TEAM_MAPPINGS: dict[str, dict[str, tuple[str, str, str, str]]] = {
"nba": {
"ATL": ("team_nba_atl", "Atlanta Hawks", "Atlanta", "stadium_nba_state_farm_arena"),
"BOS": ("team_nba_bos", "Boston Celtics", "Boston", "stadium_nba_td_garden"),
"BKN": ("team_nba_brk", "Brooklyn Nets", "Brooklyn", "stadium_nba_barclays_center"),
"BRK": ("team_nba_brk", "Brooklyn Nets", "Brooklyn", "stadium_nba_barclays_center"),
"CHA": ("team_nba_cho", "Charlotte Hornets", "Charlotte", "stadium_nba_spectrum_center"),
"CHO": ("team_nba_cho", "Charlotte Hornets", "Charlotte", "stadium_nba_spectrum_center"),
"CHI": ("team_nba_chi", "Chicago Bulls", "Chicago", "stadium_nba_united_center"),
"CLE": ("team_nba_cle", "Cleveland Cavaliers", "Cleveland", "stadium_nba_rocket_mortgage_fieldhouse"),
"DAL": ("team_nba_dal", "Dallas Mavericks", "Dallas", "stadium_nba_american_airlines_center"),
"DEN": ("team_nba_den", "Denver Nuggets", "Denver", "stadium_nba_ball_arena"),
"DET": ("team_nba_det", "Detroit Pistons", "Detroit", "stadium_nba_little_caesars_arena"),
"GSW": ("team_nba_gsw", "Golden State Warriors", "Golden State", "stadium_nba_chase_center"),
"GS": ("team_nba_gsw", "Golden State Warriors", "Golden State", "stadium_nba_chase_center"),
"HOU": ("team_nba_hou", "Houston Rockets", "Houston", "stadium_nba_toyota_center"),
"IND": ("team_nba_ind", "Indiana Pacers", "Indiana", "stadium_nba_gainbridge_fieldhouse"),
"LAC": ("team_nba_lac", "Los Angeles Clippers", "Los Angeles", "stadium_nba_intuit_dome"),
"LAL": ("team_nba_lal", "Los Angeles Lakers", "Los Angeles", "stadium_nba_cryptocom_arena"),
"MEM": ("team_nba_mem", "Memphis Grizzlies", "Memphis", "stadium_nba_fedexforum"),
"MIA": ("team_nba_mia", "Miami Heat", "Miami", "stadium_nba_kaseya_center"),
"MIL": ("team_nba_mil", "Milwaukee Bucks", "Milwaukee", "stadium_nba_fiserv_forum"),
"MIN": ("team_nba_min", "Minnesota Timberwolves", "Minnesota", "stadium_nba_target_center"),
"NOP": ("team_nba_nop", "New Orleans Pelicans", "New Orleans", "stadium_nba_smoothie_king_center"),
"NO": ("team_nba_nop", "New Orleans Pelicans", "New Orleans", "stadium_nba_smoothie_king_center"),
"NYK": ("team_nba_nyk", "New York Knicks", "New York", "stadium_nba_madison_square_garden"),
"NY": ("team_nba_nyk", "New York Knicks", "New York", "stadium_nba_madison_square_garden"),
"OKC": ("team_nba_okc", "Oklahoma City Thunder", "Oklahoma City", "stadium_nba_paycom_center"),
"ORL": ("team_nba_orl", "Orlando Magic", "Orlando", "stadium_nba_kia_center"),
"PHI": ("team_nba_phi", "Philadelphia 76ers", "Philadelphia", "stadium_nba_wells_fargo_center"),
"PHX": ("team_nba_phx", "Phoenix Suns", "Phoenix", "stadium_nba_footprint_center"),
"PHO": ("team_nba_phx", "Phoenix Suns", "Phoenix", "stadium_nba_footprint_center"),
"POR": ("team_nba_por", "Portland Trail Blazers", "Portland", "stadium_nba_moda_center"),
"SAC": ("team_nba_sac", "Sacramento Kings", "Sacramento", "stadium_nba_golden_1_center"),
"SAS": ("team_nba_sas", "San Antonio Spurs", "San Antonio", "stadium_nba_frost_bank_center"),
"SA": ("team_nba_sas", "San Antonio Spurs", "San Antonio", "stadium_nba_frost_bank_center"),
"TOR": ("team_nba_tor", "Toronto Raptors", "Toronto", "stadium_nba_scotiabank_arena"),
"UTA": ("team_nba_uta", "Utah Jazz", "Utah", "stadium_nba_delta_center"),
"WAS": ("team_nba_was", "Washington Wizards", "Washington", "stadium_nba_capital_one_arena"),
"WSH": ("team_nba_was", "Washington Wizards", "Washington", "stadium_nba_capital_one_arena"),
},
"mlb": {
"ARI": ("team_mlb_ari", "Arizona Diamondbacks", "Arizona", "stadium_mlb_chase_field"),
"ATL": ("team_mlb_atl", "Atlanta Braves", "Atlanta", "stadium_mlb_truist_park"),
"BAL": ("team_mlb_bal", "Baltimore Orioles", "Baltimore", "stadium_mlb_oriole_park_at_camden_yards"),
"BOS": ("team_mlb_bos", "Boston Red Sox", "Boston", "stadium_mlb_fenway_park"),
"CHC": ("team_mlb_chc", "Chicago Cubs", "Chicago", "stadium_mlb_wrigley_field"),
"CHW": ("team_mlb_chw", "Chicago White Sox", "Chicago", "stadium_mlb_guaranteed_rate_field"),
"CWS": ("team_mlb_chw", "Chicago White Sox", "Chicago", "stadium_mlb_guaranteed_rate_field"),
"CIN": ("team_mlb_cin", "Cincinnati Reds", "Cincinnati", "stadium_mlb_great_american_ball_park"),
"CLE": ("team_mlb_cle", "Cleveland Guardians", "Cleveland", "stadium_mlb_progressive_field"),
"COL": ("team_mlb_col", "Colorado Rockies", "Colorado", "stadium_mlb_coors_field"),
"DET": ("team_mlb_det", "Detroit Tigers", "Detroit", "stadium_mlb_comerica_park"),
"HOU": ("team_mlb_hou", "Houston Astros", "Houston", "stadium_mlb_minute_maid_park"),
"KC": ("team_mlb_kc", "Kansas City Royals", "Kansas City", "stadium_mlb_kauffman_stadium"),
"KCR": ("team_mlb_kc", "Kansas City Royals", "Kansas City", "stadium_mlb_kauffman_stadium"),
"LAA": ("team_mlb_laa", "Los Angeles Angels", "Los Angeles", "stadium_mlb_angel_stadium"),
"ANA": ("team_mlb_laa", "Los Angeles Angels", "Anaheim", "stadium_mlb_angel_stadium"),
"LAD": ("team_mlb_lad", "Los Angeles Dodgers", "Los Angeles", "stadium_mlb_dodger_stadium"),
"MIA": ("team_mlb_mia", "Miami Marlins", "Miami", "stadium_mlb_loandepot_park"),
"FLA": ("team_mlb_mia", "Miami Marlins", "Florida", "stadium_mlb_loandepot_park"),
"MIL": ("team_mlb_mil", "Milwaukee Brewers", "Milwaukee", "stadium_mlb_american_family_field"),
"MIN": ("team_mlb_min", "Minnesota Twins", "Minnesota", "stadium_mlb_target_field"),
"NYM": ("team_mlb_nym", "New York Mets", "New York", "stadium_mlb_citi_field"),
"NYY": ("team_mlb_nyy", "New York Yankees", "New York", "stadium_mlb_yankee_stadium"),
"OAK": ("team_mlb_oak", "Oakland Athletics", "Oakland", "stadium_mlb_sutter_health_park"),
"PHI": ("team_mlb_phi", "Philadelphia Phillies", "Philadelphia", "stadium_mlb_citizens_bank_park"),
"PIT": ("team_mlb_pit", "Pittsburgh Pirates", "Pittsburgh", "stadium_mlb_pnc_park"),
"SD": ("team_mlb_sd", "San Diego Padres", "San Diego", "stadium_mlb_petco_park"),
"SDP": ("team_mlb_sd", "San Diego Padres", "San Diego", "stadium_mlb_petco_park"),
"SF": ("team_mlb_sf", "San Francisco Giants", "San Francisco", "stadium_mlb_oracle_park"),
"SFG": ("team_mlb_sf", "San Francisco Giants", "San Francisco", "stadium_mlb_oracle_park"),
"SEA": ("team_mlb_sea", "Seattle Mariners", "Seattle", "stadium_mlb_tmobile_park"),
"STL": ("team_mlb_stl", "St. Louis Cardinals", "St. Louis", "stadium_mlb_busch_stadium"),
"TB": ("team_mlb_tbr", "Tampa Bay Rays", "Tampa Bay", "stadium_mlb_tropicana_field"),
"TBR": ("team_mlb_tbr", "Tampa Bay Rays", "Tampa Bay", "stadium_mlb_tropicana_field"),
"TEX": ("team_mlb_tex", "Texas Rangers", "Texas", "stadium_mlb_globe_life_field"),
"TOR": ("team_mlb_tor", "Toronto Blue Jays", "Toronto", "stadium_mlb_rogers_centre"),
"WSN": ("team_mlb_wsn", "Washington Nationals", "Washington", "stadium_mlb_nationals_park"),
"WAS": ("team_mlb_wsn", "Washington Nationals", "Washington", "stadium_mlb_nationals_park"),
},
"nfl": {
"ARI": ("team_nfl_ari", "Arizona Cardinals", "Arizona", "stadium_nfl_state_farm_stadium"),
"ATL": ("team_nfl_atl", "Atlanta Falcons", "Atlanta", "stadium_nfl_mercedes_benz_stadium"),
"BAL": ("team_nfl_bal", "Baltimore Ravens", "Baltimore", "stadium_nfl_mandt_bank_stadium"),
"BUF": ("team_nfl_buf", "Buffalo Bills", "Buffalo", "stadium_nfl_highmark_stadium"),
"CAR": ("team_nfl_car", "Carolina Panthers", "Carolina", "stadium_nfl_bank_of_america_stadium"),
"CHI": ("team_nfl_chi", "Chicago Bears", "Chicago", "stadium_nfl_soldier_field"),
"CIN": ("team_nfl_cin", "Cincinnati Bengals", "Cincinnati", "stadium_nfl_paycor_stadium"),
"CLE": ("team_nfl_cle", "Cleveland Browns", "Cleveland", "stadium_nfl_huntington_bank_field"),
"DAL": ("team_nfl_dal", "Dallas Cowboys", "Dallas", "stadium_nfl_att_stadium"),
"DEN": ("team_nfl_den", "Denver Broncos", "Denver", "stadium_nfl_empower_field"),
"DET": ("team_nfl_det", "Detroit Lions", "Detroit", "stadium_nfl_ford_field"),
"GB": ("team_nfl_gb", "Green Bay Packers", "Green Bay", "stadium_nfl_lambeau_field"),
"GNB": ("team_nfl_gb", "Green Bay Packers", "Green Bay", "stadium_nfl_lambeau_field"),
"HOU": ("team_nfl_hou", "Houston Texans", "Houston", "stadium_nfl_nrg_stadium"),
"IND": ("team_nfl_ind", "Indianapolis Colts", "Indianapolis", "stadium_nfl_lucas_oil_stadium"),
"JAX": ("team_nfl_jax", "Jacksonville Jaguars", "Jacksonville", "stadium_nfl_everbank_stadium"),
"JAC": ("team_nfl_jax", "Jacksonville Jaguars", "Jacksonville", "stadium_nfl_everbank_stadium"),
"KC": ("team_nfl_kc", "Kansas City Chiefs", "Kansas City", "stadium_nfl_arrowhead_stadium"),
"KAN": ("team_nfl_kc", "Kansas City Chiefs", "Kansas City", "stadium_nfl_arrowhead_stadium"),
"LV": ("team_nfl_lv", "Las Vegas Raiders", "Las Vegas", "stadium_nfl_allegiant_stadium"),
"LAC": ("team_nfl_lac", "Los Angeles Chargers", "Los Angeles", "stadium_nfl_sofi_stadium"),
"LAR": ("team_nfl_lar", "Los Angeles Rams", "Los Angeles", "stadium_nfl_sofi_stadium"),
"MIA": ("team_nfl_mia", "Miami Dolphins", "Miami", "stadium_nfl_hard_rock_stadium"),
"MIN": ("team_nfl_min", "Minnesota Vikings", "Minnesota", "stadium_nfl_us_bank_stadium"),
"NE": ("team_nfl_ne", "New England Patriots", "New England", "stadium_nfl_gillette_stadium"),
"NWE": ("team_nfl_ne", "New England Patriots", "New England", "stadium_nfl_gillette_stadium"),
"NO": ("team_nfl_no", "New Orleans Saints", "New Orleans", "stadium_nfl_caesars_superdome"),
"NOR": ("team_nfl_no", "New Orleans Saints", "New Orleans", "stadium_nfl_caesars_superdome"),
"NYG": ("team_nfl_nyg", "New York Giants", "New York", "stadium_nfl_metlife_stadium"),
"NYJ": ("team_nfl_nyj", "New York Jets", "New York", "stadium_nfl_metlife_stadium"),
"PHI": ("team_nfl_phi", "Philadelphia Eagles", "Philadelphia", "stadium_nfl_lincoln_financial_field"),
"PIT": ("team_nfl_pit", "Pittsburgh Steelers", "Pittsburgh", "stadium_nfl_acrisure_stadium"),
"SF": ("team_nfl_sf", "San Francisco 49ers", "San Francisco", "stadium_nfl_levis_stadium"),
"SFO": ("team_nfl_sf", "San Francisco 49ers", "San Francisco", "stadium_nfl_levis_stadium"),
"SEA": ("team_nfl_sea", "Seattle Seahawks", "Seattle", "stadium_nfl_lumen_field"),
"TB": ("team_nfl_tb", "Tampa Bay Buccaneers", "Tampa Bay", "stadium_nfl_raymond_james_stadium"),
"TAM": ("team_nfl_tb", "Tampa Bay Buccaneers", "Tampa Bay", "stadium_nfl_raymond_james_stadium"),
"TEN": ("team_nfl_ten", "Tennessee Titans", "Tennessee", "stadium_nfl_nissan_stadium"),
"WAS": ("team_nfl_was", "Washington Commanders", "Washington", "stadium_nfl_northwest_stadium"),
"WSH": ("team_nfl_was", "Washington Commanders", "Washington", "stadium_nfl_northwest_stadium"),
},
"nhl": {
"ANA": ("team_nhl_ana", "Anaheim Ducks", "Anaheim", "stadium_nhl_honda_center"),
"ARI": ("team_nhl_ari", "Utah Hockey Club", "Utah", "stadium_nhl_delta_center"), # Moved 2024
"UTA": ("team_nhl_ari", "Utah Hockey Club", "Utah", "stadium_nhl_delta_center"),
"BOS": ("team_nhl_bos", "Boston Bruins", "Boston", "stadium_nhl_td_garden"),
"BUF": ("team_nhl_buf", "Buffalo Sabres", "Buffalo", "stadium_nhl_keybank_center"),
"CGY": ("team_nhl_cgy", "Calgary Flames", "Calgary", "stadium_nhl_scotiabank_saddledome"),
"CAR": ("team_nhl_car", "Carolina Hurricanes", "Carolina", "stadium_nhl_pnc_arena"),
"CHI": ("team_nhl_chi", "Chicago Blackhawks", "Chicago", "stadium_nhl_united_center"),
"COL": ("team_nhl_col", "Colorado Avalanche", "Colorado", "stadium_nhl_ball_arena"),
"CBJ": ("team_nhl_cbj", "Columbus Blue Jackets", "Columbus", "stadium_nhl_nationwide_arena"),
"DAL": ("team_nhl_dal", "Dallas Stars", "Dallas", "stadium_nhl_american_airlines_center"),
"DET": ("team_nhl_det", "Detroit Red Wings", "Detroit", "stadium_nhl_little_caesars_arena"),
"EDM": ("team_nhl_edm", "Edmonton Oilers", "Edmonton", "stadium_nhl_rogers_place"),
"FLA": ("team_nhl_fla", "Florida Panthers", "Florida", "stadium_nhl_amerant_bank_arena"),
"LA": ("team_nhl_la", "Los Angeles Kings", "Los Angeles", "stadium_nhl_cryptocom_arena"),
"LAK": ("team_nhl_la", "Los Angeles Kings", "Los Angeles", "stadium_nhl_cryptocom_arena"),
"MIN": ("team_nhl_min", "Minnesota Wild", "Minnesota", "stadium_nhl_xcel_energy_center"),
"MTL": ("team_nhl_mtl", "Montreal Canadiens", "Montreal", "stadium_nhl_bell_centre"),
"MON": ("team_nhl_mtl", "Montreal Canadiens", "Montreal", "stadium_nhl_bell_centre"),
"NSH": ("team_nhl_nsh", "Nashville Predators", "Nashville", "stadium_nhl_bridgestone_arena"),
"NAS": ("team_nhl_nsh", "Nashville Predators", "Nashville", "stadium_nhl_bridgestone_arena"),
"NJ": ("team_nhl_njd", "New Jersey Devils", "New Jersey", "stadium_nhl_prudential_center"),
"NJD": ("team_nhl_njd", "New Jersey Devils", "New Jersey", "stadium_nhl_prudential_center"),
"NYI": ("team_nhl_nyi", "New York Islanders", "New York", "stadium_nhl_ubs_arena"),
"NYR": ("team_nhl_nyr", "New York Rangers", "New York", "stadium_nhl_madison_square_garden"),
"OTT": ("team_nhl_ott", "Ottawa Senators", "Ottawa", "stadium_nhl_canadian_tire_centre"),
"PHI": ("team_nhl_phi", "Philadelphia Flyers", "Philadelphia", "stadium_nhl_wells_fargo_center"),
"PIT": ("team_nhl_pit", "Pittsburgh Penguins", "Pittsburgh", "stadium_nhl_ppg_paints_arena"),
"SJ": ("team_nhl_sj", "San Jose Sharks", "San Jose", "stadium_nhl_sap_center"),
"SJS": ("team_nhl_sj", "San Jose Sharks", "San Jose", "stadium_nhl_sap_center"),
"SEA": ("team_nhl_sea", "Seattle Kraken", "Seattle", "stadium_nhl_climate_pledge_arena"),
"STL": ("team_nhl_stl", "St. Louis Blues", "St. Louis", "stadium_nhl_enterprise_center"),
"TB": ("team_nhl_tb", "Tampa Bay Lightning", "Tampa Bay", "stadium_nhl_amalie_arena"),
"TBL": ("team_nhl_tb", "Tampa Bay Lightning", "Tampa Bay", "stadium_nhl_amalie_arena"),
"TOR": ("team_nhl_tor", "Toronto Maple Leafs", "Toronto", "stadium_nhl_scotiabank_arena"),
"VAN": ("team_nhl_van", "Vancouver Canucks", "Vancouver", "stadium_nhl_rogers_arena"),
"VGK": ("team_nhl_vgk", "Vegas Golden Knights", "Vegas", "stadium_nhl_tmobile_arena"),
"VEG": ("team_nhl_vgk", "Vegas Golden Knights", "Vegas", "stadium_nhl_tmobile_arena"),
"WAS": ("team_nhl_was", "Washington Capitals", "Washington", "stadium_nhl_capital_one_arena"),
"WSH": ("team_nhl_was", "Washington Capitals", "Washington", "stadium_nhl_capital_one_arena"),
"WPG": ("team_nhl_wpg", "Winnipeg Jets", "Winnipeg", "stadium_nhl_canada_life_centre"),
},
"mls": {
"ATL": ("team_mls_atl", "Atlanta United", "Atlanta", "stadium_mls_mercedes_benz_stadium"),
"AUS": ("team_mls_aus", "Austin FC", "Austin", "stadium_mls_q2_stadium"),
"CLT": ("team_mls_clt", "Charlotte FC", "Charlotte", "stadium_mls_bank_of_america_stadium"),
"CHI": ("team_mls_chi", "Chicago Fire", "Chicago", "stadium_mls_soldier_field"),
"CIN": ("team_mls_cin", "FC Cincinnati", "Cincinnati", "stadium_mls_tql_stadium"),
"COL": ("team_mls_col", "Colorado Rapids", "Colorado", "stadium_mls_dicks_sporting_goods_park"),
"CLB": ("team_mls_clb", "Columbus Crew", "Columbus", "stadium_mls_lowercom_field"),
"DAL": ("team_mls_dal", "FC Dallas", "Dallas", "stadium_mls_toyota_stadium"),
"DC": ("team_mls_dc", "D.C. United", "Washington", "stadium_mls_audi_field"),
"HOU": ("team_mls_hou", "Houston Dynamo", "Houston", "stadium_mls_shell_energy_stadium"),
"LAG": ("team_mls_lag", "LA Galaxy", "Los Angeles", "stadium_mls_dignity_health_sports_park"),
"LAFC": ("team_mls_lafc", "Los Angeles FC", "Los Angeles", "stadium_mls_bmo_stadium"),
"MIA": ("team_mls_mia", "Inter Miami", "Miami", "stadium_mls_chase_stadium"),
"MIN": ("team_mls_min", "Minnesota United", "Minnesota", "stadium_mls_allianz_field"),
"MTL": ("team_mls_mtl", "CF Montreal", "Montreal", "stadium_mls_stade_saputo"),
"NSH": ("team_mls_nsh", "Nashville SC", "Nashville", "stadium_mls_geodis_park"),
"NE": ("team_mls_ne", "New England Revolution", "New England", "stadium_mls_gillette_stadium"),
"NYC": ("team_mls_nyc", "New York City FC", "New York", "stadium_mls_yankee_stadium"),
"RB": ("team_mls_ny", "New York Red Bulls", "New York", "stadium_mls_red_bull_arena"),
"RBNY": ("team_mls_ny", "New York Red Bulls", "New York", "stadium_mls_red_bull_arena"),
"ORL": ("team_mls_orl", "Orlando City", "Orlando", "stadium_mls_interco_stadium"),
"PHI": ("team_mls_phi", "Philadelphia Union", "Philadelphia", "stadium_mls_subaru_park"),
"POR": ("team_mls_por", "Portland Timbers", "Portland", "stadium_mls_providence_park"),
"SLC": ("team_mls_slc", "Real Salt Lake", "Salt Lake", "stadium_mls_america_first_field"),
"RSL": ("team_mls_slc", "Real Salt Lake", "Salt Lake", "stadium_mls_america_first_field"),
"SJ": ("team_mls_sj", "San Jose Earthquakes", "San Jose", "stadium_mls_paypal_park"),
"SD": ("team_mls_sd", "San Diego FC", "San Diego", "stadium_mls_snapdragon_stadium"),
"SEA": ("team_mls_sea", "Seattle Sounders", "Seattle", "stadium_mls_lumen_field"),
"SKC": ("team_mls_skc", "Sporting Kansas City", "Kansas City", "stadium_mls_childrens_mercy_park"),
"STL": ("team_mls_stl", "St. Louis City SC", "St. Louis", "stadium_mls_citypark"),
"TOR": ("team_mls_tor", "Toronto FC", "Toronto", "stadium_mls_bmo_field"),
"VAN": ("team_mls_van", "Vancouver Whitecaps", "Vancouver", "stadium_mls_bc_place"),
},
"wnba": {
"ATL": ("team_wnba_atl", "Atlanta Dream", "Atlanta", "stadium_wnba_gateway_center_arena"),
"DREAM": ("team_wnba_atl", "Atlanta Dream", "Atlanta", "stadium_wnba_gateway_center_arena"), # alias
"CHI": ("team_wnba_chi", "Chicago Sky", "Chicago", "stadium_wnba_wintrust_arena"),
"SKY": ("team_wnba_chi", "Chicago Sky", "Chicago", "stadium_wnba_wintrust_arena"), # alias
"CON": ("team_wnba_con", "Connecticut Sun", "Connecticut", "stadium_wnba_mohegan_sun_arena"),
"CONN": ("team_wnba_con", "Connecticut Sun", "Connecticut", "stadium_wnba_mohegan_sun_arena"), # alias
"SUN": ("team_wnba_con", "Connecticut Sun", "Connecticut", "stadium_wnba_mohegan_sun_arena"), # alias
"DAL": ("team_wnba_dal", "Dallas Wings", "Dallas", "stadium_wnba_college_park_center"),
"WINGS": ("team_wnba_dal", "Dallas Wings", "Dallas", "stadium_wnba_college_park_center"), # alias
"GSV": ("team_wnba_gsv", "Golden State Valkyries", "Golden State", "stadium_wnba_chase_center"),
"GS": ("team_wnba_gsv", "Golden State Valkyries", "Golden State", "stadium_wnba_chase_center"), # alias
"VAL": ("team_wnba_gsv", "Golden State Valkyries", "Golden State", "stadium_wnba_chase_center"), # alias
"IND": ("team_wnba_ind", "Indiana Fever", "Indiana", "stadium_wnba_gainbridge_fieldhouse"),
"FEVER": ("team_wnba_ind", "Indiana Fever", "Indiana", "stadium_wnba_gainbridge_fieldhouse"), # alias
"LV": ("team_wnba_lv", "Las Vegas Aces", "Las Vegas", "stadium_wnba_michelob_ultra_arena"),
"LVA": ("team_wnba_lv", "Las Vegas Aces", "Las Vegas", "stadium_wnba_michelob_ultra_arena"), # alias
"ACES": ("team_wnba_lv", "Las Vegas Aces", "Las Vegas", "stadium_wnba_michelob_ultra_arena"), # alias
"LA": ("team_wnba_la", "Los Angeles Sparks", "Los Angeles", "stadium_wnba_cryptocom_arena"),
"LAS": ("team_wnba_la", "Los Angeles Sparks", "Los Angeles", "stadium_wnba_cryptocom_arena"), # alias
"SPARKS": ("team_wnba_la", "Los Angeles Sparks", "Los Angeles", "stadium_wnba_cryptocom_arena"), # alias
"MIN": ("team_wnba_min", "Minnesota Lynx", "Minnesota", "stadium_wnba_target_center"),
"LYNX": ("team_wnba_min", "Minnesota Lynx", "Minnesota", "stadium_wnba_target_center"), # alias
"NY": ("team_wnba_ny", "New York Liberty", "New York", "stadium_wnba_barclays_center"),
"NYL": ("team_wnba_ny", "New York Liberty", "New York", "stadium_wnba_barclays_center"), # alias
"LIB": ("team_wnba_ny", "New York Liberty", "New York", "stadium_wnba_barclays_center"), # alias
"PHX": ("team_wnba_phx", "Phoenix Mercury", "Phoenix", "stadium_wnba_footprint_center"),
"PHO": ("team_wnba_phx", "Phoenix Mercury", "Phoenix", "stadium_wnba_footprint_center"), # alias
"MERCURY": ("team_wnba_phx", "Phoenix Mercury", "Phoenix", "stadium_wnba_footprint_center"), # alias
"SEA": ("team_wnba_sea", "Seattle Storm", "Seattle", "stadium_wnba_climate_pledge_arena"),
"STORM": ("team_wnba_sea", "Seattle Storm", "Seattle", "stadium_wnba_climate_pledge_arena"), # alias
"WAS": ("team_wnba_was", "Washington Mystics", "Washington", "stadium_wnba_entertainment_sports_arena"),
"WSH": ("team_wnba_was", "Washington Mystics", "Washington", "stadium_wnba_entertainment_sports_arena"), # alias
"MYSTICS": ("team_wnba_was", "Washington Mystics", "Washington", "stadium_wnba_entertainment_sports_arena"), # alias
},
"nwsl": {
# Canonical IDs aligned with teams_canonical.json
"ANG": ("team_nwsl_ang", "Angel City FC", "Los Angeles", "stadium_nwsl_bmo_stadium"),
"ANF": ("team_nwsl_ang", "Angel City FC", "Los Angeles", "stadium_nwsl_bmo_stadium"), # alias
"CHI": ("team_nwsl_chi", "Chicago Red Stars", "Chicago", "stadium_nwsl_seatgeek_stadium"),
"HOU": ("team_nwsl_hou", "Houston Dash", "Houston", "stadium_nwsl_shell_energy_stadium"),
"KCC": ("team_nwsl_kcc", "Kansas City Current", "Kansas City", "stadium_nwsl_cpkc_stadium"),
"KC": ("team_nwsl_kcc", "Kansas City Current", "Kansas City", "stadium_nwsl_cpkc_stadium"), # alias
"NJY": ("team_nwsl_njy", "NJ/NY Gotham FC", "New Jersey", "stadium_nwsl_red_bull_arena"),
"NJ": ("team_nwsl_njy", "NJ/NY Gotham FC", "New Jersey", "stadium_nwsl_red_bull_arena"), # alias
"NCC": ("team_nwsl_ncc", "North Carolina Courage", "North Carolina", "stadium_nwsl_wakemed_soccer_park"),
"NC": ("team_nwsl_ncc", "North Carolina Courage", "North Carolina", "stadium_nwsl_wakemed_soccer_park"), # alias
"ORL": ("team_nwsl_orl", "Orlando Pride", "Orlando", "stadium_nwsl_interco_stadium"),
"POR": ("team_nwsl_por", "Portland Thorns", "Portland", "stadium_nwsl_providence_park"),
"RGN": ("team_nwsl_rgn", "Racing Louisville", "Louisville", "stadium_nwsl_lynn_family_stadium"),
"SDW": ("team_nwsl_sdw", "San Diego Wave", "San Diego", "stadium_nwsl_snapdragon_stadium"),
"SD": ("team_nwsl_sdw", "San Diego Wave", "San Diego", "stadium_nwsl_snapdragon_stadium"), # alias
"SEA": ("team_nwsl_sea", "Seattle Reign", "Seattle", "stadium_nwsl_lumen_field"),
"UTA": ("team_nwsl_uta", "Utah Royals", "Utah", "stadium_nwsl_america_first_field"),
"SLC": ("team_nwsl_uta", "Utah Royals", "Utah", "stadium_nwsl_america_first_field"), # alias
"WSH": ("team_nwsl_wsh", "Washington Spirit", "Washington", "stadium_nwsl_audi_field"),
"WAS": ("team_nwsl_wsh", "Washington Spirit", "Washington", "stadium_nwsl_audi_field"), # alias
"BAY": ("team_nwsl_bay", "Bay FC", "San Francisco", "stadium_nwsl_paypal_park"),
"BFC": ("team_nwsl_bay", "Bay FC", "San Francisco", "stadium_nwsl_paypal_park"), # alias
# Expansion teams (2026) - need to be added to teams_canonical.json
"BOS": ("team_nwsl_bos", "Boston Legacy FC", "Boston", "stadium_nwsl_gillette_stadium"),
"DEN": ("team_nwsl_den", "Denver Summit FC", "Denver", "stadium_nwsl_dicks_sporting_goods_park"),
},
}
class TeamResolver:
"""Resolves team names to canonical IDs.
Resolution order:
1. Exact match against abbreviation mappings
2. Exact match against full team names
3. Alias lookup (with date awareness)
4. Fuzzy match against all known names
5. Unresolved (returns ManualReviewItem)
"""
def __init__(
self,
sport: str,
alias_loader: Optional[TeamAliasLoader] = None,
fuzzy_threshold: int = FUZZY_MATCH_THRESHOLD,
):
"""Initialize the resolver.
Args:
sport: Sport code (e.g., 'nba', 'mlb')
alias_loader: Team alias loader (default: global loader)
fuzzy_threshold: Minimum fuzzy match score
"""
self.sport = sport.lower()
self.alias_loader = alias_loader or get_team_alias_loader()
self.fuzzy_threshold = fuzzy_threshold
self._mappings = TEAM_MAPPINGS.get(self.sport, {})
# Build match candidates for fuzzy matching
self._candidates = self._build_candidates()
def _build_candidates(self) -> list[MatchCandidate]:
"""Build match candidates from team mappings."""
# Group by canonical ID to avoid duplicates
by_id: dict[str, tuple[str, list[str]]] = {}
for abbrev, (canonical_id, full_name, city, stadium_id) in self._mappings.items():
if canonical_id not in by_id:
by_id[canonical_id] = (full_name, [])
# Add abbreviation as alias
by_id[canonical_id][1].append(abbrev)
by_id[canonical_id][1].append(city)
return [
MatchCandidate(
canonical_id=cid,
name=name,
aliases=list(set(aliases)), # Dedupe
)
for cid, (name, aliases) in by_id.items()
]
def resolve(
self,
value: str,
check_date: Optional[date] = None,
source_url: Optional[str] = None,
) -> TeamResolveResult:
"""Resolve a team name to a canonical ID.
Args:
value: Team name, abbreviation, or city to resolve
check_date: Date for alias validity (None = today)
source_url: Source URL for manual review items
Returns:
TeamResolveResult with resolution details
"""
value_upper = value.upper().strip()
value_lower = value.lower().strip()
# 1. Exact match against abbreviation
if value_upper in self._mappings:
canonical_id, full_name, _, _ = self._mappings[value_upper]
return TeamResolveResult(
canonical_id=canonical_id,
confidence=100,
match_type="exact",
)
# 2. Exact match against full names
for abbrev, (canonical_id, full_name, city, stadium_id) in self._mappings.items():
if value_lower == full_name.lower() or value_lower == city.lower():
return TeamResolveResult(
canonical_id=canonical_id,
confidence=100,
match_type="exact",
)
# 3. Alias lookup
alias_result = self.alias_loader.resolve(value, check_date)
if alias_result:
return TeamResolveResult(
canonical_id=alias_result,
confidence=95,
match_type="alias",
)
# 4. Fuzzy match
matches = fuzzy_match_team(
value,
self._candidates,
threshold=self.fuzzy_threshold,
)
if matches:
best = matches[0]
review_item = None
# Create review item for low confidence matches
if best.confidence < 90:
review_item = ManualReviewItem(
id=f"team_{uuid4().hex[:8]}",
reason=ReviewReason.LOW_CONFIDENCE_MATCH,
sport=self.sport,
raw_value=value,
context={"match_type": "fuzzy"},
source_url=source_url,
suggested_matches=matches,
game_date=check_date,
)
return TeamResolveResult(
canonical_id=best.canonical_id,
confidence=best.confidence,
match_type="fuzzy",
review_item=review_item,
)
# 5. Unresolved
review_item = ManualReviewItem(
id=f"team_{uuid4().hex[:8]}",
reason=ReviewReason.UNRESOLVED_TEAM,
sport=self.sport,
raw_value=value,
context={},
source_url=source_url,
suggested_matches=fuzzy_match_team(
value,
self._candidates,
threshold=50, # Lower threshold for suggestions
top_n=5,
),
game_date=check_date,
)
return TeamResolveResult(
canonical_id=None,
confidence=0,
match_type="unresolved",
review_item=review_item,
)
def get_team_info(self, abbreviation: str) -> Optional[tuple[str, str, str, str]]:
"""Get team info by abbreviation.
Args:
abbreviation: Team abbreviation
Returns:
Tuple of (canonical_id, full_name, city, stadium_id) or None
"""
return self._mappings.get(abbreviation.upper())
def get_all_teams(self) -> list[tuple[str, str, str, str]]:
"""Get all teams for this sport.
Returns:
List of (canonical_id, full_name, city, stadium_id) tuples
"""
seen = set()
result = []
for abbrev, (canonical_id, full_name, city, stadium_id) in self._mappings.items():
if canonical_id not in seen:
seen.add(canonical_id)
result.append((canonical_id, full_name, city, stadium_id))
return result
# Cached resolvers
_resolvers: dict[str, TeamResolver] = {}
def get_team_resolver(sport: str) -> TeamResolver:
"""Get or create a team resolver for a sport."""
sport_lower = sport.lower()
if sport_lower not in _resolvers:
_resolvers[sport_lower] = TeamResolver(sport_lower)
return _resolvers[sport_lower]
def resolve_team(
sport: str,
value: str,
check_date: Optional[date] = None,
) -> TeamResolveResult:
"""Convenience function to resolve a team name.
Args:
sport: Sport code
value: Team name to resolve
check_date: Date for alias validity
Returns:
TeamResolveResult
"""
return get_team_resolver(sport).resolve(value, check_date)

View File

@@ -1,344 +0,0 @@
"""Timezone conversion utilities for normalizing game times to UTC."""
import re
from dataclasses import dataclass
from datetime import datetime, date, time
from typing import Optional
from zoneinfo import ZoneInfo
from dateutil import parser as dateutil_parser
from dateutil.tz import gettz, tzutc
from ..models.aliases import ReviewReason, ManualReviewItem
# Common timezone abbreviations to IANA timezones
TIMEZONE_ABBREV_MAP: dict[str, str] = {
# US timezones
"ET": "America/New_York",
"EST": "America/New_York",
"EDT": "America/New_York",
"CT": "America/Chicago",
"CST": "America/Chicago",
"CDT": "America/Chicago",
"MT": "America/Denver",
"MST": "America/Denver",
"MDT": "America/Denver",
"PT": "America/Los_Angeles",
"PST": "America/Los_Angeles",
"PDT": "America/Los_Angeles",
"AT": "America/Anchorage",
"AKST": "America/Anchorage",
"AKDT": "America/Anchorage",
"HT": "Pacific/Honolulu",
"HST": "Pacific/Honolulu",
# Canada
"AST": "America/Halifax",
"ADT": "America/Halifax",
"NST": "America/St_Johns",
"NDT": "America/St_Johns",
# Mexico
"CDST": "America/Mexico_City",
# UTC
"UTC": "UTC",
"GMT": "UTC",
"Z": "UTC",
}
# State/region to timezone mapping for inferring timezone from location
STATE_TIMEZONE_MAP: dict[str, str] = {
# Eastern
"CT": "America/New_York",
"DE": "America/New_York",
"FL": "America/New_York", # Most of Florida
"GA": "America/New_York",
"MA": "America/New_York",
"MD": "America/New_York",
"ME": "America/New_York",
"MI": "America/Detroit",
"NC": "America/New_York",
"NH": "America/New_York",
"NJ": "America/New_York",
"NY": "America/New_York",
"OH": "America/New_York",
"PA": "America/New_York",
"RI": "America/New_York",
"SC": "America/New_York",
"VA": "America/New_York",
"VT": "America/New_York",
"WV": "America/New_York",
"DC": "America/New_York",
# Central
"AL": "America/Chicago",
"AR": "America/Chicago",
"IA": "America/Chicago",
"IL": "America/Chicago",
"IN": "America/Indiana/Indianapolis",
"KS": "America/Chicago",
"KY": "America/Kentucky/Louisville",
"LA": "America/Chicago",
"MN": "America/Chicago",
"MO": "America/Chicago",
"MS": "America/Chicago",
"ND": "America/Chicago",
"NE": "America/Chicago",
"OK": "America/Chicago",
"SD": "America/Chicago",
"TN": "America/Chicago",
"TX": "America/Chicago",
"WI": "America/Chicago",
# Mountain
"AZ": "America/Phoenix", # No DST
"CO": "America/Denver",
"ID": "America/Boise",
"MT": "America/Denver",
"NM": "America/Denver",
"UT": "America/Denver",
"WY": "America/Denver",
# Pacific
"CA": "America/Los_Angeles",
"NV": "America/Los_Angeles",
"OR": "America/Los_Angeles",
"WA": "America/Los_Angeles",
# Alaska/Hawaii
"AK": "America/Anchorage",
"HI": "Pacific/Honolulu",
# Canada provinces
"ON": "America/Toronto",
"QC": "America/Montreal",
"BC": "America/Vancouver",
"AB": "America/Edmonton",
"MB": "America/Winnipeg",
"SK": "America/Regina",
"NS": "America/Halifax",
"NB": "America/Moncton",
"NL": "America/St_Johns",
"PE": "America/Halifax",
}
@dataclass
class TimezoneResult:
"""Result of timezone conversion.
Attributes:
datetime_utc: The datetime converted to UTC
source_timezone: The timezone that was detected/used
confidence: Confidence in the timezone detection ('high', 'medium', 'low')
warning: Warning message if timezone was uncertain
"""
datetime_utc: datetime
source_timezone: str
confidence: str
warning: Optional[str] = None
def detect_timezone_from_string(time_str: str) -> Optional[str]:
"""Detect timezone from a time string containing a timezone abbreviation.
Args:
time_str: Time string that may contain timezone info (e.g., '7:00 PM ET')
Returns:
IANA timezone string if detected, None otherwise
"""
# Look for timezone abbreviation at end of string
for abbrev, tz in TIMEZONE_ABBREV_MAP.items():
pattern = rf"\b{abbrev}\b"
if re.search(pattern, time_str, re.IGNORECASE):
return tz
return None
def detect_timezone_from_location(
state: Optional[str] = None,
city: Optional[str] = None,
) -> Optional[str]:
"""Detect timezone from location information.
Args:
state: State/province code (e.g., 'NY', 'ON')
city: City name (optional, for special cases)
Returns:
IANA timezone string if detected, None otherwise
"""
if state and state.upper() in STATE_TIMEZONE_MAP:
return STATE_TIMEZONE_MAP[state.upper()]
return None
def parse_datetime(
date_str: str,
time_str: Optional[str] = None,
timezone_hint: Optional[str] = None,
location_state: Optional[str] = None,
) -> TimezoneResult:
"""Parse a date/time string and convert to UTC.
Attempts to detect timezone from:
1. Explicit timezone in the string
2. Provided timezone hint
3. Location-based inference
4. Default to Eastern Time with warning
Args:
date_str: Date string (e.g., '2025-10-21', 'October 21, 2025')
time_str: Optional time string (e.g., '7:00 PM ET', '19:00')
timezone_hint: Optional IANA timezone to use if not detected
location_state: Optional state code for timezone inference
Returns:
TimezoneResult with UTC datetime and metadata
"""
# Parse the date
try:
if time_str:
# Combine date and time
full_str = f"{date_str} {time_str}"
else:
full_str = date_str
parsed = dateutil_parser.parse(full_str, fuzzy=True)
except (ValueError, OverflowError) as e:
# If parsing fails, return a placeholder with low confidence
return TimezoneResult(
datetime_utc=datetime.now(tz=ZoneInfo("UTC")),
source_timezone="unknown",
confidence="low",
warning=f"Failed to parse datetime: {e}",
)
# Determine timezone
detected_tz = None
confidence = "high"
warning = None
# Check if datetime already has timezone
if parsed.tzinfo is not None:
detected_tz = str(parsed.tzinfo)
else:
# Try to detect from time string
if time_str:
detected_tz = detect_timezone_from_string(time_str)
# Try timezone hint
if not detected_tz and timezone_hint:
detected_tz = timezone_hint
confidence = "medium"
# Try location inference
if not detected_tz and location_state:
detected_tz = detect_timezone_from_location(state=location_state)
confidence = "medium"
# Default to Eastern Time
if not detected_tz:
detected_tz = "America/New_York"
confidence = "low"
warning = "Timezone not detected, defaulting to Eastern Time"
# Apply timezone and convert to UTC
try:
tz = ZoneInfo(detected_tz)
except KeyError:
# Invalid timezone, try to resolve abbreviation
if detected_tz in TIMEZONE_ABBREV_MAP:
tz = ZoneInfo(TIMEZONE_ABBREV_MAP[detected_tz])
detected_tz = TIMEZONE_ABBREV_MAP[detected_tz]
else:
tz = ZoneInfo("America/New_York")
confidence = "low"
warning = f"Unknown timezone '{detected_tz}', defaulting to Eastern Time"
detected_tz = "America/New_York"
# Apply timezone if not already set
if parsed.tzinfo is None:
parsed = parsed.replace(tzinfo=tz)
# Convert to UTC
utc_dt = parsed.astimezone(ZoneInfo("UTC"))
return TimezoneResult(
datetime_utc=utc_dt,
source_timezone=detected_tz,
confidence=confidence,
warning=warning,
)
def convert_to_utc(
dt: datetime,
source_timezone: str,
) -> datetime:
"""Convert a datetime from a known timezone to UTC.
Args:
dt: Datetime to convert (timezone-naive or timezone-aware)
source_timezone: IANA timezone of the datetime
Returns:
Datetime in UTC
"""
tz = ZoneInfo(source_timezone)
if dt.tzinfo is None:
# Localize naive datetime
dt = dt.replace(tzinfo=tz)
return dt.astimezone(ZoneInfo("UTC"))
def create_timezone_warning(
raw_value: str,
sport: str,
game_date: Optional[date] = None,
source_url: Optional[str] = None,
) -> ManualReviewItem:
"""Create a manual review item for an undetermined timezone.
Args:
raw_value: The original time string that couldn't be resolved
sport: Sport code
game_date: Date of the game
source_url: URL of the source page
Returns:
ManualReviewItem for timezone review
"""
return ManualReviewItem(
id=f"tz_{sport}_{raw_value[:20].replace(' ', '_')}",
reason=ReviewReason.TIMEZONE_UNKNOWN,
sport=sport,
raw_value=raw_value,
context={"issue": "Could not determine timezone for game time"},
source_url=source_url,
game_date=game_date,
)
def get_stadium_timezone(
stadium_state: str,
stadium_timezone: Optional[str] = None,
) -> str:
"""Get the timezone for a stadium based on its location.
Args:
stadium_state: State/province code
stadium_timezone: Explicit timezone override from stadium data
Returns:
IANA timezone string
"""
if stadium_timezone:
return stadium_timezone
tz = detect_timezone_from_location(state=stadium_state)
if tz:
return tz
# Default to Eastern
return "America/New_York"

View File

@@ -1,46 +0,0 @@
"""Scrapers for fetching sports data from various sources."""
from .base import (
BaseScraper,
RawGameData,
ScrapeResult,
ScraperError,
PartialDataError,
)
from .nba import NBAScraper, create_nba_scraper
from .mlb import MLBScraper, create_mlb_scraper
from .nfl import NFLScraper, create_nfl_scraper
from .nhl import NHLScraper, create_nhl_scraper
from .mls import MLSScraper, create_mls_scraper
from .wnba import WNBAScraper, create_wnba_scraper
from .nwsl import NWSLScraper, create_nwsl_scraper
__all__ = [
# Base
"BaseScraper",
"RawGameData",
"ScrapeResult",
"ScraperError",
"PartialDataError",
# NBA
"NBAScraper",
"create_nba_scraper",
# MLB
"MLBScraper",
"create_mlb_scraper",
# NFL
"NFLScraper",
"create_nfl_scraper",
# NHL
"NHLScraper",
"create_nhl_scraper",
# MLS
"MLSScraper",
"create_mls_scraper",
# WNBA
"WNBAScraper",
"create_wnba_scraper",
# NWSL
"NWSLScraper",
"create_nwsl_scraper",
]

View File

@@ -1,335 +0,0 @@
"""Base scraper class for all sport scrapers."""
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from datetime import date, datetime
from typing import Optional
from ..config import EXPECTED_GAME_COUNTS
from ..models.game import Game
from ..models.team import Team
from ..models.stadium import Stadium
from ..models.aliases import ManualReviewItem
from ..utils.http import RateLimitedSession, get_session
from ..utils.logging import get_logger, log_error, log_warning
from ..utils.progress import ScrapeProgress
@dataclass
class RawGameData:
"""Raw game data before normalization.
This intermediate format holds data as scraped from sources,
before team/stadium resolution and canonical ID generation.
"""
game_date: datetime
home_team_raw: str
away_team_raw: str
stadium_raw: Optional[str] = None
home_score: Optional[int] = None
away_score: Optional[int] = None
status: str = "scheduled"
source_url: Optional[str] = None
game_number: Optional[int] = None # For doubleheaders
@dataclass
class ScrapeResult:
"""Result of a scraping operation.
Attributes:
games: List of normalized Game objects
teams: List of Team objects
stadiums: List of Stadium objects
review_items: Items requiring manual review
source: Name of the source used
success: Whether scraping succeeded
error_message: Error message if failed
"""
games: list[Game] = field(default_factory=list)
teams: list[Team] = field(default_factory=list)
stadiums: list[Stadium] = field(default_factory=list)
review_items: list[ManualReviewItem] = field(default_factory=list)
source: str = ""
success: bool = True
error_message: Optional[str] = None
@property
def game_count(self) -> int:
return len(self.games)
@property
def team_count(self) -> int:
return len(self.teams)
@property
def stadium_count(self) -> int:
return len(self.stadiums)
@property
def review_count(self) -> int:
return len(self.review_items)
class BaseScraper(ABC):
"""Abstract base class for sport scrapers.
Subclasses must implement:
- scrape_games(): Fetch and normalize game schedule
- scrape_teams(): Fetch team information
- scrape_stadiums(): Fetch stadium information
- _get_sources(): Return list of source names in priority order
Features:
- Multi-source fallback (try sources in order)
- Built-in rate limiting
- Error handling with partial data discard
- Progress tracking
- Source URL tracking for manual review
"""
def __init__(
self,
sport: str,
season: int,
session: Optional[RateLimitedSession] = None,
):
"""Initialize the scraper.
Args:
sport: Sport code (e.g., 'nba', 'mlb')
season: Season start year (e.g., 2025 for 2025-26)
session: Optional HTTP session (default: global session)
"""
self.sport = sport.lower()
self.season = season
self.session = session or get_session()
self._logger = get_logger()
self._progress: Optional[ScrapeProgress] = None
@property
def expected_game_count(self) -> int:
"""Get expected number of games for this sport."""
return EXPECTED_GAME_COUNTS.get(self.sport, 0)
@abstractmethod
def _get_sources(self) -> list[str]:
"""Return list of source names in priority order.
Returns:
List of source identifiers (e.g., ['basketball_reference', 'espn', 'cbs'])
"""
pass
@abstractmethod
def _scrape_games_from_source(
self,
source: str,
) -> list[RawGameData]:
"""Scrape games from a specific source.
Args:
source: Source identifier
Returns:
List of raw game data
Raises:
Exception: If scraping fails
"""
pass
@abstractmethod
def _normalize_games(
self,
raw_games: list[RawGameData],
) -> tuple[list[Game], list[ManualReviewItem]]:
"""Normalize raw game data to Game objects.
Args:
raw_games: Raw scraped data
Returns:
Tuple of (normalized games, review items)
"""
pass
@abstractmethod
def scrape_teams(self) -> list[Team]:
"""Fetch team information.
Returns:
List of Team objects
"""
pass
@abstractmethod
def scrape_stadiums(self) -> list[Stadium]:
"""Fetch stadium information.
Returns:
List of Stadium objects
"""
pass
def scrape_games(self) -> ScrapeResult:
"""Scrape games with multi-source fallback.
Tries each source in priority order. On failure, discards
partial data and tries the next source.
Returns:
ScrapeResult with games, review items, and status
"""
sources = self._get_sources()
last_error: Optional[str] = None
sources_tried = 0
# Allow 3 sources to be tried. This enables NHL to fall back to NHL API
# for venue data since Hockey Reference doesn't provide it.
max_sources_to_try = 3
for source in sources:
self._logger.info(f"Trying source: {source}")
sources_tried += 1
try:
# Scrape raw data
raw_games = self._scrape_games_from_source(source)
if not raw_games:
log_warning(f"No games found from {source}")
# If multiple sources return nothing, the schedule likely doesn't exist
if sources_tried >= max_sources_to_try:
return ScrapeResult(
success=False,
error_message=f"No schedule data available (tried {sources_tried} sources)",
)
continue
self._logger.info(f"Found {len(raw_games)} raw games from {source}")
# Normalize data
games, review_items = self._normalize_games(raw_games)
self._logger.info(
f"Normalized {len(games)} games, {len(review_items)} need review"
)
return ScrapeResult(
games=games,
review_items=review_items,
source=source,
success=True,
)
except Exception as e:
last_error = str(e)
log_error(f"Failed to scrape from {source}: {e}", exc_info=True)
# If we've tried enough sources, bail out
if sources_tried >= max_sources_to_try:
break
continue
# All sources failed
return ScrapeResult(
success=False,
error_message=f"All sources failed. Last error: {last_error}",
)
def scrape_all(self) -> ScrapeResult:
"""Scrape games, teams, and stadiums.
Returns:
Complete ScrapeResult with all data
"""
self._progress = ScrapeProgress(self.sport, self.season)
self._progress.start()
try:
# Scrape games
result = self.scrape_games()
if not result.success:
self._progress.log_error(result.error_message or "Unknown error")
self._progress.finish()
return result
# Scrape teams
teams = self.scrape_teams()
result.teams = teams
# Scrape stadiums
stadiums = self.scrape_stadiums()
result.stadiums = stadiums
# Update progress
self._progress.games_count = result.game_count
self._progress.teams_count = result.team_count
self._progress.stadiums_count = result.stadium_count
self._progress.errors_count = result.review_count
self._progress.finish()
return result
except Exception as e:
log_error(f"Scraping failed: {e}", exc_info=True)
self._progress.finish()
return ScrapeResult(
success=False,
error_message=str(e),
)
def _get_season_months(self) -> list[tuple[int, int]]:
"""Get the months to scrape for this sport's season.
Returns:
List of (year, month) tuples
"""
# Default implementation for sports with fall-spring seasons
# (NBA, NHL, etc.)
months = []
# Fall months of season start year
for month in range(10, 13): # Oct-Dec
months.append((self.season, month))
# Winter-spring months of following year
for month in range(1, 7): # Jan-Jun
months.append((self.season + 1, month))
return months
def _get_source_url(self, source: str, **kwargs) -> str:
"""Build a source URL with parameters.
Subclasses should override this to build URLs for their sources.
Args:
source: Source identifier
**kwargs: URL parameters
Returns:
Complete URL string
"""
raise NotImplementedError(f"URL builder not implemented for {source}")
class ScraperError(Exception):
"""Exception raised when scraping fails."""
def __init__(self, source: str, message: str):
self.source = source
self.message = message
super().__init__(f"[{source}] {message}")
class PartialDataError(ScraperError):
"""Exception raised when only partial data was retrieved."""
def __init__(self, source: str, message: str, partial_count: int):
self.partial_count = partial_count
super().__init__(source, f"{message} (got {partial_count} items)")

View File

@@ -1,717 +0,0 @@
"""MLB scraper implementation with multi-source fallback."""
from datetime import datetime, date, timedelta
from typing import Optional
from zoneinfo import ZoneInfo
from bs4 import BeautifulSoup
from .base import BaseScraper, RawGameData, ScrapeResult
from ..models.game import Game
from ..models.team import Team
from ..models.stadium import Stadium
from ..models.aliases import ManualReviewItem
from ..normalizers.canonical_id import generate_game_id
from ..normalizers.team_resolver import (
TeamResolver,
TEAM_MAPPINGS,
get_team_resolver,
)
from ..normalizers.stadium_resolver import (
StadiumResolver,
STADIUM_MAPPINGS,
get_stadium_resolver,
)
from ..normalizers.timezone import parse_datetime
from ..utils.logging import get_logger, log_game, log_warning
class MLBScraper(BaseScraper):
"""MLB schedule scraper with multi-source fallback.
Sources (in priority order):
1. MLB Stats API - Official MLB data (primary)
2. ESPN API - Backup option
3. Baseball-Reference - Complete historical data
Source Timezones:
- mlb_api: UTC - ISO 8601 format with "Z" suffix (gameDate field)
- espn: UTC - ISO 8601 format with "Z" suffix
- baseball_reference: Eastern Time (ET) - times displayed as "7:05 PM ET"
"""
def __init__(self, season: int, **kwargs):
"""Initialize MLB scraper.
Args:
season: Season year (e.g., 2026 for 2026 season)
"""
super().__init__("mlb", season, **kwargs)
self._team_resolver = get_team_resolver("mlb")
self._stadium_resolver = get_stadium_resolver("mlb")
def _get_sources(self) -> list[str]:
"""Return source list in priority order."""
# MLB API is best - returns full schedule in one request
# ESPN caps at ~25 results for baseball
# Baseball-Reference requires HTML parsing
return ["mlb_api", "espn", "baseball_reference"]
def _get_source_url(self, source: str, **kwargs) -> str:
"""Build URL for a source."""
if source == "baseball_reference":
month = kwargs.get("month", "april")
# Baseball-Reference uses season year in URL
return f"https://www.baseball-reference.com/leagues/majors/{self.season}-schedule.shtml"
elif source == "mlb_api":
start_date = kwargs.get("start_date", "")
end_date = kwargs.get("end_date", "")
return f"https://statsapi.mlb.com/api/v1/schedule?sportId=1&startDate={start_date}&endDate={end_date}"
elif source == "espn":
date_str = kwargs.get("date", "")
return f"https://site.api.espn.com/apis/site/v2/sports/baseball/mlb/scoreboard?dates={date_str}"
raise ValueError(f"Unknown source: {source}")
def _get_season_months(self) -> list[tuple[int, int]]:
"""Get the months to scrape for MLB season.
MLB season runs March/April through October/November.
"""
months = []
# Spring training / early season
for month in range(3, 12): # March-November
months.append((self.season, month))
return months
def _scrape_games_from_source(self, source: str) -> list[RawGameData]:
"""Scrape games from a specific source."""
if source == "baseball_reference":
return self._scrape_baseball_reference()
elif source == "mlb_api":
return self._scrape_mlb_api()
elif source == "espn":
return self._scrape_espn()
else:
raise ValueError(f"Unknown source: {source}")
def _scrape_baseball_reference(self) -> list[RawGameData]:
"""Scrape games from Baseball-Reference.
BR has a single schedule page per season.
Format: https://www.baseball-reference.com/leagues/majors/YYYY-schedule.shtml
"""
url = self._get_source_url("baseball_reference")
try:
html = self.session.get_html(url)
games = self._parse_baseball_reference(html, url)
return games
except Exception as e:
self._logger.error(f"Failed to scrape Baseball-Reference: {e}")
raise
def _parse_baseball_reference(
self,
html: str,
source_url: str,
) -> list[RawGameData]:
"""Parse Baseball-Reference schedule HTML.
Structure: Games are organized by date in div elements.
Each game row has: date, away team, away score, home team, home score, venue.
"""
soup = BeautifulSoup(html, "lxml")
games: list[RawGameData] = []
# Find all game divs - they use class "game" or similar
# Baseball-Reference uses <p class="game"> for each game
game_paragraphs = soup.find_all("p", class_="game")
current_date = None
for elem in soup.find_all(["h3", "p"]):
# H3 contains date headers
if elem.name == "h3":
date_text = elem.get_text(strip=True)
try:
# Format: "Thursday, April 1, 2026"
current_date = datetime.strptime(date_text, "%A, %B %d, %Y")
except ValueError:
continue
elif elem.name == "p" and "game" in elem.get("class", []):
if current_date is None:
continue
try:
# Extract game time from the element if present
# Baseball-Reference may have time in a span or in the text
game_time_for_row = None
time_elem = elem.find("span", class_="game_time")
if time_elem:
time_text = time_elem.get_text(strip=True)
if time_text:
try:
# Parse time like "7:05 PM ET" or "1:10 PM"
# Remove timezone suffix if present
time_clean = time_text.replace(" ET", "").replace(" PT", "").replace(" CT", "").replace(" MT", "").strip()
game_time_for_row = datetime.strptime(time_clean, "%I:%M %p")
except ValueError:
pass
game = self._parse_br_game(elem, current_date, source_url, game_time_for_row)
if game:
games.append(game)
except Exception as e:
self._logger.debug(f"Failed to parse game: {e}")
continue
return games
def _parse_br_game(
self,
elem,
game_date: datetime,
source_url: str,
game_time: Optional[datetime] = None,
) -> Optional[RawGameData]:
"""Parse a single Baseball-Reference game element."""
text = elem.get_text(" ", strip=True)
# Parse game text - formats vary:
# "Team A (5) @ Team B (3)" or "Team A @ Team B"
# Also handles doubleheader notation
# Find all links - usually team names
links = elem.find_all("a")
if len(links) < 2:
return None
# First link is away team, second is home team
away_team = links[0].get_text(strip=True)
home_team = links[1].get_text(strip=True)
# Try to extract scores from text
away_score = None
home_score = None
# Look for score pattern "(N)"
import re
score_pattern = r"\((\d+)\)"
scores = re.findall(score_pattern, text)
if len(scores) >= 2:
try:
away_score = int(scores[0])
home_score = int(scores[1])
except (ValueError, IndexError):
pass
# Determine status
status = "final" if home_score is not None else "scheduled"
# Check for postponed/cancelled
text_lower = text.lower()
if "postponed" in text_lower:
status = "postponed"
elif "cancelled" in text_lower or "canceled" in text_lower:
status = "cancelled"
# Extract venue if present (usually after @ symbol)
stadium = None
if len(links) > 2:
# Third link might be stadium
stadium = links[2].get_text(strip=True)
# Combine date and time if time was provided, with ET timezone (Baseball-Reference uses ET)
final_game_date = game_date
if game_time:
final_game_date = game_date.replace(
hour=game_time.hour,
minute=game_time.minute,
tzinfo=ZoneInfo("America/New_York"),
)
return RawGameData(
game_date=final_game_date,
home_team_raw=home_team,
away_team_raw=away_team,
stadium_raw=stadium,
home_score=home_score,
away_score=away_score,
status=status,
source_url=source_url,
)
def _scrape_mlb_api(self) -> list[RawGameData]:
"""Scrape games from MLB Stats API using full season query."""
# Build date range for entire season (March-November)
season_months = self._get_season_months()
start_year, start_month = season_months[0]
end_year, end_month = season_months[-1]
# Get last day of end month
if end_month == 12:
end_date = date(end_year + 1, 1, 1) - timedelta(days=1)
else:
end_date = date(end_year, end_month + 1, 1) - timedelta(days=1)
start_date = date(start_year, start_month, 1)
url = f"https://statsapi.mlb.com/api/v1/schedule?sportId=1&startDate={start_date.strftime('%Y-%m-%d')}&endDate={end_date.strftime('%Y-%m-%d')}"
self._logger.info(f"Fetching MLB schedule: {start_date} to {end_date}")
try:
data = self.session.get_json(url)
return self._parse_mlb_api_response(data, url)
except Exception as e:
self._logger.error(f"MLB API error: {e}")
return []
def _parse_mlb_api_response(
self,
data: dict,
source_url: str,
) -> list[RawGameData]:
"""Parse MLB Stats API response."""
games: list[RawGameData] = []
dates = data.get("dates", [])
for date_entry in dates:
for game in date_entry.get("games", []):
try:
raw_game = self._parse_mlb_api_game(game, source_url)
if raw_game:
games.append(raw_game)
except Exception as e:
self._logger.debug(f"Failed to parse MLB API game: {e}")
continue
return games
def _parse_mlb_api_game(
self,
game: dict,
source_url: str,
) -> Optional[RawGameData]:
"""Parse a single MLB API game."""
# Get game date/time
game_date_str = game.get("gameDate", "")
if not game_date_str:
return None
try:
game_date = datetime.fromisoformat(game_date_str.replace("Z", "+00:00"))
except ValueError:
return None
# Get teams
teams = game.get("teams", {})
away_data = teams.get("away", {})
home_data = teams.get("home", {})
away_team_info = away_data.get("team", {})
home_team_info = home_data.get("team", {})
away_team = away_team_info.get("name", "")
home_team = home_team_info.get("name", "")
if not away_team or not home_team:
return None
# Get scores
away_score = away_data.get("score")
home_score = home_data.get("score")
# Get venue
venue = game.get("venue", {})
stadium = venue.get("name")
# Get status
status_data = game.get("status", {})
abstract_game_state = status_data.get("abstractGameState", "").lower()
detailed_state = status_data.get("detailedState", "").lower()
if abstract_game_state == "final":
status = "final"
elif "postponed" in detailed_state:
status = "postponed"
elif "cancelled" in detailed_state or "canceled" in detailed_state:
status = "cancelled"
else:
status = "scheduled"
# Check for doubleheader
game_number = game.get("gameNumber")
if game.get("doubleHeader") == "Y":
game_number = game.get("gameNumber", 1)
return RawGameData(
game_date=game_date,
home_team_raw=home_team,
away_team_raw=away_team,
stadium_raw=stadium,
home_score=home_score,
away_score=away_score,
status=status,
source_url=source_url,
game_number=game_number if game.get("doubleHeader") == "Y" else None,
)
def _scrape_espn(self) -> list[RawGameData]:
"""Scrape games from ESPN API using date range query."""
# Build date range for entire season (March-November)
season_months = self._get_season_months()
start_year, start_month = season_months[0]
end_year, end_month = season_months[-1]
# Get last day of end month
if end_month == 12:
end_date = date(end_year + 1, 1, 1) - timedelta(days=1)
else:
end_date = date(end_year, end_month + 1, 1) - timedelta(days=1)
start_date = date(start_year, start_month, 1)
date_range = f"{start_date.strftime('%Y%m%d')}-{end_date.strftime('%Y%m%d')}"
url = f"https://site.api.espn.com/apis/site/v2/sports/baseball/mlb/scoreboard?limit=3000&dates={date_range}"
self._logger.info(f"Fetching MLB schedule: {date_range}")
try:
data = self.session.get_json(url)
return self._parse_espn_response(data, url)
except Exception as e:
self._logger.error(f"ESPN error: {e}")
return []
def _parse_espn_response(
self,
data: dict,
source_url: str,
) -> list[RawGameData]:
"""Parse ESPN API response."""
games: list[RawGameData] = []
events = data.get("events", [])
for event in events:
try:
game = self._parse_espn_event(event, source_url)
if game:
games.append(game)
except Exception as e:
self._logger.debug(f"Failed to parse ESPN event: {e}")
continue
return games
def _parse_espn_event(
self,
event: dict,
source_url: str,
) -> Optional[RawGameData]:
"""Parse a single ESPN event."""
# Get date
date_str = event.get("date", "")
if not date_str:
return None
try:
game_date = datetime.fromisoformat(date_str.replace("Z", "+00:00"))
except ValueError:
return None
# Get competitions
competitions = event.get("competitions", [])
if not competitions:
return None
competition = competitions[0]
# Get teams
competitors = competition.get("competitors", [])
if len(competitors) != 2:
return None
home_team = None
away_team = None
home_score = None
away_score = None
for competitor in competitors:
team_info = competitor.get("team", {})
team_name = team_info.get("displayName", "")
is_home = competitor.get("homeAway") == "home"
score = competitor.get("score")
if score:
try:
score = int(score)
except (ValueError, TypeError):
score = None
if is_home:
home_team = team_name
home_score = score
else:
away_team = team_name
away_score = score
if not home_team or not away_team:
return None
# Get venue
venue = competition.get("venue", {})
stadium = venue.get("fullName")
# Get status
status_info = competition.get("status", {})
status_type = status_info.get("type", {})
status_name = status_type.get("name", "").lower()
if status_name == "status_final":
status = "final"
elif status_name == "status_postponed":
status = "postponed"
elif status_name == "status_canceled":
status = "cancelled"
else:
status = "scheduled"
return RawGameData(
game_date=game_date,
home_team_raw=home_team,
away_team_raw=away_team,
stadium_raw=stadium,
home_score=home_score,
away_score=away_score,
status=status,
source_url=source_url,
)
def _normalize_games(
self,
raw_games: list[RawGameData],
) -> tuple[list[Game], list[ManualReviewItem]]:
"""Normalize raw games to Game objects with canonical IDs."""
games: list[Game] = []
review_items: list[ManualReviewItem] = []
# Track games by date/matchup for doubleheader detection
games_by_matchup: dict[str, list[RawGameData]] = {}
for raw in raw_games:
date_key = raw.game_date.strftime("%Y%m%d")
matchup_key = f"{date_key}_{raw.away_team_raw}_{raw.home_team_raw}"
if matchup_key not in games_by_matchup:
games_by_matchup[matchup_key] = []
games_by_matchup[matchup_key].append(raw)
# Process games with doubleheader detection
for matchup_key, matchup_games in games_by_matchup.items():
is_doubleheader = len(matchup_games) > 1
# Sort by time if doubleheader
if is_doubleheader:
matchup_games.sort(key=lambda g: g.game_date)
for i, raw in enumerate(matchup_games):
# Use provided game_number or calculate from order
game_number = raw.game_number or ((i + 1) if is_doubleheader else None)
game, item_reviews = self._normalize_single_game(raw, game_number)
if game:
games.append(game)
log_game(
self.sport,
game.id,
game.home_team_id,
game.away_team_id,
game.game_date.strftime("%Y-%m-%d"),
game.status,
)
review_items.extend(item_reviews)
return games, review_items
def _normalize_single_game(
self,
raw: RawGameData,
game_number: Optional[int],
) -> tuple[Optional[Game], list[ManualReviewItem]]:
"""Normalize a single raw game."""
review_items: list[ManualReviewItem] = []
# Resolve home team
home_result = self._team_resolver.resolve(
raw.home_team_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if home_result.review_item:
review_items.append(home_result.review_item)
if not home_result.canonical_id:
log_warning(f"Could not resolve home team: {raw.home_team_raw}")
return None, review_items
# Resolve away team
away_result = self._team_resolver.resolve(
raw.away_team_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if away_result.review_item:
review_items.append(away_result.review_item)
if not away_result.canonical_id:
log_warning(f"Could not resolve away team: {raw.away_team_raw}")
return None, review_items
# Resolve stadium
stadium_id = None
if raw.stadium_raw:
stadium_result = self._stadium_resolver.resolve(
raw.stadium_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if stadium_result.review_item:
review_items.append(stadium_result.review_item)
stadium_id = stadium_result.canonical_id
# Get abbreviations for game ID
home_abbrev = self._get_abbreviation(home_result.canonical_id)
away_abbrev = self._get_abbreviation(away_result.canonical_id)
# Generate canonical game ID
game_id = generate_game_id(
sport=self.sport,
season=self.season,
away_abbrev=away_abbrev,
home_abbrev=home_abbrev,
game_date=raw.game_date,
game_number=game_number,
)
game = Game(
id=game_id,
sport=self.sport,
season=self.season,
home_team_id=home_result.canonical_id,
away_team_id=away_result.canonical_id,
stadium_id=stadium_id or "",
game_date=raw.game_date,
game_number=game_number,
home_score=raw.home_score,
away_score=raw.away_score,
status=raw.status,
source_url=raw.source_url,
raw_home_team=raw.home_team_raw,
raw_away_team=raw.away_team_raw,
raw_stadium=raw.stadium_raw,
)
return game, review_items
def _get_abbreviation(self, team_id: str) -> str:
"""Extract abbreviation from team ID."""
# team_mlb_nyy -> nyy
parts = team_id.split("_")
return parts[-1] if parts else ""
def scrape_teams(self) -> list[Team]:
"""Get all MLB teams from hardcoded mappings."""
teams: list[Team] = []
seen: set[str] = set()
# MLB league/division structure
divisions = {
"AL East": ("American", ["BAL", "BOS", "NYY", "TB", "TOR"]),
"AL Central": ("American", ["CHW", "CLE", "DET", "KC", "MIN"]),
"AL West": ("American", ["HOU", "LAA", "OAK", "SEA", "TEX"]),
"NL East": ("National", ["ATL", "MIA", "NYM", "PHI", "WSN"]),
"NL Central": ("National", ["CHC", "CIN", "MIL", "PIT", "STL"]),
"NL West": ("National", ["ARI", "COL", "LAD", "SD", "SF"]),
}
# Build reverse lookup
team_divisions: dict[str, tuple[str, str]] = {}
for div, (league, abbrevs) in divisions.items():
for abbrev in abbrevs:
team_divisions[abbrev] = (league, div)
for abbrev, (team_id, full_name, city, stadium_id) in TEAM_MAPPINGS.get("mlb", {}).items():
if team_id in seen:
continue
seen.add(team_id)
# Parse team name from full name
parts = full_name.split()
if len(parts) >= 2:
team_name = parts[-1]
# Handle multi-word team names
if team_name in ["Sox", "Jays"]:
team_name = " ".join(parts[-2:])
else:
team_name = full_name
# Get league and division
league, div = team_divisions.get(abbrev, (None, None))
team = Team(
id=team_id,
sport="mlb",
city=city,
name=team_name,
full_name=full_name,
abbreviation=abbrev,
conference=league, # MLB uses "league" but we map to conference field
division=div,
stadium_id=stadium_id,
)
teams.append(team)
return teams
def scrape_stadiums(self) -> list[Stadium]:
"""Get all MLB stadiums from hardcoded mappings."""
stadiums: list[Stadium] = []
mlb_stadiums = STADIUM_MAPPINGS.get("mlb", {})
for stadium_id, info in mlb_stadiums.items():
stadium = Stadium(
id=stadium_id,
sport="mlb",
name=info.name,
city=info.city,
state=info.state,
country=info.country,
latitude=info.latitude,
longitude=info.longitude,
timezone=info.timezone,
surface="grass", # Most MLB stadiums
roof_type="open", # Most MLB stadiums
)
stadiums.append(stadium)
return stadiums
def create_mlb_scraper(season: int) -> MLBScraper:
"""Factory function to create an MLB scraper."""
return MLBScraper(season=season)

View File

@@ -1,405 +0,0 @@
"""MLS scraper implementation with multi-source fallback."""
from datetime import datetime, date, timedelta
from typing import Optional
from .base import BaseScraper, RawGameData, ScrapeResult
from ..models.game import Game
from ..models.team import Team
from ..models.stadium import Stadium
from ..models.aliases import ManualReviewItem
from ..normalizers.canonical_id import generate_game_id
from ..normalizers.team_resolver import (
TeamResolver,
TEAM_MAPPINGS,
get_team_resolver,
)
from ..normalizers.stadium_resolver import (
StadiumResolver,
STADIUM_MAPPINGS,
get_stadium_resolver,
)
from ..utils.logging import get_logger, log_game, log_warning
class MLSScraper(BaseScraper):
"""MLS schedule scraper with multi-source fallback.
Sources (in priority order):
1. ESPN API - Most reliable for MLS
2. FBref - Backup option (not implemented)
Source Timezones:
- espn: UTC - ISO 8601 format with "Z" suffix
- fbref: Not implemented
"""
def __init__(self, season: int, **kwargs):
"""Initialize MLS scraper.
Args:
season: Season year (e.g., 2026 for 2026 season)
"""
super().__init__("mls", season, **kwargs)
self._team_resolver = get_team_resolver("mls")
self._stadium_resolver = get_stadium_resolver("mls")
def _get_sources(self) -> list[str]:
"""Return source list in priority order."""
# FBref scraper not yet implemented - TODO for future
return ["espn"]
def _get_source_url(self, source: str, **kwargs) -> str:
"""Build URL for a source."""
if source == "espn":
date_str = kwargs.get("date", "")
return f"https://site.api.espn.com/apis/site/v2/sports/soccer/usa.1/scoreboard?dates={date_str}"
elif source == "fbref":
return f"https://fbref.com/en/comps/22/{self.season}/schedule/{self.season}-Major-League-Soccer-Scores-and-Fixtures"
raise ValueError(f"Unknown source: {source}")
def _get_season_months(self) -> list[tuple[int, int]]:
"""Get the months to scrape for MLS season.
MLS season runs February/March through October/November.
"""
months = []
# MLS runs within a calendar year
for month in range(2, 12): # Feb-Nov
months.append((self.season, month))
return months
def _scrape_games_from_source(self, source: str) -> list[RawGameData]:
"""Scrape games from a specific source."""
if source == "espn":
return self._scrape_espn()
elif source == "fbref":
return self._scrape_fbref()
else:
raise ValueError(f"Unknown source: {source}")
def _scrape_espn(self) -> list[RawGameData]:
"""Scrape games from ESPN API using date range query."""
# Build date range for entire season (Feb-November)
season_months = self._get_season_months()
start_year, start_month = season_months[0]
end_year, end_month = season_months[-1]
# Get last day of end month
if end_month == 12:
end_date = date(end_year + 1, 1, 1) - timedelta(days=1)
else:
end_date = date(end_year, end_month + 1, 1) - timedelta(days=1)
start_date = date(start_year, start_month, 1)
date_range = f"{start_date.strftime('%Y%m%d')}-{end_date.strftime('%Y%m%d')}"
url = f"https://site.api.espn.com/apis/site/v2/sports/soccer/usa.1/scoreboard?limit=1000&dates={date_range}"
self._logger.info(f"Fetching MLS schedule: {date_range}")
try:
data = self.session.get_json(url)
return self._parse_espn_response(data, url)
except Exception as e:
self._logger.error(f"ESPN error: {e}")
return []
def _parse_espn_response(
self,
data: dict,
source_url: str,
) -> list[RawGameData]:
"""Parse ESPN API response."""
games: list[RawGameData] = []
events = data.get("events", [])
for event in events:
try:
game = self._parse_espn_event(event, source_url)
if game:
games.append(game)
except Exception as e:
self._logger.debug(f"Failed to parse ESPN event: {e}")
continue
return games
def _parse_espn_event(
self,
event: dict,
source_url: str,
) -> Optional[RawGameData]:
"""Parse a single ESPN event."""
# Get date
date_str = event.get("date", "")
if not date_str:
return None
try:
game_date = datetime.fromisoformat(date_str.replace("Z", "+00:00"))
except ValueError:
return None
# Get competitions
competitions = event.get("competitions", [])
if not competitions:
return None
competition = competitions[0]
# Get teams
competitors = competition.get("competitors", [])
if len(competitors) != 2:
return None
home_team = None
away_team = None
home_score = None
away_score = None
for competitor in competitors:
team_info = competitor.get("team", {})
team_name = team_info.get("displayName", "")
is_home = competitor.get("homeAway") == "home"
score = competitor.get("score")
if score:
try:
score = int(score)
except (ValueError, TypeError):
score = None
if is_home:
home_team = team_name
home_score = score
else:
away_team = team_name
away_score = score
if not home_team or not away_team:
return None
# Get venue
venue = competition.get("venue", {})
stadium = venue.get("fullName")
# Get status
status_info = competition.get("status", {})
status_type = status_info.get("type", {})
status_name = status_type.get("name", "").lower()
if status_name == "status_final":
status = "final"
elif status_name == "status_postponed":
status = "postponed"
elif status_name == "status_canceled":
status = "cancelled"
else:
status = "scheduled"
return RawGameData(
game_date=game_date,
home_team_raw=home_team,
away_team_raw=away_team,
stadium_raw=stadium,
home_score=home_score,
away_score=away_score,
status=status,
source_url=source_url,
)
def _scrape_fbref(self) -> list[RawGameData]:
"""Scrape games from FBref."""
# FBref scraping would go here
raise NotImplementedError("FBref scraper not implemented")
def _normalize_games(
self,
raw_games: list[RawGameData],
) -> tuple[list[Game], list[ManualReviewItem]]:
"""Normalize raw games to Game objects with canonical IDs."""
games: list[Game] = []
review_items: list[ManualReviewItem] = []
for raw in raw_games:
game, item_reviews = self._normalize_single_game(raw)
if game:
games.append(game)
log_game(
self.sport,
game.id,
game.home_team_id,
game.away_team_id,
game.game_date.strftime("%Y-%m-%d"),
game.status,
)
review_items.extend(item_reviews)
return games, review_items
def _normalize_single_game(
self,
raw: RawGameData,
) -> tuple[Optional[Game], list[ManualReviewItem]]:
"""Normalize a single raw game."""
review_items: list[ManualReviewItem] = []
# Resolve home team
home_result = self._team_resolver.resolve(
raw.home_team_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if home_result.review_item:
review_items.append(home_result.review_item)
if not home_result.canonical_id:
log_warning(f"Could not resolve home team: {raw.home_team_raw}")
return None, review_items
# Resolve away team
away_result = self._team_resolver.resolve(
raw.away_team_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if away_result.review_item:
review_items.append(away_result.review_item)
if not away_result.canonical_id:
log_warning(f"Could not resolve away team: {raw.away_team_raw}")
return None, review_items
# Resolve stadium
stadium_id = None
if raw.stadium_raw:
stadium_result = self._stadium_resolver.resolve(
raw.stadium_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if stadium_result.review_item:
review_items.append(stadium_result.review_item)
stadium_id = stadium_result.canonical_id
# Get abbreviations for game ID
home_abbrev = self._get_abbreviation(home_result.canonical_id)
away_abbrev = self._get_abbreviation(away_result.canonical_id)
# Generate canonical game ID
game_id = generate_game_id(
sport=self.sport,
season=self.season,
away_abbrev=away_abbrev,
home_abbrev=home_abbrev,
game_date=raw.game_date,
game_number=None,
)
game = Game(
id=game_id,
sport=self.sport,
season=self.season,
home_team_id=home_result.canonical_id,
away_team_id=away_result.canonical_id,
stadium_id=stadium_id or "",
game_date=raw.game_date,
game_number=None,
home_score=raw.home_score,
away_score=raw.away_score,
status=raw.status,
source_url=raw.source_url,
raw_home_team=raw.home_team_raw,
raw_away_team=raw.away_team_raw,
raw_stadium=raw.stadium_raw,
)
return game, review_items
def _get_abbreviation(self, team_id: str) -> str:
"""Extract abbreviation from team ID."""
parts = team_id.split("_")
return parts[-1] if parts else ""
def scrape_teams(self) -> list[Team]:
"""Get all MLS teams from hardcoded mappings."""
teams: list[Team] = []
seen: set[str] = set()
# MLS conference structure
conferences = {
"Eastern": ["ATL", "CLT", "CHI", "CIN", "CLB", "DC", "MIA", "MTL", "NE", "NYC", "RB", "ORL", "PHI", "TOR"],
"Western": ["AUS", "COL", "DAL", "HOU", "LAG", "LAFC", "MIN", "NSH", "POR", "SLC", "SD", "SJ", "SEA", "SKC", "STL", "VAN"],
}
# Build reverse lookup
team_conferences: dict[str, str] = {}
for conf, abbrevs in conferences.items():
for abbrev in abbrevs:
team_conferences[abbrev] = conf
for abbrev, (team_id, full_name, city, stadium_id) in TEAM_MAPPINGS.get("mls", {}).items():
if team_id in seen:
continue
seen.add(team_id)
# Parse team name
team_name = full_name
# Get conference
conf = team_conferences.get(abbrev)
team = Team(
id=team_id,
sport="mls",
city=city,
name=team_name,
full_name=full_name,
abbreviation=abbrev,
conference=conf,
division=None, # MLS doesn't have divisions
stadium_id=stadium_id,
)
teams.append(team)
return teams
def scrape_stadiums(self) -> list[Stadium]:
"""Get all MLS stadiums from hardcoded mappings."""
stadiums: list[Stadium] = []
mls_stadiums = STADIUM_MAPPINGS.get("mls", {})
for stadium_id, info in mls_stadiums.items():
stadium = Stadium(
id=stadium_id,
sport="mls",
name=info.name,
city=info.city,
state=info.state,
country=info.country,
latitude=info.latitude,
longitude=info.longitude,
timezone=info.timezone,
surface="grass",
roof_type="open",
)
stadiums.append(stadium)
return stadiums
def create_mls_scraper(season: int) -> MLSScraper:
"""Factory function to create an MLS scraper."""
return MLSScraper(season=season)

View File

@@ -1,687 +0,0 @@
"""NBA scraper implementation with multi-source fallback."""
from datetime import datetime, date, timezone
from typing import Optional
from zoneinfo import ZoneInfo
from bs4 import BeautifulSoup
import re
from .base import BaseScraper, RawGameData, ScrapeResult
from ..models.game import Game
from ..models.team import Team
from ..models.stadium import Stadium
from ..models.aliases import ManualReviewItem
from ..normalizers.canonical_id import generate_game_id
from ..normalizers.team_resolver import (
TeamResolver,
TEAM_MAPPINGS,
get_team_resolver,
)
from ..normalizers.stadium_resolver import (
StadiumResolver,
STADIUM_MAPPINGS,
get_stadium_resolver,
)
from ..normalizers.timezone import parse_datetime
from ..utils.logging import get_logger, log_game, log_warning
# Month name to number mapping
MONTH_MAP = {
"january": 1, "february": 2, "march": 3, "april": 4,
"may": 5, "june": 6, "july": 7, "august": 8,
"september": 9, "october": 10, "november": 11, "december": 12,
}
# Basketball Reference month URLs
BR_MONTHS = [
"october", "november", "december",
"january", "february", "march", "april", "may", "june",
]
class NBAScraper(BaseScraper):
"""NBA schedule scraper with multi-source fallback.
Sources (in priority order):
1. Basketball-Reference - Most reliable, complete historical data
2. ESPN API - Good for current/future seasons
3. CBS Sports - Backup option (not implemented)
Source Timezones:
- basketball_reference: Eastern Time (ET) - times displayed as "7:30p"
- espn: UTC - ISO 8601 format with "Z" suffix
- cbs: Not implemented
"""
def __init__(self, season: int, **kwargs):
"""Initialize NBA scraper.
Args:
season: Season start year (e.g., 2025 for 2025-26)
"""
super().__init__("nba", season, **kwargs)
self._team_resolver = get_team_resolver("nba")
self._stadium_resolver = get_stadium_resolver("nba")
def _get_sources(self) -> list[str]:
"""Return source list in priority order."""
# CBS scraper not yet implemented - TODO for future
return ["basketball_reference", "espn"]
def _get_source_url(self, source: str, **kwargs) -> str:
"""Build URL for a source."""
if source == "basketball_reference":
month = kwargs.get("month", "october")
year = kwargs.get("year", self.season + 1)
return f"https://www.basketball-reference.com/leagues/NBA_{year}_games-{month}.html"
elif source == "espn":
date_str = kwargs.get("date", "")
return f"https://site.api.espn.com/apis/site/v2/sports/basketball/nba/scoreboard?dates={date_str}"
elif source == "cbs":
return "https://www.cbssports.com/nba/schedule/"
raise ValueError(f"Unknown source: {source}")
def _scrape_games_from_source(self, source: str) -> list[RawGameData]:
"""Scrape games from a specific source."""
if source == "basketball_reference":
return self._scrape_basketball_reference()
elif source == "espn":
return self._scrape_espn()
elif source == "cbs":
return self._scrape_cbs()
else:
raise ValueError(f"Unknown source: {source}")
def _scrape_basketball_reference(self) -> list[RawGameData]:
"""Scrape games from Basketball-Reference.
BR organizes games by month with separate pages.
Format: https://www.basketball-reference.com/leagues/NBA_YYYY_games-month.html
where YYYY is the ending year of the season.
Bails early if first few months have no data (season doesn't exist).
"""
all_games: list[RawGameData] = []
end_year = self.season + 1
consecutive_empty_months = 0
for month in BR_MONTHS:
url = self._get_source_url("basketball_reference", month=month, year=end_year)
try:
html = self.session.get_html(url)
games = self._parse_basketball_reference(html, url)
if games:
all_games.extend(games)
consecutive_empty_months = 0
self._logger.debug(f"Found {len(games)} games in {month}")
else:
consecutive_empty_months += 1
except Exception as e:
# Some months may not exist (e.g., no games in August)
self._logger.debug(f"No data for {month}: {e}")
consecutive_empty_months += 1
# If first 3 months (Oct, Nov, Dec) all have no data, season doesn't exist
if consecutive_empty_months >= 3 and not all_games:
self._logger.info(f"No games found in first {consecutive_empty_months} months, season likely doesn't exist")
break
return all_games
def _parse_basketball_reference(
self,
html: str,
source_url: str,
) -> list[RawGameData]:
"""Parse Basketball-Reference schedule HTML.
Table structure:
- th[data-stat="date_game"]: Date (e.g., "Tue, Oct 22, 2024")
- td[data-stat="visitor_team_name"]: Away team
- td[data-stat="home_team_name"]: Home team
- td[data-stat="visitor_pts"]: Away score
- td[data-stat="home_pts"]: Home score
- td[data-stat="arena_name"]: Arena/stadium name
"""
soup = BeautifulSoup(html, "lxml")
games: list[RawGameData] = []
# Find the schedule table
table = soup.find("table", id="schedule")
if not table:
return games
tbody = table.find("tbody")
if not tbody:
return games
for row in tbody.find_all("tr"):
# Skip header rows
if row.get("class") and "thead" in row.get("class", []):
continue
try:
game = self._parse_br_row(row, source_url)
if game:
games.append(game)
except Exception as e:
self._logger.debug(f"Failed to parse row: {e}")
continue
return games
def _parse_br_row(
self,
row,
source_url: str,
) -> Optional[RawGameData]:
"""Parse a single Basketball-Reference table row."""
# Get date
date_cell = row.find("th", {"data-stat": "date_game"})
if not date_cell:
return None
date_text = date_cell.get_text(strip=True)
if not date_text:
return None
# Parse date (format: "Tue, Oct 22, 2024")
try:
game_date = datetime.strptime(date_text, "%a, %b %d, %Y")
except ValueError:
# Try alternative format
try:
game_date = datetime.strptime(date_text, "%B %d, %Y")
except ValueError:
self._logger.debug(f"Could not parse date: {date_text}")
return None
# Get game start time (format: "7:30p" or "10:00p") - times are in ET
time_cell = row.find("td", {"data-stat": "game_start_time"})
if time_cell:
time_text = time_cell.get_text(strip=True)
if time_text:
try:
# Parse time like "7:30p" or "10:00p"
# Normalize: "7:30p" -> "7:30 PM", "10:00p" -> "10:00 PM"
time_normalized = time_text.replace("p", " PM").replace("a", " AM")
game_time = datetime.strptime(time_normalized, "%I:%M %p")
# Combine date and time with ET timezone (Basketball-Reference uses ET)
game_date = game_date.replace(
hour=game_time.hour,
minute=game_time.minute,
tzinfo=ZoneInfo("America/New_York"),
)
except ValueError:
self._logger.debug(f"Could not parse time: {time_text}, using midnight")
# Get teams
away_cell = row.find("td", {"data-stat": "visitor_team_name"})
home_cell = row.find("td", {"data-stat": "home_team_name"})
if not away_cell or not home_cell:
return None
away_team = away_cell.get_text(strip=True)
home_team = home_cell.get_text(strip=True)
if not away_team or not home_team:
return None
# Get scores (may be empty for future games)
away_score_cell = row.find("td", {"data-stat": "visitor_pts"})
home_score_cell = row.find("td", {"data-stat": "home_pts"})
away_score = None
home_score = None
if away_score_cell and away_score_cell.get_text(strip=True):
try:
away_score = int(away_score_cell.get_text(strip=True))
except ValueError:
pass
if home_score_cell and home_score_cell.get_text(strip=True):
try:
home_score = int(home_score_cell.get_text(strip=True))
except ValueError:
pass
# Get arena
arena_cell = row.find("td", {"data-stat": "arena_name"})
arena = arena_cell.get_text(strip=True) if arena_cell else None
# Determine status
status = "final" if home_score is not None else "scheduled"
# Check for postponed/cancelled
notes_cell = row.find("td", {"data-stat": "game_remarks"})
if notes_cell:
notes = notes_cell.get_text(strip=True).lower()
if "postponed" in notes:
status = "postponed"
elif "cancelled" in notes or "canceled" in notes:
status = "cancelled"
return RawGameData(
game_date=game_date,
home_team_raw=home_team,
away_team_raw=away_team,
stadium_raw=arena,
home_score=home_score,
away_score=away_score,
status=status,
source_url=source_url,
)
def _scrape_espn(self) -> list[RawGameData]:
"""Scrape games from ESPN API.
ESPN API returns games for a specific date range.
We iterate through each day of the season.
Bails out early if no games found after checking first month.
"""
all_games: list[RawGameData] = []
consecutive_empty_days = 0
max_empty_days = 45 # Bail after ~1.5 months of no games
for year, month in self._get_season_months():
# Get number of days in month
if month == 12:
next_month = date(year + 1, 1, 1)
else:
next_month = date(year, month + 1, 1)
days_in_month = (next_month - date(year, month, 1)).days
for day in range(1, days_in_month + 1):
try:
game_date = date(year, month, day)
date_str = game_date.strftime("%Y%m%d")
url = self._get_source_url("espn", date=date_str)
data = self.session.get_json(url)
games = self._parse_espn_response(data, url)
if games:
all_games.extend(games)
consecutive_empty_days = 0
else:
consecutive_empty_days += 1
# Bail early if no games found for a long stretch
if consecutive_empty_days >= max_empty_days:
self._logger.info(f"No games found for {max_empty_days} consecutive days, stopping ESPN scrape")
return all_games
except Exception as e:
self._logger.debug(f"ESPN error for {year}-{month}-{day}: {e}")
consecutive_empty_days += 1
if consecutive_empty_days >= max_empty_days:
self._logger.info(f"Too many consecutive failures, stopping ESPN scrape")
return all_games
continue
return all_games
def _parse_espn_response(
self,
data: dict,
source_url: str,
) -> list[RawGameData]:
"""Parse ESPN API response."""
games: list[RawGameData] = []
events = data.get("events", [])
for event in events:
try:
game = self._parse_espn_event(event, source_url)
if game:
games.append(game)
except Exception as e:
self._logger.debug(f"Failed to parse ESPN event: {e}")
continue
return games
def _parse_espn_event(
self,
event: dict,
source_url: str,
) -> Optional[RawGameData]:
"""Parse a single ESPN event."""
# Get date
date_str = event.get("date", "")
if not date_str:
return None
try:
# ESPN uses ISO format
game_date = datetime.fromisoformat(date_str.replace("Z", "+00:00"))
except ValueError:
return None
# Get competitions (usually just one)
competitions = event.get("competitions", [])
if not competitions:
return None
competition = competitions[0]
# Get teams
competitors = competition.get("competitors", [])
if len(competitors) != 2:
return None
home_team = None
away_team = None
home_score = None
away_score = None
for competitor in competitors:
team_info = competitor.get("team", {})
team_name = team_info.get("displayName", "")
is_home = competitor.get("homeAway") == "home"
score = competitor.get("score")
if score:
try:
score = int(score)
except (ValueError, TypeError):
score = None
if is_home:
home_team = team_name
home_score = score
else:
away_team = team_name
away_score = score
if not home_team or not away_team:
return None
# Get venue
venue = competition.get("venue", {})
arena = venue.get("fullName")
# Get status
status_info = competition.get("status", {})
status_type = status_info.get("type", {})
status_name = status_type.get("name", "").lower()
if status_name == "status_final":
status = "final"
elif status_name == "status_postponed":
status = "postponed"
elif status_name == "status_canceled":
status = "cancelled"
else:
status = "scheduled"
return RawGameData(
game_date=game_date,
home_team_raw=home_team,
away_team_raw=away_team,
stadium_raw=arena,
home_score=home_score,
away_score=away_score,
status=status,
source_url=source_url,
)
def _scrape_cbs(self) -> list[RawGameData]:
"""Scrape games from CBS Sports.
CBS Sports is a backup source with less structured data.
"""
# CBS Sports scraping would go here
# For now, return empty to fall back to other sources
raise NotImplementedError("CBS scraper not implemented")
def _normalize_games(
self,
raw_games: list[RawGameData],
) -> tuple[list[Game], list[ManualReviewItem]]:
"""Normalize raw games to Game objects with canonical IDs."""
games: list[Game] = []
review_items: list[ManualReviewItem] = []
# Track games by date for doubleheader detection
games_by_date: dict[str, list[RawGameData]] = {}
for raw in raw_games:
date_key = raw.game_date.strftime("%Y%m%d")
matchup_key = f"{date_key}_{raw.away_team_raw}_{raw.home_team_raw}"
if matchup_key not in games_by_date:
games_by_date[matchup_key] = []
games_by_date[matchup_key].append(raw)
# Process games with doubleheader detection
for matchup_key, matchup_games in games_by_date.items():
is_doubleheader = len(matchup_games) > 1
for i, raw in enumerate(matchup_games):
game_number = (i + 1) if is_doubleheader else None
game, item_reviews = self._normalize_single_game(raw, game_number)
if game:
games.append(game)
log_game(
self.sport,
game.id,
game.home_team_id,
game.away_team_id,
game.game_date.strftime("%Y-%m-%d"),
game.status,
)
review_items.extend(item_reviews)
return games, review_items
def _normalize_single_game(
self,
raw: RawGameData,
game_number: Optional[int],
) -> tuple[Optional[Game], list[ManualReviewItem]]:
"""Normalize a single raw game."""
review_items: list[ManualReviewItem] = []
# Resolve home team
home_result = self._team_resolver.resolve(
raw.home_team_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if home_result.review_item:
review_items.append(home_result.review_item)
if not home_result.canonical_id:
log_warning(f"Could not resolve home team: {raw.home_team_raw}")
return None, review_items
# Resolve away team
away_result = self._team_resolver.resolve(
raw.away_team_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if away_result.review_item:
review_items.append(away_result.review_item)
if not away_result.canonical_id:
log_warning(f"Could not resolve away team: {raw.away_team_raw}")
return None, review_items
# Resolve stadium (optional - use home team's stadium if not found)
stadium_id = None
if raw.stadium_raw:
stadium_result = self._stadium_resolver.resolve(
raw.stadium_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if stadium_result.review_item:
review_items.append(stadium_result.review_item)
stadium_id = stadium_result.canonical_id
# If no stadium found, use home team's default stadium
if not stadium_id:
# Look up home team's stadium from mappings
home_abbrev = home_result.canonical_id.split("_")[-1].upper()
team_info = self._team_resolver.get_team_info(home_abbrev)
if team_info:
# Try to find stadium by team's home arena
for sid, sinfo in STADIUM_MAPPINGS.get("nba", {}).items():
# Match by city
if sinfo.city.lower() in team_info[2].lower():
stadium_id = sid
break
# Get abbreviations for game ID
home_abbrev = self._get_abbreviation(home_result.canonical_id)
away_abbrev = self._get_abbreviation(away_result.canonical_id)
# Generate canonical game ID
game_id = generate_game_id(
sport=self.sport,
season=self.season,
away_abbrev=away_abbrev,
home_abbrev=home_abbrev,
game_date=raw.game_date,
game_number=game_number,
)
game = Game(
id=game_id,
sport=self.sport,
season=self.season,
home_team_id=home_result.canonical_id,
away_team_id=away_result.canonical_id,
stadium_id=stadium_id or "",
game_date=raw.game_date,
game_number=game_number,
home_score=raw.home_score,
away_score=raw.away_score,
status=raw.status,
source_url=raw.source_url,
raw_home_team=raw.home_team_raw,
raw_away_team=raw.away_team_raw,
raw_stadium=raw.stadium_raw,
)
return game, review_items
def _get_abbreviation(self, team_id: str) -> str:
"""Extract abbreviation from team ID."""
# team_nba_okc -> okc
parts = team_id.split("_")
return parts[-1] if parts else ""
def scrape_teams(self) -> list[Team]:
"""Get all NBA teams from hardcoded mappings."""
teams: list[Team] = []
seen: set[str] = set()
# NBA conference/division structure
divisions = {
"Atlantic": ("Eastern", ["BOS", "BKN", "NYK", "PHI", "TOR"]),
"Central": ("Eastern", ["CHI", "CLE", "DET", "IND", "MIL"]),
"Southeast": ("Eastern", ["ATL", "CHA", "MIA", "ORL", "WAS"]),
"Northwest": ("Western", ["DEN", "MIN", "OKC", "POR", "UTA"]),
"Pacific": ("Western", ["GSW", "LAC", "LAL", "PHX", "SAC"]),
"Southwest": ("Western", ["DAL", "HOU", "MEM", "NOP", "SAS"]),
}
# Build reverse lookup
team_divisions: dict[str, tuple[str, str]] = {}
for div, (conf, abbrevs) in divisions.items():
for abbrev in abbrevs:
team_divisions[abbrev] = (conf, div)
for abbrev, (team_id, full_name, city, stadium_id) in TEAM_MAPPINGS.get("nba", {}).items():
if team_id in seen:
continue
seen.add(team_id)
# Parse full name into city and name parts
parts = full_name.split()
if len(parts) >= 2:
# Handle special cases like "Oklahoma City Thunder"
if city == "Oklahoma City":
team_name = "Thunder"
elif city == "Golden State":
team_name = "Warriors"
elif city == "San Antonio":
team_name = "Spurs"
elif city == "New York":
team_name = parts[-1] # Knicks
elif city == "New Orleans":
team_name = "Pelicans"
elif city == "Los Angeles":
team_name = parts[-1] # Lakers or Clippers
else:
team_name = parts[-1]
else:
team_name = full_name
# Get conference and division
conf, div = team_divisions.get(abbrev, (None, None))
team = Team(
id=team_id,
sport="nba",
city=city,
name=team_name,
full_name=full_name,
abbreviation=abbrev,
conference=conf,
division=div,
stadium_id=stadium_id,
)
teams.append(team)
return teams
def scrape_stadiums(self) -> list[Stadium]:
"""Get all NBA stadiums from hardcoded mappings."""
stadiums: list[Stadium] = []
for stadium_id, info in STADIUM_MAPPINGS.get("nba", {}).items():
stadium = Stadium(
id=stadium_id,
sport="nba",
name=info.name,
city=info.city,
state=info.state,
country=info.country,
latitude=info.latitude,
longitude=info.longitude,
timezone=info.timezone,
surface="hardwood",
roof_type="dome",
)
stadiums.append(stadium)
return stadiums
def create_nba_scraper(season: int) -> NBAScraper:
"""Factory function to create an NBA scraper."""
return NBAScraper(season=season)

View File

@@ -1,605 +0,0 @@
"""NFL scraper implementation with multi-source fallback."""
from datetime import datetime, date
from typing import Optional
from zoneinfo import ZoneInfo
from bs4 import BeautifulSoup
from .base import BaseScraper, RawGameData, ScrapeResult
from ..models.game import Game
from ..models.team import Team
from ..models.stadium import Stadium
from ..models.aliases import ManualReviewItem
from ..normalizers.canonical_id import generate_game_id
from ..normalizers.team_resolver import (
TeamResolver,
TEAM_MAPPINGS,
get_team_resolver,
)
from ..normalizers.stadium_resolver import (
StadiumResolver,
STADIUM_MAPPINGS,
get_stadium_resolver,
)
from ..utils.logging import get_logger, log_game, log_warning
# International game locations to filter out
INTERNATIONAL_LOCATIONS = {"London", "Mexico City", "Frankfurt", "Munich", "São Paulo"}
class NFLScraper(BaseScraper):
"""NFL schedule scraper with multi-source fallback.
Sources (in priority order):
1. ESPN API - Most reliable for NFL
2. Pro-Football-Reference - Complete historical data
3. CBS Sports - Backup option (not implemented)
Source Timezones:
- espn: UTC - ISO 8601 format with "Z" suffix
- pro_football_reference: Eastern Time (ET) - times displayed as "8:20PM"
- cbs: Not implemented
"""
def __init__(self, season: int, **kwargs):
"""Initialize NFL scraper.
Args:
season: Season year (e.g., 2025 for 2025 season)
"""
super().__init__("nfl", season, **kwargs)
self._team_resolver = get_team_resolver("nfl")
self._stadium_resolver = get_stadium_resolver("nfl")
def _get_sources(self) -> list[str]:
"""Return source list in priority order."""
# CBS scraper not yet implemented - TODO for future
return ["espn", "pro_football_reference"]
def _get_source_url(self, source: str, **kwargs) -> str:
"""Build URL for a source."""
if source == "espn":
week = kwargs.get("week", 1)
season_type = kwargs.get("season_type", 2) # 1=preseason, 2=regular, 3=postseason
return f"https://site.api.espn.com/apis/site/v2/sports/football/nfl/scoreboard?seasontype={season_type}&week={week}"
elif source == "pro_football_reference":
return f"https://www.pro-football-reference.com/years/{self.season}/games.htm"
elif source == "cbs":
return "https://www.cbssports.com/nfl/schedule/"
raise ValueError(f"Unknown source: {source}")
def _get_season_months(self) -> list[tuple[int, int]]:
"""Get the months to scrape for NFL season.
NFL season runs September through February.
"""
months = []
# Regular season months
for month in range(9, 13): # Sept-Dec
months.append((self.season, month))
# Playoff months
for month in range(1, 3): # Jan-Feb
months.append((self.season + 1, month))
return months
def _scrape_games_from_source(self, source: str) -> list[RawGameData]:
"""Scrape games from a specific source."""
if source == "espn":
return self._scrape_espn()
elif source == "pro_football_reference":
return self._scrape_pro_football_reference()
elif source == "cbs":
return self._scrape_cbs()
else:
raise ValueError(f"Unknown source: {source}")
def _scrape_espn(self) -> list[RawGameData]:
"""Scrape games from ESPN API.
ESPN NFL API uses week numbers.
"""
all_games: list[RawGameData] = []
# Scrape preseason (4 weeks)
for week in range(1, 5):
try:
url = self._get_source_url("espn", week=week, season_type=1)
data = self.session.get_json(url)
games = self._parse_espn_response(data, url)
all_games.extend(games)
except Exception as e:
self._logger.debug(f"ESPN preseason week {week} error: {e}")
continue
# Scrape regular season (18 weeks)
for week in range(1, 19):
try:
url = self._get_source_url("espn", week=week, season_type=2)
data = self.session.get_json(url)
games = self._parse_espn_response(data, url)
all_games.extend(games)
self._logger.debug(f"Found {len(games)} games in week {week}")
except Exception as e:
self._logger.debug(f"ESPN regular season week {week} error: {e}")
continue
# Scrape postseason (4 rounds)
for week in range(1, 5):
try:
url = self._get_source_url("espn", week=week, season_type=3)
data = self.session.get_json(url)
games = self._parse_espn_response(data, url)
all_games.extend(games)
except Exception as e:
self._logger.debug(f"ESPN postseason week {week} error: {e}")
continue
return all_games
def _parse_espn_response(
self,
data: dict,
source_url: str,
) -> list[RawGameData]:
"""Parse ESPN API response."""
games: list[RawGameData] = []
events = data.get("events", [])
for event in events:
try:
game = self._parse_espn_event(event, source_url)
if game:
# Filter international games
if game.stadium_raw and any(loc in game.stadium_raw for loc in INTERNATIONAL_LOCATIONS):
self._logger.debug(f"Skipping international game: {game.stadium_raw}")
continue
games.append(game)
except Exception as e:
self._logger.debug(f"Failed to parse ESPN event: {e}")
continue
return games
def _parse_espn_event(
self,
event: dict,
source_url: str,
) -> Optional[RawGameData]:
"""Parse a single ESPN event."""
# Get date
date_str = event.get("date", "")
if not date_str:
return None
try:
game_date = datetime.fromisoformat(date_str.replace("Z", "+00:00"))
except ValueError:
return None
# Get competitions
competitions = event.get("competitions", [])
if not competitions:
return None
competition = competitions[0]
# Check for neutral site (international games)
if competition.get("neutralSite"):
venue = competition.get("venue", {})
venue_city = venue.get("address", {}).get("city", "")
if venue_city in INTERNATIONAL_LOCATIONS:
return None
# Get teams
competitors = competition.get("competitors", [])
if len(competitors) != 2:
return None
home_team = None
away_team = None
home_score = None
away_score = None
for competitor in competitors:
team_info = competitor.get("team", {})
team_name = team_info.get("displayName", "")
is_home = competitor.get("homeAway") == "home"
score = competitor.get("score")
if score:
try:
score = int(score)
except (ValueError, TypeError):
score = None
if is_home:
home_team = team_name
home_score = score
else:
away_team = team_name
away_score = score
if not home_team or not away_team:
return None
# Get venue
venue = competition.get("venue", {})
stadium = venue.get("fullName")
# Get status
status_info = competition.get("status", {})
status_type = status_info.get("type", {})
status_name = status_type.get("name", "").lower()
if status_name == "status_final":
status = "final"
elif status_name == "status_postponed":
status = "postponed"
elif status_name == "status_canceled":
status = "cancelled"
else:
status = "scheduled"
return RawGameData(
game_date=game_date,
home_team_raw=home_team,
away_team_raw=away_team,
stadium_raw=stadium,
home_score=home_score,
away_score=away_score,
status=status,
source_url=source_url,
)
def _scrape_pro_football_reference(self) -> list[RawGameData]:
"""Scrape games from Pro-Football-Reference.
PFR has a single schedule page per season.
"""
url = self._get_source_url("pro_football_reference")
try:
html = self.session.get_html(url)
games = self._parse_pfr(html, url)
return games
except Exception as e:
self._logger.error(f"Failed to scrape Pro-Football-Reference: {e}")
raise
def _parse_pfr(
self,
html: str,
source_url: str,
) -> list[RawGameData]:
"""Parse Pro-Football-Reference schedule HTML."""
soup = BeautifulSoup(html, "lxml")
games: list[RawGameData] = []
# Find the schedule table
table = soup.find("table", id="games")
if not table:
return games
tbody = table.find("tbody")
if not tbody:
return games
for row in tbody.find_all("tr"):
# Skip header rows
if row.get("class") and "thead" in row.get("class", []):
continue
try:
game = self._parse_pfr_row(row, source_url)
if game:
games.append(game)
except Exception as e:
self._logger.debug(f"Failed to parse PFR row: {e}")
continue
return games
def _parse_pfr_row(
self,
row,
source_url: str,
) -> Optional[RawGameData]:
"""Parse a single Pro-Football-Reference table row."""
# Get date
date_cell = row.find("td", {"data-stat": "game_date"})
if not date_cell:
return None
date_text = date_cell.get_text(strip=True)
if not date_text:
return None
# Parse date
try:
# PFR uses YYYY-MM-DD format
game_date = datetime.strptime(date_text, "%Y-%m-%d")
except ValueError:
return None
# Get game start time (format: "8:20PM" or "1:00PM") - times are in ET
time_cell = row.find("td", {"data-stat": "gametime"})
if time_cell:
time_text = time_cell.get_text(strip=True)
if time_text:
try:
# Parse time like "8:20PM" or "1:00PM"
# Normalize: "8:20PM" -> "8:20 PM"
time_normalized = time_text.replace("PM", " PM").replace("AM", " AM")
game_time = datetime.strptime(time_normalized, "%I:%M %p")
# Combine date and time with ET timezone (Pro-Football-Reference uses ET)
game_date = game_date.replace(
hour=game_time.hour,
minute=game_time.minute,
tzinfo=ZoneInfo("America/New_York"),
)
except ValueError:
self._logger.debug(f"Could not parse time: {time_text}, using midnight")
# Get teams
winner_cell = row.find("td", {"data-stat": "winner"})
loser_cell = row.find("td", {"data-stat": "loser"})
if not winner_cell or not loser_cell:
return None
winner = winner_cell.get_text(strip=True)
loser = loser_cell.get_text(strip=True)
if not winner or not loser:
return None
# Determine home/away based on @ symbol
game_location = row.find("td", {"data-stat": "game_location"})
at_home = game_location and "@" in game_location.get_text()
if at_home:
home_team = loser
away_team = winner
else:
home_team = winner
away_team = loser
# Get scores
pts_win_cell = row.find("td", {"data-stat": "pts_win"})
pts_lose_cell = row.find("td", {"data-stat": "pts_lose"})
home_score = None
away_score = None
if pts_win_cell and pts_lose_cell:
try:
winner_pts = int(pts_win_cell.get_text(strip=True))
loser_pts = int(pts_lose_cell.get_text(strip=True))
if at_home:
home_score = loser_pts
away_score = winner_pts
else:
home_score = winner_pts
away_score = loser_pts
except ValueError:
pass
# Determine status
status = "final" if home_score is not None else "scheduled"
return RawGameData(
game_date=game_date,
home_team_raw=home_team,
away_team_raw=away_team,
stadium_raw=None, # PFR doesn't always have stadium
home_score=home_score,
away_score=away_score,
status=status,
source_url=source_url,
)
def _scrape_cbs(self) -> list[RawGameData]:
"""Scrape games from CBS Sports."""
raise NotImplementedError("CBS scraper not implemented")
def _normalize_games(
self,
raw_games: list[RawGameData],
) -> tuple[list[Game], list[ManualReviewItem]]:
"""Normalize raw games to Game objects with canonical IDs."""
games: list[Game] = []
review_items: list[ManualReviewItem] = []
for raw in raw_games:
game, item_reviews = self._normalize_single_game(raw)
if game:
games.append(game)
log_game(
self.sport,
game.id,
game.home_team_id,
game.away_team_id,
game.game_date.strftime("%Y-%m-%d"),
game.status,
)
review_items.extend(item_reviews)
return games, review_items
def _normalize_single_game(
self,
raw: RawGameData,
) -> tuple[Optional[Game], list[ManualReviewItem]]:
"""Normalize a single raw game."""
review_items: list[ManualReviewItem] = []
# Resolve home team
home_result = self._team_resolver.resolve(
raw.home_team_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if home_result.review_item:
review_items.append(home_result.review_item)
if not home_result.canonical_id:
log_warning(f"Could not resolve home team: {raw.home_team_raw}")
return None, review_items
# Resolve away team
away_result = self._team_resolver.resolve(
raw.away_team_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if away_result.review_item:
review_items.append(away_result.review_item)
if not away_result.canonical_id:
log_warning(f"Could not resolve away team: {raw.away_team_raw}")
return None, review_items
# Resolve stadium
stadium_id = None
if raw.stadium_raw:
stadium_result = self._stadium_resolver.resolve(
raw.stadium_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if stadium_result.review_item:
review_items.append(stadium_result.review_item)
stadium_id = stadium_result.canonical_id
# Get abbreviations for game ID
home_abbrev = self._get_abbreviation(home_result.canonical_id)
away_abbrev = self._get_abbreviation(away_result.canonical_id)
# Generate canonical game ID
game_id = generate_game_id(
sport=self.sport,
season=self.season,
away_abbrev=away_abbrev,
home_abbrev=home_abbrev,
game_date=raw.game_date,
game_number=None, # NFL doesn't have doubleheaders
)
game = Game(
id=game_id,
sport=self.sport,
season=self.season,
home_team_id=home_result.canonical_id,
away_team_id=away_result.canonical_id,
stadium_id=stadium_id or "",
game_date=raw.game_date,
game_number=None,
home_score=raw.home_score,
away_score=raw.away_score,
status=raw.status,
source_url=raw.source_url,
raw_home_team=raw.home_team_raw,
raw_away_team=raw.away_team_raw,
raw_stadium=raw.stadium_raw,
)
return game, review_items
def _get_abbreviation(self, team_id: str) -> str:
"""Extract abbreviation from team ID."""
parts = team_id.split("_")
return parts[-1] if parts else ""
def scrape_teams(self) -> list[Team]:
"""Get all NFL teams from hardcoded mappings."""
teams: list[Team] = []
seen: set[str] = set()
# NFL conference/division structure
divisions = {
"AFC East": ("AFC", ["BUF", "MIA", "NE", "NYJ"]),
"AFC North": ("AFC", ["BAL", "CIN", "CLE", "PIT"]),
"AFC South": ("AFC", ["HOU", "IND", "JAX", "TEN"]),
"AFC West": ("AFC", ["DEN", "KC", "LV", "LAC"]),
"NFC East": ("NFC", ["DAL", "NYG", "PHI", "WAS"]),
"NFC North": ("NFC", ["CHI", "DET", "GB", "MIN"]),
"NFC South": ("NFC", ["ATL", "CAR", "NO", "TB"]),
"NFC West": ("NFC", ["ARI", "LAR", "SF", "SEA"]),
}
# Build reverse lookup
team_divisions: dict[str, tuple[str, str]] = {}
for div, (conf, abbrevs) in divisions.items():
for abbrev in abbrevs:
team_divisions[abbrev] = (conf, div)
for abbrev, (team_id, full_name, city, stadium_id) in TEAM_MAPPINGS.get("nfl", {}).items():
if team_id in seen:
continue
seen.add(team_id)
# Parse team name
parts = full_name.split()
team_name = parts[-1] if parts else full_name
# Get conference and division
conf, div = team_divisions.get(abbrev, (None, None))
team = Team(
id=team_id,
sport="nfl",
city=city,
name=team_name,
full_name=full_name,
abbreviation=abbrev,
conference=conf,
division=div,
stadium_id=stadium_id,
)
teams.append(team)
return teams
def scrape_stadiums(self) -> list[Stadium]:
"""Get all NFL stadiums from hardcoded mappings."""
stadiums: list[Stadium] = []
nfl_stadiums = STADIUM_MAPPINGS.get("nfl", {})
for stadium_id, info in nfl_stadiums.items():
stadium = Stadium(
id=stadium_id,
sport="nfl",
name=info.name,
city=info.city,
state=info.state,
country=info.country,
latitude=info.latitude,
longitude=info.longitude,
timezone=info.timezone,
surface="turf", # Many NFL stadiums
roof_type="open", # Most outdoor
)
stadiums.append(stadium)
return stadiums
def create_nfl_scraper(season: int) -> NFLScraper:
"""Factory function to create an NFL scraper."""
return NFLScraper(season=season)

View File

@@ -1,683 +0,0 @@
"""NHL scraper implementation with multi-source fallback."""
from datetime import datetime, date
from typing import Optional
from zoneinfo import ZoneInfo
from bs4 import BeautifulSoup
from .base import BaseScraper, RawGameData, ScrapeResult
from ..models.game import Game
from ..models.team import Team
from ..models.stadium import Stadium
from ..models.aliases import ManualReviewItem
from ..normalizers.canonical_id import generate_game_id
from ..normalizers.team_resolver import (
TeamResolver,
TEAM_MAPPINGS,
get_team_resolver,
)
from ..normalizers.stadium_resolver import (
StadiumResolver,
STADIUM_MAPPINGS,
get_stadium_resolver,
)
from ..utils.logging import get_logger, log_game, log_warning
# International game locations to filter out
INTERNATIONAL_LOCATIONS = {"Prague", "Stockholm", "Helsinki", "Tampere", "Gothenburg"}
# Hockey Reference month URLs
HR_MONTHS = [
"october", "november", "december",
"january", "february", "march", "april", "may", "june",
]
class NHLScraper(BaseScraper):
"""NHL schedule scraper with multi-source fallback.
Sources (in priority order):
1. Hockey-Reference - Most reliable for NHL
2. NHL API - Official NHL data
3. ESPN API - Backup option
Source Timezones:
- hockey_reference: Eastern Time (ET) - times displayed as "7:00p"
- nhl_api: UTC - ISO 8601 format with "Z" suffix (startTimeUTC field)
- espn: UTC - ISO 8601 format with "Z" suffix
"""
def __init__(self, season: int, **kwargs):
"""Initialize NHL scraper.
Args:
season: Season start year (e.g., 2025 for 2025-26)
"""
super().__init__("nhl", season, **kwargs)
self._team_resolver = get_team_resolver("nhl")
self._stadium_resolver = get_stadium_resolver("nhl")
def _get_sources(self) -> list[str]:
"""Return source list in priority order."""
return ["hockey_reference", "nhl_api", "espn"]
def _get_source_url(self, source: str, **kwargs) -> str:
"""Build URL for a source."""
if source == "hockey_reference":
month = kwargs.get("month", "october")
year = kwargs.get("year", self.season + 1)
return f"https://www.hockey-reference.com/leagues/NHL_{year}_games.html"
elif source == "nhl_api":
start_date = kwargs.get("start_date", "")
end_date = kwargs.get("end_date", "")
return f"https://api-web.nhle.com/v1/schedule/{start_date}"
elif source == "espn":
date_str = kwargs.get("date", "")
return f"https://site.api.espn.com/apis/site/v2/sports/hockey/nhl/scoreboard?dates={date_str}"
raise ValueError(f"Unknown source: {source}")
def _scrape_games_from_source(self, source: str) -> list[RawGameData]:
"""Scrape games from a specific source."""
if source == "hockey_reference":
return self._scrape_hockey_reference()
elif source == "nhl_api":
return self._scrape_nhl_api()
elif source == "espn":
return self._scrape_espn()
else:
raise ValueError(f"Unknown source: {source}")
def _scrape_hockey_reference(self) -> list[RawGameData]:
"""Scrape games from Hockey-Reference.
HR has a single schedule page per season.
"""
end_year = self.season + 1
url = self._get_source_url("hockey_reference", year=end_year)
try:
html = self.session.get_html(url)
games = self._parse_hockey_reference(html, url)
return games
except Exception as e:
self._logger.error(f"Failed to scrape Hockey-Reference: {e}")
raise
def _parse_hockey_reference(
self,
html: str,
source_url: str,
) -> list[RawGameData]:
"""Parse Hockey-Reference schedule HTML."""
soup = BeautifulSoup(html, "lxml")
games: list[RawGameData] = []
# Find the schedule table
table = soup.find("table", id="games")
if not table:
return games
tbody = table.find("tbody")
if not tbody:
return games
for row in tbody.find_all("tr"):
# Skip header rows
if row.get("class") and "thead" in row.get("class", []):
continue
try:
game = self._parse_hr_row(row, source_url)
if game:
# Filter international games
if game.stadium_raw and any(loc in game.stadium_raw for loc in INTERNATIONAL_LOCATIONS):
continue
games.append(game)
except Exception as e:
self._logger.debug(f"Failed to parse HR row: {e}")
continue
return games
def _parse_hr_row(
self,
row,
source_url: str,
) -> Optional[RawGameData]:
"""Parse a single Hockey-Reference table row."""
# Get date
date_cell = row.find("th", {"data-stat": "date_game"})
if not date_cell:
return None
date_text = date_cell.get_text(strip=True)
if not date_text:
return None
# Parse date (format: "2025-10-15")
try:
game_date = datetime.strptime(date_text, "%Y-%m-%d")
except ValueError:
return None
# Get game start time (format: "7:00p" or "10:30p") - times are in ET
time_cell = row.find("td", {"data-stat": "time_game"})
if time_cell:
time_text = time_cell.get_text(strip=True)
if time_text:
try:
# Parse time like "7:00p" or "10:30p"
# Normalize: "7:00p" -> "7:00 PM", "10:30p" -> "10:30 PM"
time_normalized = time_text.replace("p", " PM").replace("a", " AM")
game_time = datetime.strptime(time_normalized, "%I:%M %p")
# Combine date and time with ET timezone (Hockey-Reference uses ET)
game_date = game_date.replace(
hour=game_time.hour,
minute=game_time.minute,
tzinfo=ZoneInfo("America/New_York"),
)
except ValueError:
self._logger.debug(f"Could not parse time: {time_text}, using midnight")
# Get teams
visitor_cell = row.find("td", {"data-stat": "visitor_team_name"})
home_cell = row.find("td", {"data-stat": "home_team_name"})
if not visitor_cell or not home_cell:
return None
away_team = visitor_cell.get_text(strip=True)
home_team = home_cell.get_text(strip=True)
if not away_team or not home_team:
return None
# Get scores
visitor_goals_cell = row.find("td", {"data-stat": "visitor_goals"})
home_goals_cell = row.find("td", {"data-stat": "home_goals"})
away_score = None
home_score = None
if visitor_goals_cell and visitor_goals_cell.get_text(strip=True):
try:
away_score = int(visitor_goals_cell.get_text(strip=True))
except ValueError:
pass
if home_goals_cell and home_goals_cell.get_text(strip=True):
try:
home_score = int(home_goals_cell.get_text(strip=True))
except ValueError:
pass
# Determine status
status = "final" if home_score is not None else "scheduled"
# Check for OT/SO
overtimes_cell = row.find("td", {"data-stat": "overtimes"})
if overtimes_cell:
ot_text = overtimes_cell.get_text(strip=True)
if ot_text:
status = "final" # OT games are still final
return RawGameData(
game_date=game_date,
home_team_raw=home_team,
away_team_raw=away_team,
stadium_raw=None, # HR doesn't have stadium
home_score=home_score,
away_score=away_score,
status=status,
source_url=source_url,
)
def _scrape_nhl_api(self) -> list[RawGameData]:
"""Scrape games from NHL API."""
all_games: list[RawGameData] = []
for year, month in self._get_season_months():
start_date = date(year, month, 1)
url = self._get_source_url("nhl_api", start_date=start_date.strftime("%Y-%m-%d"))
try:
data = self.session.get_json(url)
games = self._parse_nhl_api_response(data, url)
all_games.extend(games)
except Exception as e:
self._logger.debug(f"NHL API error for {year}-{month}: {e}")
continue
return all_games
def _parse_nhl_api_response(
self,
data: dict,
source_url: str,
) -> list[RawGameData]:
"""Parse NHL API response."""
games: list[RawGameData] = []
game_weeks = data.get("gameWeek", [])
for week in game_weeks:
for game_day in week.get("games", []):
try:
game = self._parse_nhl_api_game(game_day, source_url)
if game:
games.append(game)
except Exception as e:
self._logger.debug(f"Failed to parse NHL API game: {e}")
continue
return games
def _parse_nhl_api_game(
self,
game: dict,
source_url: str,
) -> Optional[RawGameData]:
"""Parse a single NHL API game."""
# Get date
start_time = game.get("startTimeUTC", "")
if not start_time:
return None
try:
game_date = datetime.fromisoformat(start_time.replace("Z", "+00:00"))
except ValueError:
return None
# Get teams
away_team_data = game.get("awayTeam", {})
home_team_data = game.get("homeTeam", {})
away_team = away_team_data.get("placeName", {}).get("default", "")
home_team = home_team_data.get("placeName", {}).get("default", "")
if not away_team or not home_team:
# Try full name
away_team = away_team_data.get("name", {}).get("default", "")
home_team = home_team_data.get("name", {}).get("default", "")
if not away_team or not home_team:
return None
# Get scores
away_score = away_team_data.get("score")
home_score = home_team_data.get("score")
# Get venue
venue = game.get("venue", {})
stadium = venue.get("default")
# Get status
game_state = game.get("gameState", "").lower()
if game_state in ["final", "off"]:
status = "final"
elif game_state == "postponed":
status = "postponed"
elif game_state in ["cancelled", "canceled"]:
status = "cancelled"
else:
status = "scheduled"
return RawGameData(
game_date=game_date,
home_team_raw=home_team,
away_team_raw=away_team,
stadium_raw=stadium,
home_score=home_score,
away_score=away_score,
status=status,
source_url=source_url,
)
def _scrape_espn(self) -> list[RawGameData]:
"""Scrape games from ESPN API."""
all_games: list[RawGameData] = []
for year, month in self._get_season_months():
# Get number of days in month
if month == 12:
next_month = date(year + 1, 1, 1)
else:
next_month = date(year, month + 1, 1)
days_in_month = (next_month - date(year, month, 1)).days
for day in range(1, days_in_month + 1):
try:
game_date = date(year, month, day)
date_str = game_date.strftime("%Y%m%d")
url = self._get_source_url("espn", date=date_str)
data = self.session.get_json(url)
games = self._parse_espn_response(data, url)
all_games.extend(games)
except Exception as e:
self._logger.debug(f"ESPN error for {year}-{month}-{day}: {e}")
continue
return all_games
def _parse_espn_response(
self,
data: dict,
source_url: str,
) -> list[RawGameData]:
"""Parse ESPN API response."""
games: list[RawGameData] = []
events = data.get("events", [])
for event in events:
try:
game = self._parse_espn_event(event, source_url)
if game:
games.append(game)
except Exception as e:
self._logger.debug(f"Failed to parse ESPN event: {e}")
continue
return games
def _parse_espn_event(
self,
event: dict,
source_url: str,
) -> Optional[RawGameData]:
"""Parse a single ESPN event."""
# Get date
date_str = event.get("date", "")
if not date_str:
return None
try:
game_date = datetime.fromisoformat(date_str.replace("Z", "+00:00"))
except ValueError:
return None
# Get competitions
competitions = event.get("competitions", [])
if not competitions:
return None
competition = competitions[0]
# Check for neutral site (international games like Global Series)
if competition.get("neutralSite"):
venue = competition.get("venue", {})
venue_city = venue.get("address", {}).get("city", "")
if venue_city in INTERNATIONAL_LOCATIONS:
return None
# Get teams
competitors = competition.get("competitors", [])
if len(competitors) != 2:
return None
home_team = None
away_team = None
home_score = None
away_score = None
for competitor in competitors:
team_info = competitor.get("team", {})
team_name = team_info.get("displayName", "")
is_home = competitor.get("homeAway") == "home"
score = competitor.get("score")
if score:
try:
score = int(score)
except (ValueError, TypeError):
score = None
if is_home:
home_team = team_name
home_score = score
else:
away_team = team_name
away_score = score
if not home_team or not away_team:
return None
# Get venue
venue = competition.get("venue", {})
stadium = venue.get("fullName")
# Get status
status_info = competition.get("status", {})
status_type = status_info.get("type", {})
status_name = status_type.get("name", "").lower()
if status_name == "status_final":
status = "final"
elif status_name == "status_postponed":
status = "postponed"
elif status_name == "status_canceled":
status = "cancelled"
else:
status = "scheduled"
return RawGameData(
game_date=game_date,
home_team_raw=home_team,
away_team_raw=away_team,
stadium_raw=stadium,
home_score=home_score,
away_score=away_score,
status=status,
source_url=source_url,
)
def _normalize_games(
self,
raw_games: list[RawGameData],
) -> tuple[list[Game], list[ManualReviewItem]]:
"""Normalize raw games to Game objects with canonical IDs."""
games: list[Game] = []
review_items: list[ManualReviewItem] = []
for raw in raw_games:
game, item_reviews = self._normalize_single_game(raw)
if game:
games.append(game)
log_game(
self.sport,
game.id,
game.home_team_id,
game.away_team_id,
game.game_date.strftime("%Y-%m-%d"),
game.status,
)
review_items.extend(item_reviews)
return games, review_items
def _normalize_single_game(
self,
raw: RawGameData,
) -> tuple[Optional[Game], list[ManualReviewItem]]:
"""Normalize a single raw game."""
review_items: list[ManualReviewItem] = []
# Resolve home team
home_result = self._team_resolver.resolve(
raw.home_team_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if home_result.review_item:
review_items.append(home_result.review_item)
if not home_result.canonical_id:
log_warning(f"Could not resolve home team: {raw.home_team_raw}")
return None, review_items
# Resolve away team
away_result = self._team_resolver.resolve(
raw.away_team_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if away_result.review_item:
review_items.append(away_result.review_item)
if not away_result.canonical_id:
log_warning(f"Could not resolve away team: {raw.away_team_raw}")
return None, review_items
# Resolve stadium
stadium_id = None
if raw.stadium_raw:
stadium_result = self._stadium_resolver.resolve(
raw.stadium_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if stadium_result.review_item:
review_items.append(stadium_result.review_item)
stadium_id = stadium_result.canonical_id
# Fallback: Use home team's default stadium if no venue provided
# This is common for Hockey-Reference which doesn't have venue data
if not stadium_id:
home_team_data = TEAM_MAPPINGS.get("nhl", {})
home_abbrev = self._get_abbreviation(home_result.canonical_id)
for abbrev, (team_id, _, _, default_stadium) in home_team_data.items():
if team_id == home_result.canonical_id:
stadium_id = default_stadium
break
# Get abbreviations for game ID
home_abbrev = self._get_abbreviation(home_result.canonical_id)
away_abbrev = self._get_abbreviation(away_result.canonical_id)
# Generate canonical game ID
game_id = generate_game_id(
sport=self.sport,
season=self.season,
away_abbrev=away_abbrev,
home_abbrev=home_abbrev,
game_date=raw.game_date,
game_number=None, # NHL doesn't have doubleheaders
)
game = Game(
id=game_id,
sport=self.sport,
season=self.season,
home_team_id=home_result.canonical_id,
away_team_id=away_result.canonical_id,
stadium_id=stadium_id or "",
game_date=raw.game_date,
game_number=None,
home_score=raw.home_score,
away_score=raw.away_score,
status=raw.status,
source_url=raw.source_url,
raw_home_team=raw.home_team_raw,
raw_away_team=raw.away_team_raw,
raw_stadium=raw.stadium_raw,
)
return game, review_items
def _get_abbreviation(self, team_id: str) -> str:
"""Extract abbreviation from team ID."""
parts = team_id.split("_")
return parts[-1] if parts else ""
def scrape_teams(self) -> list[Team]:
"""Get all NHL teams from hardcoded mappings."""
teams: list[Team] = []
seen: set[str] = set()
# NHL conference/division structure
divisions = {
"Atlantic": ("Eastern", ["BOS", "BUF", "DET", "FLA", "MTL", "OTT", "TB", "TOR"]),
"Metropolitan": ("Eastern", ["CAR", "CBJ", "NJ", "NYI", "NYR", "PHI", "PIT", "WAS"]),
"Central": ("Western", ["ARI", "CHI", "COL", "DAL", "MIN", "NSH", "STL", "WPG"]),
"Pacific": ("Western", ["ANA", "CGY", "EDM", "LA", "SJ", "SEA", "VAN", "VGK"]),
}
# Build reverse lookup
team_divisions: dict[str, tuple[str, str]] = {}
for div, (conf, abbrevs) in divisions.items():
for abbrev in abbrevs:
team_divisions[abbrev] = (conf, div)
for abbrev, (team_id, full_name, city, stadium_id) in TEAM_MAPPINGS.get("nhl", {}).items():
if team_id in seen:
continue
seen.add(team_id)
# Parse team name
parts = full_name.split()
team_name = parts[-1] if parts else full_name
# Handle multi-word names
if team_name in ["Wings", "Jackets", "Knights", "Leafs"]:
team_name = " ".join(parts[-2:])
# Get conference and division
conf, div = team_divisions.get(abbrev, (None, None))
team = Team(
id=team_id,
sport="nhl",
city=city,
name=team_name,
full_name=full_name,
abbreviation=abbrev,
conference=conf,
division=div,
stadium_id=stadium_id,
)
teams.append(team)
return teams
def scrape_stadiums(self) -> list[Stadium]:
"""Get all NHL stadiums from hardcoded mappings."""
stadiums: list[Stadium] = []
nhl_stadiums = STADIUM_MAPPINGS.get("nhl", {})
for stadium_id, info in nhl_stadiums.items():
stadium = Stadium(
id=stadium_id,
sport="nhl",
name=info.name,
city=info.city,
state=info.state,
country=info.country,
latitude=info.latitude,
longitude=info.longitude,
timezone=info.timezone,
surface="ice",
roof_type="dome",
)
stadiums.append(stadium)
return stadiums
def create_nhl_scraper(season: int) -> NHLScraper:
"""Factory function to create an NHL scraper."""
return NHLScraper(season=season)

View File

@@ -1,377 +0,0 @@
"""NWSL scraper implementation with multi-source fallback."""
from datetime import datetime, date, timedelta
from typing import Optional
from .base import BaseScraper, RawGameData, ScrapeResult
from ..models.game import Game
from ..models.team import Team
from ..models.stadium import Stadium
from ..models.aliases import ManualReviewItem
from ..normalizers.canonical_id import generate_game_id
from ..normalizers.team_resolver import (
TeamResolver,
TEAM_MAPPINGS,
get_team_resolver,
)
from ..normalizers.stadium_resolver import (
StadiumResolver,
STADIUM_MAPPINGS,
get_stadium_resolver,
)
from ..utils.logging import get_logger, log_game, log_warning
class NWSLScraper(BaseScraper):
"""NWSL schedule scraper with multi-source fallback.
Sources (in priority order):
1. ESPN API - Most reliable for NWSL
Source Timezones:
- espn: UTC - ISO 8601 format with "Z" suffix
"""
def __init__(self, season: int, **kwargs):
"""Initialize NWSL scraper.
Args:
season: Season year (e.g., 2026 for 2026 season)
"""
super().__init__("nwsl", season, **kwargs)
self._team_resolver = get_team_resolver("nwsl")
self._stadium_resolver = get_stadium_resolver("nwsl")
def _get_sources(self) -> list[str]:
"""Return source list in priority order."""
return ["espn"]
def _get_source_url(self, source: str, **kwargs) -> str:
"""Build URL for a source."""
if source == "espn":
date_str = kwargs.get("date", "")
return f"https://site.api.espn.com/apis/site/v2/sports/soccer/usa.nwsl/scoreboard?dates={date_str}"
raise ValueError(f"Unknown source: {source}")
def _get_season_months(self) -> list[tuple[int, int]]:
"""Get the months to scrape for NWSL season.
NWSL season runs March through November.
"""
months = []
# NWSL regular season + playoffs
for month in range(3, 12): # March-Nov
months.append((self.season, month))
return months
def _scrape_games_from_source(self, source: str) -> list[RawGameData]:
"""Scrape games from a specific source."""
if source == "espn":
return self._scrape_espn()
else:
raise ValueError(f"Unknown source: {source}")
def _scrape_espn(self) -> list[RawGameData]:
"""Scrape games from ESPN API using date range query."""
# Build date range for entire season (March-November)
season_months = self._get_season_months()
start_year, start_month = season_months[0]
end_year, end_month = season_months[-1]
# Get last day of end month
if end_month == 12:
end_date = date(end_year + 1, 1, 1) - timedelta(days=1)
else:
end_date = date(end_year, end_month + 1, 1) - timedelta(days=1)
start_date = date(start_year, start_month, 1)
date_range = f"{start_date.strftime('%Y%m%d')}-{end_date.strftime('%Y%m%d')}"
url = f"https://site.api.espn.com/apis/site/v2/sports/soccer/usa.nwsl/scoreboard?limit=1000&dates={date_range}"
self._logger.info(f"Fetching NWSL schedule: {date_range}")
try:
data = self.session.get_json(url)
return self._parse_espn_response(data, url)
except Exception as e:
self._logger.error(f"ESPN error: {e}")
return []
def _parse_espn_response(
self,
data: dict,
source_url: str,
) -> list[RawGameData]:
"""Parse ESPN API response."""
games: list[RawGameData] = []
events = data.get("events", [])
for event in events:
try:
game = self._parse_espn_event(event, source_url)
if game:
games.append(game)
except Exception as e:
self._logger.debug(f"Failed to parse ESPN event: {e}")
continue
return games
def _parse_espn_event(
self,
event: dict,
source_url: str,
) -> Optional[RawGameData]:
"""Parse a single ESPN event."""
# Get date
date_str = event.get("date", "")
if not date_str:
return None
try:
game_date = datetime.fromisoformat(date_str.replace("Z", "+00:00"))
except ValueError:
return None
# Get competitions
competitions = event.get("competitions", [])
if not competitions:
return None
competition = competitions[0]
# Get teams
competitors = competition.get("competitors", [])
if len(competitors) != 2:
return None
home_team = None
away_team = None
home_score = None
away_score = None
for competitor in competitors:
team_info = competitor.get("team", {})
team_name = team_info.get("displayName", "")
is_home = competitor.get("homeAway") == "home"
score = competitor.get("score")
if score:
try:
score = int(score)
except (ValueError, TypeError):
score = None
if is_home:
home_team = team_name
home_score = score
else:
away_team = team_name
away_score = score
if not home_team or not away_team:
return None
# Get venue
venue = competition.get("venue", {})
stadium = venue.get("fullName")
# Get status
status_info = competition.get("status", {})
status_type = status_info.get("type", {})
status_name = status_type.get("name", "").lower()
if status_name == "status_final":
status = "final"
elif status_name == "status_postponed":
status = "postponed"
elif status_name == "status_canceled":
status = "cancelled"
else:
status = "scheduled"
return RawGameData(
game_date=game_date,
home_team_raw=home_team,
away_team_raw=away_team,
stadium_raw=stadium,
home_score=home_score,
away_score=away_score,
status=status,
source_url=source_url,
)
def _normalize_games(
self,
raw_games: list[RawGameData],
) -> tuple[list[Game], list[ManualReviewItem]]:
"""Normalize raw games to Game objects with canonical IDs."""
games: list[Game] = []
review_items: list[ManualReviewItem] = []
for raw in raw_games:
game, item_reviews = self._normalize_single_game(raw)
if game:
games.append(game)
log_game(
self.sport,
game.id,
game.home_team_id,
game.away_team_id,
game.game_date.strftime("%Y-%m-%d"),
game.status,
)
review_items.extend(item_reviews)
return games, review_items
def _normalize_single_game(
self,
raw: RawGameData,
) -> tuple[Optional[Game], list[ManualReviewItem]]:
"""Normalize a single raw game."""
review_items: list[ManualReviewItem] = []
# Resolve home team
home_result = self._team_resolver.resolve(
raw.home_team_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if home_result.review_item:
review_items.append(home_result.review_item)
if not home_result.canonical_id:
log_warning(f"Could not resolve home team: {raw.home_team_raw}")
return None, review_items
# Resolve away team
away_result = self._team_resolver.resolve(
raw.away_team_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if away_result.review_item:
review_items.append(away_result.review_item)
if not away_result.canonical_id:
log_warning(f"Could not resolve away team: {raw.away_team_raw}")
return None, review_items
# Resolve stadium
stadium_id = None
if raw.stadium_raw:
stadium_result = self._stadium_resolver.resolve(
raw.stadium_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if stadium_result.review_item:
review_items.append(stadium_result.review_item)
stadium_id = stadium_result.canonical_id
# Get abbreviations for game ID
home_abbrev = self._get_abbreviation(home_result.canonical_id)
away_abbrev = self._get_abbreviation(away_result.canonical_id)
# Generate canonical game ID
game_id = generate_game_id(
sport=self.sport,
season=self.season,
away_abbrev=away_abbrev,
home_abbrev=home_abbrev,
game_date=raw.game_date,
game_number=None,
)
game = Game(
id=game_id,
sport=self.sport,
season=self.season,
home_team_id=home_result.canonical_id,
away_team_id=away_result.canonical_id,
stadium_id=stadium_id or "",
game_date=raw.game_date,
game_number=None,
home_score=raw.home_score,
away_score=raw.away_score,
status=raw.status,
source_url=raw.source_url,
raw_home_team=raw.home_team_raw,
raw_away_team=raw.away_team_raw,
raw_stadium=raw.stadium_raw,
)
return game, review_items
def _get_abbreviation(self, team_id: str) -> str:
"""Extract abbreviation from team ID."""
parts = team_id.split("_")
return parts[-1] if parts else ""
def scrape_teams(self) -> list[Team]:
"""Get all NWSL teams from hardcoded mappings."""
teams: list[Team] = []
seen: set[str] = set()
for abbrev, (team_id, full_name, city, stadium_id) in TEAM_MAPPINGS.get("nwsl", {}).items():
if team_id in seen:
continue
seen.add(team_id)
# Parse team name
team_name = full_name
team = Team(
id=team_id,
sport="nwsl",
city=city,
name=team_name,
full_name=full_name,
abbreviation=abbrev,
conference=None, # NWSL uses single table
division=None,
stadium_id=stadium_id,
)
teams.append(team)
return teams
def scrape_stadiums(self) -> list[Stadium]:
"""Get all NWSL stadiums from hardcoded mappings."""
stadiums: list[Stadium] = []
nwsl_stadiums = STADIUM_MAPPINGS.get("nwsl", {})
for stadium_id, info in nwsl_stadiums.items():
stadium = Stadium(
id=stadium_id,
sport="nwsl",
name=info.name,
city=info.city,
state=info.state,
country=info.country,
latitude=info.latitude,
longitude=info.longitude,
timezone=info.timezone,
surface="grass",
roof_type="open",
)
stadiums.append(stadium)
return stadiums
def create_nwsl_scraper(season: int) -> NWSLScraper:
"""Factory function to create an NWSL scraper."""
return NWSLScraper(season=season)

View File

@@ -1,378 +0,0 @@
"""WNBA scraper implementation with multi-source fallback."""
from datetime import datetime, date, timedelta
from typing import Optional
from .base import BaseScraper, RawGameData, ScrapeResult
from ..models.game import Game
from ..models.team import Team
from ..models.stadium import Stadium
from ..models.aliases import ManualReviewItem
from ..normalizers.canonical_id import generate_game_id
from ..normalizers.team_resolver import (
TeamResolver,
TEAM_MAPPINGS,
get_team_resolver,
)
from ..normalizers.stadium_resolver import (
StadiumResolver,
STADIUM_MAPPINGS,
get_stadium_resolver,
)
from ..utils.logging import get_logger, log_game, log_warning
class WNBAScraper(BaseScraper):
"""WNBA schedule scraper with multi-source fallback.
Sources (in priority order):
1. ESPN API - Most reliable for WNBA
Source Timezones:
- espn: UTC - ISO 8601 format with "Z" suffix
"""
def __init__(self, season: int, **kwargs):
"""Initialize WNBA scraper.
Args:
season: Season year (e.g., 2026 for 2026 season)
"""
super().__init__("wnba", season, **kwargs)
self._team_resolver = get_team_resolver("wnba")
self._stadium_resolver = get_stadium_resolver("wnba")
def _get_sources(self) -> list[str]:
"""Return source list in priority order."""
return ["espn"]
def _get_source_url(self, source: str, **kwargs) -> str:
"""Build URL for a source."""
if source == "espn":
date_str = kwargs.get("date", "")
return f"https://site.api.espn.com/apis/site/v2/sports/basketball/wnba/scoreboard?dates={date_str}"
raise ValueError(f"Unknown source: {source}")
def _get_season_months(self) -> list[tuple[int, int]]:
"""Get the months to scrape for WNBA season.
WNBA season runs May through September/October.
"""
months = []
# WNBA regular season + playoffs
for month in range(5, 11): # May-Oct
months.append((self.season, month))
return months
def _scrape_games_from_source(self, source: str) -> list[RawGameData]:
"""Scrape games from a specific source."""
if source == "espn":
return self._scrape_espn()
else:
raise ValueError(f"Unknown source: {source}")
def _scrape_espn(self) -> list[RawGameData]:
"""Scrape games from ESPN API using date range query."""
# Build date range for entire season (May-October)
season_months = self._get_season_months()
start_year, start_month = season_months[0]
end_year, end_month = season_months[-1]
# Get last day of end month
if end_month == 12:
end_date = date(end_year + 1, 1, 1) - timedelta(days=1)
else:
end_date = date(end_year, end_month + 1, 1) - timedelta(days=1)
start_date = date(start_year, start_month, 1)
date_range = f"{start_date.strftime('%Y%m%d')}-{end_date.strftime('%Y%m%d')}"
url = f"https://site.api.espn.com/apis/site/v2/sports/basketball/wnba/scoreboard?limit=1000&dates={date_range}"
self._logger.info(f"Fetching WNBA schedule: {date_range}")
try:
data = self.session.get_json(url)
return self._parse_espn_response(data, url)
except Exception as e:
self._logger.error(f"ESPN error: {e}")
return []
def _parse_espn_response(
self,
data: dict,
source_url: str,
) -> list[RawGameData]:
"""Parse ESPN API response."""
games: list[RawGameData] = []
events = data.get("events", [])
for event in events:
try:
game = self._parse_espn_event(event, source_url)
if game:
games.append(game)
except Exception as e:
self._logger.debug(f"Failed to parse ESPN event: {e}")
continue
return games
def _parse_espn_event(
self,
event: dict,
source_url: str,
) -> Optional[RawGameData]:
"""Parse a single ESPN event."""
# Get date
date_str = event.get("date", "")
if not date_str:
return None
try:
game_date = datetime.fromisoformat(date_str.replace("Z", "+00:00"))
except ValueError:
return None
# Get competitions
competitions = event.get("competitions", [])
if not competitions:
return None
competition = competitions[0]
# Get teams
competitors = competition.get("competitors", [])
if len(competitors) != 2:
return None
home_team = None
away_team = None
home_score = None
away_score = None
for competitor in competitors:
team_info = competitor.get("team", {})
team_name = team_info.get("displayName", "")
is_home = competitor.get("homeAway") == "home"
score = competitor.get("score")
if score:
try:
score = int(score)
except (ValueError, TypeError):
score = None
if is_home:
home_team = team_name
home_score = score
else:
away_team = team_name
away_score = score
if not home_team or not away_team:
return None
# Get venue
venue = competition.get("venue", {})
stadium = venue.get("fullName")
# Get status
status_info = competition.get("status", {})
status_type = status_info.get("type", {})
status_name = status_type.get("name", "").lower()
if status_name == "status_final":
status = "final"
elif status_name == "status_postponed":
status = "postponed"
elif status_name == "status_canceled":
status = "cancelled"
else:
status = "scheduled"
return RawGameData(
game_date=game_date,
home_team_raw=home_team,
away_team_raw=away_team,
stadium_raw=stadium,
home_score=home_score,
away_score=away_score,
status=status,
source_url=source_url,
)
def _normalize_games(
self,
raw_games: list[RawGameData],
) -> tuple[list[Game], list[ManualReviewItem]]:
"""Normalize raw games to Game objects with canonical IDs."""
games: list[Game] = []
review_items: list[ManualReviewItem] = []
for raw in raw_games:
game, item_reviews = self._normalize_single_game(raw)
if game:
games.append(game)
log_game(
self.sport,
game.id,
game.home_team_id,
game.away_team_id,
game.game_date.strftime("%Y-%m-%d"),
game.status,
)
review_items.extend(item_reviews)
return games, review_items
def _normalize_single_game(
self,
raw: RawGameData,
) -> tuple[Optional[Game], list[ManualReviewItem]]:
"""Normalize a single raw game."""
review_items: list[ManualReviewItem] = []
# Resolve home team
home_result = self._team_resolver.resolve(
raw.home_team_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if home_result.review_item:
review_items.append(home_result.review_item)
if not home_result.canonical_id:
log_warning(f"Could not resolve home team: {raw.home_team_raw}")
return None, review_items
# Resolve away team
away_result = self._team_resolver.resolve(
raw.away_team_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if away_result.review_item:
review_items.append(away_result.review_item)
if not away_result.canonical_id:
log_warning(f"Could not resolve away team: {raw.away_team_raw}")
return None, review_items
# Resolve stadium
stadium_id = None
if raw.stadium_raw:
stadium_result = self._stadium_resolver.resolve(
raw.stadium_raw,
check_date=raw.game_date.date(),
source_url=raw.source_url,
)
if stadium_result.review_item:
review_items.append(stadium_result.review_item)
stadium_id = stadium_result.canonical_id
# Get abbreviations for game ID
home_abbrev = self._get_abbreviation(home_result.canonical_id)
away_abbrev = self._get_abbreviation(away_result.canonical_id)
# Generate canonical game ID
game_id = generate_game_id(
sport=self.sport,
season=self.season,
away_abbrev=away_abbrev,
home_abbrev=home_abbrev,
game_date=raw.game_date,
game_number=None,
)
game = Game(
id=game_id,
sport=self.sport,
season=self.season,
home_team_id=home_result.canonical_id,
away_team_id=away_result.canonical_id,
stadium_id=stadium_id or "",
game_date=raw.game_date,
game_number=None,
home_score=raw.home_score,
away_score=raw.away_score,
status=raw.status,
source_url=raw.source_url,
raw_home_team=raw.home_team_raw,
raw_away_team=raw.away_team_raw,
raw_stadium=raw.stadium_raw,
)
return game, review_items
def _get_abbreviation(self, team_id: str) -> str:
"""Extract abbreviation from team ID."""
parts = team_id.split("_")
return parts[-1] if parts else ""
def scrape_teams(self) -> list[Team]:
"""Get all WNBA teams from hardcoded mappings."""
teams: list[Team] = []
seen: set[str] = set()
for abbrev, (team_id, full_name, city, stadium_id) in TEAM_MAPPINGS.get("wnba", {}).items():
if team_id in seen:
continue
seen.add(team_id)
# Parse team name
parts = full_name.split()
team_name = parts[-1] if parts else full_name
team = Team(
id=team_id,
sport="wnba",
city=city,
name=team_name,
full_name=full_name,
abbreviation=abbrev,
conference=None, # WNBA uses single table now
division=None,
stadium_id=stadium_id,
)
teams.append(team)
return teams
def scrape_stadiums(self) -> list[Stadium]:
"""Get all WNBA stadiums from hardcoded mappings."""
stadiums: list[Stadium] = []
wnba_stadiums = STADIUM_MAPPINGS.get("wnba", {})
for stadium_id, info in wnba_stadiums.items():
stadium = Stadium(
id=stadium_id,
sport="wnba",
name=info.name,
city=info.city,
state=info.state,
country=info.country,
latitude=info.latitude,
longitude=info.longitude,
timezone=info.timezone,
surface="hardwood",
roof_type="dome",
)
stadiums.append(stadium)
return stadiums
def create_wnba_scraper(season: int) -> WNBAScraper:
"""Factory function to create a WNBA scraper."""
return WNBAScraper(season=season)

View File

@@ -1 +0,0 @@
"""Unit tests for sportstime_parser."""

View File

@@ -1,48 +0,0 @@
"""Test fixtures for sportstime-parser tests."""
from pathlib import Path
FIXTURES_DIR = Path(__file__).parent
# NBA fixtures
NBA_FIXTURES_DIR = FIXTURES_DIR / "nba"
NBA_BR_OCTOBER_HTML = NBA_FIXTURES_DIR / "basketball_reference_october.html"
NBA_BR_EDGE_CASES_HTML = NBA_FIXTURES_DIR / "basketball_reference_edge_cases.html"
NBA_ESPN_SCOREBOARD_JSON = NBA_FIXTURES_DIR / "espn_scoreboard.json"
# MLB fixtures
MLB_FIXTURES_DIR = FIXTURES_DIR / "mlb"
MLB_ESPN_SCOREBOARD_JSON = MLB_FIXTURES_DIR / "espn_scoreboard.json"
# NFL fixtures
NFL_FIXTURES_DIR = FIXTURES_DIR / "nfl"
NFL_ESPN_SCOREBOARD_JSON = NFL_FIXTURES_DIR / "espn_scoreboard.json"
# NHL fixtures
NHL_FIXTURES_DIR = FIXTURES_DIR / "nhl"
NHL_ESPN_SCOREBOARD_JSON = NHL_FIXTURES_DIR / "espn_scoreboard.json"
# MLS fixtures
MLS_FIXTURES_DIR = FIXTURES_DIR / "mls"
MLS_ESPN_SCOREBOARD_JSON = MLS_FIXTURES_DIR / "espn_scoreboard.json"
# WNBA fixtures
WNBA_FIXTURES_DIR = FIXTURES_DIR / "wnba"
WNBA_ESPN_SCOREBOARD_JSON = WNBA_FIXTURES_DIR / "espn_scoreboard.json"
# NWSL fixtures
NWSL_FIXTURES_DIR = FIXTURES_DIR / "nwsl"
NWSL_ESPN_SCOREBOARD_JSON = NWSL_FIXTURES_DIR / "espn_scoreboard.json"
def load_fixture(path: Path) -> str:
"""Load a fixture file as text."""
with open(path, "r", encoding="utf-8") as f:
return f.read()
def load_json_fixture(path: Path) -> dict:
"""Load a JSON fixture file."""
import json
with open(path, "r", encoding="utf-8") as f:
return json.load(f)

View File

@@ -1,245 +0,0 @@
{
"leagues": [
{
"id": "10",
"uid": "s:1~l:10",
"name": "Major League Baseball",
"abbreviation": "MLB"
}
],
"season": {
"type": 2,
"year": 2026
},
"day": {
"date": "2026-04-15T00:00:00Z"
},
"events": [
{
"id": "401584801",
"uid": "s:1~l:10~e:401584801",
"date": "2026-04-15T23:05:00Z",
"name": "New York Yankees at Boston Red Sox",
"shortName": "NYY @ BOS",
"competitions": [
{
"id": "401584801",
"uid": "s:1~l:10~e:401584801~c:401584801",
"date": "2026-04-15T23:05:00Z",
"attendance": 37435,
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "3",
"fullName": "Fenway Park",
"address": {
"city": "Boston",
"state": "MA"
},
"capacity": 37755,
"indoor": false
},
"competitors": [
{
"id": "2",
"uid": "s:1~l:10~t:2",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "2",
"uid": "s:1~l:10~t:2",
"location": "Boston",
"name": "Red Sox",
"abbreviation": "BOS",
"displayName": "Boston Red Sox"
},
"score": "5",
"winner": true
},
{
"id": "10",
"uid": "s:1~l:10~t:10",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "10",
"uid": "s:1~l:10~t:10",
"location": "New York",
"name": "Yankees",
"abbreviation": "NYY",
"displayName": "New York Yankees"
},
"score": "3",
"winner": false
}
],
"status": {
"clock": 0,
"displayClock": "0:00",
"period": 9,
"type": {
"id": "3",
"name": "STATUS_FINAL",
"state": "post",
"completed": true
}
}
}
]
},
{
"id": "401584802",
"uid": "s:1~l:10~e:401584802",
"date": "2026-04-15T20:10:00Z",
"name": "Chicago Cubs at St. Louis Cardinals",
"shortName": "CHC @ STL",
"competitions": [
{
"id": "401584802",
"uid": "s:1~l:10~e:401584802~c:401584802",
"date": "2026-04-15T20:10:00Z",
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "87",
"fullName": "Busch Stadium",
"address": {
"city": "St. Louis",
"state": "MO"
},
"capacity": 45538,
"indoor": false
},
"competitors": [
{
"id": "24",
"uid": "s:1~l:10~t:24",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "24",
"uid": "s:1~l:10~t:24",
"location": "St. Louis",
"name": "Cardinals",
"abbreviation": "STL",
"displayName": "St. Louis Cardinals"
},
"score": "7",
"winner": true
},
{
"id": "16",
"uid": "s:1~l:10~t:16",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "16",
"uid": "s:1~l:10~t:16",
"location": "Chicago",
"name": "Cubs",
"abbreviation": "CHC",
"displayName": "Chicago Cubs"
},
"score": "4",
"winner": false
}
],
"status": {
"clock": 0,
"displayClock": "0:00",
"period": 9,
"type": {
"id": "3",
"name": "STATUS_FINAL",
"state": "post",
"completed": true
}
}
}
]
},
{
"id": "401584803",
"uid": "s:1~l:10~e:401584803",
"date": "2026-04-16T00:10:00Z",
"name": "Los Angeles Dodgers at San Francisco Giants",
"shortName": "LAD @ SF",
"competitions": [
{
"id": "401584803",
"uid": "s:1~l:10~e:401584803~c:401584803",
"date": "2026-04-16T00:10:00Z",
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "116",
"fullName": "Oracle Park",
"address": {
"city": "San Francisco",
"state": "CA"
},
"capacity": 41915,
"indoor": false
},
"competitors": [
{
"id": "26",
"uid": "s:1~l:10~t:26",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "26",
"uid": "s:1~l:10~t:26",
"location": "San Francisco",
"name": "Giants",
"abbreviation": "SF",
"displayName": "San Francisco Giants"
},
"score": null,
"winner": null
},
{
"id": "19",
"uid": "s:1~l:10~t:19",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "19",
"uid": "s:1~l:10~t:19",
"location": "Los Angeles",
"name": "Dodgers",
"abbreviation": "LAD",
"displayName": "Los Angeles Dodgers"
},
"score": null,
"winner": null
}
],
"status": {
"clock": 0,
"displayClock": "0:00",
"period": 0,
"type": {
"id": "1",
"name": "STATUS_SCHEDULED",
"state": "pre",
"completed": false
}
}
}
]
}
]
}

View File

@@ -1,245 +0,0 @@
{
"leagues": [
{
"id": "19",
"uid": "s:600~l:19",
"name": "Major League Soccer",
"abbreviation": "MLS"
}
],
"season": {
"type": 2,
"year": 2026
},
"day": {
"date": "2026-03-15T00:00:00Z"
},
"events": [
{
"id": "401672001",
"uid": "s:600~l:19~e:401672001",
"date": "2026-03-15T22:00:00Z",
"name": "LA Galaxy at LAFC",
"shortName": "LA @ LAFC",
"competitions": [
{
"id": "401672001",
"uid": "s:600~l:19~e:401672001~c:401672001",
"date": "2026-03-15T22:00:00Z",
"attendance": 22000,
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "8909",
"fullName": "BMO Stadium",
"address": {
"city": "Los Angeles",
"state": "CA"
},
"capacity": 22000,
"indoor": false
},
"competitors": [
{
"id": "21295",
"uid": "s:600~l:19~t:21295",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "21295",
"uid": "s:600~l:19~t:21295",
"location": "Los Angeles",
"name": "FC",
"abbreviation": "LAFC",
"displayName": "Los Angeles FC"
},
"score": "3",
"winner": true
},
{
"id": "3610",
"uid": "s:600~l:19~t:3610",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "3610",
"uid": "s:600~l:19~t:3610",
"location": "Los Angeles",
"name": "Galaxy",
"abbreviation": "LA",
"displayName": "LA Galaxy"
},
"score": "2",
"winner": false
}
],
"status": {
"clock": 90,
"displayClock": "90'",
"period": 2,
"type": {
"id": "3",
"name": "STATUS_FINAL",
"state": "post",
"completed": true
}
}
}
]
},
{
"id": "401672002",
"uid": "s:600~l:19~e:401672002",
"date": "2026-03-15T23:00:00Z",
"name": "Seattle Sounders at Portland Timbers",
"shortName": "SEA @ POR",
"competitions": [
{
"id": "401672002",
"uid": "s:600~l:19~e:401672002~c:401672002",
"date": "2026-03-15T23:00:00Z",
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "8070",
"fullName": "Providence Park",
"address": {
"city": "Portland",
"state": "OR"
},
"capacity": 25218,
"indoor": false
},
"competitors": [
{
"id": "5282",
"uid": "s:600~l:19~t:5282",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "5282",
"uid": "s:600~l:19~t:5282",
"location": "Portland",
"name": "Timbers",
"abbreviation": "POR",
"displayName": "Portland Timbers"
},
"score": "2",
"winner": false
},
{
"id": "4687",
"uid": "s:600~l:19~t:4687",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "4687",
"uid": "s:600~l:19~t:4687",
"location": "Seattle",
"name": "Sounders FC",
"abbreviation": "SEA",
"displayName": "Seattle Sounders FC"
},
"score": "2",
"winner": false
}
],
"status": {
"clock": 90,
"displayClock": "90'",
"period": 2,
"type": {
"id": "3",
"name": "STATUS_FINAL",
"state": "post",
"completed": true
}
}
}
]
},
{
"id": "401672003",
"uid": "s:600~l:19~e:401672003",
"date": "2026-03-16T00:00:00Z",
"name": "New York Red Bulls at Atlanta United",
"shortName": "NY @ ATL",
"competitions": [
{
"id": "401672003",
"uid": "s:600~l:19~e:401672003~c:401672003",
"date": "2026-03-16T00:00:00Z",
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "8904",
"fullName": "Mercedes-Benz Stadium",
"address": {
"city": "Atlanta",
"state": "GA"
},
"capacity": 42500,
"indoor": true
},
"competitors": [
{
"id": "18626",
"uid": "s:600~l:19~t:18626",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "18626",
"uid": "s:600~l:19~t:18626",
"location": "Atlanta",
"name": "United FC",
"abbreviation": "ATL",
"displayName": "Atlanta United FC"
},
"score": null,
"winner": null
},
{
"id": "399",
"uid": "s:600~l:19~t:399",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "399",
"uid": "s:600~l:19~t:399",
"location": "New York",
"name": "Red Bulls",
"abbreviation": "NY",
"displayName": "New York Red Bulls"
},
"score": null,
"winner": null
}
],
"status": {
"clock": 0,
"displayClock": "0'",
"period": 0,
"type": {
"id": "1",
"name": "STATUS_SCHEDULED",
"state": "pre",
"completed": false
}
}
}
]
}
]
}

View File

@@ -1,79 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<title>2025-26 NBA Schedule - Edge Cases | Basketball-Reference.com</title>
</head>
<body>
<table id="schedule" class="stats_table">
<thead>
<tr>
<th data-stat="date_game">Date</th>
<th data-stat="game_start_time">Start (ET)</th>
<th data-stat="visitor_team_name">Visitor/Neutral</th>
<th data-stat="visitor_pts">PTS</th>
<th data-stat="home_team_name">Home/Neutral</th>
<th data-stat="home_pts">PTS</th>
<th data-stat="arena_name">Arena</th>
<th data-stat="game_remarks">Notes</th>
</tr>
</thead>
<tbody>
<!-- Postponed game -->
<tr>
<th data-stat="date_game">Sat, Jan 11, 2026</th>
<td data-stat="game_start_time">7:30p</td>
<td data-stat="visitor_team_name">Los Angeles Lakers</td>
<td data-stat="visitor_pts"></td>
<td data-stat="home_team_name">Phoenix Suns</td>
<td data-stat="home_pts"></td>
<td data-stat="arena_name">Footprint Center</td>
<td data-stat="game_remarks">Postponed - Weather</td>
</tr>
<!-- Neutral site game (Mexico City) -->
<tr>
<th data-stat="date_game">Sat, Nov 8, 2025</th>
<td data-stat="game_start_time">7:00p</td>
<td data-stat="visitor_team_name">Miami Heat</td>
<td data-stat="visitor_pts">105</td>
<td data-stat="home_team_name">Washington Wizards</td>
<td data-stat="home_pts">99</td>
<td data-stat="arena_name">Arena CDMX</td>
<td data-stat="game_remarks">NBA Mexico City Games</td>
</tr>
<!-- Cancelled game -->
<tr>
<th data-stat="date_game">Wed, Dec 3, 2025</th>
<td data-stat="game_start_time">8:00p</td>
<td data-stat="visitor_team_name">Portland Trail Blazers</td>
<td data-stat="visitor_pts"></td>
<td data-stat="home_team_name">Sacramento Kings</td>
<td data-stat="home_pts"></td>
<td data-stat="arena_name">Golden 1 Center</td>
<td data-stat="game_remarks">Cancelled</td>
</tr>
<!-- Regular completed game with high scores -->
<tr>
<th data-stat="date_game">Sun, Mar 15, 2026</th>
<td data-stat="game_start_time">3:30p</td>
<td data-stat="visitor_team_name">Indiana Pacers</td>
<td data-stat="visitor_pts">147</td>
<td data-stat="home_team_name">Atlanta Hawks</td>
<td data-stat="home_pts">150</td>
<td data-stat="arena_name">State Farm Arena</td>
<td data-stat="game_remarks">OT</td>
</tr>
<!-- Game at arena with special characters -->
<tr>
<th data-stat="date_game">Mon, Feb 2, 2026</th>
<td data-stat="game_start_time">10:30p</td>
<td data-stat="visitor_team_name">Golden State Warriors</td>
<td data-stat="visitor_pts">118</td>
<td data-stat="home_team_name">Los Angeles Clippers</td>
<td data-stat="home_pts">115</td>
<td data-stat="arena_name">Intuit Dome</td>
<td data-stat="game_remarks"></td>
</tr>
</tbody>
</table>
</body>
</html>

View File

@@ -1,94 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<title>2025-26 NBA Schedule - October | Basketball-Reference.com</title>
</head>
<body>
<table id="schedule" class="stats_table">
<thead>
<tr>
<th data-stat="date_game">Date</th>
<th data-stat="game_start_time">Start (ET)</th>
<th data-stat="visitor_team_name">Visitor/Neutral</th>
<th data-stat="visitor_pts">PTS</th>
<th data-stat="home_team_name">Home/Neutral</th>
<th data-stat="home_pts">PTS</th>
<th data-stat="arena_name">Arena</th>
<th data-stat="game_remarks">Notes</th>
</tr>
</thead>
<tbody>
<tr>
<th data-stat="date_game">Tue, Oct 22, 2025</th>
<td data-stat="game_start_time">7:30p</td>
<td data-stat="visitor_team_name">Boston Celtics</td>
<td data-stat="visitor_pts">112</td>
<td data-stat="home_team_name">Cleveland Cavaliers</td>
<td data-stat="home_pts">108</td>
<td data-stat="arena_name">Rocket Mortgage FieldHouse</td>
<td data-stat="game_remarks"></td>
</tr>
<tr>
<th data-stat="date_game">Tue, Oct 22, 2025</th>
<td data-stat="game_start_time">10:00p</td>
<td data-stat="visitor_team_name">Denver Nuggets</td>
<td data-stat="visitor_pts">119</td>
<td data-stat="home_team_name">Los Angeles Lakers</td>
<td data-stat="home_pts">127</td>
<td data-stat="arena_name">Crypto.com Arena</td>
<td data-stat="game_remarks"></td>
</tr>
<tr>
<th data-stat="date_game">Wed, Oct 23, 2025</th>
<td data-stat="game_start_time">7:00p</td>
<td data-stat="visitor_team_name">Houston Rockets</td>
<td data-stat="visitor_pts"></td>
<td data-stat="home_team_name">Oklahoma City Thunder</td>
<td data-stat="home_pts"></td>
<td data-stat="arena_name">Paycom Center</td>
<td data-stat="game_remarks"></td>
</tr>
<tr>
<th data-stat="date_game">Wed, Oct 23, 2025</th>
<td data-stat="game_start_time">7:30p</td>
<td data-stat="visitor_team_name">New York Knicks</td>
<td data-stat="visitor_pts"></td>
<td data-stat="home_team_name">Brooklyn Nets</td>
<td data-stat="home_pts"></td>
<td data-stat="arena_name">Barclays Center</td>
<td data-stat="game_remarks"></td>
</tr>
<tr>
<th data-stat="date_game">Thu, Oct 24, 2025</th>
<td data-stat="game_start_time">7:00p</td>
<td data-stat="visitor_team_name">Chicago Bulls</td>
<td data-stat="visitor_pts"></td>
<td data-stat="home_team_name">Miami Heat</td>
<td data-stat="home_pts"></td>
<td data-stat="arena_name">Kaseya Center</td>
<td data-stat="game_remarks"></td>
</tr>
<tr>
<th data-stat="date_game">Fri, Oct 25, 2025</th>
<td data-stat="game_start_time">7:30p</td>
<td data-stat="visitor_team_name">Toronto Raptors</td>
<td data-stat="visitor_pts"></td>
<td data-stat="home_team_name">Boston Celtics</td>
<td data-stat="home_pts"></td>
<td data-stat="arena_name">TD Garden</td>
<td data-stat="game_remarks"></td>
</tr>
<tr>
<th data-stat="date_game">Sat, Oct 26, 2025</th>
<td data-stat="game_start_time">8:00p</td>
<td data-stat="visitor_team_name">Minnesota Timberwolves</td>
<td data-stat="visitor_pts"></td>
<td data-stat="home_team_name">Dallas Mavericks</td>
<td data-stat="home_pts"></td>
<td data-stat="arena_name">American Airlines Center</td>
<td data-stat="game_remarks"></td>
</tr>
</tbody>
</table>
</body>
</html>

View File

@@ -1,245 +0,0 @@
{
"leagues": [
{
"id": "46",
"uid": "s:40~l:46",
"name": "National Basketball Association",
"abbreviation": "NBA"
}
],
"season": {
"type": 2,
"year": 2026
},
"day": {
"date": "2025-10-22T00:00:00Z"
},
"events": [
{
"id": "401584721",
"uid": "s:40~l:46~e:401584721",
"date": "2025-10-22T23:30:00Z",
"name": "Boston Celtics at Cleveland Cavaliers",
"shortName": "BOS @ CLE",
"competitions": [
{
"id": "401584721",
"uid": "s:40~l:46~e:401584721~c:401584721",
"date": "2025-10-22T23:30:00Z",
"attendance": 20562,
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "5064",
"fullName": "Rocket Mortgage FieldHouse",
"address": {
"city": "Cleveland",
"state": "OH"
},
"capacity": 19432,
"indoor": true
},
"competitors": [
{
"id": "5",
"uid": "s:40~l:46~t:5",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "5",
"uid": "s:40~l:46~t:5",
"location": "Cleveland",
"name": "Cavaliers",
"abbreviation": "CLE",
"displayName": "Cleveland Cavaliers"
},
"score": "108",
"winner": false
},
{
"id": "2",
"uid": "s:40~l:46~t:2",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "2",
"uid": "s:40~l:46~t:2",
"location": "Boston",
"name": "Celtics",
"abbreviation": "BOS",
"displayName": "Boston Celtics"
},
"score": "112",
"winner": true
}
],
"status": {
"clock": 0,
"displayClock": "0:00",
"period": 4,
"type": {
"id": "3",
"name": "STATUS_FINAL",
"state": "post",
"completed": true
}
}
}
]
},
{
"id": "401584722",
"uid": "s:40~l:46~e:401584722",
"date": "2025-10-23T02:00:00Z",
"name": "Denver Nuggets at Los Angeles Lakers",
"shortName": "DEN @ LAL",
"competitions": [
{
"id": "401584722",
"uid": "s:40~l:46~e:401584722~c:401584722",
"date": "2025-10-23T02:00:00Z",
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "5091",
"fullName": "Crypto.com Arena",
"address": {
"city": "Los Angeles",
"state": "CA"
},
"capacity": 19068,
"indoor": true
},
"competitors": [
{
"id": "13",
"uid": "s:40~l:46~t:13",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "13",
"uid": "s:40~l:46~t:13",
"location": "Los Angeles",
"name": "Lakers",
"abbreviation": "LAL",
"displayName": "Los Angeles Lakers"
},
"score": "127",
"winner": true
},
{
"id": "7",
"uid": "s:40~l:46~t:7",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "7",
"uid": "s:40~l:46~t:7",
"location": "Denver",
"name": "Nuggets",
"abbreviation": "DEN",
"displayName": "Denver Nuggets"
},
"score": "119",
"winner": false
}
],
"status": {
"clock": 0,
"displayClock": "0:00",
"period": 4,
"type": {
"id": "3",
"name": "STATUS_FINAL",
"state": "post",
"completed": true
}
}
}
]
},
{
"id": "401584723",
"uid": "s:40~l:46~e:401584723",
"date": "2025-10-24T00:00:00Z",
"name": "Houston Rockets at Oklahoma City Thunder",
"shortName": "HOU @ OKC",
"competitions": [
{
"id": "401584723",
"uid": "s:40~l:46~e:401584723~c:401584723",
"date": "2025-10-24T00:00:00Z",
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "4922",
"fullName": "Paycom Center",
"address": {
"city": "Oklahoma City",
"state": "OK"
},
"capacity": 18203,
"indoor": true
},
"competitors": [
{
"id": "25",
"uid": "s:40~l:46~t:25",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "25",
"uid": "s:40~l:46~t:25",
"location": "Oklahoma City",
"name": "Thunder",
"abbreviation": "OKC",
"displayName": "Oklahoma City Thunder"
},
"score": null,
"winner": null
},
{
"id": "10",
"uid": "s:40~l:46~t:10",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "10",
"uid": "s:40~l:46~t:10",
"location": "Houston",
"name": "Rockets",
"abbreviation": "HOU",
"displayName": "Houston Rockets"
},
"score": null,
"winner": null
}
],
"status": {
"clock": 0,
"displayClock": "0:00",
"period": 0,
"type": {
"id": "1",
"name": "STATUS_SCHEDULED",
"state": "pre",
"completed": false
}
}
}
]
}
]
}

View File

@@ -1,245 +0,0 @@
{
"leagues": [
{
"id": "28",
"uid": "s:20~l:28",
"name": "National Football League",
"abbreviation": "NFL"
}
],
"season": {
"type": 2,
"year": 2025
},
"week": {
"number": 1
},
"events": [
{
"id": "401671801",
"uid": "s:20~l:28~e:401671801",
"date": "2025-09-07T20:00:00Z",
"name": "Kansas City Chiefs at Baltimore Ravens",
"shortName": "KC @ BAL",
"competitions": [
{
"id": "401671801",
"uid": "s:20~l:28~e:401671801~c:401671801",
"date": "2025-09-07T20:00:00Z",
"attendance": 71547,
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "3814",
"fullName": "M&T Bank Stadium",
"address": {
"city": "Baltimore",
"state": "MD"
},
"capacity": 71008,
"indoor": false
},
"competitors": [
{
"id": "33",
"uid": "s:20~l:28~t:33",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "33",
"uid": "s:20~l:28~t:33",
"location": "Baltimore",
"name": "Ravens",
"abbreviation": "BAL",
"displayName": "Baltimore Ravens"
},
"score": "20",
"winner": false
},
{
"id": "12",
"uid": "s:20~l:28~t:12",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "12",
"uid": "s:20~l:28~t:12",
"location": "Kansas City",
"name": "Chiefs",
"abbreviation": "KC",
"displayName": "Kansas City Chiefs"
},
"score": "27",
"winner": true
}
],
"status": {
"clock": 0,
"displayClock": "0:00",
"period": 4,
"type": {
"id": "3",
"name": "STATUS_FINAL",
"state": "post",
"completed": true
}
}
}
]
},
{
"id": "401671802",
"uid": "s:20~l:28~e:401671802",
"date": "2025-09-08T17:00:00Z",
"name": "Philadelphia Eagles at Green Bay Packers",
"shortName": "PHI @ GB",
"competitions": [
{
"id": "401671802",
"uid": "s:20~l:28~e:401671802~c:401671802",
"date": "2025-09-08T17:00:00Z",
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "3798",
"fullName": "Lambeau Field",
"address": {
"city": "Green Bay",
"state": "WI"
},
"capacity": 81441,
"indoor": false
},
"competitors": [
{
"id": "9",
"uid": "s:20~l:28~t:9",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "9",
"uid": "s:20~l:28~t:9",
"location": "Green Bay",
"name": "Packers",
"abbreviation": "GB",
"displayName": "Green Bay Packers"
},
"score": "34",
"winner": true
},
{
"id": "21",
"uid": "s:20~l:28~t:21",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "21",
"uid": "s:20~l:28~t:21",
"location": "Philadelphia",
"name": "Eagles",
"abbreviation": "PHI",
"displayName": "Philadelphia Eagles"
},
"score": "29",
"winner": false
}
],
"status": {
"clock": 0,
"displayClock": "0:00",
"period": 4,
"type": {
"id": "3",
"name": "STATUS_FINAL",
"state": "post",
"completed": true
}
}
}
]
},
{
"id": "401671803",
"uid": "s:20~l:28~e:401671803",
"date": "2025-09-08T20:25:00Z",
"name": "Dallas Cowboys at Cleveland Browns",
"shortName": "DAL @ CLE",
"competitions": [
{
"id": "401671803",
"uid": "s:20~l:28~e:401671803~c:401671803",
"date": "2025-09-08T20:25:00Z",
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "3653",
"fullName": "Cleveland Browns Stadium",
"address": {
"city": "Cleveland",
"state": "OH"
},
"capacity": 67431,
"indoor": false
},
"competitors": [
{
"id": "5",
"uid": "s:20~l:28~t:5",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "5",
"uid": "s:20~l:28~t:5",
"location": "Cleveland",
"name": "Browns",
"abbreviation": "CLE",
"displayName": "Cleveland Browns"
},
"score": null,
"winner": null
},
{
"id": "6",
"uid": "s:20~l:28~t:6",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "6",
"uid": "s:20~l:28~t:6",
"location": "Dallas",
"name": "Cowboys",
"abbreviation": "DAL",
"displayName": "Dallas Cowboys"
},
"score": null,
"winner": null
}
],
"status": {
"clock": 0,
"displayClock": "0:00",
"period": 0,
"type": {
"id": "1",
"name": "STATUS_SCHEDULED",
"state": "pre",
"completed": false
}
}
}
]
}
]
}

View File

@@ -1,245 +0,0 @@
{
"leagues": [
{
"id": "90",
"uid": "s:70~l:90",
"name": "National Hockey League",
"abbreviation": "NHL"
}
],
"season": {
"type": 2,
"year": 2026
},
"day": {
"date": "2025-10-08T00:00:00Z"
},
"events": [
{
"id": "401671901",
"uid": "s:70~l:90~e:401671901",
"date": "2025-10-08T23:00:00Z",
"name": "Pittsburgh Penguins at Boston Bruins",
"shortName": "PIT @ BOS",
"competitions": [
{
"id": "401671901",
"uid": "s:70~l:90~e:401671901~c:401671901",
"date": "2025-10-08T23:00:00Z",
"attendance": 17850,
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "1823",
"fullName": "TD Garden",
"address": {
"city": "Boston",
"state": "MA"
},
"capacity": 17850,
"indoor": true
},
"competitors": [
{
"id": "1",
"uid": "s:70~l:90~t:1",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "1",
"uid": "s:70~l:90~t:1",
"location": "Boston",
"name": "Bruins",
"abbreviation": "BOS",
"displayName": "Boston Bruins"
},
"score": "4",
"winner": true
},
{
"id": "5",
"uid": "s:70~l:90~t:5",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "5",
"uid": "s:70~l:90~t:5",
"location": "Pittsburgh",
"name": "Penguins",
"abbreviation": "PIT",
"displayName": "Pittsburgh Penguins"
},
"score": "2",
"winner": false
}
],
"status": {
"clock": 0,
"displayClock": "0:00",
"period": 3,
"type": {
"id": "3",
"name": "STATUS_FINAL",
"state": "post",
"completed": true
}
}
}
]
},
{
"id": "401671902",
"uid": "s:70~l:90~e:401671902",
"date": "2025-10-09T00:00:00Z",
"name": "Toronto Maple Leafs at Montreal Canadiens",
"shortName": "TOR @ MTL",
"competitions": [
{
"id": "401671902",
"uid": "s:70~l:90~e:401671902~c:401671902",
"date": "2025-10-09T00:00:00Z",
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "1918",
"fullName": "Bell Centre",
"address": {
"city": "Montreal",
"state": "QC"
},
"capacity": 21302,
"indoor": true
},
"competitors": [
{
"id": "8",
"uid": "s:70~l:90~t:8",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "8",
"uid": "s:70~l:90~t:8",
"location": "Montreal",
"name": "Canadiens",
"abbreviation": "MTL",
"displayName": "Montreal Canadiens"
},
"score": "3",
"winner": false
},
{
"id": "10",
"uid": "s:70~l:90~t:10",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "10",
"uid": "s:70~l:90~t:10",
"location": "Toronto",
"name": "Maple Leafs",
"abbreviation": "TOR",
"displayName": "Toronto Maple Leafs"
},
"score": "5",
"winner": true
}
],
"status": {
"clock": 0,
"displayClock": "0:00",
"period": 3,
"type": {
"id": "3",
"name": "STATUS_FINAL",
"state": "post",
"completed": true
}
}
}
]
},
{
"id": "401671903",
"uid": "s:70~l:90~e:401671903",
"date": "2025-10-09T02:00:00Z",
"name": "Vegas Golden Knights at Los Angeles Kings",
"shortName": "VGK @ LAK",
"competitions": [
{
"id": "401671903",
"uid": "s:70~l:90~e:401671903~c:401671903",
"date": "2025-10-09T02:00:00Z",
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "1816",
"fullName": "Crypto.com Arena",
"address": {
"city": "Los Angeles",
"state": "CA"
},
"capacity": 18230,
"indoor": true
},
"competitors": [
{
"id": "26",
"uid": "s:70~l:90~t:26",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "26",
"uid": "s:70~l:90~t:26",
"location": "Los Angeles",
"name": "Kings",
"abbreviation": "LAK",
"displayName": "Los Angeles Kings"
},
"score": null,
"winner": null
},
{
"id": "54",
"uid": "s:70~l:90~t:54",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "54",
"uid": "s:70~l:90~t:54",
"location": "Vegas",
"name": "Golden Knights",
"abbreviation": "VGK",
"displayName": "Vegas Golden Knights"
},
"score": null,
"winner": null
}
],
"status": {
"clock": 0,
"displayClock": "0:00",
"period": 0,
"type": {
"id": "1",
"name": "STATUS_SCHEDULED",
"state": "pre",
"completed": false
}
}
}
]
}
]
}

View File

@@ -1,245 +0,0 @@
{
"leagues": [
{
"id": "761",
"uid": "s:600~l:761",
"name": "National Women's Soccer League",
"abbreviation": "NWSL"
}
],
"season": {
"type": 2,
"year": 2026
},
"day": {
"date": "2026-04-10T00:00:00Z"
},
"events": [
{
"id": "401672201",
"uid": "s:600~l:761~e:401672201",
"date": "2026-04-10T23:00:00Z",
"name": "Angel City FC at Portland Thorns",
"shortName": "LA @ POR",
"competitions": [
{
"id": "401672201",
"uid": "s:600~l:761~e:401672201~c:401672201",
"date": "2026-04-10T23:00:00Z",
"attendance": 22000,
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "8070",
"fullName": "Providence Park",
"address": {
"city": "Portland",
"state": "OR"
},
"capacity": 25218,
"indoor": false
},
"competitors": [
{
"id": "15625",
"uid": "s:600~l:761~t:15625",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "15625",
"uid": "s:600~l:761~t:15625",
"location": "Portland",
"name": "Thorns FC",
"abbreviation": "POR",
"displayName": "Portland Thorns FC"
},
"score": "2",
"winner": true
},
{
"id": "19934",
"uid": "s:600~l:761~t:19934",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "19934",
"uid": "s:600~l:761~t:19934",
"location": "Los Angeles",
"name": "Angel City",
"abbreviation": "LA",
"displayName": "Angel City FC"
},
"score": "1",
"winner": false
}
],
"status": {
"clock": 90,
"displayClock": "90'",
"period": 2,
"type": {
"id": "3",
"name": "STATUS_FINAL",
"state": "post",
"completed": true
}
}
}
]
},
{
"id": "401672202",
"uid": "s:600~l:761~e:401672202",
"date": "2026-04-11T00:00:00Z",
"name": "Orlando Pride at North Carolina Courage",
"shortName": "ORL @ NC",
"competitions": [
{
"id": "401672202",
"uid": "s:600~l:761~e:401672202~c:401672202",
"date": "2026-04-11T00:00:00Z",
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "8073",
"fullName": "WakeMed Soccer Park",
"address": {
"city": "Cary",
"state": "NC"
},
"capacity": 10000,
"indoor": false
},
"competitors": [
{
"id": "15618",
"uid": "s:600~l:761~t:15618",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "15618",
"uid": "s:600~l:761~t:15618",
"location": "North Carolina",
"name": "Courage",
"abbreviation": "NC",
"displayName": "North Carolina Courage"
},
"score": "3",
"winner": true
},
{
"id": "15626",
"uid": "s:600~l:761~t:15626",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "15626",
"uid": "s:600~l:761~t:15626",
"location": "Orlando",
"name": "Pride",
"abbreviation": "ORL",
"displayName": "Orlando Pride"
},
"score": "1",
"winner": false
}
],
"status": {
"clock": 90,
"displayClock": "90'",
"period": 2,
"type": {
"id": "3",
"name": "STATUS_FINAL",
"state": "post",
"completed": true
}
}
}
]
},
{
"id": "401672203",
"uid": "s:600~l:761~e:401672203",
"date": "2026-04-11T02:00:00Z",
"name": "San Diego Wave at Bay FC",
"shortName": "SD @ BAY",
"competitions": [
{
"id": "401672203",
"uid": "s:600~l:761~e:401672203~c:401672203",
"date": "2026-04-11T02:00:00Z",
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "3945",
"fullName": "PayPal Park",
"address": {
"city": "San Jose",
"state": "CA"
},
"capacity": 18000,
"indoor": false
},
"competitors": [
{
"id": "25645",
"uid": "s:600~l:761~t:25645",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "25645",
"uid": "s:600~l:761~t:25645",
"location": "Bay Area",
"name": "FC",
"abbreviation": "BAY",
"displayName": "Bay FC"
},
"score": null,
"winner": null
},
{
"id": "22638",
"uid": "s:600~l:761~t:22638",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "22638",
"uid": "s:600~l:761~t:22638",
"location": "San Diego",
"name": "Wave FC",
"abbreviation": "SD",
"displayName": "San Diego Wave FC"
},
"score": null,
"winner": null
}
],
"status": {
"clock": 0,
"displayClock": "0'",
"period": 0,
"type": {
"id": "1",
"name": "STATUS_SCHEDULED",
"state": "pre",
"completed": false
}
}
}
]
}
]
}

View File

@@ -1,245 +0,0 @@
{
"leagues": [
{
"id": "59",
"uid": "s:40~l:59",
"name": "Women's National Basketball Association",
"abbreviation": "WNBA"
}
],
"season": {
"type": 2,
"year": 2026
},
"day": {
"date": "2026-05-20T00:00:00Z"
},
"events": [
{
"id": "401672101",
"uid": "s:40~l:59~e:401672101",
"date": "2026-05-20T23:00:00Z",
"name": "Las Vegas Aces at New York Liberty",
"shortName": "LV @ NY",
"competitions": [
{
"id": "401672101",
"uid": "s:40~l:59~e:401672101~c:401672101",
"date": "2026-05-20T23:00:00Z",
"attendance": 17732,
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "4346",
"fullName": "Barclays Center",
"address": {
"city": "Brooklyn",
"state": "NY"
},
"capacity": 17732,
"indoor": true
},
"competitors": [
{
"id": "9",
"uid": "s:40~l:59~t:9",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "9",
"uid": "s:40~l:59~t:9",
"location": "New York",
"name": "Liberty",
"abbreviation": "NY",
"displayName": "New York Liberty"
},
"score": "92",
"winner": true
},
{
"id": "20",
"uid": "s:40~l:59~t:20",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "20",
"uid": "s:40~l:59~t:20",
"location": "Las Vegas",
"name": "Aces",
"abbreviation": "LV",
"displayName": "Las Vegas Aces"
},
"score": "88",
"winner": false
}
],
"status": {
"clock": 0,
"displayClock": "0:00",
"period": 4,
"type": {
"id": "3",
"name": "STATUS_FINAL",
"state": "post",
"completed": true
}
}
}
]
},
{
"id": "401672102",
"uid": "s:40~l:59~e:401672102",
"date": "2026-05-21T00:00:00Z",
"name": "Connecticut Sun at Chicago Sky",
"shortName": "CONN @ CHI",
"competitions": [
{
"id": "401672102",
"uid": "s:40~l:59~e:401672102~c:401672102",
"date": "2026-05-21T00:00:00Z",
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "8086",
"fullName": "Wintrust Arena",
"address": {
"city": "Chicago",
"state": "IL"
},
"capacity": 10387,
"indoor": true
},
"competitors": [
{
"id": "6",
"uid": "s:40~l:59~t:6",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "6",
"uid": "s:40~l:59~t:6",
"location": "Chicago",
"name": "Sky",
"abbreviation": "CHI",
"displayName": "Chicago Sky"
},
"score": "78",
"winner": false
},
{
"id": "5",
"uid": "s:40~l:59~t:5",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "5",
"uid": "s:40~l:59~t:5",
"location": "Connecticut",
"name": "Sun",
"abbreviation": "CONN",
"displayName": "Connecticut Sun"
},
"score": "85",
"winner": true
}
],
"status": {
"clock": 0,
"displayClock": "0:00",
"period": 4,
"type": {
"id": "3",
"name": "STATUS_FINAL",
"state": "post",
"completed": true
}
}
}
]
},
{
"id": "401672103",
"uid": "s:40~l:59~e:401672103",
"date": "2026-05-21T02:00:00Z",
"name": "Phoenix Mercury at Seattle Storm",
"shortName": "PHX @ SEA",
"competitions": [
{
"id": "401672103",
"uid": "s:40~l:59~e:401672103~c:401672103",
"date": "2026-05-21T02:00:00Z",
"type": {
"id": "1",
"abbreviation": "STD"
},
"venue": {
"id": "3097",
"fullName": "Climate Pledge Arena",
"address": {
"city": "Seattle",
"state": "WA"
},
"capacity": 18100,
"indoor": true
},
"competitors": [
{
"id": "11",
"uid": "s:40~l:59~t:11",
"type": "team",
"order": 0,
"homeAway": "home",
"team": {
"id": "11",
"uid": "s:40~l:59~t:11",
"location": "Seattle",
"name": "Storm",
"abbreviation": "SEA",
"displayName": "Seattle Storm"
},
"score": null,
"winner": null
},
{
"id": "8",
"uid": "s:40~l:59~t:8",
"type": "team",
"order": 1,
"homeAway": "away",
"team": {
"id": "8",
"uid": "s:40~l:59~t:8",
"location": "Phoenix",
"name": "Mercury",
"abbreviation": "PHX",
"displayName": "Phoenix Mercury"
},
"score": null,
"winner": null
}
],
"status": {
"clock": 0,
"displayClock": "0:00",
"period": 0,
"type": {
"id": "1",
"name": "STATUS_SCHEDULED",
"state": "pre",
"completed": false
}
}
}
]
}
]
}

View File

@@ -1,269 +0,0 @@
"""Tests for alias loaders."""
import pytest
import json
import tempfile
from datetime import date
from pathlib import Path
from sportstime_parser.normalizers.alias_loader import (
TeamAliasLoader,
StadiumAliasLoader,
)
from sportstime_parser.models.aliases import AliasType
class TestTeamAliasLoader:
"""Tests for TeamAliasLoader class."""
@pytest.fixture
def sample_aliases_file(self):
"""Create a temporary aliases file for testing."""
data = [
{
"id": "1",
"team_canonical_id": "nba_okc",
"alias_type": "name",
"alias_value": "Seattle SuperSonics",
"valid_from": "1967-01-01",
"valid_until": "2008-07-02",
},
{
"id": "2",
"team_canonical_id": "nba_okc",
"alias_type": "name",
"alias_value": "Oklahoma City Thunder",
"valid_from": "2008-07-03",
"valid_until": None,
},
{
"id": "3",
"team_canonical_id": "nba_okc",
"alias_type": "abbreviation",
"alias_value": "OKC",
"valid_from": "2008-07-03",
"valid_until": None,
},
]
with tempfile.NamedTemporaryFile(
mode="w", suffix=".json", delete=False
) as f:
json.dump(data, f)
return Path(f.name)
def test_load_aliases(self, sample_aliases_file):
"""Test loading aliases from file."""
loader = TeamAliasLoader(sample_aliases_file)
loader.load()
assert len(loader._aliases) == 3
def test_resolve_current_alias(self, sample_aliases_file):
"""Test resolving a current alias."""
loader = TeamAliasLoader(sample_aliases_file)
# Current date should resolve to Thunder
result = loader.resolve("Oklahoma City Thunder")
assert result == "nba_okc"
# Abbreviation should also work
result = loader.resolve("OKC")
assert result == "nba_okc"
def test_resolve_historical_alias(self, sample_aliases_file):
"""Test resolving a historical alias with date."""
loader = TeamAliasLoader(sample_aliases_file)
# Historical date should resolve SuperSonics
result = loader.resolve("Seattle SuperSonics", check_date=date(2007, 1, 1))
assert result == "nba_okc"
# After relocation, SuperSonics shouldn't resolve
result = loader.resolve("Seattle SuperSonics", check_date=date(2010, 1, 1))
assert result is None
def test_resolve_case_insensitive(self, sample_aliases_file):
"""Test case insensitive resolution."""
loader = TeamAliasLoader(sample_aliases_file)
result = loader.resolve("oklahoma city thunder")
assert result == "nba_okc"
result = loader.resolve("okc")
assert result == "nba_okc"
def test_resolve_with_type_filter(self, sample_aliases_file):
"""Test filtering by alias type."""
loader = TeamAliasLoader(sample_aliases_file)
# Should find when searching all types
result = loader.resolve("OKC")
assert result == "nba_okc"
# Should not find when filtering to name only
result = loader.resolve("OKC", alias_types=[AliasType.NAME])
assert result is None
def test_get_aliases_for_team(self, sample_aliases_file):
"""Test getting all aliases for a team."""
loader = TeamAliasLoader(sample_aliases_file)
aliases = loader.get_aliases_for_team("nba_okc")
assert len(aliases) == 3
# Filter by current date
aliases = loader.get_aliases_for_team(
"nba_okc", check_date=date(2020, 1, 1)
)
assert len(aliases) == 2 # Thunder name + OKC abbreviation
def test_missing_file(self):
"""Test handling of missing file."""
loader = TeamAliasLoader(Path("/nonexistent/file.json"))
loader.load() # Should not raise
assert len(loader._aliases) == 0
class TestStadiumAliasLoader:
"""Tests for StadiumAliasLoader class."""
@pytest.fixture
def sample_stadium_aliases(self):
"""Create a temporary stadium aliases file."""
data = [
{
"alias_name": "Crypto.com Arena",
"stadium_canonical_id": "crypto_arena_los_angeles_ca",
"valid_from": "2021-12-25",
"valid_until": None,
},
{
"alias_name": "Staples Center",
"stadium_canonical_id": "crypto_arena_los_angeles_ca",
"valid_from": "1999-10-17",
"valid_until": "2021-12-24",
},
]
with tempfile.NamedTemporaryFile(
mode="w", suffix=".json", delete=False
) as f:
json.dump(data, f)
return Path(f.name)
def test_load_stadium_aliases(self, sample_stadium_aliases):
"""Test loading stadium aliases."""
loader = StadiumAliasLoader(sample_stadium_aliases)
loader.load()
assert len(loader._aliases) == 2
def test_resolve_current_name(self, sample_stadium_aliases):
"""Test resolving current stadium name."""
loader = StadiumAliasLoader(sample_stadium_aliases)
result = loader.resolve("Crypto.com Arena")
assert result == "crypto_arena_los_angeles_ca"
def test_resolve_historical_name(self, sample_stadium_aliases):
"""Test resolving historical stadium name."""
loader = StadiumAliasLoader(sample_stadium_aliases)
# Staples Center in 2020
result = loader.resolve("Staples Center", check_date=date(2020, 1, 1))
assert result == "crypto_arena_los_angeles_ca"
# Staples Center after rename shouldn't resolve
result = loader.resolve("Staples Center", check_date=date(2023, 1, 1))
assert result is None
def test_date_boundary(self, sample_stadium_aliases):
"""Test exact date boundaries."""
loader = StadiumAliasLoader(sample_stadium_aliases)
# Last day of Staples Center
result = loader.resolve("Staples Center", check_date=date(2021, 12, 24))
assert result == "crypto_arena_los_angeles_ca"
# First day of Crypto.com Arena
result = loader.resolve("Crypto.com Arena", check_date=date(2021, 12, 25))
assert result == "crypto_arena_los_angeles_ca"
def test_get_all_names(self, sample_stadium_aliases):
"""Test getting all stadium names."""
loader = StadiumAliasLoader(sample_stadium_aliases)
names = loader.get_all_names()
assert len(names) == 2
assert "Crypto.com Arena" in names
assert "Staples Center" in names
class TestDateRangeHandling:
"""Tests for date range edge cases in aliases."""
@pytest.fixture
def date_range_aliases(self):
"""Create aliases with various date range scenarios."""
data = [
{
"id": "1",
"team_canonical_id": "test_team",
"alias_type": "name",
"alias_value": "Always Valid",
"valid_from": None,
"valid_until": None,
},
{
"id": "2",
"team_canonical_id": "test_team",
"alias_type": "name",
"alias_value": "Future Only",
"valid_from": "2030-01-01",
"valid_until": None,
},
{
"id": "3",
"team_canonical_id": "test_team",
"alias_type": "name",
"alias_value": "Past Only",
"valid_from": None,
"valid_until": "2000-01-01",
},
]
with tempfile.NamedTemporaryFile(
mode="w", suffix=".json", delete=False
) as f:
json.dump(data, f)
return Path(f.name)
def test_always_valid_alias(self, date_range_aliases):
"""Test alias with no date restrictions."""
loader = TeamAliasLoader(date_range_aliases)
result = loader.resolve("Always Valid", check_date=date(2025, 1, 1))
assert result == "test_team"
result = loader.resolve("Always Valid", check_date=date(1990, 1, 1))
assert result == "test_team"
def test_future_only_alias(self, date_range_aliases):
"""Test alias that starts in the future."""
loader = TeamAliasLoader(date_range_aliases)
# Before valid_from
result = loader.resolve("Future Only", check_date=date(2025, 1, 1))
assert result is None
# After valid_from
result = loader.resolve("Future Only", check_date=date(2035, 1, 1))
assert result == "test_team"
def test_past_only_alias(self, date_range_aliases):
"""Test alias that expired in the past."""
loader = TeamAliasLoader(date_range_aliases)
# Before valid_until
result = loader.resolve("Past Only", check_date=date(1990, 1, 1))
assert result == "test_team"
# After valid_until
result = loader.resolve("Past Only", check_date=date(2025, 1, 1))
assert result is None

View File

@@ -1,187 +0,0 @@
"""Tests for canonical ID generation."""
import pytest
from datetime import datetime, date
from sportstime_parser.normalizers.canonical_id import (
generate_game_id,
generate_team_id,
generate_team_id_from_abbrev,
generate_stadium_id,
parse_game_id,
normalize_string,
)
class TestNormalizeString:
"""Tests for normalize_string function."""
def test_basic_normalization(self):
"""Test basic string normalization."""
assert normalize_string("New York") == "new_york"
assert normalize_string("Los Angeles") == "los_angeles"
def test_removes_special_characters(self):
"""Test that special characters are removed."""
assert normalize_string("AT&T Stadium") == "att_stadium"
assert normalize_string("St. Louis") == "st_louis"
assert normalize_string("O'Brien Field") == "obrien_field"
def test_collapses_whitespace(self):
"""Test that multiple spaces are collapsed."""
assert normalize_string("New York") == "new_york"
assert normalize_string(" Los Angeles ") == "los_angeles"
def test_empty_string(self):
"""Test empty string handling."""
assert normalize_string("") == ""
assert normalize_string(" ") == ""
def test_unicode_normalization(self):
"""Test unicode characters are handled."""
assert normalize_string("Café") == "cafe"
assert normalize_string("José") == "jose"
class TestGenerateGameId:
"""Tests for generate_game_id function."""
def test_basic_game_id(self):
"""Test basic game ID generation."""
game_id = generate_game_id(
sport="nba",
season=2025,
away_abbrev="bos",
home_abbrev="lal",
game_date=date(2025, 12, 25),
)
assert game_id == "game_nba_2025_20251225_bos_lal"
def test_game_id_with_datetime(self):
"""Test game ID generation with datetime object."""
game_id = generate_game_id(
sport="mlb",
season=2026,
away_abbrev="nyy",
home_abbrev="bos",
game_date=datetime(2026, 4, 1, 19, 0),
)
assert game_id == "game_mlb_2026_20260401_nyy_bos"
def test_game_id_with_game_number(self):
"""Test game ID for doubleheader."""
game_id_1 = generate_game_id(
sport="mlb",
season=2026,
away_abbrev="nyy",
home_abbrev="bos",
game_date=date(2026, 7, 4),
game_number=1,
)
game_id_2 = generate_game_id(
sport="mlb",
season=2026,
away_abbrev="nyy",
home_abbrev="bos",
game_date=date(2026, 7, 4),
game_number=2,
)
assert game_id_1 == "game_mlb_2026_20260704_nyy_bos_1"
assert game_id_2 == "game_mlb_2026_20260704_nyy_bos_2"
def test_sport_lowercased(self):
"""Test that sport is lowercased."""
game_id = generate_game_id(
sport="NBA",
season=2025,
away_abbrev="BOS",
home_abbrev="LAL",
game_date=date(2025, 12, 25),
)
assert game_id == "game_nba_2025_20251225_bos_lal"
class TestParseGameId:
"""Tests for parse_game_id function."""
def test_parse_basic_game_id(self):
"""Test parsing a basic game ID."""
parsed = parse_game_id("game_nba_2025_20251225_bos_lal")
assert parsed["sport"] == "nba"
assert parsed["season"] == 2025
assert parsed["away_abbrev"] == "bos"
assert parsed["home_abbrev"] == "lal"
assert parsed["year"] == 2025
assert parsed["month"] == 12
assert parsed["day"] == 25
assert parsed["game_number"] is None
def test_parse_game_id_with_game_number(self):
"""Test parsing game ID with game number."""
parsed = parse_game_id("game_mlb_2026_20260704_nyy_bos_2")
assert parsed["sport"] == "mlb"
assert parsed["season"] == 2026
assert parsed["away_abbrev"] == "nyy"
assert parsed["home_abbrev"] == "bos"
assert parsed["year"] == 2026
assert parsed["month"] == 7
assert parsed["day"] == 4
assert parsed["game_number"] == 2
def test_parse_invalid_game_id(self):
"""Test parsing invalid game ID raises error."""
with pytest.raises(ValueError):
parse_game_id("invalid")
with pytest.raises(ValueError):
parse_game_id("nba_2025_bos") # Missing game_ prefix
with pytest.raises(ValueError):
parse_game_id("")
with pytest.raises(ValueError):
parse_game_id("game_nba_2025_bos_lal") # Missing date
class TestGenerateTeamId:
"""Tests for generate_team_id function."""
def test_basic_team_id(self):
"""Test basic team ID generation from city and name."""
team_id = generate_team_id(sport="nba", city="Los Angeles", name="Lakers")
assert team_id == "team_nba_los_angeles_lakers"
def test_team_id_normalizes_input(self):
"""Test that inputs are normalized."""
team_id = generate_team_id(sport="NBA", city="New York", name="Yankees")
assert team_id == "team_nba_new_york_yankees"
class TestGenerateTeamIdFromAbbrev:
"""Tests for generate_team_id_from_abbrev function."""
def test_basic_team_id_from_abbrev(self):
"""Test team ID from abbreviation."""
team_id = generate_team_id_from_abbrev(sport="nba", abbreviation="LAL")
assert team_id == "team_nba_lal"
def test_lowercases_abbreviation(self):
"""Test abbreviation is lowercased."""
team_id = generate_team_id_from_abbrev(sport="MLB", abbreviation="NYY")
assert team_id == "team_mlb_nyy"
class TestGenerateStadiumId:
"""Tests for generate_stadium_id function."""
def test_basic_stadium_id(self):
"""Test basic stadium ID generation."""
stadium_id = generate_stadium_id(sport="mlb", name="Fenway Park")
assert stadium_id == "stadium_mlb_fenway_park"
def test_stadium_id_special_characters(self):
"""Test stadium ID with special characters."""
stadium_id = generate_stadium_id(sport="nfl", name="AT&T Stadium")
assert stadium_id == "stadium_nfl_att_stadium"
def test_stadium_id_with_sponsor(self):
"""Test stadium ID with sponsor name."""
stadium_id = generate_stadium_id(sport="nba", name="Crypto.com Arena")
assert stadium_id == "stadium_nba_cryptocom_arena"

View File

@@ -1,194 +0,0 @@
"""Tests for fuzzy string matching utilities."""
import pytest
from sportstime_parser.normalizers.fuzzy import (
normalize_for_matching,
fuzzy_match_team,
fuzzy_match_stadium,
exact_match,
best_match,
calculate_similarity,
MatchCandidate,
)
class TestNormalizeForMatching:
"""Tests for normalize_for_matching function."""
def test_basic_normalization(self):
"""Test basic string normalization."""
assert normalize_for_matching("Los Angeles Lakers") == "los angeles lakers"
assert normalize_for_matching(" Boston Celtics ") == "boston celtics"
def test_removes_common_prefixes(self):
"""Test removal of common prefixes."""
assert normalize_for_matching("The Boston Celtics") == "boston celtics"
assert normalize_for_matching("Team Lakers") == "lakers"
def test_removes_stadium_suffixes(self):
"""Test removal of stadium-related suffixes."""
assert normalize_for_matching("Fenway Park") == "fenway"
assert normalize_for_matching("Madison Square Garden Arena") == "madison square garden"
assert normalize_for_matching("Wrigley Field") == "wrigley"
assert normalize_for_matching("TD Garden Center") == "td garden"
class TestExactMatch:
"""Tests for exact_match function."""
def test_exact_match_primary_name(self):
"""Test exact match on primary name."""
candidates = [
MatchCandidate("nba_lal", "Los Angeles Lakers", ["Lakers", "LAL"]),
MatchCandidate("nba_bos", "Boston Celtics", ["Celtics", "BOS"]),
]
assert exact_match("Los Angeles Lakers", candidates) == "nba_lal"
assert exact_match("Boston Celtics", candidates) == "nba_bos"
def test_exact_match_alias(self):
"""Test exact match on alias."""
candidates = [
MatchCandidate("nba_lal", "Los Angeles Lakers", ["Lakers", "LAL"]),
]
assert exact_match("Lakers", candidates) == "nba_lal"
assert exact_match("LAL", candidates) == "nba_lal"
def test_case_insensitive(self):
"""Test case insensitive matching."""
candidates = [
MatchCandidate("nba_lal", "Los Angeles Lakers", ["Lakers"]),
]
assert exact_match("los angeles lakers", candidates) == "nba_lal"
assert exact_match("LAKERS", candidates) == "nba_lal"
def test_no_match(self):
"""Test no match returns None."""
candidates = [
MatchCandidate("nba_lal", "Los Angeles Lakers", ["Lakers"]),
]
assert exact_match("New York Knicks", candidates) is None
class TestFuzzyMatchTeam:
"""Tests for fuzzy_match_team function."""
def test_close_match(self):
"""Test fuzzy matching finds close matches."""
candidates = [
MatchCandidate("nba_lal", "Los Angeles Lakers", ["Lakers", "LA Lakers"]),
MatchCandidate("nba_lac", "Los Angeles Clippers", ["Clippers", "LA Clippers"]),
]
matches = fuzzy_match_team("LA Lakers", candidates, threshold=70)
assert len(matches) > 0
assert matches[0].canonical_id == "nba_lal"
def test_partial_name_match(self):
"""Test matching on partial team name."""
candidates = [
MatchCandidate("nba_bos", "Boston Celtics", ["Celtics", "BOS"]),
]
matches = fuzzy_match_team("Celtics", candidates, threshold=80)
assert len(matches) > 0
assert matches[0].canonical_id == "nba_bos"
def test_threshold_filtering(self):
"""Test that threshold filters low-confidence matches."""
candidates = [
MatchCandidate("nba_bos", "Boston Celtics", []),
]
# Very different string should not match at high threshold
matches = fuzzy_match_team("xyz123", candidates, threshold=90)
assert len(matches) == 0
def test_returns_top_n(self):
"""Test that top_n parameter limits results."""
candidates = [
MatchCandidate("nba_lal", "Los Angeles Lakers", []),
MatchCandidate("nba_lac", "Los Angeles Clippers", []),
MatchCandidate("mlb_lad", "Los Angeles Dodgers", []),
]
matches = fuzzy_match_team("Los Angeles", candidates, threshold=50, top_n=2)
assert len(matches) <= 2
class TestFuzzyMatchStadium:
"""Tests for fuzzy_match_stadium function."""
def test_stadium_match(self):
"""Test fuzzy matching stadium names."""
candidates = [
MatchCandidate("fenway", "Fenway Park", ["Fenway"]),
MatchCandidate("td_garden", "TD Garden", ["Boston Garden"]),
]
matches = fuzzy_match_stadium("Fenway Park Boston", candidates, threshold=70)
assert len(matches) > 0
assert matches[0].canonical_id == "fenway"
def test_naming_rights_change(self):
"""Test matching old stadium names."""
candidates = [
MatchCandidate(
"chase_center",
"Chase Center",
["Oracle Arena", "Oakland Coliseum Arena"],
),
]
# Should match on alias
matches = fuzzy_match_stadium("Oracle Arena", candidates, threshold=70)
assert len(matches) > 0
class TestBestMatch:
"""Tests for best_match function."""
def test_prefers_exact_match(self):
"""Test that exact match is preferred over fuzzy."""
candidates = [
MatchCandidate("nba_lal", "Los Angeles Lakers", ["Lakers"]),
MatchCandidate("nba_bos", "Boston Celtics", ["Celtics"]),
]
result = best_match("Lakers", candidates)
assert result is not None
assert result.canonical_id == "nba_lal"
assert result.confidence == 100 # Exact match
def test_falls_back_to_fuzzy(self):
"""Test fallback to fuzzy when no exact match."""
candidates = [
MatchCandidate("nba_lal", "Los Angeles Lakers", ["Lakers"]),
]
result = best_match("LA Laker", candidates, threshold=70)
assert result is not None
assert result.confidence < 100 # Fuzzy match
def test_no_match_below_threshold(self):
"""Test returns None when no match above threshold."""
candidates = [
MatchCandidate("nba_lal", "Los Angeles Lakers", []),
]
result = best_match("xyz123", candidates, threshold=90)
assert result is None
class TestCalculateSimilarity:
"""Tests for calculate_similarity function."""
def test_identical_strings(self):
"""Test identical strings have 100% similarity."""
assert calculate_similarity("Boston Celtics", "Boston Celtics") == 100
def test_similar_strings(self):
"""Test similar strings have high similarity."""
score = calculate_similarity("Boston Celtics", "Celtics Boston")
assert score >= 90
def test_different_strings(self):
"""Test different strings have low similarity."""
score = calculate_similarity("Boston Celtics", "Los Angeles Lakers")
assert score < 50
def test_empty_string(self):
"""Test empty string handling."""
score = calculate_similarity("", "Boston Celtics")
assert score == 0

View File

@@ -1 +0,0 @@
"""Tests for scrapers module."""

View File

@@ -1,257 +0,0 @@
"""Tests for MLB scraper."""
from datetime import datetime
from unittest.mock import patch
import pytest
from sportstime_parser.scrapers.mlb import MLBScraper, create_mlb_scraper
from sportstime_parser.scrapers.base import RawGameData
from sportstime_parser.tests.fixtures import (
load_json_fixture,
MLB_ESPN_SCOREBOARD_JSON,
)
class TestMLBScraperInit:
"""Test MLBScraper initialization."""
def test_creates_scraper_with_season(self):
"""Test scraper initializes with correct season."""
scraper = MLBScraper(season=2026)
assert scraper.sport == "mlb"
assert scraper.season == 2026
def test_factory_function_creates_scraper(self):
"""Test factory function creates correct scraper."""
scraper = create_mlb_scraper(season=2026)
assert isinstance(scraper, MLBScraper)
assert scraper.season == 2026
def test_expected_game_count(self):
"""Test expected game count is correct for MLB."""
scraper = MLBScraper(season=2026)
assert scraper.expected_game_count == 2430
def test_sources_in_priority_order(self):
"""Test sources are returned in correct priority order."""
scraper = MLBScraper(season=2026)
sources = scraper._get_sources()
assert sources == ["baseball_reference", "mlb_api", "espn"]
class TestESPNParsing:
"""Test ESPN API response parsing."""
def test_parses_completed_games(self):
"""Test parsing completed games from ESPN."""
scraper = MLBScraper(season=2026)
data = load_json_fixture(MLB_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
completed = [g for g in games if g.status == "final"]
assert len(completed) == 2
# Yankees @ Red Sox
nyy_bos = next(g for g in completed if g.away_team_raw == "New York Yankees")
assert nyy_bos.home_team_raw == "Boston Red Sox"
assert nyy_bos.away_score == 3
assert nyy_bos.home_score == 5
assert nyy_bos.stadium_raw == "Fenway Park"
def test_parses_scheduled_games(self):
"""Test parsing scheduled games from ESPN."""
scraper = MLBScraper(season=2026)
data = load_json_fixture(MLB_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
scheduled = [g for g in games if g.status == "scheduled"]
assert len(scheduled) == 1
lad_sf = scheduled[0]
assert lad_sf.away_team_raw == "Los Angeles Dodgers"
assert lad_sf.home_team_raw == "San Francisco Giants"
assert lad_sf.stadium_raw == "Oracle Park"
def test_parses_venue_info(self):
"""Test venue information is extracted."""
scraper = MLBScraper(season=2026)
data = load_json_fixture(MLB_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
for game in games:
assert game.stadium_raw is not None
class TestGameNormalization:
"""Test game normalization and canonical ID generation."""
def test_normalizes_games_with_canonical_ids(self):
"""Test games are normalized with correct canonical IDs."""
scraper = MLBScraper(season=2026)
raw_games = [
RawGameData(
game_date=datetime(2026, 4, 15),
home_team_raw="Boston Red Sox",
away_team_raw="New York Yankees",
stadium_raw="Fenway Park",
home_score=5,
away_score=3,
status="final",
source_url="http://example.com",
)
]
games, review_items = scraper._normalize_games(raw_games)
assert len(games) == 1
game = games[0]
# Check canonical ID format
assert game.id == "mlb_2026_nyy_bos_0415"
assert game.sport == "mlb"
assert game.season == 2026
# Check team IDs
assert game.home_team_id == "team_mlb_bos"
assert game.away_team_id == "team_mlb_nyy"
# Check scores preserved
assert game.home_score == 5
assert game.away_score == 3
def test_creates_review_items_for_unresolved_teams(self):
"""Test review items are created for unresolved teams."""
scraper = MLBScraper(season=2026)
raw_games = [
RawGameData(
game_date=datetime(2026, 4, 15),
home_team_raw="Unknown Team XYZ",
away_team_raw="Boston Red Sox",
stadium_raw="Fenway Park",
status="scheduled",
),
]
games, review_items = scraper._normalize_games(raw_games)
# Game should not be created due to unresolved team
assert len(games) == 0
# But there should be a review item
assert len(review_items) >= 1
class TestTeamAndStadiumScraping:
"""Test team and stadium data scraping."""
def test_scrapes_all_mlb_teams(self):
"""Test all 30 MLB teams are returned."""
scraper = MLBScraper(season=2026)
teams = scraper.scrape_teams()
# 30 MLB teams
assert len(teams) == 30
# Check team IDs are unique
team_ids = [t.id for t in teams]
assert len(set(team_ids)) == 30
# Check all teams have required fields
for team in teams:
assert team.id.startswith("team_mlb_")
assert team.sport == "mlb"
assert team.city
assert team.name
assert team.full_name
assert team.abbreviation
def test_teams_have_leagues_and_divisions(self):
"""Test teams have league (conference) and division info."""
scraper = MLBScraper(season=2026)
teams = scraper.scrape_teams()
# Count teams by league
al = [t for t in teams if t.conference == "American"]
nl = [t for t in teams if t.conference == "National"]
assert len(al) == 15
assert len(nl) == 15
def test_scrapes_all_mlb_stadiums(self):
"""Test all MLB stadiums are returned."""
scraper = MLBScraper(season=2026)
stadiums = scraper.scrape_stadiums()
# Should have stadiums for all teams
assert len(stadiums) == 30
# Check stadium IDs are unique
stadium_ids = [s.id for s in stadiums]
assert len(set(stadium_ids)) == 30
# Check all stadiums have required fields
for stadium in stadiums:
assert stadium.id.startswith("stadium_mlb_")
assert stadium.sport == "mlb"
assert stadium.name
assert stadium.city
assert stadium.state
assert stadium.country in ["USA", "Canada"]
assert stadium.latitude != 0
assert stadium.longitude != 0
class TestScrapeFallback:
"""Test multi-source fallback behavior."""
def test_falls_back_to_next_source_on_failure(self):
"""Test scraper tries next source when first fails."""
scraper = MLBScraper(season=2026)
with patch.object(scraper, '_scrape_baseball_reference') as mock_br, \
patch.object(scraper, '_scrape_mlb_api') as mock_mlb, \
patch.object(scraper, '_scrape_espn') as mock_espn:
# Make BR and MLB API fail
mock_br.side_effect = Exception("Connection failed")
mock_mlb.side_effect = Exception("API error")
# Make ESPN return data
mock_espn.return_value = [
RawGameData(
game_date=datetime(2026, 4, 15),
home_team_raw="Boston Red Sox",
away_team_raw="New York Yankees",
stadium_raw="Fenway Park",
status="scheduled",
)
]
result = scraper.scrape_games()
assert result.success
assert result.source == "espn"
assert mock_br.called
assert mock_mlb.called
assert mock_espn.called
class TestSeasonMonths:
"""Test season month calculation."""
def test_gets_correct_season_months(self):
"""Test correct months are returned for MLB season."""
scraper = MLBScraper(season=2026)
months = scraper._get_season_months()
# MLB season is March-November
assert len(months) == 9 # Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov
# Check first month is March of season year
assert months[0] == (2026, 3)
# Check last month is November
assert months[-1] == (2026, 11)

View File

@@ -1,251 +0,0 @@
"""Tests for MLS scraper."""
from datetime import datetime
from unittest.mock import patch
import pytest
from sportstime_parser.scrapers.mls import MLSScraper, create_mls_scraper
from sportstime_parser.scrapers.base import RawGameData
from sportstime_parser.tests.fixtures import (
load_json_fixture,
MLS_ESPN_SCOREBOARD_JSON,
)
class TestMLSScraperInit:
"""Test MLSScraper initialization."""
def test_creates_scraper_with_season(self):
"""Test scraper initializes with correct season."""
scraper = MLSScraper(season=2026)
assert scraper.sport == "mls"
assert scraper.season == 2026
def test_factory_function_creates_scraper(self):
"""Test factory function creates correct scraper."""
scraper = create_mls_scraper(season=2026)
assert isinstance(scraper, MLSScraper)
assert scraper.season == 2026
def test_expected_game_count(self):
"""Test expected game count is correct for MLS."""
scraper = MLSScraper(season=2026)
assert scraper.expected_game_count == 493
def test_sources_in_priority_order(self):
"""Test sources are returned in correct priority order."""
scraper = MLSScraper(season=2026)
sources = scraper._get_sources()
assert sources == ["espn", "fbref"]
class TestESPNParsing:
"""Test ESPN API response parsing."""
def test_parses_completed_games(self):
"""Test parsing completed games from ESPN."""
scraper = MLSScraper(season=2026)
data = load_json_fixture(MLS_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
completed = [g for g in games if g.status == "final"]
assert len(completed) == 2
# Galaxy @ LAFC
la_lafc = next(g for g in completed if g.away_team_raw == "LA Galaxy")
assert la_lafc.home_team_raw == "Los Angeles FC"
assert la_lafc.away_score == 2
assert la_lafc.home_score == 3
assert la_lafc.stadium_raw == "BMO Stadium"
def test_parses_scheduled_games(self):
"""Test parsing scheduled games from ESPN."""
scraper = MLSScraper(season=2026)
data = load_json_fixture(MLS_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
scheduled = [g for g in games if g.status == "scheduled"]
assert len(scheduled) == 1
ny_atl = scheduled[0]
assert ny_atl.away_team_raw == "New York Red Bulls"
assert ny_atl.home_team_raw == "Atlanta United FC"
assert ny_atl.stadium_raw == "Mercedes-Benz Stadium"
def test_parses_venue_info(self):
"""Test venue information is extracted."""
scraper = MLSScraper(season=2026)
data = load_json_fixture(MLS_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
for game in games:
assert game.stadium_raw is not None
class TestGameNormalization:
"""Test game normalization and canonical ID generation."""
def test_normalizes_games_with_canonical_ids(self):
"""Test games are normalized with correct canonical IDs."""
scraper = MLSScraper(season=2026)
raw_games = [
RawGameData(
game_date=datetime(2026, 3, 15),
home_team_raw="Los Angeles FC",
away_team_raw="LA Galaxy",
stadium_raw="BMO Stadium",
home_score=3,
away_score=2,
status="final",
source_url="http://example.com",
)
]
games, review_items = scraper._normalize_games(raw_games)
assert len(games) == 1
game = games[0]
# Check canonical ID format
assert game.id == "mls_2026_lag_lafc_0315"
assert game.sport == "mls"
assert game.season == 2026
# Check team IDs
assert game.home_team_id == "team_mls_lafc"
assert game.away_team_id == "team_mls_lag"
# Check scores preserved
assert game.home_score == 3
assert game.away_score == 2
def test_creates_review_items_for_unresolved_teams(self):
"""Test review items are created for unresolved teams."""
scraper = MLSScraper(season=2026)
raw_games = [
RawGameData(
game_date=datetime(2026, 3, 15),
home_team_raw="Unknown Team XYZ",
away_team_raw="LA Galaxy",
stadium_raw="BMO Stadium",
status="scheduled",
),
]
games, review_items = scraper._normalize_games(raw_games)
# Game should not be created due to unresolved team
assert len(games) == 0
# But there should be a review item
assert len(review_items) >= 1
class TestTeamAndStadiumScraping:
"""Test team and stadium data scraping."""
def test_scrapes_all_mls_teams(self):
"""Test all MLS teams are returned."""
scraper = MLSScraper(season=2026)
teams = scraper.scrape_teams()
# MLS has 29+ teams
assert len(teams) >= 29
# Check team IDs are unique
team_ids = [t.id for t in teams]
assert len(set(team_ids)) == len(teams)
# Check all teams have required fields
for team in teams:
assert team.id.startswith("team_mls_")
assert team.sport == "mls"
assert team.city
assert team.name
assert team.full_name
assert team.abbreviation
def test_teams_have_conferences(self):
"""Test teams have conference info."""
scraper = MLSScraper(season=2026)
teams = scraper.scrape_teams()
# Count teams by conference
eastern = [t for t in teams if t.conference == "Eastern"]
western = [t for t in teams if t.conference == "Western"]
# MLS has two conferences
assert len(eastern) >= 14
assert len(western) >= 14
def test_scrapes_all_mls_stadiums(self):
"""Test all MLS stadiums are returned."""
scraper = MLSScraper(season=2026)
stadiums = scraper.scrape_stadiums()
# Should have stadiums for all teams
assert len(stadiums) >= 29
# Check all stadiums have required fields
for stadium in stadiums:
assert stadium.id.startswith("stadium_mls_")
assert stadium.sport == "mls"
assert stadium.name
assert stadium.city
assert stadium.state
assert stadium.country in ["USA", "Canada"]
assert stadium.latitude != 0
assert stadium.longitude != 0
class TestScrapeFallback:
"""Test multi-source fallback behavior."""
def test_falls_back_to_next_source_on_failure(self):
"""Test scraper tries next source when first fails."""
scraper = MLSScraper(season=2026)
with patch.object(scraper, '_scrape_espn') as mock_espn, \
patch.object(scraper, '_scrape_fbref') as mock_fbref:
# Make ESPN fail
mock_espn.side_effect = Exception("Connection failed")
# Make FBref return data
mock_fbref.return_value = [
RawGameData(
game_date=datetime(2026, 3, 15),
home_team_raw="Los Angeles FC",
away_team_raw="LA Galaxy",
stadium_raw="BMO Stadium",
status="scheduled",
)
]
result = scraper.scrape_games()
assert result.success
assert result.source == "fbref"
assert mock_espn.called
assert mock_fbref.called
class TestSeasonMonths:
"""Test season month calculation."""
def test_gets_correct_season_months(self):
"""Test correct months are returned for MLS season."""
scraper = MLSScraper(season=2026)
months = scraper._get_season_months()
# MLS season is February-November
assert len(months) == 10 # Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov
# Check first month is February of season year
assert months[0] == (2026, 2)
# Check last month is November
assert months[-1] == (2026, 11)

View File

@@ -1,428 +0,0 @@
"""Tests for NBA scraper."""
import json
from datetime import datetime
from unittest.mock import MagicMock, patch
import pytest
from sportstime_parser.scrapers.nba import NBAScraper, create_nba_scraper
from sportstime_parser.scrapers.base import RawGameData
from sportstime_parser.tests.fixtures import (
load_fixture,
load_json_fixture,
NBA_BR_OCTOBER_HTML,
NBA_BR_EDGE_CASES_HTML,
NBA_ESPN_SCOREBOARD_JSON,
)
class TestNBAScraperInit:
"""Test NBAScraper initialization."""
def test_creates_scraper_with_season(self):
"""Test scraper initializes with correct season."""
scraper = NBAScraper(season=2025)
assert scraper.sport == "nba"
assert scraper.season == 2025
def test_factory_function_creates_scraper(self):
"""Test factory function creates correct scraper."""
scraper = create_nba_scraper(season=2025)
assert isinstance(scraper, NBAScraper)
assert scraper.season == 2025
def test_expected_game_count(self):
"""Test expected game count is correct for NBA."""
scraper = NBAScraper(season=2025)
assert scraper.expected_game_count == 1230
def test_sources_in_priority_order(self):
"""Test sources are returned in correct priority order."""
scraper = NBAScraper(season=2025)
sources = scraper._get_sources()
assert sources == ["basketball_reference", "espn", "cbs"]
class TestBasketballReferenceParsing:
"""Test Basketball-Reference HTML parsing."""
def test_parses_completed_games(self):
"""Test parsing completed games with scores."""
scraper = NBAScraper(season=2025)
html = load_fixture(NBA_BR_OCTOBER_HTML)
games = scraper._parse_basketball_reference(html, "http://example.com")
# Should find all games in fixture
assert len(games) == 7
# Check first completed game
completed_games = [g for g in games if g.status == "final"]
assert len(completed_games) == 2
# Boston @ Cleveland
bos_cle = next(g for g in games if g.away_team_raw == "Boston Celtics")
assert bos_cle.home_team_raw == "Cleveland Cavaliers"
assert bos_cle.away_score == 112
assert bos_cle.home_score == 108
assert bos_cle.stadium_raw == "Rocket Mortgage FieldHouse"
assert bos_cle.status == "final"
def test_parses_scheduled_games(self):
"""Test parsing scheduled games without scores."""
scraper = NBAScraper(season=2025)
html = load_fixture(NBA_BR_OCTOBER_HTML)
games = scraper._parse_basketball_reference(html, "http://example.com")
scheduled_games = [g for g in games if g.status == "scheduled"]
assert len(scheduled_games) == 5
# Houston @ OKC
hou_okc = next(g for g in scheduled_games if g.away_team_raw == "Houston Rockets")
assert hou_okc.home_team_raw == "Oklahoma City Thunder"
assert hou_okc.away_score is None
assert hou_okc.home_score is None
assert hou_okc.stadium_raw == "Paycom Center"
def test_parses_game_dates_correctly(self):
"""Test game dates are parsed correctly."""
scraper = NBAScraper(season=2025)
html = load_fixture(NBA_BR_OCTOBER_HTML)
games = scraper._parse_basketball_reference(html, "http://example.com")
# Check first game date
first_game = games[0]
assert first_game.game_date.year == 2025
assert first_game.game_date.month == 10
assert first_game.game_date.day == 22
def test_tracks_source_url(self):
"""Test source URL is tracked for all games."""
scraper = NBAScraper(season=2025)
html = load_fixture(NBA_BR_OCTOBER_HTML)
source_url = "http://basketball-reference.com/test"
games = scraper._parse_basketball_reference(html, source_url)
for game in games:
assert game.source_url == source_url
class TestBasketballReferenceEdgeCases:
"""Test edge case handling in Basketball-Reference parsing."""
def test_parses_postponed_games(self):
"""Test postponed games are identified correctly."""
scraper = NBAScraper(season=2025)
html = load_fixture(NBA_BR_EDGE_CASES_HTML)
games = scraper._parse_basketball_reference(html, "http://example.com")
postponed = [g for g in games if g.status == "postponed"]
assert len(postponed) == 1
assert postponed[0].away_team_raw == "Los Angeles Lakers"
assert postponed[0].home_team_raw == "Phoenix Suns"
def test_parses_cancelled_games(self):
"""Test cancelled games are identified correctly."""
scraper = NBAScraper(season=2025)
html = load_fixture(NBA_BR_EDGE_CASES_HTML)
games = scraper._parse_basketball_reference(html, "http://example.com")
cancelled = [g for g in games if g.status == "cancelled"]
assert len(cancelled) == 1
assert cancelled[0].away_team_raw == "Portland Trail Blazers"
def test_parses_neutral_site_games(self):
"""Test neutral site games are parsed."""
scraper = NBAScraper(season=2025)
html = load_fixture(NBA_BR_EDGE_CASES_HTML)
games = scraper._parse_basketball_reference(html, "http://example.com")
# Mexico City game
mexico = next(g for g in games if g.stadium_raw == "Arena CDMX")
assert mexico.away_team_raw == "Miami Heat"
assert mexico.home_team_raw == "Washington Wizards"
assert mexico.status == "final"
def test_parses_overtime_games(self):
"""Test overtime games with high scores."""
scraper = NBAScraper(season=2025)
html = load_fixture(NBA_BR_EDGE_CASES_HTML)
games = scraper._parse_basketball_reference(html, "http://example.com")
# High scoring OT game
ot_game = next(g for g in games if g.away_score == 147)
assert ot_game.home_score == 150
assert ot_game.status == "final"
class TestESPNParsing:
"""Test ESPN API response parsing."""
def test_parses_completed_games(self):
"""Test parsing completed games from ESPN."""
scraper = NBAScraper(season=2025)
data = load_json_fixture(NBA_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
completed = [g for g in games if g.status == "final"]
assert len(completed) == 2
# Boston @ Cleveland
bos_cle = next(g for g in completed if g.away_team_raw == "Boston Celtics")
assert bos_cle.home_team_raw == "Cleveland Cavaliers"
assert bos_cle.away_score == 112
assert bos_cle.home_score == 108
assert bos_cle.stadium_raw == "Rocket Mortgage FieldHouse"
def test_parses_scheduled_games(self):
"""Test parsing scheduled games from ESPN."""
scraper = NBAScraper(season=2025)
data = load_json_fixture(NBA_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
scheduled = [g for g in games if g.status == "scheduled"]
assert len(scheduled) == 1
hou_okc = scheduled[0]
assert hou_okc.away_team_raw == "Houston Rockets"
assert hou_okc.home_team_raw == "Oklahoma City Thunder"
assert hou_okc.stadium_raw == "Paycom Center"
def test_parses_venue_info(self):
"""Test venue information is extracted."""
scraper = NBAScraper(season=2025)
data = load_json_fixture(NBA_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
# Check all games have venue info
for game in games:
assert game.stadium_raw is not None
class TestGameNormalization:
"""Test game normalization and canonical ID generation."""
def test_normalizes_games_with_canonical_ids(self):
"""Test games are normalized with correct canonical IDs."""
scraper = NBAScraper(season=2025)
raw_games = [
RawGameData(
game_date=datetime(2025, 10, 22),
home_team_raw="Cleveland Cavaliers",
away_team_raw="Boston Celtics",
stadium_raw="Rocket Mortgage FieldHouse",
home_score=108,
away_score=112,
status="final",
source_url="http://example.com",
)
]
games, review_items = scraper._normalize_games(raw_games)
assert len(games) == 1
game = games[0]
# Check canonical ID format
assert game.id == "nba_2025_bos_cle_1022"
assert game.sport == "nba"
assert game.season == 2025
# Check team IDs
assert game.home_team_id == "team_nba_cle"
assert game.away_team_id == "team_nba_bos"
# Check scores preserved
assert game.home_score == 108
assert game.away_score == 112
def test_detects_doubleheaders(self):
"""Test doubleheaders get correct game numbers."""
scraper = NBAScraper(season=2025)
raw_games = [
RawGameData(
game_date=datetime(2025, 4, 1, 13, 0),
home_team_raw="Boston Celtics",
away_team_raw="New York Knicks",
stadium_raw="TD Garden",
status="final",
home_score=105,
away_score=98,
),
RawGameData(
game_date=datetime(2025, 4, 1, 19, 0),
home_team_raw="Boston Celtics",
away_team_raw="New York Knicks",
stadium_raw="TD Garden",
status="final",
home_score=110,
away_score=102,
),
]
games, _ = scraper._normalize_games(raw_games)
assert len(games) == 2
game_numbers = sorted([g.game_number for g in games])
assert game_numbers == [1, 2]
# Check IDs include game number
game_ids = sorted([g.id for g in games])
assert game_ids == ["nba_2025_nyk_bos_0401_1", "nba_2025_nyk_bos_0401_2"]
def test_creates_review_items_for_unresolved_teams(self):
"""Test review items are created for unresolved teams."""
scraper = NBAScraper(season=2025)
raw_games = [
RawGameData(
game_date=datetime(2025, 10, 22),
home_team_raw="Unknown Team XYZ",
away_team_raw="Boston Celtics",
stadium_raw="TD Garden",
status="scheduled",
),
]
games, review_items = scraper._normalize_games(raw_games)
# Game should not be created due to unresolved team
assert len(games) == 0
# But there should be a review item
assert len(review_items) >= 1
class TestTeamAndStadiumScraping:
"""Test team and stadium data scraping."""
def test_scrapes_all_nba_teams(self):
"""Test all 30 NBA teams are returned."""
scraper = NBAScraper(season=2025)
teams = scraper.scrape_teams()
# 30 NBA teams
assert len(teams) == 30
# Check team IDs are unique
team_ids = [t.id for t in teams]
assert len(set(team_ids)) == 30
# Check all teams have required fields
for team in teams:
assert team.id.startswith("team_nba_")
assert team.sport == "nba"
assert team.city
assert team.name
assert team.full_name
assert team.abbreviation
def test_teams_have_conferences_and_divisions(self):
"""Test teams have conference and division info."""
scraper = NBAScraper(season=2025)
teams = scraper.scrape_teams()
# Count teams by conference
eastern = [t for t in teams if t.conference == "Eastern"]
western = [t for t in teams if t.conference == "Western"]
assert len(eastern) == 15
assert len(western) == 15
def test_scrapes_all_nba_stadiums(self):
"""Test all NBA stadiums are returned."""
scraper = NBAScraper(season=2025)
stadiums = scraper.scrape_stadiums()
# Should have stadiums for all teams
assert len(stadiums) == 30
# Check stadium IDs are unique
stadium_ids = [s.id for s in stadiums]
assert len(set(stadium_ids)) == 30
# Check all stadiums have required fields
for stadium in stadiums:
assert stadium.id.startswith("stadium_nba_")
assert stadium.sport == "nba"
assert stadium.name
assert stadium.city
assert stadium.state
assert stadium.country in ["USA", "Canada"]
assert stadium.latitude != 0
assert stadium.longitude != 0
class TestScrapeFallback:
"""Test multi-source fallback behavior."""
def test_falls_back_to_next_source_on_failure(self):
"""Test scraper tries next source when first fails."""
scraper = NBAScraper(season=2025)
with patch.object(scraper, '_scrape_basketball_reference') as mock_br, \
patch.object(scraper, '_scrape_espn') as mock_espn:
# Make BR fail
mock_br.side_effect = Exception("Connection failed")
# Make ESPN return data
mock_espn.return_value = [
RawGameData(
game_date=datetime(2025, 10, 22),
home_team_raw="Cleveland Cavaliers",
away_team_raw="Boston Celtics",
stadium_raw="Rocket Mortgage FieldHouse",
status="scheduled",
)
]
result = scraper.scrape_games()
# Should have succeeded with ESPN
assert result.success
assert result.source == "espn"
assert mock_br.called
assert mock_espn.called
def test_returns_failure_when_all_sources_fail(self):
"""Test scraper returns failure when all sources fail."""
scraper = NBAScraper(season=2025)
with patch.object(scraper, '_scrape_basketball_reference') as mock_br, \
patch.object(scraper, '_scrape_espn') as mock_espn, \
patch.object(scraper, '_scrape_cbs') as mock_cbs:
mock_br.side_effect = Exception("BR failed")
mock_espn.side_effect = Exception("ESPN failed")
mock_cbs.side_effect = Exception("CBS failed")
result = scraper.scrape_games()
assert not result.success
assert "All sources failed" in result.error_message
assert "CBS failed" in result.error_message
class TestSeasonMonths:
"""Test season month calculation."""
def test_gets_correct_season_months(self):
"""Test correct months are returned for NBA season."""
scraper = NBAScraper(season=2025)
months = scraper._get_season_months()
# NBA season is Oct-Jun
assert len(months) == 9 # Oct, Nov, Dec, Jan, Feb, Mar, Apr, May, Jun
# Check first month is Oct of season year
assert months[0] == (2025, 10)
# Check last month is Jun of following year
assert months[-1] == (2026, 6)
# Check transition to new year
assert months[2] == (2025, 12) # December
assert months[3] == (2026, 1) # January

View File

@@ -1,310 +0,0 @@
"""Tests for NFL scraper."""
from datetime import datetime
from unittest.mock import patch
import pytest
from sportstime_parser.scrapers.nfl import NFLScraper, create_nfl_scraper
from sportstime_parser.scrapers.base import RawGameData
from sportstime_parser.tests.fixtures import (
load_json_fixture,
NFL_ESPN_SCOREBOARD_JSON,
)
class TestNFLScraperInit:
"""Test NFLScraper initialization."""
def test_creates_scraper_with_season(self):
"""Test scraper initializes with correct season."""
scraper = NFLScraper(season=2025)
assert scraper.sport == "nfl"
assert scraper.season == 2025
def test_factory_function_creates_scraper(self):
"""Test factory function creates correct scraper."""
scraper = create_nfl_scraper(season=2025)
assert isinstance(scraper, NFLScraper)
assert scraper.season == 2025
def test_expected_game_count(self):
"""Test expected game count is correct for NFL."""
scraper = NFLScraper(season=2025)
assert scraper.expected_game_count == 272
def test_sources_in_priority_order(self):
"""Test sources are returned in correct priority order."""
scraper = NFLScraper(season=2025)
sources = scraper._get_sources()
assert sources == ["espn", "pro_football_reference", "cbs"]
class TestESPNParsing:
"""Test ESPN API response parsing."""
def test_parses_completed_games(self):
"""Test parsing completed games from ESPN."""
scraper = NFLScraper(season=2025)
data = load_json_fixture(NFL_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
completed = [g for g in games if g.status == "final"]
assert len(completed) == 2
# Chiefs @ Ravens
kc_bal = next(g for g in completed if g.away_team_raw == "Kansas City Chiefs")
assert kc_bal.home_team_raw == "Baltimore Ravens"
assert kc_bal.away_score == 27
assert kc_bal.home_score == 20
assert kc_bal.stadium_raw == "M&T Bank Stadium"
def test_parses_scheduled_games(self):
"""Test parsing scheduled games from ESPN."""
scraper = NFLScraper(season=2025)
data = load_json_fixture(NFL_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
scheduled = [g for g in games if g.status == "scheduled"]
assert len(scheduled) == 1
dal_cle = scheduled[0]
assert dal_cle.away_team_raw == "Dallas Cowboys"
assert dal_cle.home_team_raw == "Cleveland Browns"
assert dal_cle.stadium_raw == "Cleveland Browns Stadium"
def test_parses_venue_info(self):
"""Test venue information is extracted."""
scraper = NFLScraper(season=2025)
data = load_json_fixture(NFL_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
for game in games:
assert game.stadium_raw is not None
class TestGameNormalization:
"""Test game normalization and canonical ID generation."""
def test_normalizes_games_with_canonical_ids(self):
"""Test games are normalized with correct canonical IDs."""
scraper = NFLScraper(season=2025)
raw_games = [
RawGameData(
game_date=datetime(2025, 9, 7),
home_team_raw="Baltimore Ravens",
away_team_raw="Kansas City Chiefs",
stadium_raw="M&T Bank Stadium",
home_score=20,
away_score=27,
status="final",
source_url="http://example.com",
)
]
games, review_items = scraper._normalize_games(raw_games)
assert len(games) == 1
game = games[0]
# Check canonical ID format
assert game.id == "nfl_2025_kc_bal_0907"
assert game.sport == "nfl"
assert game.season == 2025
# Check team IDs
assert game.home_team_id == "team_nfl_bal"
assert game.away_team_id == "team_nfl_kc"
# Check scores preserved
assert game.home_score == 20
assert game.away_score == 27
def test_creates_review_items_for_unresolved_teams(self):
"""Test review items are created for unresolved teams."""
scraper = NFLScraper(season=2025)
raw_games = [
RawGameData(
game_date=datetime(2025, 9, 7),
home_team_raw="Unknown Team XYZ",
away_team_raw="Kansas City Chiefs",
stadium_raw="Arrowhead Stadium",
status="scheduled",
),
]
games, review_items = scraper._normalize_games(raw_games)
# Game should not be created due to unresolved team
assert len(games) == 0
# But there should be a review item
assert len(review_items) >= 1
class TestTeamAndStadiumScraping:
"""Test team and stadium data scraping."""
def test_scrapes_all_nfl_teams(self):
"""Test all 32 NFL teams are returned."""
scraper = NFLScraper(season=2025)
teams = scraper.scrape_teams()
# 32 NFL teams
assert len(teams) == 32
# Check team IDs are unique
team_ids = [t.id for t in teams]
assert len(set(team_ids)) == 32
# Check all teams have required fields
for team in teams:
assert team.id.startswith("team_nfl_")
assert team.sport == "nfl"
assert team.city
assert team.name
assert team.full_name
assert team.abbreviation
def test_teams_have_conferences_and_divisions(self):
"""Test teams have conference and division info."""
scraper = NFLScraper(season=2025)
teams = scraper.scrape_teams()
# Count teams by conference
afc = [t for t in teams if t.conference == "AFC"]
nfc = [t for t in teams if t.conference == "NFC"]
assert len(afc) == 16
assert len(nfc) == 16
def test_scrapes_all_nfl_stadiums(self):
"""Test all NFL stadiums are returned."""
scraper = NFLScraper(season=2025)
stadiums = scraper.scrape_stadiums()
# Should have stadiums for all teams (some share)
assert len(stadiums) >= 30
# Check all stadiums have required fields
for stadium in stadiums:
assert stadium.id.startswith("stadium_nfl_")
assert stadium.sport == "nfl"
assert stadium.name
assert stadium.city
assert stadium.state
assert stadium.country == "USA"
assert stadium.latitude != 0
assert stadium.longitude != 0
class TestScrapeFallback:
"""Test multi-source fallback behavior."""
def test_falls_back_to_next_source_on_failure(self):
"""Test scraper tries next source when first fails."""
scraper = NFLScraper(season=2025)
with patch.object(scraper, '_scrape_espn') as mock_espn, \
patch.object(scraper, '_scrape_pro_football_reference') as mock_pfr:
# Make ESPN fail
mock_espn.side_effect = Exception("Connection failed")
# Make PFR return data
mock_pfr.return_value = [
RawGameData(
game_date=datetime(2025, 9, 7),
home_team_raw="Baltimore Ravens",
away_team_raw="Kansas City Chiefs",
stadium_raw="M&T Bank Stadium",
status="scheduled",
)
]
result = scraper.scrape_games()
assert result.success
assert result.source == "pro_football_reference"
assert mock_espn.called
assert mock_pfr.called
class TestSeasonMonths:
"""Test season month calculation."""
def test_gets_correct_season_months(self):
"""Test correct months are returned for NFL season."""
scraper = NFLScraper(season=2025)
months = scraper._get_season_months()
# NFL season is September-February
assert len(months) == 6 # Sep, Oct, Nov, Dec, Jan, Feb
# Check first month is September of season year
assert months[0] == (2025, 9)
# Check last month is February of following year
assert months[-1] == (2026, 2)
# Check transition to new year
assert months[3] == (2025, 12) # December
assert months[4] == (2026, 1) # January
class TestInternationalFiltering:
"""Test international game filtering.
Note: Filtering happens in _parse_espn_response, not _normalize_games.
"""
def test_filters_london_games_during_parsing(self):
"""Test London games are filtered out during ESPN parsing."""
scraper = NFLScraper(season=2025)
# Create ESPN-like data with London game
espn_data = {
"events": [
{
"date": "2025-10-15T09:30:00Z",
"competitions": [
{
"neutralSite": True,
"venue": {
"fullName": "London Stadium",
"address": {"city": "London", "country": "UK"},
},
"competitors": [
{"homeAway": "home", "team": {"displayName": "Jacksonville Jaguars"}},
{"homeAway": "away", "team": {"displayName": "Buffalo Bills"}},
],
}
],
}
]
}
games = scraper._parse_espn_response(espn_data, "http://espn.com/api")
# London game should be filtered
assert len(games) == 0
def test_keeps_us_games(self):
"""Test US games are kept."""
scraper = NFLScraper(season=2025)
raw_games = [
RawGameData(
game_date=datetime(2025, 9, 7),
home_team_raw="Baltimore Ravens",
away_team_raw="Kansas City Chiefs",
stadium_raw="M&T Bank Stadium",
status="scheduled",
),
]
games, _ = scraper._normalize_games(raw_games)
assert len(games) == 1

View File

@@ -1,317 +0,0 @@
"""Tests for NHL scraper."""
from datetime import datetime
from unittest.mock import patch
import pytest
from sportstime_parser.scrapers.nhl import NHLScraper, create_nhl_scraper
from sportstime_parser.scrapers.base import RawGameData
from sportstime_parser.tests.fixtures import (
load_json_fixture,
NHL_ESPN_SCOREBOARD_JSON,
)
class TestNHLScraperInit:
"""Test NHLScraper initialization."""
def test_creates_scraper_with_season(self):
"""Test scraper initializes with correct season."""
scraper = NHLScraper(season=2025)
assert scraper.sport == "nhl"
assert scraper.season == 2025
def test_factory_function_creates_scraper(self):
"""Test factory function creates correct scraper."""
scraper = create_nhl_scraper(season=2025)
assert isinstance(scraper, NHLScraper)
assert scraper.season == 2025
def test_expected_game_count(self):
"""Test expected game count is correct for NHL."""
scraper = NHLScraper(season=2025)
assert scraper.expected_game_count == 1312
def test_sources_in_priority_order(self):
"""Test sources are returned in correct priority order."""
scraper = NHLScraper(season=2025)
sources = scraper._get_sources()
assert sources == ["hockey_reference", "nhl_api", "espn"]
class TestESPNParsing:
"""Test ESPN API response parsing."""
def test_parses_completed_games(self):
"""Test parsing completed games from ESPN."""
scraper = NHLScraper(season=2025)
data = load_json_fixture(NHL_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
completed = [g for g in games if g.status == "final"]
assert len(completed) == 2
# Penguins @ Bruins
pit_bos = next(g for g in completed if g.away_team_raw == "Pittsburgh Penguins")
assert pit_bos.home_team_raw == "Boston Bruins"
assert pit_bos.away_score == 2
assert pit_bos.home_score == 4
assert pit_bos.stadium_raw == "TD Garden"
def test_parses_scheduled_games(self):
"""Test parsing scheduled games from ESPN."""
scraper = NHLScraper(season=2025)
data = load_json_fixture(NHL_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
scheduled = [g for g in games if g.status == "scheduled"]
assert len(scheduled) == 1
vgk_lak = scheduled[0]
assert vgk_lak.away_team_raw == "Vegas Golden Knights"
assert vgk_lak.home_team_raw == "Los Angeles Kings"
assert vgk_lak.stadium_raw == "Crypto.com Arena"
def test_parses_venue_info(self):
"""Test venue information is extracted."""
scraper = NHLScraper(season=2025)
data = load_json_fixture(NHL_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
for game in games:
assert game.stadium_raw is not None
class TestGameNormalization:
"""Test game normalization and canonical ID generation."""
def test_normalizes_games_with_canonical_ids(self):
"""Test games are normalized with correct canonical IDs."""
scraper = NHLScraper(season=2025)
raw_games = [
RawGameData(
game_date=datetime(2025, 10, 8),
home_team_raw="Boston Bruins",
away_team_raw="Pittsburgh Penguins",
stadium_raw="TD Garden",
home_score=4,
away_score=2,
status="final",
source_url="http://example.com",
)
]
games, review_items = scraper._normalize_games(raw_games)
assert len(games) == 1
game = games[0]
# Check canonical ID format
assert game.id == "nhl_2025_pit_bos_1008"
assert game.sport == "nhl"
assert game.season == 2025
# Check team IDs
assert game.home_team_id == "team_nhl_bos"
assert game.away_team_id == "team_nhl_pit"
# Check scores preserved
assert game.home_score == 4
assert game.away_score == 2
def test_creates_review_items_for_unresolved_teams(self):
"""Test review items are created for unresolved teams."""
scraper = NHLScraper(season=2025)
raw_games = [
RawGameData(
game_date=datetime(2025, 10, 8),
home_team_raw="Unknown Team XYZ",
away_team_raw="Boston Bruins",
stadium_raw="TD Garden",
status="scheduled",
),
]
games, review_items = scraper._normalize_games(raw_games)
# Game should not be created due to unresolved team
assert len(games) == 0
# But there should be a review item
assert len(review_items) >= 1
class TestTeamAndStadiumScraping:
"""Test team and stadium data scraping."""
def test_scrapes_all_nhl_teams(self):
"""Test all 32 NHL teams are returned."""
scraper = NHLScraper(season=2025)
teams = scraper.scrape_teams()
# 32 NHL teams
assert len(teams) == 32
# Check team IDs are unique
team_ids = [t.id for t in teams]
assert len(set(team_ids)) == 32
# Check all teams have required fields
for team in teams:
assert team.id.startswith("team_nhl_")
assert team.sport == "nhl"
assert team.city
assert team.name
assert team.full_name
assert team.abbreviation
def test_teams_have_conferences_and_divisions(self):
"""Test teams have conference and division info."""
scraper = NHLScraper(season=2025)
teams = scraper.scrape_teams()
# Count teams by conference
eastern = [t for t in teams if t.conference == "Eastern"]
western = [t for t in teams if t.conference == "Western"]
assert len(eastern) == 16
assert len(western) == 16
def test_scrapes_all_nhl_stadiums(self):
"""Test all NHL stadiums are returned."""
scraper = NHLScraper(season=2025)
stadiums = scraper.scrape_stadiums()
# Should have stadiums for all teams
assert len(stadiums) == 32
# Check stadium IDs are unique
stadium_ids = [s.id for s in stadiums]
assert len(set(stadium_ids)) == 32
# Check all stadiums have required fields
for stadium in stadiums:
assert stadium.id.startswith("stadium_nhl_")
assert stadium.sport == "nhl"
assert stadium.name
assert stadium.city
assert stadium.state
assert stadium.country in ["USA", "Canada"]
assert stadium.latitude != 0
assert stadium.longitude != 0
class TestScrapeFallback:
"""Test multi-source fallback behavior."""
def test_falls_back_to_next_source_on_failure(self):
"""Test scraper tries next source when first fails."""
scraper = NHLScraper(season=2025)
with patch.object(scraper, '_scrape_hockey_reference') as mock_hr, \
patch.object(scraper, '_scrape_nhl_api') as mock_nhl, \
patch.object(scraper, '_scrape_espn') as mock_espn:
# Make HR and NHL API fail
mock_hr.side_effect = Exception("Connection failed")
mock_nhl.side_effect = Exception("API error")
# Make ESPN return data
mock_espn.return_value = [
RawGameData(
game_date=datetime(2025, 10, 8),
home_team_raw="Boston Bruins",
away_team_raw="Pittsburgh Penguins",
stadium_raw="TD Garden",
status="scheduled",
)
]
result = scraper.scrape_games()
assert result.success
assert result.source == "espn"
assert mock_hr.called
assert mock_nhl.called
assert mock_espn.called
class TestSeasonMonths:
"""Test season month calculation."""
def test_gets_correct_season_months(self):
"""Test correct months are returned for NHL season."""
scraper = NHLScraper(season=2025)
months = scraper._get_season_months()
# NHL season is October-June
assert len(months) == 9 # Oct, Nov, Dec, Jan, Feb, Mar, Apr, May, Jun
# Check first month is October of season year
assert months[0] == (2025, 10)
# Check last month is June of following year
assert months[-1] == (2026, 6)
# Check transition to new year
assert months[2] == (2025, 12) # December
assert months[3] == (2026, 1) # January
class TestInternationalFiltering:
"""Test international game filtering.
Note: Filtering happens in _parse_espn_response, not _normalize_games.
"""
def test_filters_european_games_during_parsing(self):
"""Test European games are filtered out during ESPN parsing."""
scraper = NHLScraper(season=2025)
# Create ESPN-like data with Prague game (Global Series)
espn_data = {
"events": [
{
"date": "2025-10-10T18:00:00Z",
"competitions": [
{
"neutralSite": True,
"venue": {
"fullName": "O2 Arena, Prague",
"address": {"city": "Prague", "country": "Czech Republic"},
},
"competitors": [
{"homeAway": "home", "team": {"displayName": "Florida Panthers"}},
{"homeAway": "away", "team": {"displayName": "Dallas Stars"}},
],
}
],
}
]
}
games = scraper._parse_espn_response(espn_data, "http://espn.com/api")
# Prague game should be filtered
assert len(games) == 0
def test_keeps_north_american_games(self):
"""Test North American games are kept."""
scraper = NHLScraper(season=2025)
raw_games = [
RawGameData(
game_date=datetime(2025, 10, 8),
home_team_raw="Boston Bruins",
away_team_raw="Pittsburgh Penguins",
stadium_raw="TD Garden",
status="scheduled",
),
]
games, _ = scraper._normalize_games(raw_games)
assert len(games) == 1

View File

@@ -1,226 +0,0 @@
"""Tests for NWSL scraper."""
from datetime import datetime
from unittest.mock import patch
import pytest
from sportstime_parser.scrapers.nwsl import NWSLScraper, create_nwsl_scraper
from sportstime_parser.scrapers.base import RawGameData
from sportstime_parser.tests.fixtures import (
load_json_fixture,
NWSL_ESPN_SCOREBOARD_JSON,
)
class TestNWSLScraperInit:
"""Test NWSLScraper initialization."""
def test_creates_scraper_with_season(self):
"""Test scraper initializes with correct season."""
scraper = NWSLScraper(season=2026)
assert scraper.sport == "nwsl"
assert scraper.season == 2026
def test_factory_function_creates_scraper(self):
"""Test factory function creates correct scraper."""
scraper = create_nwsl_scraper(season=2026)
assert isinstance(scraper, NWSLScraper)
assert scraper.season == 2026
def test_expected_game_count(self):
"""Test expected game count is correct for NWSL."""
scraper = NWSLScraper(season=2026)
assert scraper.expected_game_count == 182
def test_sources_in_priority_order(self):
"""Test sources are returned in correct priority order."""
scraper = NWSLScraper(season=2026)
sources = scraper._get_sources()
assert sources == ["espn"]
class TestESPNParsing:
"""Test ESPN API response parsing."""
def test_parses_completed_games(self):
"""Test parsing completed games from ESPN."""
scraper = NWSLScraper(season=2026)
data = load_json_fixture(NWSL_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
completed = [g for g in games if g.status == "final"]
assert len(completed) == 2
# Angel City @ Thorns
la_por = next(g for g in completed if g.away_team_raw == "Angel City FC")
assert la_por.home_team_raw == "Portland Thorns FC"
assert la_por.away_score == 1
assert la_por.home_score == 2
assert la_por.stadium_raw == "Providence Park"
def test_parses_scheduled_games(self):
"""Test parsing scheduled games from ESPN."""
scraper = NWSLScraper(season=2026)
data = load_json_fixture(NWSL_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
scheduled = [g for g in games if g.status == "scheduled"]
assert len(scheduled) == 1
sd_bay = scheduled[0]
assert sd_bay.away_team_raw == "San Diego Wave FC"
assert sd_bay.home_team_raw == "Bay FC"
assert sd_bay.stadium_raw == "PayPal Park"
def test_parses_venue_info(self):
"""Test venue information is extracted."""
scraper = NWSLScraper(season=2026)
data = load_json_fixture(NWSL_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
for game in games:
assert game.stadium_raw is not None
class TestGameNormalization:
"""Test game normalization and canonical ID generation."""
def test_normalizes_games_with_canonical_ids(self):
"""Test games are normalized with correct canonical IDs."""
scraper = NWSLScraper(season=2026)
raw_games = [
RawGameData(
game_date=datetime(2026, 4, 10),
home_team_raw="Portland Thorns FC",
away_team_raw="Angel City FC",
stadium_raw="Providence Park",
home_score=2,
away_score=1,
status="final",
source_url="http://example.com",
)
]
games, review_items = scraper._normalize_games(raw_games)
assert len(games) == 1
game = games[0]
# Check canonical ID format
assert game.id == "nwsl_2026_anf_por_0410"
assert game.sport == "nwsl"
assert game.season == 2026
# Check team IDs
assert game.home_team_id == "team_nwsl_por"
assert game.away_team_id == "team_nwsl_anf"
# Check scores preserved
assert game.home_score == 2
assert game.away_score == 1
def test_creates_review_items_for_unresolved_teams(self):
"""Test review items are created for unresolved teams."""
scraper = NWSLScraper(season=2026)
raw_games = [
RawGameData(
game_date=datetime(2026, 4, 10),
home_team_raw="Unknown Team XYZ",
away_team_raw="Portland Thorns FC",
stadium_raw="Providence Park",
status="scheduled",
),
]
games, review_items = scraper._normalize_games(raw_games)
# Game should not be created due to unresolved team
assert len(games) == 0
# But there should be a review item
assert len(review_items) >= 1
class TestTeamAndStadiumScraping:
"""Test team and stadium data scraping."""
def test_scrapes_all_nwsl_teams(self):
"""Test all NWSL teams are returned."""
scraper = NWSLScraper(season=2026)
teams = scraper.scrape_teams()
# NWSL has 14 teams
assert len(teams) == 14
# Check team IDs are unique
team_ids = [t.id for t in teams]
assert len(set(team_ids)) == 14
# Check all teams have required fields
for team in teams:
assert team.id.startswith("team_nwsl_")
assert team.sport == "nwsl"
assert team.city
assert team.name
assert team.full_name
assert team.abbreviation
def test_scrapes_all_nwsl_stadiums(self):
"""Test all NWSL stadiums are returned."""
scraper = NWSLScraper(season=2026)
stadiums = scraper.scrape_stadiums()
# Should have stadiums for all teams
assert len(stadiums) == 14
# Check stadium IDs are unique
stadium_ids = [s.id for s in stadiums]
assert len(set(stadium_ids)) == 14
# Check all stadiums have required fields
for stadium in stadiums:
assert stadium.id.startswith("stadium_nwsl_")
assert stadium.sport == "nwsl"
assert stadium.name
assert stadium.city
assert stadium.state
assert stadium.country == "USA"
assert stadium.latitude != 0
assert stadium.longitude != 0
class TestScrapeFallback:
"""Test fallback behavior (NWSL only has ESPN)."""
def test_returns_failure_when_espn_fails(self):
"""Test scraper returns failure when ESPN fails."""
scraper = NWSLScraper(season=2026)
with patch.object(scraper, '_scrape_espn') as mock_espn:
mock_espn.side_effect = Exception("ESPN failed")
result = scraper.scrape_games()
assert not result.success
assert "All sources failed" in result.error_message
class TestSeasonMonths:
"""Test season month calculation."""
def test_gets_correct_season_months(self):
"""Test correct months are returned for NWSL season."""
scraper = NWSLScraper(season=2026)
months = scraper._get_season_months()
# NWSL season is March-November
assert len(months) == 9 # Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov
# Check first month is March of season year
assert months[0] == (2026, 3)
# Check last month is November
assert months[-1] == (2026, 11)

View File

@@ -1,226 +0,0 @@
"""Tests for WNBA scraper."""
from datetime import datetime
from unittest.mock import patch
import pytest
from sportstime_parser.scrapers.wnba import WNBAScraper, create_wnba_scraper
from sportstime_parser.scrapers.base import RawGameData
from sportstime_parser.tests.fixtures import (
load_json_fixture,
WNBA_ESPN_SCOREBOARD_JSON,
)
class TestWNBAScraperInit:
"""Test WNBAScraper initialization."""
def test_creates_scraper_with_season(self):
"""Test scraper initializes with correct season."""
scraper = WNBAScraper(season=2026)
assert scraper.sport == "wnba"
assert scraper.season == 2026
def test_factory_function_creates_scraper(self):
"""Test factory function creates correct scraper."""
scraper = create_wnba_scraper(season=2026)
assert isinstance(scraper, WNBAScraper)
assert scraper.season == 2026
def test_expected_game_count(self):
"""Test expected game count is correct for WNBA."""
scraper = WNBAScraper(season=2026)
assert scraper.expected_game_count == 220
def test_sources_in_priority_order(self):
"""Test sources are returned in correct priority order."""
scraper = WNBAScraper(season=2026)
sources = scraper._get_sources()
assert sources == ["espn"]
class TestESPNParsing:
"""Test ESPN API response parsing."""
def test_parses_completed_games(self):
"""Test parsing completed games from ESPN."""
scraper = WNBAScraper(season=2026)
data = load_json_fixture(WNBA_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
completed = [g for g in games if g.status == "final"]
assert len(completed) == 2
# Aces @ Liberty
lv_ny = next(g for g in completed if g.away_team_raw == "Las Vegas Aces")
assert lv_ny.home_team_raw == "New York Liberty"
assert lv_ny.away_score == 88
assert lv_ny.home_score == 92
assert lv_ny.stadium_raw == "Barclays Center"
def test_parses_scheduled_games(self):
"""Test parsing scheduled games from ESPN."""
scraper = WNBAScraper(season=2026)
data = load_json_fixture(WNBA_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
scheduled = [g for g in games if g.status == "scheduled"]
assert len(scheduled) == 1
phx_sea = scheduled[0]
assert phx_sea.away_team_raw == "Phoenix Mercury"
assert phx_sea.home_team_raw == "Seattle Storm"
assert phx_sea.stadium_raw == "Climate Pledge Arena"
def test_parses_venue_info(self):
"""Test venue information is extracted."""
scraper = WNBAScraper(season=2026)
data = load_json_fixture(WNBA_ESPN_SCOREBOARD_JSON)
games = scraper._parse_espn_response(data, "http://espn.com/api")
for game in games:
assert game.stadium_raw is not None
class TestGameNormalization:
"""Test game normalization and canonical ID generation."""
def test_normalizes_games_with_canonical_ids(self):
"""Test games are normalized with correct canonical IDs."""
scraper = WNBAScraper(season=2026)
raw_games = [
RawGameData(
game_date=datetime(2026, 5, 20),
home_team_raw="New York Liberty",
away_team_raw="Las Vegas Aces",
stadium_raw="Barclays Center",
home_score=92,
away_score=88,
status="final",
source_url="http://example.com",
)
]
games, review_items = scraper._normalize_games(raw_games)
assert len(games) == 1
game = games[0]
# Check canonical ID format
assert game.id == "wnba_2026_lv_ny_0520"
assert game.sport == "wnba"
assert game.season == 2026
# Check team IDs
assert game.home_team_id == "team_wnba_ny"
assert game.away_team_id == "team_wnba_lv"
# Check scores preserved
assert game.home_score == 92
assert game.away_score == 88
def test_creates_review_items_for_unresolved_teams(self):
"""Test review items are created for unresolved teams."""
scraper = WNBAScraper(season=2026)
raw_games = [
RawGameData(
game_date=datetime(2026, 5, 20),
home_team_raw="Unknown Team XYZ",
away_team_raw="Las Vegas Aces",
stadium_raw="Barclays Center",
status="scheduled",
),
]
games, review_items = scraper._normalize_games(raw_games)
# Game should not be created due to unresolved team
assert len(games) == 0
# But there should be a review item
assert len(review_items) >= 1
class TestTeamAndStadiumScraping:
"""Test team and stadium data scraping."""
def test_scrapes_all_wnba_teams(self):
"""Test all WNBA teams are returned."""
scraper = WNBAScraper(season=2026)
teams = scraper.scrape_teams()
# WNBA has 13 teams (including Golden State Valkyries)
assert len(teams) == 13
# Check team IDs are unique
team_ids = [t.id for t in teams]
assert len(set(team_ids)) == 13
# Check all teams have required fields
for team in teams:
assert team.id.startswith("team_wnba_")
assert team.sport == "wnba"
assert team.city
assert team.name
assert team.full_name
assert team.abbreviation
def test_scrapes_all_wnba_stadiums(self):
"""Test all WNBA stadiums are returned."""
scraper = WNBAScraper(season=2026)
stadiums = scraper.scrape_stadiums()
# Should have stadiums for all teams
assert len(stadiums) == 13
# Check stadium IDs are unique
stadium_ids = [s.id for s in stadiums]
assert len(set(stadium_ids)) == 13
# Check all stadiums have required fields
for stadium in stadiums:
assert stadium.id.startswith("stadium_wnba_")
assert stadium.sport == "wnba"
assert stadium.name
assert stadium.city
assert stadium.state
assert stadium.country == "USA"
assert stadium.latitude != 0
assert stadium.longitude != 0
class TestScrapeFallback:
"""Test fallback behavior (WNBA only has ESPN)."""
def test_returns_failure_when_espn_fails(self):
"""Test scraper returns failure when ESPN fails."""
scraper = WNBAScraper(season=2026)
with patch.object(scraper, '_scrape_espn') as mock_espn:
mock_espn.side_effect = Exception("ESPN failed")
result = scraper.scrape_games()
assert not result.success
assert "All sources failed" in result.error_message
class TestSeasonMonths:
"""Test season month calculation."""
def test_gets_correct_season_months(self):
"""Test correct months are returned for WNBA season."""
scraper = WNBAScraper(season=2026)
months = scraper._get_season_months()
# WNBA season is May-October
assert len(months) == 6 # May, Jun, Jul, Aug, Sep, Oct
# Check first month is May of season year
assert months[0] == (2026, 5)
# Check last month is October
assert months[-1] == (2026, 10)

View File

@@ -1,187 +0,0 @@
"""Tests for timezone conversion utilities."""
import pytest
from datetime import datetime, date
from zoneinfo import ZoneInfo
from sportstime_parser.normalizers.timezone import (
detect_timezone_from_string,
detect_timezone_from_location,
parse_datetime,
convert_to_utc,
get_stadium_timezone,
TimezoneResult,
)
class TestDetectTimezoneFromString:
"""Tests for detect_timezone_from_string function."""
def test_eastern_time(self):
"""Test Eastern Time detection."""
assert detect_timezone_from_string("7:00 PM ET") == "America/New_York"
assert detect_timezone_from_string("7:00 PM EST") == "America/New_York"
assert detect_timezone_from_string("7:00 PM EDT") == "America/New_York"
def test_central_time(self):
"""Test Central Time detection."""
assert detect_timezone_from_string("8:00 PM CT") == "America/Chicago"
assert detect_timezone_from_string("8:00 PM CST") == "America/Chicago"
assert detect_timezone_from_string("8:00 PM CDT") == "America/Chicago"
def test_mountain_time(self):
"""Test Mountain Time detection."""
assert detect_timezone_from_string("7:00 PM MT") == "America/Denver"
assert detect_timezone_from_string("7:00 PM MST") == "America/Denver"
def test_pacific_time(self):
"""Test Pacific Time detection."""
assert detect_timezone_from_string("7:00 PM PT") == "America/Los_Angeles"
assert detect_timezone_from_string("7:00 PM PST") == "America/Los_Angeles"
assert detect_timezone_from_string("7:00 PM PDT") == "America/Los_Angeles"
def test_no_timezone(self):
"""Test string with no timezone."""
assert detect_timezone_from_string("7:00 PM") is None
assert detect_timezone_from_string("19:00") is None
def test_case_insensitive(self):
"""Test case insensitive matching."""
assert detect_timezone_from_string("7:00 PM et") == "America/New_York"
assert detect_timezone_from_string("7:00 PM Et") == "America/New_York"
class TestDetectTimezoneFromLocation:
"""Tests for detect_timezone_from_location function."""
def test_eastern_states(self):
"""Test Eastern timezone states."""
assert detect_timezone_from_location(state="NY") == "America/New_York"
assert detect_timezone_from_location(state="MA") == "America/New_York"
assert detect_timezone_from_location(state="FL") == "America/New_York"
def test_central_states(self):
"""Test Central timezone states."""
assert detect_timezone_from_location(state="TX") == "America/Chicago"
assert detect_timezone_from_location(state="IL") == "America/Chicago"
def test_mountain_states(self):
"""Test Mountain timezone states."""
assert detect_timezone_from_location(state="CO") == "America/Denver"
assert detect_timezone_from_location(state="AZ") == "America/Phoenix"
def test_pacific_states(self):
"""Test Pacific timezone states."""
assert detect_timezone_from_location(state="CA") == "America/Los_Angeles"
assert detect_timezone_from_location(state="WA") == "America/Los_Angeles"
def test_canadian_provinces(self):
"""Test Canadian provinces."""
assert detect_timezone_from_location(state="ON") == "America/Toronto"
assert detect_timezone_from_location(state="BC") == "America/Vancouver"
assert detect_timezone_from_location(state="AB") == "America/Edmonton"
def test_case_insensitive(self):
"""Test case insensitive matching."""
assert detect_timezone_from_location(state="ny") == "America/New_York"
assert detect_timezone_from_location(state="Ny") == "America/New_York"
def test_unknown_state(self):
"""Test unknown state returns None."""
assert detect_timezone_from_location(state="XX") is None
assert detect_timezone_from_location(state=None) is None
class TestParseDatetime:
"""Tests for parse_datetime function."""
def test_basic_date_time(self):
"""Test basic date and time parsing."""
result = parse_datetime("2025-12-25", "7:00 PM ET")
assert result.datetime_utc.year == 2025
assert result.datetime_utc.month == 12
assert result.datetime_utc.day == 26 # UTC is +5 hours ahead
assert result.source_timezone == "America/New_York"
assert result.confidence == "high"
def test_date_only(self):
"""Test date only parsing."""
result = parse_datetime("2025-10-21")
assert result.datetime_utc.year == 2025
assert result.datetime_utc.month == 10
assert result.datetime_utc.day == 21
def test_timezone_hint(self):
"""Test timezone hint is used when no timezone in string."""
result = parse_datetime(
"2025-10-21",
"7:00 PM",
timezone_hint="America/Chicago",
)
assert result.source_timezone == "America/Chicago"
assert result.confidence == "medium"
def test_location_inference(self):
"""Test timezone inference from location."""
result = parse_datetime(
"2025-10-21",
"7:00 PM",
location_state="CA",
)
assert result.source_timezone == "America/Los_Angeles"
assert result.confidence == "medium"
def test_default_to_eastern(self):
"""Test defaults to Eastern when no timezone info."""
result = parse_datetime("2025-10-21", "7:00 PM")
assert result.source_timezone == "America/New_York"
assert result.confidence == "low"
assert result.warning is not None
def test_invalid_date(self):
"""Test handling of invalid date."""
result = parse_datetime("not a date")
assert result.confidence == "low"
assert result.warning is not None
class TestConvertToUtc:
"""Tests for convert_to_utc function."""
def test_convert_naive_datetime(self):
"""Test converting naive datetime to UTC."""
dt = datetime(2025, 12, 25, 19, 0) # 7:00 PM
utc = convert_to_utc(dt, "America/New_York")
# In December, Eastern Time is UTC-5
assert utc.hour == 0 # Next day 00:00 UTC
assert utc.day == 26
def test_convert_aware_datetime(self):
"""Test converting timezone-aware datetime."""
tz = ZoneInfo("America/Los_Angeles")
dt = datetime(2025, 7, 4, 19, 0, tzinfo=tz) # 7:00 PM PT
utc = convert_to_utc(dt, "America/Los_Angeles")
# In July, Pacific Time is UTC-7
assert utc.hour == 2 # 02:00 UTC next day
assert utc.day == 5
class TestGetStadiumTimezone:
"""Tests for get_stadium_timezone function."""
def test_explicit_timezone(self):
"""Test explicit timezone override."""
tz = get_stadium_timezone("AZ", stadium_timezone="America/Phoenix")
assert tz == "America/Phoenix"
def test_state_inference(self):
"""Test timezone from state."""
tz = get_stadium_timezone("NY")
assert tz == "America/New_York"
def test_default_eastern(self):
"""Test default to Eastern for unknown state."""
tz = get_stadium_timezone("XX")
assert tz == "America/New_York"

View File

@@ -1 +0,0 @@
"""Tests for the uploaders module."""

View File

@@ -1,461 +0,0 @@
"""Tests for the CloudKit client."""
import json
import pytest
from datetime import datetime
from unittest.mock import Mock, patch, MagicMock
from sportstime_parser.uploaders.cloudkit import (
CloudKitClient,
CloudKitRecord,
CloudKitError,
CloudKitAuthError,
CloudKitRateLimitError,
CloudKitServerError,
RecordType,
OperationResult,
BatchResult,
)
class TestCloudKitRecord:
"""Tests for CloudKitRecord dataclass."""
def test_create_record(self):
"""Test creating a CloudKitRecord."""
record = CloudKitRecord(
record_name="nba_2025_hou_okc_1021",
record_type=RecordType.GAME,
fields={
"sport": "nba",
"season": 2025,
},
)
assert record.record_name == "nba_2025_hou_okc_1021"
assert record.record_type == RecordType.GAME
assert record.fields["sport"] == "nba"
assert record.record_change_tag is None
def test_to_cloudkit_dict(self):
"""Test converting to CloudKit API format."""
record = CloudKitRecord(
record_name="nba_2025_hou_okc_1021",
record_type=RecordType.GAME,
fields={
"sport": "nba",
"season": 2025,
},
)
data = record.to_cloudkit_dict()
assert data["recordName"] == "nba_2025_hou_okc_1021"
assert data["recordType"] == "Game"
assert "fields" in data
assert "recordChangeTag" not in data
def test_to_cloudkit_dict_with_change_tag(self):
"""Test converting with change tag for updates."""
record = CloudKitRecord(
record_name="nba_2025_hou_okc_1021",
record_type=RecordType.GAME,
fields={"sport": "nba"},
record_change_tag="abc123",
)
data = record.to_cloudkit_dict()
assert data["recordChangeTag"] == "abc123"
def test_format_string_field(self):
"""Test formatting string fields."""
record = CloudKitRecord(
record_name="test",
record_type=RecordType.GAME,
fields={"name": "Test Name"},
)
data = record.to_cloudkit_dict()
assert data["fields"]["name"]["value"] == "Test Name"
assert data["fields"]["name"]["type"] == "STRING"
def test_format_int_field(self):
"""Test formatting integer fields."""
record = CloudKitRecord(
record_name="test",
record_type=RecordType.GAME,
fields={"count": 42},
)
data = record.to_cloudkit_dict()
assert data["fields"]["count"]["value"] == 42
assert data["fields"]["count"]["type"] == "INT64"
def test_format_float_field(self):
"""Test formatting float fields."""
record = CloudKitRecord(
record_name="test",
record_type=RecordType.STADIUM,
fields={"latitude": 35.4634},
)
data = record.to_cloudkit_dict()
assert data["fields"]["latitude"]["value"] == 35.4634
assert data["fields"]["latitude"]["type"] == "DOUBLE"
def test_format_datetime_field(self):
"""Test formatting datetime fields."""
dt = datetime(2025, 10, 21, 19, 0, 0)
record = CloudKitRecord(
record_name="test",
record_type=RecordType.GAME,
fields={"game_date": dt},
)
data = record.to_cloudkit_dict()
expected_ms = int(dt.timestamp() * 1000)
assert data["fields"]["game_date"]["value"] == expected_ms
assert data["fields"]["game_date"]["type"] == "TIMESTAMP"
def test_format_location_field(self):
"""Test formatting location fields."""
record = CloudKitRecord(
record_name="test",
record_type=RecordType.STADIUM,
fields={
"location": {"latitude": 35.4634, "longitude": -97.5151},
},
)
data = record.to_cloudkit_dict()
assert data["fields"]["location"]["type"] == "LOCATION"
assert data["fields"]["location"]["value"]["latitude"] == 35.4634
assert data["fields"]["location"]["value"]["longitude"] == -97.5151
def test_skip_none_fields(self):
"""Test that None fields are skipped."""
record = CloudKitRecord(
record_name="test",
record_type=RecordType.GAME,
fields={
"sport": "nba",
"score": None, # Should be skipped
},
)
data = record.to_cloudkit_dict()
assert "sport" in data["fields"]
assert "score" not in data["fields"]
class TestOperationResult:
"""Tests for OperationResult dataclass."""
def test_successful_result(self):
"""Test creating a successful operation result."""
result = OperationResult(
record_name="test_record",
success=True,
record_change_tag="new_tag",
)
assert result.record_name == "test_record"
assert result.success is True
assert result.record_change_tag == "new_tag"
assert result.error_code is None
def test_failed_result(self):
"""Test creating a failed operation result."""
result = OperationResult(
record_name="test_record",
success=False,
error_code="SERVER_ERROR",
error_message="Internal server error",
)
assert result.success is False
assert result.error_code == "SERVER_ERROR"
assert result.error_message == "Internal server error"
class TestBatchResult:
"""Tests for BatchResult dataclass."""
def test_empty_batch_result(self):
"""Test empty batch result."""
result = BatchResult()
assert result.all_succeeded is True
assert result.success_count == 0
assert result.failure_count == 0
def test_batch_with_successes(self):
"""Test batch with successful operations."""
result = BatchResult()
result.successful.append(OperationResult("rec1", True))
result.successful.append(OperationResult("rec2", True))
assert result.all_succeeded is True
assert result.success_count == 2
assert result.failure_count == 0
def test_batch_with_failures(self):
"""Test batch with failed operations."""
result = BatchResult()
result.successful.append(OperationResult("rec1", True))
result.failed.append(OperationResult("rec2", False, error_message="Error"))
assert result.all_succeeded is False
assert result.success_count == 1
assert result.failure_count == 1
class TestCloudKitClient:
"""Tests for CloudKitClient."""
def test_not_configured_without_credentials(self):
"""Test that client reports not configured without credentials."""
with patch.dict("os.environ", {}, clear=True):
client = CloudKitClient()
assert client.is_configured is False
def test_configured_with_credentials(self):
"""Test that client reports configured with credentials."""
# Create a minimal mock for the private key
mock_key = MagicMock()
with patch.dict("os.environ", {
"CLOUDKIT_KEY_ID": "test_key_id",
"CLOUDKIT_PRIVATE_KEY": "-----BEGIN EC PRIVATE KEY-----\ntest\n-----END EC PRIVATE KEY-----",
}):
with patch("sportstime_parser.uploaders.cloudkit.serialization.load_pem_private_key") as mock_load:
mock_load.return_value = mock_key
client = CloudKitClient()
assert client.is_configured is True
def test_get_api_path(self):
"""Test API path construction."""
client = CloudKitClient(
container_id="iCloud.com.test.app",
environment="development",
)
path = client._get_api_path("records/query")
assert path == "/database/1/iCloud.com.test.app/development/public/records/query"
@patch("sportstime_parser.uploaders.cloudkit.requests.Session")
def test_fetch_records_query(self, mock_session_class):
"""Test fetching records with query."""
mock_session = MagicMock()
mock_session_class.return_value = mock_session
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.json.return_value = {
"records": [
{"recordName": "rec1", "recordType": "Game"},
{"recordName": "rec2", "recordType": "Game"},
]
}
mock_session.request.return_value = mock_response
# Setup client with mocked auth
mock_key = MagicMock()
mock_key.sign.return_value = b"signature"
with patch.dict("os.environ", {
"CLOUDKIT_KEY_ID": "test_key",
"CLOUDKIT_PRIVATE_KEY": "-----BEGIN EC PRIVATE KEY-----\ntest\n-----END EC PRIVATE KEY-----",
}):
with patch("sportstime_parser.uploaders.cloudkit.serialization.load_pem_private_key") as mock_load:
with patch("sportstime_parser.uploaders.cloudkit.jwt.encode") as mock_jwt:
mock_load.return_value = mock_key
mock_jwt.return_value = "test_token"
client = CloudKitClient()
records = client.fetch_records(RecordType.GAME)
assert len(records) == 2
assert records[0]["recordName"] == "rec1"
@patch("sportstime_parser.uploaders.cloudkit.requests.Session")
def test_save_records_success(self, mock_session_class):
"""Test saving records successfully."""
mock_session = MagicMock()
mock_session_class.return_value = mock_session
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.json.return_value = {
"records": [
{"recordName": "rec1", "recordChangeTag": "tag1"},
{"recordName": "rec2", "recordChangeTag": "tag2"},
]
}
mock_session.request.return_value = mock_response
mock_key = MagicMock()
mock_key.sign.return_value = b"signature"
with patch.dict("os.environ", {
"CLOUDKIT_KEY_ID": "test_key",
"CLOUDKIT_PRIVATE_KEY": "-----BEGIN EC PRIVATE KEY-----\ntest\n-----END EC PRIVATE KEY-----",
}):
with patch("sportstime_parser.uploaders.cloudkit.serialization.load_pem_private_key") as mock_load:
with patch("sportstime_parser.uploaders.cloudkit.jwt.encode") as mock_jwt:
mock_load.return_value = mock_key
mock_jwt.return_value = "test_token"
client = CloudKitClient()
records = [
CloudKitRecord("rec1", RecordType.GAME, {"sport": "nba"}),
CloudKitRecord("rec2", RecordType.GAME, {"sport": "nba"}),
]
result = client.save_records(records)
assert result.success_count == 2
assert result.failure_count == 0
@patch("sportstime_parser.uploaders.cloudkit.requests.Session")
def test_save_records_partial_failure(self, mock_session_class):
"""Test saving records with some failures."""
mock_session = MagicMock()
mock_session_class.return_value = mock_session
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.json.return_value = {
"records": [
{"recordName": "rec1", "recordChangeTag": "tag1"},
{"recordName": "rec2", "serverErrorCode": "QUOTA_EXCEEDED", "reason": "Quota exceeded"},
]
}
mock_session.request.return_value = mock_response
mock_key = MagicMock()
mock_key.sign.return_value = b"signature"
with patch.dict("os.environ", {
"CLOUDKIT_KEY_ID": "test_key",
"CLOUDKIT_PRIVATE_KEY": "-----BEGIN EC PRIVATE KEY-----\ntest\n-----END EC PRIVATE KEY-----",
}):
with patch("sportstime_parser.uploaders.cloudkit.serialization.load_pem_private_key") as mock_load:
with patch("sportstime_parser.uploaders.cloudkit.jwt.encode") as mock_jwt:
mock_load.return_value = mock_key
mock_jwt.return_value = "test_token"
client = CloudKitClient()
records = [
CloudKitRecord("rec1", RecordType.GAME, {"sport": "nba"}),
CloudKitRecord("rec2", RecordType.GAME, {"sport": "nba"}),
]
result = client.save_records(records)
assert result.success_count == 1
assert result.failure_count == 1
assert result.failed[0].error_code == "QUOTA_EXCEEDED"
@patch("sportstime_parser.uploaders.cloudkit.requests.Session")
def test_auth_error(self, mock_session_class):
"""Test handling authentication error."""
mock_session = MagicMock()
mock_session_class.return_value = mock_session
mock_response = MagicMock()
mock_response.status_code = 421
mock_session.request.return_value = mock_response
mock_key = MagicMock()
mock_key.sign.return_value = b"signature"
with patch.dict("os.environ", {
"CLOUDKIT_KEY_ID": "test_key",
"CLOUDKIT_PRIVATE_KEY": "-----BEGIN EC PRIVATE KEY-----\ntest\n-----END EC PRIVATE KEY-----",
}):
with patch("sportstime_parser.uploaders.cloudkit.serialization.load_pem_private_key") as mock_load:
with patch("sportstime_parser.uploaders.cloudkit.jwt.encode") as mock_jwt:
mock_load.return_value = mock_key
mock_jwt.return_value = "test_token"
client = CloudKitClient()
with pytest.raises(CloudKitAuthError):
client.fetch_records(RecordType.GAME)
@patch("sportstime_parser.uploaders.cloudkit.requests.Session")
def test_rate_limit_error(self, mock_session_class):
"""Test handling rate limit error."""
mock_session = MagicMock()
mock_session_class.return_value = mock_session
mock_response = MagicMock()
mock_response.status_code = 429
mock_session.request.return_value = mock_response
mock_key = MagicMock()
mock_key.sign.return_value = b"signature"
with patch.dict("os.environ", {
"CLOUDKIT_KEY_ID": "test_key",
"CLOUDKIT_PRIVATE_KEY": "-----BEGIN EC PRIVATE KEY-----\ntest\n-----END EC PRIVATE KEY-----",
}):
with patch("sportstime_parser.uploaders.cloudkit.serialization.load_pem_private_key") as mock_load:
with patch("sportstime_parser.uploaders.cloudkit.jwt.encode") as mock_jwt:
mock_load.return_value = mock_key
mock_jwt.return_value = "test_token"
client = CloudKitClient()
with pytest.raises(CloudKitRateLimitError):
client.fetch_records(RecordType.GAME)
@patch("sportstime_parser.uploaders.cloudkit.requests.Session")
def test_server_error(self, mock_session_class):
"""Test handling server error."""
mock_session = MagicMock()
mock_session_class.return_value = mock_session
mock_response = MagicMock()
mock_response.status_code = 503
mock_session.request.return_value = mock_response
mock_key = MagicMock()
mock_key.sign.return_value = b"signature"
with patch.dict("os.environ", {
"CLOUDKIT_KEY_ID": "test_key",
"CLOUDKIT_PRIVATE_KEY": "-----BEGIN EC PRIVATE KEY-----\ntest\n-----END EC PRIVATE KEY-----",
}):
with patch("sportstime_parser.uploaders.cloudkit.serialization.load_pem_private_key") as mock_load:
with patch("sportstime_parser.uploaders.cloudkit.jwt.encode") as mock_jwt:
mock_load.return_value = mock_key
mock_jwt.return_value = "test_token"
client = CloudKitClient()
with pytest.raises(CloudKitServerError):
client.fetch_records(RecordType.GAME)
class TestRecordType:
"""Tests for RecordType enum."""
def test_record_type_values(self):
"""Test that record type values match CloudKit schema."""
assert RecordType.GAME.value == "Game"
assert RecordType.TEAM.value == "Team"
assert RecordType.STADIUM.value == "Stadium"
assert RecordType.TEAM_ALIAS.value == "TeamAlias"
assert RecordType.STADIUM_ALIAS.value == "StadiumAlias"

View File

@@ -1,350 +0,0 @@
"""Tests for the record differ."""
import pytest
from datetime import datetime
from sportstime_parser.models.game import Game
from sportstime_parser.models.team import Team
from sportstime_parser.models.stadium import Stadium
from sportstime_parser.uploaders.diff import (
DiffAction,
RecordDiff,
DiffResult,
RecordDiffer,
game_to_cloudkit_record,
team_to_cloudkit_record,
stadium_to_cloudkit_record,
)
from sportstime_parser.uploaders.cloudkit import RecordType
class TestRecordDiff:
"""Tests for RecordDiff dataclass."""
def test_create_record_diff(self):
"""Test creating a RecordDiff."""
diff = RecordDiff(
record_name="nba_2025_hou_okc_1021",
record_type=RecordType.GAME,
action=DiffAction.CREATE,
)
assert diff.record_name == "nba_2025_hou_okc_1021"
assert diff.record_type == RecordType.GAME
assert diff.action == DiffAction.CREATE
class TestDiffResult:
"""Tests for DiffResult dataclass."""
def test_empty_result(self):
"""Test empty DiffResult."""
result = DiffResult()
assert result.create_count == 0
assert result.update_count == 0
assert result.delete_count == 0
assert result.unchanged_count == 0
assert result.total_changes == 0
def test_counts(self):
"""Test counting different change types."""
result = DiffResult()
result.creates.append(RecordDiff(
record_name="game_1",
record_type=RecordType.GAME,
action=DiffAction.CREATE,
))
result.creates.append(RecordDiff(
record_name="game_2",
record_type=RecordType.GAME,
action=DiffAction.CREATE,
))
result.updates.append(RecordDiff(
record_name="game_3",
record_type=RecordType.GAME,
action=DiffAction.UPDATE,
))
result.deletes.append(RecordDiff(
record_name="game_4",
record_type=RecordType.GAME,
action=DiffAction.DELETE,
))
result.unchanged.append(RecordDiff(
record_name="game_5",
record_type=RecordType.GAME,
action=DiffAction.UNCHANGED,
))
assert result.create_count == 2
assert result.update_count == 1
assert result.delete_count == 1
assert result.unchanged_count == 1
assert result.total_changes == 4 # excludes unchanged
class TestRecordDiffer:
"""Tests for RecordDiffer."""
@pytest.fixture
def differ(self):
"""Create a RecordDiffer instance."""
return RecordDiffer()
@pytest.fixture
def sample_game(self):
"""Create a sample Game."""
return Game(
id="nba_2025_hou_okc_1021",
sport="nba",
season=2025,
home_team_id="team_nba_okc",
away_team_id="team_nba_hou",
stadium_id="stadium_nba_paycom_center",
game_date=datetime(2025, 10, 21, 19, 0, 0),
status="scheduled",
)
@pytest.fixture
def sample_team(self):
"""Create a sample Team."""
return Team(
id="team_nba_okc",
sport="nba",
city="Oklahoma City",
name="Thunder",
full_name="Oklahoma City Thunder",
abbreviation="OKC",
conference="Western",
division="Northwest",
)
@pytest.fixture
def sample_stadium(self):
"""Create a sample Stadium."""
return Stadium(
id="stadium_nba_paycom_center",
sport="nba",
name="Paycom Center",
city="Oklahoma City",
state="OK",
country="USA",
latitude=35.4634,
longitude=-97.5151,
capacity=18203,
)
def test_diff_games_create(self, differ, sample_game):
"""Test detecting new games to create."""
local_games = [sample_game]
remote_records = []
result = differ.diff_games(local_games, remote_records)
assert result.create_count == 1
assert result.update_count == 0
assert result.delete_count == 0
assert result.creates[0].record_name == sample_game.id
def test_diff_games_delete(self, differ, sample_game):
"""Test detecting games to delete."""
local_games = []
remote_records = [
{
"recordName": sample_game.id,
"recordType": "Game",
"fields": {
"sport": {"value": "nba", "type": "STRING"},
"season": {"value": 2025, "type": "INT64"},
},
"recordChangeTag": "abc123",
}
]
result = differ.diff_games(local_games, remote_records)
assert result.create_count == 0
assert result.delete_count == 1
assert result.deletes[0].record_name == sample_game.id
def test_diff_games_unchanged(self, differ, sample_game):
"""Test detecting unchanged games."""
local_games = [sample_game]
remote_records = [
{
"recordName": sample_game.id,
"recordType": "Game",
"fields": {
"sport": {"value": "nba", "type": "STRING"},
"season": {"value": 2025, "type": "INT64"},
"home_team_id": {"value": "team_nba_okc", "type": "STRING"},
"away_team_id": {"value": "team_nba_hou", "type": "STRING"},
"stadium_id": {"value": "stadium_nba_paycom_center", "type": "STRING"},
"game_date": {"value": int(sample_game.game_date.timestamp() * 1000), "type": "TIMESTAMP"},
"game_number": {"value": None, "type": "INT64"},
"home_score": {"value": None, "type": "INT64"},
"away_score": {"value": None, "type": "INT64"},
"status": {"value": "scheduled", "type": "STRING"},
},
"recordChangeTag": "abc123",
}
]
result = differ.diff_games(local_games, remote_records)
assert result.create_count == 0
assert result.update_count == 0
assert result.unchanged_count == 1
def test_diff_games_update(self, differ, sample_game):
"""Test detecting games that need update."""
local_games = [sample_game]
# Remote has different status
remote_records = [
{
"recordName": sample_game.id,
"recordType": "Game",
"fields": {
"sport": {"value": "nba", "type": "STRING"},
"season": {"value": 2025, "type": "INT64"},
"home_team_id": {"value": "team_nba_okc", "type": "STRING"},
"away_team_id": {"value": "team_nba_hou", "type": "STRING"},
"stadium_id": {"value": "stadium_nba_paycom_center", "type": "STRING"},
"game_date": {"value": int(sample_game.game_date.timestamp() * 1000), "type": "TIMESTAMP"},
"game_number": {"value": None, "type": "INT64"},
"home_score": {"value": None, "type": "INT64"},
"away_score": {"value": None, "type": "INT64"},
"status": {"value": "postponed", "type": "STRING"}, # Different!
},
"recordChangeTag": "abc123",
}
]
result = differ.diff_games(local_games, remote_records)
assert result.update_count == 1
assert "status" in result.updates[0].changed_fields
assert result.updates[0].record_change_tag == "abc123"
def test_diff_teams_create(self, differ, sample_team):
"""Test detecting new teams to create."""
local_teams = [sample_team]
remote_records = []
result = differ.diff_teams(local_teams, remote_records)
assert result.create_count == 1
assert result.creates[0].record_name == sample_team.id
def test_diff_stadiums_create(self, differ, sample_stadium):
"""Test detecting new stadiums to create."""
local_stadiums = [sample_stadium]
remote_records = []
result = differ.diff_stadiums(local_stadiums, remote_records)
assert result.create_count == 1
assert result.creates[0].record_name == sample_stadium.id
def test_get_records_to_upload(self, differ, sample_game):
"""Test getting CloudKitRecords for upload."""
game2 = Game(
id="nba_2025_lal_lac_1022",
sport="nba",
season=2025,
home_team_id="team_nba_lac",
away_team_id="team_nba_lal",
stadium_id="stadium_nba_crypto_com",
game_date=datetime(2025, 10, 22, 19, 0, 0),
status="scheduled",
)
local_games = [sample_game, game2]
# Only game2 exists remotely with different status
remote_records = [
{
"recordName": game2.id,
"recordType": "Game",
"fields": {
"sport": {"value": "nba", "type": "STRING"},
"season": {"value": 2025, "type": "INT64"},
"home_team_id": {"value": "team_nba_lac", "type": "STRING"},
"away_team_id": {"value": "team_nba_lal", "type": "STRING"},
"stadium_id": {"value": "stadium_nba_crypto_com", "type": "STRING"},
"game_date": {"value": int(game2.game_date.timestamp() * 1000), "type": "TIMESTAMP"},
"status": {"value": "postponed", "type": "STRING"}, # Different!
},
"recordChangeTag": "xyz789",
}
]
result = differ.diff_games(local_games, remote_records)
records = result.get_records_to_upload()
assert len(records) == 2 # 1 create + 1 update
record_names = [r.record_name for r in records]
assert sample_game.id in record_names
assert game2.id in record_names
class TestConvenienceFunctions:
"""Tests for module-level convenience functions."""
def test_game_to_cloudkit_record(self):
"""Test converting Game to CloudKitRecord."""
game = Game(
id="nba_2025_hou_okc_1021",
sport="nba",
season=2025,
home_team_id="team_nba_okc",
away_team_id="team_nba_hou",
stadium_id="stadium_nba_paycom_center",
game_date=datetime(2025, 10, 21, 19, 0, 0),
status="scheduled",
)
record = game_to_cloudkit_record(game)
assert record.record_name == game.id
assert record.record_type == RecordType.GAME
assert record.fields["sport"] == "nba"
assert record.fields["season"] == 2025
def test_team_to_cloudkit_record(self):
"""Test converting Team to CloudKitRecord."""
team = Team(
id="team_nba_okc",
sport="nba",
city="Oklahoma City",
name="Thunder",
full_name="Oklahoma City Thunder",
abbreviation="OKC",
)
record = team_to_cloudkit_record(team)
assert record.record_name == team.id
assert record.record_type == RecordType.TEAM
assert record.fields["city"] == "Oklahoma City"
assert record.fields["name"] == "Thunder"
def test_stadium_to_cloudkit_record(self):
"""Test converting Stadium to CloudKitRecord."""
stadium = Stadium(
id="stadium_nba_paycom_center",
sport="nba",
name="Paycom Center",
city="Oklahoma City",
state="OK",
country="USA",
latitude=35.4634,
longitude=-97.5151,
)
record = stadium_to_cloudkit_record(stadium)
assert record.record_name == stadium.id
assert record.record_type == RecordType.STADIUM
assert record.fields["name"] == "Paycom Center"
assert record.fields["latitude"] == 35.4634

View File

@@ -1,472 +0,0 @@
"""Tests for the upload state manager."""
import json
import pytest
from datetime import datetime, timedelta
from pathlib import Path
from tempfile import TemporaryDirectory
from sportstime_parser.uploaders.state import (
RecordState,
UploadSession,
StateManager,
)
class TestRecordState:
"""Tests for RecordState dataclass."""
def test_create_record_state(self):
"""Test creating a RecordState with default values."""
state = RecordState(
record_name="nba_2025_hou_okc_1021",
record_type="Game",
)
assert state.record_name == "nba_2025_hou_okc_1021"
assert state.record_type == "Game"
assert state.status == "pending"
assert state.uploaded_at is None
assert state.record_change_tag is None
assert state.error_message is None
assert state.retry_count == 0
def test_record_state_to_dict(self):
"""Test serializing RecordState to dictionary."""
now = datetime.utcnow()
state = RecordState(
record_name="nba_2025_hou_okc_1021",
record_type="Game",
uploaded_at=now,
record_change_tag="abc123",
status="uploaded",
)
data = state.to_dict()
assert data["record_name"] == "nba_2025_hou_okc_1021"
assert data["record_type"] == "Game"
assert data["status"] == "uploaded"
assert data["uploaded_at"] == now.isoformat()
assert data["record_change_tag"] == "abc123"
def test_record_state_from_dict(self):
"""Test deserializing RecordState from dictionary."""
data = {
"record_name": "nba_2025_hou_okc_1021",
"record_type": "Game",
"uploaded_at": "2026-01-10T12:00:00",
"record_change_tag": "abc123",
"status": "uploaded",
"error_message": None,
"retry_count": 0,
}
state = RecordState.from_dict(data)
assert state.record_name == "nba_2025_hou_okc_1021"
assert state.record_type == "Game"
assert state.status == "uploaded"
assert state.uploaded_at == datetime.fromisoformat("2026-01-10T12:00:00")
assert state.record_change_tag == "abc123"
class TestUploadSession:
"""Tests for UploadSession dataclass."""
def test_create_upload_session(self):
"""Test creating an UploadSession."""
session = UploadSession(
sport="nba",
season=2025,
environment="development",
)
assert session.sport == "nba"
assert session.season == 2025
assert session.environment == "development"
assert session.total_count == 0
assert len(session.records) == 0
def test_add_record(self):
"""Test adding records to a session."""
session = UploadSession(
sport="nba",
season=2025,
environment="development",
)
session.add_record("game_1", "Game")
session.add_record("game_2", "Game")
session.add_record("team_1", "Team")
assert session.total_count == 3
assert len(session.records) == 3
assert "game_1" in session.records
assert session.records["game_1"].record_type == "Game"
def test_mark_uploaded(self):
"""Test marking a record as uploaded."""
session = UploadSession(
sport="nba",
season=2025,
environment="development",
)
session.add_record("game_1", "Game")
session.mark_uploaded("game_1", "change_tag_123")
assert session.records["game_1"].status == "uploaded"
assert session.records["game_1"].record_change_tag == "change_tag_123"
assert session.records["game_1"].uploaded_at is not None
def test_mark_failed(self):
"""Test marking a record as failed."""
session = UploadSession(
sport="nba",
season=2025,
environment="development",
)
session.add_record("game_1", "Game")
session.mark_failed("game_1", "Server error")
assert session.records["game_1"].status == "failed"
assert session.records["game_1"].error_message == "Server error"
assert session.records["game_1"].retry_count == 1
def test_mark_failed_increments_retry_count(self):
"""Test that marking failed increments retry count."""
session = UploadSession(
sport="nba",
season=2025,
environment="development",
)
session.add_record("game_1", "Game")
session.mark_failed("game_1", "Error 1")
session.mark_failed("game_1", "Error 2")
session.mark_failed("game_1", "Error 3")
assert session.records["game_1"].retry_count == 3
def test_counts(self):
"""Test session counts."""
session = UploadSession(
sport="nba",
season=2025,
environment="development",
)
session.add_record("game_1", "Game")
session.add_record("game_2", "Game")
session.add_record("game_3", "Game")
session.mark_uploaded("game_1")
session.mark_failed("game_2", "Error")
assert session.uploaded_count == 1
assert session.failed_count == 1
assert session.pending_count == 1
def test_is_complete(self):
"""Test is_complete property."""
session = UploadSession(
sport="nba",
season=2025,
environment="development",
)
session.add_record("game_1", "Game")
session.add_record("game_2", "Game")
assert not session.is_complete
session.mark_uploaded("game_1")
assert not session.is_complete
session.mark_uploaded("game_2")
assert session.is_complete
def test_progress_percent(self):
"""Test progress percentage calculation."""
session = UploadSession(
sport="nba",
season=2025,
environment="development",
)
session.add_record("game_1", "Game")
session.add_record("game_2", "Game")
session.add_record("game_3", "Game")
session.add_record("game_4", "Game")
session.mark_uploaded("game_1")
assert session.progress_percent == 25.0
def test_get_pending_records(self):
"""Test getting pending record names."""
session = UploadSession(
sport="nba",
season=2025,
environment="development",
)
session.add_record("game_1", "Game")
session.add_record("game_2", "Game")
session.add_record("game_3", "Game")
session.mark_uploaded("game_1")
session.mark_failed("game_2", "Error")
pending = session.get_pending_records()
assert pending == ["game_3"]
def test_get_failed_records(self):
"""Test getting failed record names."""
session = UploadSession(
sport="nba",
season=2025,
environment="development",
)
session.add_record("game_1", "Game")
session.add_record("game_2", "Game")
session.add_record("game_3", "Game")
session.mark_failed("game_1", "Error 1")
session.mark_failed("game_3", "Error 3")
failed = session.get_failed_records()
assert set(failed) == {"game_1", "game_3"}
def test_get_retryable_records(self):
"""Test getting records eligible for retry."""
session = UploadSession(
sport="nba",
season=2025,
environment="development",
)
session.add_record("game_1", "Game")
session.add_record("game_2", "Game")
session.add_record("game_3", "Game")
# Fail game_1 once
session.mark_failed("game_1", "Error")
# Fail game_2 three times (max retries)
session.mark_failed("game_2", "Error")
session.mark_failed("game_2", "Error")
session.mark_failed("game_2", "Error")
retryable = session.get_retryable_records(max_retries=3)
assert retryable == ["game_1"]
def test_to_dict_and_from_dict(self):
"""Test round-trip serialization."""
session = UploadSession(
sport="nba",
season=2025,
environment="development",
)
session.add_record("game_1", "Game")
session.add_record("game_2", "Game")
session.mark_uploaded("game_1", "tag_123")
data = session.to_dict()
restored = UploadSession.from_dict(data)
assert restored.sport == session.sport
assert restored.season == session.season
assert restored.environment == session.environment
assert restored.total_count == session.total_count
assert restored.uploaded_count == session.uploaded_count
assert restored.records["game_1"].status == "uploaded"
class TestStateManager:
"""Tests for StateManager."""
def test_create_session(self):
"""Test creating a new session."""
with TemporaryDirectory() as tmpdir:
manager = StateManager(state_dir=Path(tmpdir))
session = manager.create_session(
sport="nba",
season=2025,
environment="development",
record_names=[
("game_1", "Game"),
("game_2", "Game"),
("team_1", "Team"),
],
)
assert session.sport == "nba"
assert session.season == 2025
assert session.total_count == 3
# Check file was created
state_file = Path(tmpdir) / "upload_state_nba_2025_development.json"
assert state_file.exists()
def test_load_session(self):
"""Test loading an existing session."""
with TemporaryDirectory() as tmpdir:
manager = StateManager(state_dir=Path(tmpdir))
# Create and save a session
original = manager.create_session(
sport="nba",
season=2025,
environment="development",
record_names=[("game_1", "Game")],
)
original.mark_uploaded("game_1", "tag_123")
manager.save_session(original)
# Load it back
loaded = manager.load_session("nba", 2025, "development")
assert loaded is not None
assert loaded.sport == "nba"
assert loaded.records["game_1"].status == "uploaded"
def test_load_nonexistent_session(self):
"""Test loading a session that doesn't exist."""
with TemporaryDirectory() as tmpdir:
manager = StateManager(state_dir=Path(tmpdir))
session = manager.load_session("nba", 2025, "development")
assert session is None
def test_delete_session(self):
"""Test deleting a session."""
with TemporaryDirectory() as tmpdir:
manager = StateManager(state_dir=Path(tmpdir))
# Create a session
manager.create_session(
sport="nba",
season=2025,
environment="development",
record_names=[("game_1", "Game")],
)
# Delete it
result = manager.delete_session("nba", 2025, "development")
assert result is True
# Verify it's gone
loaded = manager.load_session("nba", 2025, "development")
assert loaded is None
def test_delete_nonexistent_session(self):
"""Test deleting a session that doesn't exist."""
with TemporaryDirectory() as tmpdir:
manager = StateManager(state_dir=Path(tmpdir))
result = manager.delete_session("nba", 2025, "development")
assert result is False
def test_list_sessions(self):
"""Test listing all sessions."""
with TemporaryDirectory() as tmpdir:
manager = StateManager(state_dir=Path(tmpdir))
# Create multiple sessions
manager.create_session(
sport="nba",
season=2025,
environment="development",
record_names=[("game_1", "Game")],
)
manager.create_session(
sport="mlb",
season=2026,
environment="production",
record_names=[("game_2", "Game"), ("game_3", "Game")],
)
sessions = manager.list_sessions()
assert len(sessions) == 2
sports = {s["sport"] for s in sessions}
assert sports == {"nba", "mlb"}
def test_get_session_or_create_new(self):
"""Test getting a session when none exists."""
with TemporaryDirectory() as tmpdir:
manager = StateManager(state_dir=Path(tmpdir))
session = manager.get_session_or_create(
sport="nba",
season=2025,
environment="development",
record_names=[("game_1", "Game")],
resume=False,
)
assert session.sport == "nba"
assert session.total_count == 1
def test_get_session_or_create_resume(self):
"""Test resuming an existing session."""
with TemporaryDirectory() as tmpdir:
manager = StateManager(state_dir=Path(tmpdir))
# Create initial session
original = manager.create_session(
sport="nba",
season=2025,
environment="development",
record_names=[("game_1", "Game"), ("game_2", "Game")],
)
original.mark_uploaded("game_1", "tag_123")
manager.save_session(original)
# Resume with additional records
session = manager.get_session_or_create(
sport="nba",
season=2025,
environment="development",
record_names=[("game_1", "Game"), ("game_2", "Game"), ("game_3", "Game")],
resume=True,
)
# Should have original progress plus new record
assert session.records["game_1"].status == "uploaded"
assert "game_3" in session.records
assert session.total_count == 3
def test_get_session_or_create_overwrite(self):
"""Test overwriting an existing session when not resuming."""
with TemporaryDirectory() as tmpdir:
manager = StateManager(state_dir=Path(tmpdir))
# Create initial session
original = manager.create_session(
sport="nba",
season=2025,
environment="development",
record_names=[("game_1", "Game"), ("game_2", "Game")],
)
original.mark_uploaded("game_1", "tag_123")
manager.save_session(original)
# Create new session (not resuming)
session = manager.get_session_or_create(
sport="nba",
season=2025,
environment="development",
record_names=[("game_3", "Game")],
resume=False,
)
# Should be a fresh session
assert session.total_count == 1
assert "game_1" not in session.records
assert "game_3" in session.records

View File

@@ -1,52 +0,0 @@
"""CloudKit uploaders for sportstime-parser."""
from .cloudkit import (
CloudKitClient,
CloudKitRecord,
CloudKitError,
CloudKitAuthError,
CloudKitRateLimitError,
CloudKitServerError,
RecordType,
OperationResult,
BatchResult,
)
from .state import (
RecordState,
UploadSession,
StateManager,
)
from .diff import (
DiffAction,
RecordDiff,
DiffResult,
RecordDiffer,
game_to_cloudkit_record,
team_to_cloudkit_record,
stadium_to_cloudkit_record,
)
__all__ = [
# CloudKit client
"CloudKitClient",
"CloudKitRecord",
"CloudKitError",
"CloudKitAuthError",
"CloudKitRateLimitError",
"CloudKitServerError",
"RecordType",
"OperationResult",
"BatchResult",
# State manager
"RecordState",
"UploadSession",
"StateManager",
# Differ
"DiffAction",
"RecordDiff",
"DiffResult",
"RecordDiffer",
"game_to_cloudkit_record",
"team_to_cloudkit_record",
"stadium_to_cloudkit_record",
]

View File

@@ -1,578 +0,0 @@
"""CloudKit Web Services client for sportstime-parser.
This module provides a client for uploading data to CloudKit using the
CloudKit Web Services API. It handles JWT authentication, request signing,
and batch operations.
Reference: https://developer.apple.com/documentation/cloudkitwebservices
"""
import base64
import hashlib
import json
import os
import time
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
from typing import Any, Optional
from enum import Enum
import jwt
import requests
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric import ec
from cryptography.hazmat.backends import default_backend
from ..config import (
CLOUDKIT_CONTAINER_ID,
CLOUDKIT_ENVIRONMENT,
CLOUDKIT_BATCH_SIZE,
CLOUDKIT_KEY_ID,
CLOUDKIT_PRIVATE_KEY_PATH,
)
from ..utils.logging import get_logger
class RecordType(str, Enum):
"""CloudKit record types for SportsTime.
Must match CKRecordType constants in CKModels.swift.
"""
GAME = "Game"
TEAM = "Team"
STADIUM = "Stadium"
TEAM_ALIAS = "TeamAlias"
STADIUM_ALIAS = "StadiumAlias"
SPORT = "Sport"
LEAGUE_STRUCTURE = "LeagueStructure"
TRIP_POLL = "TripPoll"
POLL_VOTE = "PollVote"
ITINERARY_ITEM = "ItineraryItem"
@dataclass
class CloudKitRecord:
"""Represents a CloudKit record for upload.
Attributes:
record_name: Unique record identifier (canonical ID)
record_type: CloudKit record type
fields: Dictionary of field name -> field value
record_change_tag: Version tag for conflict detection (None for new records)
"""
record_name: str
record_type: RecordType
fields: dict[str, Any]
record_change_tag: Optional[str] = None
def to_cloudkit_dict(self) -> dict:
"""Convert to CloudKit API format."""
record = {
"recordName": self.record_name,
"recordType": self.record_type.value,
"fields": self._format_fields(),
}
if self.record_change_tag:
record["recordChangeTag"] = self.record_change_tag
return record
def _format_fields(self) -> dict:
"""Format fields for CloudKit API."""
formatted = {}
for key, value in self.fields.items():
if value is None:
continue
formatted[key] = self._format_field_value(value)
return formatted
def _format_field_value(self, value: Any) -> dict:
"""Format a single field value for CloudKit API."""
# Check bool BEFORE int (bool is a subclass of int in Python)
if isinstance(value, bool):
return {"value": 1 if value else 0, "type": "INT64"}
elif isinstance(value, str):
return {"value": value, "type": "STRING"}
elif isinstance(value, int):
return {"value": value, "type": "INT64"}
elif isinstance(value, float):
return {"value": value, "type": "DOUBLE"}
elif isinstance(value, datetime):
# CloudKit expects milliseconds since epoch
timestamp_ms = int(value.timestamp() * 1000)
return {"value": timestamp_ms, "type": "TIMESTAMP"}
elif isinstance(value, list):
return {"value": value, "type": "STRING_LIST"}
elif isinstance(value, dict) and "latitude" in value and "longitude" in value:
return {
"value": {
"latitude": value["latitude"],
"longitude": value["longitude"],
},
"type": "LOCATION",
}
else:
# Default to string
return {"value": str(value), "type": "STRING"}
@dataclass
class OperationResult:
"""Result of a CloudKit operation."""
record_name: str
success: bool
record_change_tag: Optional[str] = None
error_code: Optional[str] = None
error_message: Optional[str] = None
@dataclass
class BatchResult:
"""Result of a batch CloudKit operation."""
successful: list[OperationResult] = field(default_factory=list)
failed: list[OperationResult] = field(default_factory=list)
@property
def all_succeeded(self) -> bool:
return len(self.failed) == 0
@property
def success_count(self) -> int:
return len(self.successful)
@property
def failure_count(self) -> int:
return len(self.failed)
class CloudKitClient:
"""Client for CloudKit Web Services API.
Handles authentication via server-to-server JWT tokens and provides
methods for CRUD operations on CloudKit records.
Authentication requires:
- Key ID: CloudKit key identifier from Apple Developer Portal
- Private Key: EC private key in PEM format
Environment variables:
- CLOUDKIT_KEY_ID: The key identifier
- CLOUDKIT_PRIVATE_KEY_PATH: Path to the private key file
- CLOUDKIT_PRIVATE_KEY: The private key contents (alternative to path)
"""
BASE_URL = "https://api.apple-cloudkit.com"
TOKEN_EXPIRY_SECONDS = 3600 # 1 hour
def __init__(
self,
container_id: str = CLOUDKIT_CONTAINER_ID,
environment: str = CLOUDKIT_ENVIRONMENT,
key_id: Optional[str] = None,
private_key: Optional[str] = None,
private_key_path: Optional[str] = None,
):
"""Initialize the CloudKit client.
Args:
container_id: CloudKit container identifier
environment: 'development' or 'production'
key_id: CloudKit server-to-server key ID
private_key: PEM-encoded EC private key contents
private_key_path: Path to PEM-encoded EC private key file
"""
self.container_id = container_id
self.environment = environment
self.logger = get_logger()
# Load authentication credentials (config defaults > env vars > None)
self.key_id = key_id or os.environ.get("CLOUDKIT_KEY_ID") or CLOUDKIT_KEY_ID
if private_key:
self._private_key_pem = private_key
elif private_key_path:
self._private_key_pem = Path(private_key_path).read_text()
elif os.environ.get("CLOUDKIT_PRIVATE_KEY"):
self._private_key_pem = os.environ["CLOUDKIT_PRIVATE_KEY"]
elif os.environ.get("CLOUDKIT_PRIVATE_KEY_PATH"):
self._private_key_pem = Path(os.environ["CLOUDKIT_PRIVATE_KEY_PATH"]).read_text()
elif CLOUDKIT_PRIVATE_KEY_PATH.exists():
self._private_key_pem = CLOUDKIT_PRIVATE_KEY_PATH.read_text()
else:
self._private_key_pem = None
# Parse the private key if available
self._private_key = None
if self._private_key_pem:
self._private_key = serialization.load_pem_private_key(
self._private_key_pem.encode(),
password=None,
backend=default_backend(),
)
# Token cache
self._token: Optional[str] = None
self._token_expiry: float = 0
# Session for connection pooling
self._session = requests.Session()
@property
def is_configured(self) -> bool:
"""Check if the client has valid authentication credentials."""
return bool(self.key_id and self._private_key)
def _get_api_path(self, operation: str) -> str:
"""Build the full API path for an operation."""
return f"/database/1/{self.container_id}/{self.environment}/public/{operation}"
def _get_token(self) -> str:
"""Get a valid JWT token, generating a new one if needed."""
if not self.is_configured:
raise ValueError(
"CloudKit client not configured. Set CLOUDKIT_KEY_ID and "
"CLOUDKIT_PRIVATE_KEY_PATH environment variables."
)
now = time.time()
# Return cached token if still valid (with 5 min buffer)
if self._token and (self._token_expiry - now) > 300:
return self._token
# Generate new token
expiry = now + self.TOKEN_EXPIRY_SECONDS
payload = {
"iss": self.key_id,
"iat": int(now),
"exp": int(expiry),
"sub": self.container_id,
}
self._token = jwt.encode(
payload,
self._private_key,
algorithm="ES256",
)
self._token_expiry = expiry
return self._token
def _sign_request(self, method: str, path: str, body: Optional[bytes] = None) -> dict:
"""Generate request headers with authentication.
Args:
method: HTTP method
path: API path
body: Request body bytes
Returns:
Dictionary of headers to include in the request
"""
token = self._get_token()
# CloudKit uses date in ISO format
date_str = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ")
# Calculate body hash
if body:
body_hash = base64.b64encode(
hashlib.sha256(body).digest()
).decode()
else:
body_hash = base64.b64encode(
hashlib.sha256(b"").digest()
).decode()
# Build the message to sign
# Format: date:body_hash:path
message = f"{date_str}:{body_hash}:{path}"
# Sign the message
signature = self._private_key.sign(
message.encode(),
ec.ECDSA(hashes.SHA256()),
)
signature_b64 = base64.b64encode(signature).decode()
return {
"Authorization": f"Bearer {token}",
"X-Apple-CloudKit-Request-KeyID": self.key_id,
"X-Apple-CloudKit-Request-ISO8601Date": date_str,
"X-Apple-CloudKit-Request-SignatureV1": signature_b64,
"Content-Type": "application/json",
}
def _request(
self,
method: str,
operation: str,
body: Optional[dict] = None,
) -> dict:
"""Make a request to the CloudKit API.
Args:
method: HTTP method
operation: API operation path
body: Request body as dictionary
Returns:
Response data as dictionary
Raises:
CloudKitError: If the request fails
"""
path = self._get_api_path(operation)
url = f"{self.BASE_URL}{path}"
body_bytes = json.dumps(body).encode() if body else None
headers = self._sign_request(method, path, body_bytes)
response = self._session.request(
method=method,
url=url,
headers=headers,
data=body_bytes,
)
if response.status_code == 200:
return response.json()
elif response.status_code == 421:
# Authentication required - token may be expired
self._token = None
raise CloudKitAuthError("Authentication failed - check credentials")
elif response.status_code == 429:
raise CloudKitRateLimitError("Rate limit exceeded")
elif response.status_code >= 500:
raise CloudKitServerError(f"Server error: {response.status_code}")
else:
try:
error_data = response.json()
error_msg = error_data.get("serverErrorCode", str(response.status_code))
except (json.JSONDecodeError, KeyError):
error_msg = response.text
raise CloudKitError(f"Request failed: {error_msg}")
def fetch_records(
self,
record_type: RecordType,
record_names: Optional[list[str]] = None,
limit: int = 200,
) -> list[dict]:
"""Fetch records from CloudKit.
Args:
record_type: Type of records to fetch
record_names: Specific record names to fetch (optional)
limit: Maximum records to return (default 200)
Returns:
List of record dictionaries
"""
if record_names:
# Fetch specific records by name
body = {
"records": [{"recordName": name} for name in record_names],
}
response = self._request("POST", "records/lookup", body)
else:
# Query all records of type
body = {
"query": {
"recordType": record_type.value,
},
"resultsLimit": limit,
}
response = self._request("POST", "records/query", body)
records = response.get("records", [])
return [r for r in records if "recordName" in r]
def fetch_all_records(self, record_type: RecordType) -> list[dict]:
"""Fetch all records of a type using pagination.
Args:
record_type: Type of records to fetch
Returns:
List of all record dictionaries
"""
all_records = []
continuation_marker = None
while True:
body = {
"query": {
"recordType": record_type.value,
},
"resultsLimit": 200,
}
if continuation_marker:
body["continuationMarker"] = continuation_marker
response = self._request("POST", "records/query", body)
records = response.get("records", [])
all_records.extend([r for r in records if "recordName" in r])
continuation_marker = response.get("continuationMarker")
if not continuation_marker:
break
return all_records
def save_records(self, records: list[CloudKitRecord]) -> BatchResult:
"""Save records to CloudKit (create or update).
Args:
records: List of records to save
Returns:
BatchResult with success/failure details
"""
result = BatchResult()
# Process in batches
for i in range(0, len(records), CLOUDKIT_BATCH_SIZE):
batch = records[i:i + CLOUDKIT_BATCH_SIZE]
batch_result = self._save_batch(batch)
result.successful.extend(batch_result.successful)
result.failed.extend(batch_result.failed)
return result
def _save_batch(self, records: list[CloudKitRecord]) -> BatchResult:
"""Save a single batch of records.
Args:
records: List of records (max CLOUDKIT_BATCH_SIZE)
Returns:
BatchResult with success/failure details
"""
result = BatchResult()
operations = []
for record in records:
op = {
"operationType": "forceReplace",
"record": record.to_cloudkit_dict(),
}
operations.append(op)
body = {"operations": operations}
try:
response = self._request("POST", "records/modify", body)
except CloudKitError as e:
# Entire batch failed
for record in records:
result.failed.append(OperationResult(
record_name=record.record_name,
success=False,
error_message=str(e),
))
return result
# Process individual results
for record_data in response.get("records", []):
record_name = record_data.get("recordName", "unknown")
if "serverErrorCode" in record_data:
result.failed.append(OperationResult(
record_name=record_name,
success=False,
error_code=record_data.get("serverErrorCode"),
error_message=record_data.get("reason"),
))
else:
result.successful.append(OperationResult(
record_name=record_name,
success=True,
record_change_tag=record_data.get("recordChangeTag"),
))
return result
def delete_records(
self,
record_type: RecordType,
records: list[dict],
) -> BatchResult:
"""Delete records from CloudKit.
Args:
record_type: Type of records to delete
records: List of record dicts (must have recordName and recordChangeTag)
Returns:
BatchResult with success/failure details
"""
result = BatchResult()
# Process in batches
for i in range(0, len(records), CLOUDKIT_BATCH_SIZE):
batch = records[i:i + CLOUDKIT_BATCH_SIZE]
operations = []
for record in batch:
operations.append({
"operationType": "delete",
"record": {
"recordName": record["recordName"],
"recordChangeTag": record.get("recordChangeTag"),
},
})
body = {"operations": operations}
try:
response = self._request("POST", "records/modify", body)
except CloudKitError as e:
for record in batch:
result.failed.append(OperationResult(
record_name=record["recordName"],
success=False,
error_message=str(e),
))
continue
for record_data in response.get("records", []):
record_name = record_data.get("recordName", "unknown")
if "serverErrorCode" in record_data:
result.failed.append(OperationResult(
record_name=record_name,
success=False,
error_code=record_data.get("serverErrorCode"),
error_message=record_data.get("reason"),
))
else:
result.successful.append(OperationResult(
record_name=record_name,
success=True,
))
return result
class CloudKitError(Exception):
"""Base exception for CloudKit errors."""
pass
class CloudKitAuthError(CloudKitError):
"""Authentication error."""
pass
class CloudKitRateLimitError(CloudKitError):
"""Rate limit exceeded."""
pass
class CloudKitServerError(CloudKitError):
"""Server-side error."""
pass

View File

@@ -1,743 +0,0 @@
"""Record differ for CloudKit uploads.
This module compares local records with CloudKit records to determine
what needs to be created, updated, or deleted.
Field names must match CKModels.swift exactly:
- Stadium: stadiumId, canonicalId, name, city, state, location (CLLocation),
capacity, yearOpened, imageURL, sport
- Team: teamId, canonicalId, name, abbreviation, sport, city, stadiumCanonicalId,
logoURL, primaryColor, secondaryColor
- Game: gameId, canonicalId, homeTeamCanonicalId, awayTeamCanonicalId,
stadiumCanonicalId, dateTime, sport, season, isPlayoff, broadcastInfo
- TeamAlias: aliasId, teamCanonicalId, aliasType, aliasValue, validFrom, validUntil
- StadiumAlias: aliasName, stadiumCanonicalId, validFrom, validUntil
- Sport: sportId, abbreviation, displayName, iconName, colorHex,
seasonStartMonth, seasonEndMonth, isActive
- LeagueStructure: structureId, sport, type, name, abbreviation, parentId, displayOrder
"""
from dataclasses import dataclass, field
from datetime import datetime, date
from enum import Enum
from typing import Any, Optional
from ..models.game import Game
from ..models.team import Team
from ..models.stadium import Stadium
from ..models.aliases import TeamAlias, StadiumAlias, AliasType
from ..models.sport import Sport, LeagueStructure
from .cloudkit import CloudKitRecord, RecordType
def _date_to_datetime(d: Optional[date]) -> Optional[datetime]:
"""Convert a date to a datetime at midnight UTC.
CloudKit TIMESTAMP fields require datetime, not date.
"""
if d is None:
return None
return datetime(d.year, d.month, d.day, 0, 0, 0)
class DiffAction(str, Enum):
"""Action to take for a record."""
CREATE = "create"
UPDATE = "update"
DELETE = "delete"
UNCHANGED = "unchanged"
@dataclass
class RecordDiff:
"""Represents the difference between local and remote records.
Attributes:
record_name: Canonical record ID
record_type: CloudKit record type
action: Action to take (create, update, delete, unchanged)
local_record: Local CloudKitRecord (None if delete)
remote_record: Remote record dict (None if create)
changed_fields: List of field names that changed (for update)
record_change_tag: Remote record's change tag (for update)
"""
record_name: str
record_type: RecordType
action: DiffAction
local_record: Optional[CloudKitRecord] = None
remote_record: Optional[dict] = None
changed_fields: list[str] = field(default_factory=list)
record_change_tag: Optional[str] = None
@dataclass
class DiffResult:
"""Result of diffing local and remote records.
Attributes:
creates: Records to create
updates: Records to update
deletes: Records to delete (record names)
unchanged: Records with no changes
"""
creates: list[RecordDiff] = field(default_factory=list)
updates: list[RecordDiff] = field(default_factory=list)
deletes: list[RecordDiff] = field(default_factory=list)
unchanged: list[RecordDiff] = field(default_factory=list)
@property
def create_count(self) -> int:
return len(self.creates)
@property
def update_count(self) -> int:
return len(self.updates)
@property
def delete_count(self) -> int:
return len(self.deletes)
@property
def unchanged_count(self) -> int:
return len(self.unchanged)
@property
def total_changes(self) -> int:
return self.create_count + self.update_count + self.delete_count
def get_records_to_upload(self) -> list[CloudKitRecord]:
"""Get all records that need to be uploaded (creates + updates)."""
records = []
for diff in self.creates:
if diff.local_record:
records.append(diff.local_record)
for diff in self.updates:
if diff.local_record:
# Add change tag for update
diff.local_record.record_change_tag = diff.record_change_tag
records.append(diff.local_record)
return records
class RecordDiffer:
"""Compares local records with CloudKit records.
Field names must match CKModels.swift field keys exactly (camelCase).
"""
# Fields to compare for each record type (matching CKModels.swift keys)
GAME_FIELDS = [
"gameId", "canonicalId", "sport", "season", "dateTime",
"homeTeamCanonicalId", "awayTeamCanonicalId", "stadiumCanonicalId",
"isPlayoff", "broadcastInfo",
]
TEAM_FIELDS = [
"teamId", "canonicalId", "sport", "city", "name", "abbreviation",
"stadiumCanonicalId", "logoURL", "primaryColor", "secondaryColor",
]
STADIUM_FIELDS = [
"stadiumId", "canonicalId", "sport", "name", "city", "state",
"location", "capacity", "yearOpened", "imageURL",
]
TEAM_ALIAS_FIELDS = [
"aliasId", "teamCanonicalId", "aliasType", "aliasValue",
"validFrom", "validUntil",
]
STADIUM_ALIAS_FIELDS = [
"aliasName", "stadiumCanonicalId", "validFrom", "validUntil",
]
SPORT_FIELDS = [
"sportId", "abbreviation", "displayName", "iconName",
"colorHex", "seasonStartMonth", "seasonEndMonth", "isActive",
]
LEAGUE_STRUCTURE_FIELDS = [
"structureId", "sport", "type", "name", "abbreviation",
"parentId", "displayOrder",
]
def diff_games(
self,
local_games: list[Game],
remote_records: list[dict],
) -> DiffResult:
"""Diff local games against remote CloudKit records.
Args:
local_games: List of local Game objects
remote_records: List of remote record dictionaries
Returns:
DiffResult with creates, updates, deletes
"""
local_records = [self._game_to_record(g) for g in local_games]
return self._diff_records(
local_records,
remote_records,
RecordType.GAME,
self.GAME_FIELDS,
)
def diff_teams(
self,
local_teams: list[Team],
remote_records: list[dict],
) -> DiffResult:
"""Diff local teams against remote CloudKit records.
Args:
local_teams: List of local Team objects
remote_records: List of remote record dictionaries
Returns:
DiffResult with creates, updates, deletes
"""
local_records = [self._team_to_record(t) for t in local_teams]
return self._diff_records(
local_records,
remote_records,
RecordType.TEAM,
self.TEAM_FIELDS,
)
def diff_stadiums(
self,
local_stadiums: list[Stadium],
remote_records: list[dict],
) -> DiffResult:
"""Diff local stadiums against remote CloudKit records.
Args:
local_stadiums: List of local Stadium objects
remote_records: List of remote record dictionaries
Returns:
DiffResult with creates, updates, deletes
"""
local_records = [self._stadium_to_record(s) for s in local_stadiums]
return self._diff_records(
local_records,
remote_records,
RecordType.STADIUM,
self.STADIUM_FIELDS,
)
def diff_team_aliases(
self,
local_aliases: list[TeamAlias],
remote_records: list[dict],
) -> DiffResult:
"""Diff local team aliases against remote CloudKit records.
Args:
local_aliases: List of local TeamAlias objects
remote_records: List of remote record dictionaries
Returns:
DiffResult with creates, updates, deletes
"""
local_records = [self._team_alias_to_record(a) for a in local_aliases]
return self._diff_records(
local_records,
remote_records,
RecordType.TEAM_ALIAS,
self.TEAM_ALIAS_FIELDS,
)
def diff_stadium_aliases(
self,
local_aliases: list[StadiumAlias],
remote_records: list[dict],
) -> DiffResult:
"""Diff local stadium aliases against remote CloudKit records.
Args:
local_aliases: List of local StadiumAlias objects
remote_records: List of remote record dictionaries
Returns:
DiffResult with creates, updates, deletes
"""
local_records = [self._stadium_alias_to_record(a) for a in local_aliases]
return self._diff_records(
local_records,
remote_records,
RecordType.STADIUM_ALIAS,
self.STADIUM_ALIAS_FIELDS,
)
def diff_sports(
self,
local_sports: list[Sport],
remote_records: list[dict],
) -> DiffResult:
"""Diff local sports against remote CloudKit records.
Args:
local_sports: List of local Sport objects
remote_records: List of remote record dictionaries
Returns:
DiffResult with creates, updates, deletes
"""
local_records = [self._sport_to_record(s) for s in local_sports]
return self._diff_records(
local_records,
remote_records,
RecordType.SPORT,
self.SPORT_FIELDS,
)
def diff_league_structures(
self,
local_structures: list[LeagueStructure],
remote_records: list[dict],
) -> DiffResult:
"""Diff local league structures against remote CloudKit records.
Args:
local_structures: List of local LeagueStructure objects
remote_records: List of remote record dictionaries
Returns:
DiffResult with creates, updates, deletes
"""
local_records = [self._league_structure_to_record(s) for s in local_structures]
return self._diff_records(
local_records,
remote_records,
RecordType.LEAGUE_STRUCTURE,
self.LEAGUE_STRUCTURE_FIELDS,
)
def _diff_records(
self,
local_records: list[CloudKitRecord],
remote_records: list[dict],
record_type: RecordType,
compare_fields: list[str],
) -> DiffResult:
"""Compare local and remote records.
Args:
local_records: List of local CloudKitRecord objects
remote_records: List of remote record dictionaries
record_type: Type of records being compared
compare_fields: List of field names to compare
Returns:
DiffResult with categorized differences
"""
result = DiffResult()
# Index remote records by name
remote_by_name: dict[str, dict] = {}
for record in remote_records:
name = record.get("recordName")
if name:
remote_by_name[name] = record
# Index local records by name
local_by_name: dict[str, CloudKitRecord] = {}
for record in local_records:
local_by_name[record.record_name] = record
# Find creates and updates
for local_record in local_records:
remote = remote_by_name.get(local_record.record_name)
if remote is None:
# New record
result.creates.append(RecordDiff(
record_name=local_record.record_name,
record_type=record_type,
action=DiffAction.CREATE,
local_record=local_record,
))
else:
# Check for changes
changed_fields = self._compare_fields(
local_record.fields,
remote.get("fields", {}),
compare_fields,
)
if changed_fields:
result.updates.append(RecordDiff(
record_name=local_record.record_name,
record_type=record_type,
action=DiffAction.UPDATE,
local_record=local_record,
remote_record=remote,
changed_fields=changed_fields,
record_change_tag=remote.get("recordChangeTag"),
))
else:
result.unchanged.append(RecordDiff(
record_name=local_record.record_name,
record_type=record_type,
action=DiffAction.UNCHANGED,
local_record=local_record,
remote_record=remote,
record_change_tag=remote.get("recordChangeTag"),
))
# Find deletes (remote records not in local)
local_names = set(local_by_name.keys())
for remote_name, remote in remote_by_name.items():
if remote_name not in local_names:
result.deletes.append(RecordDiff(
record_name=remote_name,
record_type=record_type,
action=DiffAction.DELETE,
remote_record=remote,
record_change_tag=remote.get("recordChangeTag"),
))
return result
def _compare_fields(
self,
local_fields: dict[str, Any],
remote_fields: dict[str, dict],
compare_fields: list[str],
) -> list[str]:
"""Compare field values between local and remote.
Args:
local_fields: Local field values
remote_fields: Remote field values (CloudKit format)
compare_fields: Fields to compare
Returns:
List of field names that differ
"""
changed = []
for field_name in compare_fields:
local_value = local_fields.get(field_name)
remote_field = remote_fields.get(field_name, {})
remote_value = remote_field.get("value") if remote_field else None
# Normalize values for comparison
local_normalized = self._normalize_value(local_value)
remote_normalized = self._normalize_remote_value(remote_value, remote_field)
if local_normalized != remote_normalized:
changed.append(field_name)
return changed
def _normalize_value(self, value: Any) -> Any:
"""Normalize a local value for comparison."""
if value is None:
return None
if isinstance(value, datetime):
# Convert to milliseconds since epoch
return int(value.timestamp() * 1000)
if isinstance(value, float):
# Round to 6 decimal places for coordinate comparison
return round(value, 6)
return value
def _normalize_remote_value(self, value: Any, field_data: dict) -> Any:
"""Normalize a remote CloudKit value for comparison."""
if value is None:
return None
field_type = field_data.get("type", "")
if field_type == "TIMESTAMP":
# Already in milliseconds
return value
if field_type == "DOUBLE":
return round(value, 6)
if field_type == "LOCATION":
# Return as tuple for comparison
if isinstance(value, dict):
return (
round(value.get("latitude", 0), 6),
round(value.get("longitude", 0), 6),
)
return value
def _game_to_record(self, game: Game) -> CloudKitRecord:
"""Convert a Game to a CloudKitRecord.
Field names match CKGame keys in CKModels.swift:
- gameId, canonicalId: Unique identifiers
- homeTeamCanonicalId, awayTeamCanonicalId, stadiumCanonicalId: References as strings
- dateTime: Game time as datetime (will be converted to TIMESTAMP)
- sport: Sport code uppercase (e.g., "MLB")
- season: Season string (e.g., "2025-26" or "2026")
- isPlayoff: Boolean as int (1 or 0)
- broadcastInfo: Optional broadcast network string
"""
# Format season as string
sport_lower = game.sport.lower()
if sport_lower in ("nba", "nhl"):
season_str = f"{game.season}-{str(game.season + 1)[-2:]}"
else:
season_str = str(game.season)
return CloudKitRecord(
record_name=game.id,
record_type=RecordType.GAME,
fields={
"gameId": game.id,
"canonicalId": game.id,
"sport": game.sport.upper(),
"season": season_str,
"dateTime": game.game_date,
"homeTeamCanonicalId": game.home_team_id,
"awayTeamCanonicalId": game.away_team_id,
"stadiumCanonicalId": game.stadium_id,
"isPlayoff": False, # Default, can be overridden
"broadcastInfo": None, # Default, can be overridden
},
)
def _team_to_record(self, team: Team) -> CloudKitRecord:
"""Convert a Team to a CloudKitRecord.
Field names match CKTeam keys in CKModels.swift:
- teamId, canonicalId: Unique identifiers
- name, abbreviation, city: Team info
- sport: Sport code uppercase (e.g., "NBA")
- stadiumCanonicalId: Home stadium canonical ID string
- logoURL: URL string for team logo
- primaryColor, secondaryColor: Hex color strings
"""
return CloudKitRecord(
record_name=team.id,
record_type=RecordType.TEAM,
fields={
"teamId": team.id,
"canonicalId": team.id,
"sport": team.sport.upper(),
"city": team.city,
"name": team.name,
"abbreviation": team.abbreviation,
"stadiumCanonicalId": team.stadium_id,
"logoURL": team.logo_url,
"primaryColor": team.primary_color,
"secondaryColor": team.secondary_color,
},
)
def _stadium_to_record(self, stadium: Stadium) -> CloudKitRecord:
"""Convert a Stadium to a CloudKitRecord.
Field names match CKStadium keys in CKModels.swift:
- stadiumId, canonicalId: Unique identifiers
- name, city, state: Location info
- location: CloudKit LOCATION type with latitude/longitude
- capacity: Seating capacity as int
- yearOpened: Year opened as int
- imageURL: URL string for stadium image
- sport: Sport code uppercase (e.g., "MLB")
- timezoneIdentifier: IANA timezone (e.g., "America/New_York")
"""
return CloudKitRecord(
record_name=stadium.id,
record_type=RecordType.STADIUM,
fields={
"stadiumId": stadium.id,
"canonicalId": stadium.id,
"sport": stadium.sport.upper(),
"name": stadium.name,
"city": stadium.city,
"state": stadium.state,
# CloudKit LOCATION type expects dict with latitude/longitude
"location": {
"latitude": stadium.latitude,
"longitude": stadium.longitude,
},
"capacity": stadium.capacity,
"yearOpened": stadium.opened_year,
"imageURL": stadium.image_url,
"timezoneIdentifier": stadium.timezone,
},
)
def _team_alias_to_record(self, alias: TeamAlias) -> CloudKitRecord:
"""Convert a TeamAlias to a CloudKitRecord.
Field names match CKTeamAlias keys in CKModels.swift:
- aliasId: Unique identifier
- teamCanonicalId: The canonical team this alias resolves to
- aliasType: Type of alias ("abbreviation", "name", "city")
- aliasValue: The alias value to match
- validFrom, validUntil: Optional date bounds
- schemaVersion, lastModified: Versioning fields
"""
return CloudKitRecord(
record_name=alias.id,
record_type=RecordType.TEAM_ALIAS,
fields={
"aliasId": alias.id,
"teamCanonicalId": alias.team_canonical_id,
"aliasType": alias.alias_type.value,
"aliasValue": alias.alias_value,
"validFrom": _date_to_datetime(alias.valid_from),
"validUntil": _date_to_datetime(alias.valid_until),
"schemaVersion": 1,
"lastModified": datetime.utcnow(),
},
)
def _stadium_alias_to_record(self, alias: StadiumAlias) -> CloudKitRecord:
"""Convert a StadiumAlias to a CloudKitRecord.
Field names match CKStadiumAlias keys in CKModels.swift:
- aliasName: The alias name (used as record name/primary key)
- stadiumCanonicalId: The canonical stadium this alias resolves to
- validFrom, validUntil: Optional date bounds
- schemaVersion, lastModified: Versioning fields
"""
# Record name must be unique - combine alias name with stadium ID
# to handle cases like "yankee stadium" mapping to both MLB and MLS stadiums
record_name = f"{alias.alias_name.lower()}|{alias.stadium_canonical_id}"
return CloudKitRecord(
record_name=record_name,
record_type=RecordType.STADIUM_ALIAS,
fields={
"aliasName": alias.alias_name.lower(),
"stadiumCanonicalId": alias.stadium_canonical_id,
"validFrom": _date_to_datetime(alias.valid_from),
"validUntil": _date_to_datetime(alias.valid_until),
"schemaVersion": 1,
"lastModified": datetime.utcnow(),
},
)
def _sport_to_record(self, sport: Sport) -> CloudKitRecord:
"""Convert a Sport to a CloudKitRecord.
Field names match CKSport keys in CKModels.swift:
- sportId: Unique identifier (e.g., 'MLB', 'NBA')
- abbreviation: Sport abbreviation
- displayName: Full display name
- iconName: SF Symbol name
- colorHex: Primary color as hex string
- seasonStartMonth, seasonEndMonth: Season boundary months (1-12)
- isActive: Whether sport is currently supported
- schemaVersion, lastModified: Versioning fields
"""
return CloudKitRecord(
record_name=sport.id,
record_type=RecordType.SPORT,
fields={
"sportId": sport.id,
"abbreviation": sport.abbreviation,
"displayName": sport.display_name,
"iconName": sport.icon_name,
"colorHex": sport.color_hex,
"seasonStartMonth": sport.season_start_month,
"seasonEndMonth": sport.season_end_month,
"isActive": sport.is_active,
"schemaVersion": 1,
"lastModified": datetime.utcnow(),
},
)
def _league_structure_to_record(self, structure: LeagueStructure) -> CloudKitRecord:
"""Convert a LeagueStructure to a CloudKitRecord.
Field names match CKLeagueStructure keys in CKModels.swift:
- structureId: Unique identifier (e.g., 'nba_eastern', 'mlb_al_east')
- sport: Sport code (e.g., 'NBA', 'MLB')
- type: Structure type ('conference', 'division', 'league')
- name: Full name
- abbreviation: Optional abbreviation
- parentId: Parent structure ID (e.g., division's parent is conference)
- displayOrder: Order for display (0-indexed)
- schemaVersion, lastModified: Versioning fields
"""
return CloudKitRecord(
record_name=structure.id,
record_type=RecordType.LEAGUE_STRUCTURE,
fields={
"structureId": structure.id,
"sport": structure.sport.upper(),
"type": structure.structure_type.value,
"name": structure.name,
"abbreviation": structure.abbreviation,
"parentId": structure.parent_id,
"displayOrder": structure.display_order,
"schemaVersion": 1,
"lastModified": datetime.utcnow(),
},
)
def game_to_cloudkit_record(game: Game) -> CloudKitRecord:
"""Convert a Game to a CloudKitRecord.
Convenience function for external use.
"""
differ = RecordDiffer()
return differ._game_to_record(game)
def team_to_cloudkit_record(team: Team) -> CloudKitRecord:
"""Convert a Team to a CloudKitRecord.
Convenience function for external use.
"""
differ = RecordDiffer()
return differ._team_to_record(team)
def stadium_to_cloudkit_record(stadium: Stadium) -> CloudKitRecord:
"""Convert a Stadium to a CloudKitRecord.
Convenience function for external use.
"""
differ = RecordDiffer()
return differ._stadium_to_record(stadium)
def team_alias_to_cloudkit_record(alias: TeamAlias) -> CloudKitRecord:
"""Convert a TeamAlias to a CloudKitRecord.
Convenience function for external use.
"""
differ = RecordDiffer()
return differ._team_alias_to_record(alias)
def stadium_alias_to_cloudkit_record(alias: StadiumAlias) -> CloudKitRecord:
"""Convert a StadiumAlias to a CloudKitRecord.
Convenience function for external use.
"""
differ = RecordDiffer()
return differ._stadium_alias_to_record(alias)
def sport_to_cloudkit_record(sport: Sport) -> CloudKitRecord:
"""Convert a Sport to a CloudKitRecord.
Convenience function for external use.
"""
differ = RecordDiffer()
return differ._sport_to_record(sport)
def league_structure_to_cloudkit_record(structure: LeagueStructure) -> CloudKitRecord:
"""Convert a LeagueStructure to a CloudKitRecord.
Convenience function for external use.
"""
differ = RecordDiffer()
return differ._league_structure_to_record(structure)

View File

@@ -1,384 +0,0 @@
"""Upload state manager for resumable uploads.
This module tracks upload progress to enable resuming interrupted uploads.
State is persisted to JSON files in the .parser_state directory.
"""
import json
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
from typing import Optional
from ..config import STATE_DIR
@dataclass
class RecordState:
"""State of an individual record upload.
Attributes:
record_name: Canonical record ID
record_type: CloudKit record type
uploaded_at: Timestamp when successfully uploaded
record_change_tag: CloudKit version tag
status: 'pending', 'uploaded', 'failed'
error_message: Error message if failed
retry_count: Number of retry attempts
"""
record_name: str
record_type: str
uploaded_at: Optional[datetime] = None
record_change_tag: Optional[str] = None
status: str = "pending"
error_message: Optional[str] = None
retry_count: int = 0
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization."""
return {
"record_name": self.record_name,
"record_type": self.record_type,
"uploaded_at": self.uploaded_at.isoformat() if self.uploaded_at else None,
"record_change_tag": self.record_change_tag,
"status": self.status,
"error_message": self.error_message,
"retry_count": self.retry_count,
}
@classmethod
def from_dict(cls, data: dict) -> "RecordState":
"""Create RecordState from dictionary."""
uploaded_at = data.get("uploaded_at")
if uploaded_at:
uploaded_at = datetime.fromisoformat(uploaded_at)
return cls(
record_name=data["record_name"],
record_type=data["record_type"],
uploaded_at=uploaded_at,
record_change_tag=data.get("record_change_tag"),
status=data.get("status", "pending"),
error_message=data.get("error_message"),
retry_count=data.get("retry_count", 0),
)
@dataclass
class UploadSession:
"""Tracks the state of an upload session.
Attributes:
sport: Sport code
season: Season start year
environment: CloudKit environment
started_at: When the upload session started
last_updated: When the state was last updated
records: Dictionary of record_name -> RecordState
total_count: Total number of records to upload
"""
sport: str
season: int
environment: str
started_at: datetime = field(default_factory=datetime.utcnow)
last_updated: datetime = field(default_factory=datetime.utcnow)
records: dict[str, RecordState] = field(default_factory=dict)
total_count: int = 0
@property
def uploaded_count(self) -> int:
"""Count of successfully uploaded records."""
return sum(1 for r in self.records.values() if r.status == "uploaded")
@property
def pending_count(self) -> int:
"""Count of pending records."""
return sum(1 for r in self.records.values() if r.status == "pending")
@property
def failed_count(self) -> int:
"""Count of failed records."""
return sum(1 for r in self.records.values() if r.status == "failed")
@property
def is_complete(self) -> bool:
"""Check if all records have been processed."""
return self.pending_count == 0
@property
def progress_percent(self) -> float:
"""Calculate upload progress as percentage."""
if self.total_count == 0:
return 100.0
return (self.uploaded_count / self.total_count) * 100
def get_pending_records(self) -> list[str]:
"""Get list of record names that still need to be uploaded."""
return [
name for name, state in self.records.items()
if state.status == "pending"
]
def get_failed_records(self) -> list[str]:
"""Get list of record names that failed to upload."""
return [
name for name, state in self.records.items()
if state.status == "failed"
]
def get_retryable_records(self, max_retries: int = 3) -> list[str]:
"""Get failed records that can be retried."""
return [
name for name, state in self.records.items()
if state.status == "failed" and state.retry_count < max_retries
]
def mark_uploaded(
self,
record_name: str,
record_change_tag: Optional[str] = None,
) -> None:
"""Mark a record as successfully uploaded."""
if record_name in self.records:
state = self.records[record_name]
state.status = "uploaded"
state.uploaded_at = datetime.utcnow()
state.record_change_tag = record_change_tag
state.error_message = None
self.last_updated = datetime.utcnow()
def mark_failed(self, record_name: str, error_message: str) -> None:
"""Mark a record as failed."""
if record_name in self.records:
state = self.records[record_name]
state.status = "failed"
state.error_message = error_message
state.retry_count += 1
self.last_updated = datetime.utcnow()
def mark_pending(self, record_name: str) -> None:
"""Mark a record as pending (for retry)."""
if record_name in self.records:
state = self.records[record_name]
state.status = "pending"
state.error_message = None
self.last_updated = datetime.utcnow()
def add_record(self, record_name: str, record_type: str) -> None:
"""Add a new record to track."""
if record_name not in self.records:
self.records[record_name] = RecordState(
record_name=record_name,
record_type=record_type,
)
self.total_count = len(self.records)
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization."""
return {
"sport": self.sport,
"season": self.season,
"environment": self.environment,
"started_at": self.started_at.isoformat(),
"last_updated": self.last_updated.isoformat(),
"total_count": self.total_count,
"records": {
name: state.to_dict()
for name, state in self.records.items()
},
}
@classmethod
def from_dict(cls, data: dict) -> "UploadSession":
"""Create UploadSession from dictionary."""
session = cls(
sport=data["sport"],
season=data["season"],
environment=data["environment"],
started_at=datetime.fromisoformat(data["started_at"]),
last_updated=datetime.fromisoformat(data["last_updated"]),
total_count=data.get("total_count", 0),
)
for name, record_data in data.get("records", {}).items():
session.records[name] = RecordState.from_dict(record_data)
return session
class StateManager:
"""Manages upload state persistence.
State files are stored in .parser_state/ with naming convention:
upload_state_{sport}_{season}_{environment}.json
"""
def __init__(self, state_dir: Optional[Path] = None):
"""Initialize the state manager.
Args:
state_dir: Directory for state files (default: .parser_state/)
"""
self.state_dir = state_dir or STATE_DIR
self.state_dir.mkdir(parents=True, exist_ok=True)
def _get_state_file(self, sport: str, season: int, environment: str) -> Path:
"""Get the path to a state file."""
return self.state_dir / f"upload_state_{sport}_{season}_{environment}.json"
def load_session(
self,
sport: str,
season: int,
environment: str,
) -> Optional[UploadSession]:
"""Load an existing upload session.
Args:
sport: Sport code
season: Season start year
environment: CloudKit environment
Returns:
UploadSession if exists, None otherwise
"""
state_file = self._get_state_file(sport, season, environment)
if not state_file.exists():
return None
try:
with open(state_file, "r", encoding="utf-8") as f:
data = json.load(f)
return UploadSession.from_dict(data)
except (json.JSONDecodeError, KeyError) as e:
# Corrupted state file
return None
def save_session(self, session: UploadSession) -> None:
"""Save an upload session to disk.
Args:
session: The session to save
"""
state_file = self._get_state_file(
session.sport,
session.season,
session.environment,
)
session.last_updated = datetime.utcnow()
with open(state_file, "w", encoding="utf-8") as f:
json.dump(session.to_dict(), f, indent=2)
def create_session(
self,
sport: str,
season: int,
environment: str,
record_names: list[tuple[str, str]], # (record_name, record_type)
) -> UploadSession:
"""Create a new upload session.
Args:
sport: Sport code
season: Season start year
environment: CloudKit environment
record_names: List of (record_name, record_type) tuples
Returns:
New UploadSession
"""
session = UploadSession(
sport=sport,
season=season,
environment=environment,
)
for record_name, record_type in record_names:
session.add_record(record_name, record_type)
self.save_session(session)
return session
def delete_session(self, sport: str, season: int, environment: str) -> bool:
"""Delete an upload session state file.
Args:
sport: Sport code
season: Season start year
environment: CloudKit environment
Returns:
True if deleted, False if not found
"""
state_file = self._get_state_file(sport, season, environment)
if state_file.exists():
state_file.unlink()
return True
return False
def list_sessions(self) -> list[dict]:
"""List all upload sessions.
Returns:
List of session summaries
"""
sessions = []
for state_file in self.state_dir.glob("upload_state_*.json"):
try:
with open(state_file, "r", encoding="utf-8") as f:
data = json.load(f)
session = UploadSession.from_dict(data)
sessions.append({
"sport": session.sport,
"season": session.season,
"environment": session.environment,
"started_at": session.started_at.isoformat(),
"last_updated": session.last_updated.isoformat(),
"progress": f"{session.uploaded_count}/{session.total_count}",
"progress_percent": f"{session.progress_percent:.1f}%",
"status": "complete" if session.is_complete else "in_progress",
"failed_count": session.failed_count,
})
except (json.JSONDecodeError, KeyError):
continue
return sessions
def get_session_or_create(
self,
sport: str,
season: int,
environment: str,
record_names: list[tuple[str, str]],
resume: bool = False,
) -> UploadSession:
"""Get existing session or create new one.
Args:
sport: Sport code
season: Season start year
environment: CloudKit environment
record_names: List of (record_name, record_type) tuples
resume: Whether to resume existing session
Returns:
UploadSession (existing or new)
"""
if resume:
existing = self.load_session(sport, season, environment)
if existing:
# Add any new records not in existing session
existing_names = set(existing.records.keys())
for record_name, record_type in record_names:
if record_name not in existing_names:
existing.add_record(record_name, record_type)
return existing
# Create new session (overwrites existing)
return self.create_session(sport, season, environment, record_names)

View File

@@ -1,58 +0,0 @@
"""Utility modules for sportstime-parser."""
from .logging import (
get_console,
get_logger,
is_verbose,
log_error,
log_failure,
log_game,
log_stadium,
log_success,
log_team,
log_warning,
set_verbose,
)
from .http import (
RateLimitedSession,
get_session,
fetch_url,
fetch_json,
fetch_html,
)
from .progress import (
create_progress,
create_spinner_progress,
progress_bar,
track_progress,
ProgressTracker,
ScrapeProgress,
)
__all__ = [
# Logging
"get_console",
"get_logger",
"is_verbose",
"log_error",
"log_failure",
"log_game",
"log_stadium",
"log_success",
"log_team",
"log_warning",
"set_verbose",
# HTTP
"RateLimitedSession",
"get_session",
"fetch_url",
"fetch_json",
"fetch_html",
# Progress
"create_progress",
"create_spinner_progress",
"progress_bar",
"track_progress",
"ProgressTracker",
"ScrapeProgress",
]

View File

@@ -1,276 +0,0 @@
"""HTTP utilities with rate limiting and exponential backoff."""
import random
import time
from typing import Optional
from urllib.parse import urlparse
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
from ..config import (
DEFAULT_REQUEST_DELAY,
MAX_RETRIES,
BACKOFF_FACTOR,
INITIAL_BACKOFF,
)
from .logging import get_logger, log_warning
# User agents for rotation to avoid blocks
USER_AGENTS = [
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:121.0) Gecko/20100101 Firefox/121.0",
]
class RateLimitedSession:
"""HTTP session with rate limiting and exponential backoff.
Features:
- Configurable delay between requests
- Automatic 429 detection with exponential backoff
- User-agent rotation
- Connection pooling
- Automatic retries for transient errors
"""
def __init__(
self,
delay: float = DEFAULT_REQUEST_DELAY,
max_retries: int = MAX_RETRIES,
backoff_factor: float = BACKOFF_FACTOR,
initial_backoff: float = INITIAL_BACKOFF,
):
"""Initialize the rate-limited session.
Args:
delay: Minimum delay between requests in seconds
max_retries: Maximum number of retry attempts
backoff_factor: Multiplier for exponential backoff
initial_backoff: Initial backoff duration in seconds
"""
self.delay = delay
self.max_retries = max_retries
self.backoff_factor = backoff_factor
self.initial_backoff = initial_backoff
self.last_request_time: float = 0.0
self._domain_delays: dict[str, float] = {}
# Create session with retry adapter
self.session = requests.Session()
# Configure automatic retries for connection errors
retry_strategy = Retry(
total=max_retries,
backoff_factor=0.5,
status_forcelist=[500, 502, 503, 504],
allowed_methods=["GET", "HEAD"],
)
adapter = HTTPAdapter(max_retries=retry_strategy, pool_maxsize=10)
self.session.mount("http://", adapter)
self.session.mount("https://", adapter)
self._logger = get_logger()
def _get_user_agent(self) -> str:
"""Get a random user agent."""
return random.choice(USER_AGENTS)
def _get_domain(self, url: str) -> str:
"""Extract domain from URL."""
parsed = urlparse(url)
return parsed.netloc
def _wait_for_rate_limit(self, url: str) -> None:
"""Wait to respect rate limiting."""
domain = self._get_domain(url)
# Get domain-specific delay (if 429 was received)
domain_delay = self._domain_delays.get(domain, 0.0)
effective_delay = max(self.delay, domain_delay)
elapsed = time.time() - self.last_request_time
if elapsed < effective_delay:
sleep_time = effective_delay - elapsed
self._logger.debug(f"Rate limiting: sleeping {sleep_time:.2f}s")
time.sleep(sleep_time)
def _handle_429(self, url: str, attempt: int) -> float:
"""Handle 429 Too Many Requests with exponential backoff.
Returns the backoff duration in seconds.
"""
domain = self._get_domain(url)
backoff = self.initial_backoff * (self.backoff_factor ** attempt)
# Add jitter to prevent thundering herd
backoff += random.uniform(0, 1)
# Update domain-specific delay
self._domain_delays[domain] = min(backoff * 2, 60.0) # Cap at 60s
log_warning(f"Rate limited (429) for {domain}, backing off {backoff:.1f}s")
return backoff
def get(
self,
url: str,
headers: Optional[dict] = None,
params: Optional[dict] = None,
timeout: float = 30.0,
) -> requests.Response:
"""Make a rate-limited GET request with automatic retries.
Args:
url: URL to fetch
headers: Additional headers to include
params: Query parameters
timeout: Request timeout in seconds
Returns:
Response object
Raises:
requests.RequestException: If all retries fail
"""
# Prepare headers with user agent
request_headers = {"User-Agent": self._get_user_agent()}
if headers:
request_headers.update(headers)
last_exception: Optional[Exception] = None
for attempt in range(self.max_retries + 1):
try:
# Wait for rate limit
self._wait_for_rate_limit(url)
# Make request
self.last_request_time = time.time()
response = self.session.get(
url,
headers=request_headers,
params=params,
timeout=timeout,
)
# Handle 429
if response.status_code == 429:
if attempt < self.max_retries:
backoff = self._handle_429(url, attempt)
time.sleep(backoff)
continue
else:
response.raise_for_status()
# Return successful response
return response
except requests.RequestException as e:
last_exception = e
if attempt < self.max_retries:
backoff = self.initial_backoff * (self.backoff_factor ** attempt)
self._logger.warning(
f"Request failed (attempt {attempt + 1}): {e}, retrying in {backoff:.1f}s"
)
time.sleep(backoff)
else:
raise
# Should not reach here, but just in case
if last_exception:
raise last_exception
raise requests.RequestException("Max retries exceeded")
def get_json(
self,
url: str,
headers: Optional[dict] = None,
params: Optional[dict] = None,
timeout: float = 30.0,
) -> dict:
"""Make a rate-limited GET request and parse JSON response.
Args:
url: URL to fetch
headers: Additional headers to include
params: Query parameters
timeout: Request timeout in seconds
Returns:
Parsed JSON as dictionary
Raises:
requests.RequestException: If request fails
ValueError: If response is not valid JSON
"""
response = self.get(url, headers=headers, params=params, timeout=timeout)
response.raise_for_status()
return response.json()
def get_html(
self,
url: str,
headers: Optional[dict] = None,
params: Optional[dict] = None,
timeout: float = 30.0,
) -> str:
"""Make a rate-limited GET request and return HTML text.
Args:
url: URL to fetch
headers: Additional headers to include
params: Query parameters
timeout: Request timeout in seconds
Returns:
HTML text content
Raises:
requests.RequestException: If request fails
"""
response = self.get(url, headers=headers, params=params, timeout=timeout)
response.raise_for_status()
return response.text
def reset_domain_delays(self) -> None:
"""Reset domain-specific delays (e.g., after a long pause)."""
self._domain_delays.clear()
def close(self) -> None:
"""Close the session and release resources."""
self.session.close()
# Global session instance (lazy initialized)
_global_session: Optional[RateLimitedSession] = None
def get_session() -> RateLimitedSession:
"""Get the global rate-limited session instance."""
global _global_session
if _global_session is None:
_global_session = RateLimitedSession()
return _global_session
def fetch_url(url: str, **kwargs) -> requests.Response:
"""Convenience function to fetch a URL with rate limiting."""
return get_session().get(url, **kwargs)
def fetch_json(url: str, **kwargs) -> dict:
"""Convenience function to fetch JSON with rate limiting."""
return get_session().get_json(url, **kwargs)
def fetch_html(url: str, **kwargs) -> str:
"""Convenience function to fetch HTML with rate limiting."""
return get_session().get_html(url, **kwargs)

View File

@@ -1,149 +0,0 @@
"""Logging infrastructure for sportstime-parser."""
import logging
import sys
from datetime import datetime
from pathlib import Path
from typing import Optional
from rich.console import Console
from rich.logging import RichHandler
from ..config import SCRIPTS_DIR
# Module-level state
_logger: Optional[logging.Logger] = None
_verbose: bool = False
_console: Optional[Console] = None
def get_console() -> Console:
"""Get the shared Rich console instance."""
global _console
if _console is None:
_console = Console()
return _console
def set_verbose(verbose: bool) -> None:
"""Set verbose mode globally."""
global _verbose
_verbose = verbose
logger = get_logger()
if verbose:
logger.setLevel(logging.DEBUG)
else:
logger.setLevel(logging.INFO)
def is_verbose() -> bool:
"""Check if verbose mode is enabled."""
return _verbose
def get_logger() -> logging.Logger:
"""Get or create the application logger."""
global _logger
if _logger is not None:
return _logger
_logger = logging.getLogger("sportstime_parser")
_logger.setLevel(logging.INFO)
# Prevent propagation to root logger
_logger.propagate = False
# Clear any existing handlers
_logger.handlers.clear()
# Console handler with Rich formatting
console_handler = RichHandler(
console=get_console(),
show_time=True,
show_path=False,
rich_tracebacks=True,
tracebacks_show_locals=True,
markup=True,
)
console_handler.setLevel(logging.DEBUG)
console_format = logging.Formatter("%(message)s")
console_handler.setFormatter(console_format)
_logger.addHandler(console_handler)
# File handler for persistent logs
log_dir = SCRIPTS_DIR / "logs"
log_dir.mkdir(exist_ok=True)
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
log_file = log_dir / f"parser_{timestamp}.log"
file_handler = logging.FileHandler(log_file, encoding="utf-8")
file_handler.setLevel(logging.DEBUG)
file_format = logging.Formatter(
"%(asctime)s | %(levelname)-8s | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
)
file_handler.setFormatter(file_format)
_logger.addHandler(file_handler)
return _logger
def log_game(
sport: str,
game_id: str,
home: str,
away: str,
date: str,
status: str = "parsed",
) -> None:
"""Log a game being processed (only in verbose mode)."""
if not is_verbose():
return
logger = get_logger()
logger.debug(f"[{sport.upper()}] {game_id}: {away} @ {home} ({date}) - {status}")
def log_team(sport: str, team_id: str, name: str, status: str = "resolved") -> None:
"""Log a team being processed (only in verbose mode)."""
if not is_verbose():
return
logger = get_logger()
logger.debug(f"[{sport.upper()}] Team: {name} -> {team_id} ({status})")
def log_stadium(sport: str, stadium_id: str, name: str, status: str = "resolved") -> None:
"""Log a stadium being processed (only in verbose mode)."""
if not is_verbose():
return
logger = get_logger()
logger.debug(f"[{sport.upper()}] Stadium: {name} -> {stadium_id} ({status})")
def log_error(message: str, exc_info: bool = False) -> None:
"""Log an error message."""
logger = get_logger()
logger.error(message, exc_info=exc_info)
def log_warning(message: str) -> None:
"""Log a warning message."""
logger = get_logger()
logger.warning(message)
def log_success(message: str) -> None:
"""Log a success message with green formatting."""
logger = get_logger()
logger.info(f"[green]✓[/green] {message}")
def log_failure(message: str) -> None:
"""Log a failure message with red formatting."""
logger = get_logger()
logger.info(f"[red]✗[/red] {message}")

View File

@@ -1,360 +0,0 @@
"""Progress utilities using Rich for visual feedback."""
from contextlib import contextmanager
from typing import Generator, Iterable, Optional, TypeVar
from rich.progress import (
Progress,
SpinnerColumn,
TextColumn,
BarColumn,
TaskProgressColumn,
TimeElapsedColumn,
TimeRemainingColumn,
MofNCompleteColumn,
)
from rich.console import Console
from .logging import get_console
T = TypeVar("T")
def create_progress() -> Progress:
"""Create a Rich progress bar with standard columns."""
return Progress(
SpinnerColumn(),
TextColumn("[bold blue]{task.description}"),
BarColumn(bar_width=40),
TaskProgressColumn(),
MofNCompleteColumn(),
TimeElapsedColumn(),
TimeRemainingColumn(),
console=get_console(),
transient=False,
)
def create_spinner_progress() -> Progress:
"""Create a Rich progress bar with spinner only (for indeterminate tasks)."""
return Progress(
SpinnerColumn(),
TextColumn("[bold blue]{task.description}"),
TimeElapsedColumn(),
console=get_console(),
transient=True,
)
@contextmanager
def progress_bar(
description: str,
total: Optional[int] = None,
) -> Generator[tuple[Progress, int], None, None]:
"""Context manager for a progress bar.
Args:
description: Task description to display
total: Total number of items (None for indeterminate)
Yields:
Tuple of (Progress instance, task_id)
Example:
with progress_bar("Scraping games", total=100) as (progress, task):
for item in items:
process(item)
progress.advance(task)
"""
if total is None:
progress = create_spinner_progress()
else:
progress = create_progress()
with progress:
task_id = progress.add_task(description, total=total)
yield progress, task_id
def track_progress(
iterable: Iterable[T],
description: str,
total: Optional[int] = None,
) -> Generator[T, None, None]:
"""Wrap an iterable with a progress bar.
Args:
iterable: Items to iterate over
description: Task description to display
total: Total number of items (auto-detected if iterable has len)
Yields:
Items from the iterable
Example:
for game in track_progress(games, "Processing games"):
process(game)
"""
# Try to get length if not provided
if total is None:
try:
total = len(iterable) # type: ignore
except TypeError:
pass
if total is None:
# Indeterminate progress
progress = create_spinner_progress()
with progress:
task_id = progress.add_task(description, total=None)
for item in iterable:
yield item
progress.update(task_id, advance=1)
else:
# Determinate progress
progress = create_progress()
with progress:
task_id = progress.add_task(description, total=total)
for item in iterable:
yield item
progress.advance(task_id)
class ProgressTracker:
"""Track progress across multiple phases with nested tasks.
Example:
tracker = ProgressTracker()
tracker.start("Scraping NBA")
with tracker.task("Fetching schedule", total=12) as advance:
for month in months:
fetch(month)
advance()
with tracker.task("Parsing games", total=1230) as advance:
for game in games:
parse(game)
advance()
tracker.finish("Completed NBA scrape")
"""
def __init__(self):
"""Initialize the progress tracker."""
self._console = get_console()
self._current_progress: Optional[Progress] = None
self._current_task: Optional[int] = None
def start(self, message: str) -> None:
"""Start a new tracking session with a message."""
self._console.print(f"\n[bold cyan]>>> {message}[/bold cyan]")
def finish(self, message: str) -> None:
"""Finish the tracking session with a message."""
self._console.print(f"[bold green]<<< {message}[/bold green]\n")
@contextmanager
def task(
self,
description: str,
total: Optional[int] = None,
) -> Generator[callable, None, None]:
"""Context manager for a tracked task.
Args:
description: Task description
total: Total items (None for indeterminate)
Yields:
Callable to advance the progress
Example:
with tracker.task("Processing", total=100) as advance:
for item in items:
process(item)
advance()
"""
with progress_bar(description, total) as (progress, task_id):
self._current_progress = progress
self._current_task = task_id
def advance(amount: int = 1) -> None:
progress.advance(task_id, advance=amount)
yield advance
self._current_progress = None
self._current_task = None
def log(self, message: str) -> None:
"""Log a message (will be displayed above progress bar if active)."""
if self._current_progress:
self._current_progress.console.print(f" {message}")
else:
self._console.print(f" {message}")
class ScrapeProgress:
"""Specialized progress tracker for scraping operations.
Tracks counts of games, teams, stadiums scraped and provides
formatted status updates.
"""
def __init__(self, sport: str, season: int):
"""Initialize scrape progress for a sport.
Args:
sport: Sport code (e.g., 'nba')
season: Season start year
"""
self.sport = sport
self.season = season
self.games_count = 0
self.teams_count = 0
self.stadiums_count = 0
self.errors_count = 0
self._tracker = ProgressTracker()
def start(self) -> None:
"""Start the scraping session."""
self._tracker.start(
f"Scraping {self.sport.upper()} {self.season}-{self.season + 1}"
)
def finish(self) -> None:
"""Finish the scraping session with summary."""
summary = (
f"Scraped {self.games_count} games, "
f"{self.teams_count} teams, "
f"{self.stadiums_count} stadiums"
)
if self.errors_count > 0:
summary += f" ({self.errors_count} errors)"
self._tracker.finish(summary)
@contextmanager
def scraping_schedule(
self,
total_months: Optional[int] = None,
) -> Generator[callable, None, None]:
"""Track schedule scraping progress."""
with self._tracker.task(
f"Fetching {self.sport.upper()} schedule",
total=total_months,
) as advance:
yield advance
@contextmanager
def parsing_games(
self,
total_games: Optional[int] = None,
) -> Generator[callable, None, None]:
"""Track game parsing progress."""
with self._tracker.task(
"Parsing games",
total=total_games,
) as advance:
def advance_and_count(amount: int = 1) -> None:
self.games_count += amount
advance(amount)
yield advance_and_count
@contextmanager
def resolving_teams(
self,
total_teams: Optional[int] = None,
) -> Generator[callable, None, None]:
"""Track team resolution progress."""
with self._tracker.task(
"Resolving teams",
total=total_teams,
) as advance:
def advance_and_count(amount: int = 1) -> None:
self.teams_count += amount
advance(amount)
yield advance_and_count
@contextmanager
def resolving_stadiums(
self,
total_stadiums: Optional[int] = None,
) -> Generator[callable, None, None]:
"""Track stadium resolution progress."""
with self._tracker.task(
"Resolving stadiums",
total=total_stadiums,
) as advance:
def advance_and_count(amount: int = 1) -> None:
self.stadiums_count += amount
advance(amount)
yield advance_and_count
def log_error(self, message: str) -> None:
"""Log an error during scraping."""
self.errors_count += 1
self._tracker.log(f"[red]Error: {message}[/red]")
def log_warning(self, message: str) -> None:
"""Log a warning during scraping."""
self._tracker.log(f"[yellow]Warning: {message}[/yellow]")
def log_info(self, message: str) -> None:
"""Log an info message during scraping."""
self._tracker.log(message)
class SimpleProgressBar:
"""Simple progress bar wrapper for batch operations.
Example:
with create_progress_bar(total=100, description="Uploading") as progress:
for item in items:
upload(item)
progress.advance()
"""
def __init__(self, progress: Progress, task_id: int):
self._progress = progress
self._task_id = task_id
def advance(self, amount: int = 1) -> None:
"""Advance the progress bar."""
self._progress.advance(self._task_id, advance=amount)
def update(self, completed: int) -> None:
"""Set the progress to a specific value."""
self._progress.update(self._task_id, completed=completed)
@contextmanager
def create_progress_bar(
total: int,
description: str = "Progress",
) -> Generator[SimpleProgressBar, None, None]:
"""Create a simple progress bar for batch operations.
Args:
total: Total number of items
description: Task description
Yields:
SimpleProgressBar with advance() and update() methods
Example:
with create_progress_bar(total=100, description="Uploading") as progress:
for item in items:
upload(item)
progress.advance()
"""
progress = create_progress()
with progress:
task_id = progress.add_task(description, total=total)
yield SimpleProgressBar(progress, task_id)

View File

@@ -1,32 +0,0 @@
"""Validators for scraped data."""
from .report import (
ValidationReport,
ValidationSummary,
generate_report,
detect_duplicate_games,
validate_games,
)
from .schema import (
SchemaValidationError,
validate_canonical_stadium,
validate_canonical_team,
validate_canonical_game,
validate_and_raise,
validate_batch,
)
__all__ = [
"ValidationReport",
"ValidationSummary",
"generate_report",
"detect_duplicate_games",
"validate_games",
"SchemaValidationError",
"validate_canonical_stadium",
"validate_canonical_team",
"validate_canonical_game",
"validate_and_raise",
"validate_batch",
]

View File

@@ -1,409 +0,0 @@
"""Validation report generator for scraped data."""
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
from typing import Optional
from ..config import EXPECTED_GAME_COUNTS, OUTPUT_DIR
from ..models.game import Game
from ..models.team import Team
from ..models.stadium import Stadium
from ..models.aliases import ManualReviewItem, ReviewReason
@dataclass
class ValidationSummary:
"""Summary statistics for validation report.
Attributes:
total_games: Total number of games scraped
valid_games: Number of games with all data resolved
review_count: Number of items needing manual review
unresolved_teams: Count of unresolved team names
unresolved_stadiums: Count of unresolved stadium names
duplicate_games: Count of potential duplicate games
missing_data: Count of games with missing required data
expected_games: Expected number of games for this sport
"""
total_games: int = 0
valid_games: int = 0
review_count: int = 0
unresolved_teams: int = 0
unresolved_stadiums: int = 0
duplicate_games: int = 0
missing_data: int = 0
expected_games: int = 0
@property
def game_coverage(self) -> float:
"""Percentage of expected games scraped."""
if self.expected_games == 0:
return 100.0
return (self.total_games / self.expected_games) * 100
@property
def validity_rate(self) -> float:
"""Percentage of games that are valid."""
if self.total_games == 0:
return 100.0
return (self.valid_games / self.total_games) * 100
@property
def needs_review(self) -> bool:
"""Check if report requires manual review."""
return self.review_count > 0
@property
def status_emoji(self) -> str:
"""Get status emoji for report header."""
if self.review_count == 0 and self.game_coverage >= 95:
return ""
elif self.review_count <= 10 and self.game_coverage >= 80:
return "⚠️"
else:
return ""
@dataclass
class ValidationReport:
"""Complete validation report for a sport/season.
Attributes:
sport: Sport code
season: Season start year
source: Name of the data source used
summary: Summary statistics
review_items: Items requiring manual review
games: All scraped games
teams: All teams
stadiums: All stadiums
generated_at: Timestamp of report generation
"""
sport: str
season: int
source: str
summary: ValidationSummary
review_items: list[ManualReviewItem] = field(default_factory=list)
games: list[Game] = field(default_factory=list)
teams: list[Team] = field(default_factory=list)
stadiums: list[Stadium] = field(default_factory=list)
generated_at: datetime = field(default_factory=datetime.now)
def to_markdown(self) -> str:
"""Generate markdown report.
Returns:
Complete markdown report as string
"""
lines = []
# Header
season_str = f"{self.season}-{str(self.season + 1)[-2:]}"
lines.append(f"# Validation Report: {self.sport.upper()} {season_str}")
lines.append("")
lines.append(f"**Generated**: {self.generated_at.strftime('%Y-%m-%d %H:%M:%S')} UTC")
lines.append(f"**Source**: {self.source}")
lines.append(f"**Status**: {self.summary.status_emoji} {'Needs Review' if self.summary.needs_review else 'Ready'}")
lines.append("")
# Summary table
lines.append("## Summary")
lines.append("")
lines.append("| Metric | Count |")
lines.append("|--------|-------|")
lines.append(f"| Total Games | {self.summary.total_games:,} |")
lines.append(f"| Valid Games | {self.summary.valid_games:,} |")
lines.append(f"| Expected Games | {self.summary.expected_games:,} |")
lines.append(f"| Coverage | {self.summary.game_coverage:.1f}% |")
lines.append(f"| Manual Review | {self.summary.review_count} |")
lines.append(f"| Unresolved Teams | {self.summary.unresolved_teams} |")
lines.append(f"| Unresolved Stadiums | {self.summary.unresolved_stadiums} |")
lines.append(f"| Duplicate Games | {self.summary.duplicate_games} |")
lines.append(f"| Missing Data | {self.summary.missing_data} |")
lines.append("")
# Manual review section
if self.review_items:
lines.append("## Manual Review Required")
lines.append("")
# Group by reason
by_reason: dict[ReviewReason, list[ManualReviewItem]] = {}
for item in self.review_items:
if item.reason not in by_reason:
by_reason[item.reason] = []
by_reason[item.reason].append(item)
for reason, items in sorted(by_reason.items(), key=lambda x: x[0].value):
reason_title = reason.value.replace("_", " ").title()
lines.append(f"### {reason_title} ({len(items)})")
lines.append("")
for item in items[:10]: # Limit to first 10 per category
lines.append(item.to_markdown())
if len(items) > 10:
lines.append(f"*... and {len(items) - 10} more items*")
lines.append("")
# Teams section
lines.append("## Teams")
lines.append("")
lines.append(f"Total teams: {len(self.teams)}")
lines.append("")
if self.teams:
lines.append("| ID | Full Name | City | Conference | Division |")
lines.append("|-----|-----------|------|------------|----------|")
for team in sorted(self.teams, key=lambda t: t.full_name)[:20]:
lines.append(
f"| `{team.id}` | {team.full_name} | {team.city} | "
f"{team.conference or '-'} | {team.division or '-'} |"
)
if len(self.teams) > 20:
lines.append(f"*... and {len(self.teams) - 20} more teams*")
lines.append("")
# Stadiums section
lines.append("## Stadiums")
lines.append("")
lines.append(f"Total stadiums: {len(self.stadiums)}")
lines.append("")
if self.stadiums:
lines.append("| ID | Name | City | State |")
lines.append("|-----|------|------|-------|")
for stadium in sorted(self.stadiums, key=lambda s: s.name)[:20]:
lines.append(
f"| `{stadium.id}` | {stadium.name} | "
f"{stadium.city} | {stadium.state} |"
)
if len(self.stadiums) > 20:
lines.append(f"*... and {len(self.stadiums) - 20} more stadiums*")
lines.append("")
# Game samples section
lines.append("## Game Samples")
lines.append("")
if self.games:
# Show first 10 games
lines.append("### First 10 Games")
lines.append("")
lines.append("| ID | Date | Away | Home | Status |")
lines.append("|----|------|------|------|--------|")
for game in self.games[:10]:
date_str = game.game_date.strftime("%Y-%m-%d")
lines.append(
f"| `{game.id}` | {date_str} | {game.away_team_id} | "
f"{game.home_team_id} | {game.status} |"
)
lines.append("")
# Show games with issues
problem_games = [g for g in self.games if not g.stadium_id]
if problem_games:
lines.append("### Games Missing Stadium")
lines.append("")
lines.append("| ID | Date | Away | Home | Raw Stadium |")
lines.append("|----|------|------|------|-------------|")
for game in problem_games[:10]:
date_str = game.game_date.strftime("%Y-%m-%d")
lines.append(
f"| `{game.id}` | {date_str} | {game.away_team_id} | "
f"{game.home_team_id} | {game.raw_stadium or '-'} |"
)
if len(problem_games) > 10:
lines.append(f"*... and {len(problem_games) - 10} more*")
lines.append("")
lines.append("---")
lines.append("")
lines.append("*Generated by sportstime-parser*")
return "\n".join(lines)
def save(self, output_dir: Optional[Path] = None) -> Path:
"""Save report to markdown file.
Args:
output_dir: Directory to save to (default: OUTPUT_DIR)
Returns:
Path to saved file
"""
if output_dir is None:
output_dir = OUTPUT_DIR
output_dir.mkdir(parents=True, exist_ok=True)
filename = f"validation_{self.sport}_{self.season}.md"
filepath = output_dir / filename
with open(filepath, "w", encoding="utf-8") as f:
f.write(self.to_markdown())
return filepath
def generate_report(
sport: str,
season: int,
source: str,
games: list[Game],
teams: list[Team],
stadiums: list[Stadium],
review_items: list[ManualReviewItem],
) -> ValidationReport:
"""Generate a validation report from scraped data.
Args:
sport: Sport code
season: Season start year
source: Data source name
games: List of scraped games
teams: List of teams
stadiums: List of stadiums
review_items: Items requiring review
Returns:
Complete ValidationReport
"""
# Calculate summary
summary = ValidationSummary(
total_games=len(games),
expected_games=EXPECTED_GAME_COUNTS.get(sport, 0),
review_count=len(review_items),
)
# Count review item types
for item in review_items:
if item.reason == ReviewReason.UNRESOLVED_TEAM:
summary.unresolved_teams += 1
elif item.reason == ReviewReason.UNRESOLVED_STADIUM:
summary.unresolved_stadiums += 1
elif item.reason == ReviewReason.DUPLICATE_GAME:
summary.duplicate_games += 1
elif item.reason == ReviewReason.MISSING_DATA:
summary.missing_data += 1
# Count valid games (games with all required data)
valid_count = 0
for game in games:
if game.home_team_id and game.away_team_id:
valid_count += 1
summary.valid_games = valid_count
return ValidationReport(
sport=sport,
season=season,
source=source,
summary=summary,
review_items=review_items,
games=games,
teams=teams,
stadiums=stadiums,
)
def detect_duplicate_games(games: list[Game]) -> list[ManualReviewItem]:
"""Detect potential duplicate games.
Duplicates are identified by having the same:
- Home team
- Away team
- Date (ignoring time)
Args:
games: List of games to check
Returns:
List of ManualReviewItems for duplicates
"""
from uuid import uuid4
seen: dict[str, Game] = {}
duplicates: list[ManualReviewItem] = []
for game in games:
# Create a key for the game
key = (
f"{game.home_team_id}_{game.away_team_id}_"
f"{game.game_date.strftime('%Y%m%d')}"
)
if key in seen:
# Skip if it's a doubleheader (has game_number)
if game.game_number:
continue
existing = seen[key]
duplicates.append(
ManualReviewItem(
id=f"dup_{uuid4().hex[:8]}",
reason=ReviewReason.DUPLICATE_GAME,
sport=game.sport,
raw_value=f"{game.id} vs {existing.id}",
context={
"game1_id": existing.id,
"game2_id": game.id,
"date": game.game_date.strftime("%Y-%m-%d"),
"home": game.home_team_id,
"away": game.away_team_id,
},
game_date=game.game_date.date(),
)
)
else:
seen[key] = game
return duplicates
def validate_games(games: list[Game]) -> list[ManualReviewItem]:
"""Validate games and return issues found.
Checks:
- Missing stadium IDs
- Missing team IDs
- Invalid dates
- Duplicate games
Args:
games: List of games to validate
Returns:
List of ManualReviewItems for issues
"""
from uuid import uuid4
issues: list[ManualReviewItem] = []
for game in games:
# Check for missing stadium
if not game.stadium_id:
issues.append(
ManualReviewItem(
id=f"missing_{uuid4().hex[:8]}",
reason=ReviewReason.MISSING_DATA,
sport=game.sport,
raw_value=f"Game {game.id} missing stadium",
context={
"game_id": game.id,
"field": "stadium_id",
"raw_stadium": game.raw_stadium,
},
source_url=game.source_url,
game_date=game.game_date.date(),
)
)
# Check for duplicates
dup_issues = detect_duplicate_games(games)
issues.extend(dup_issues)
return issues

View File

@@ -1,246 +0,0 @@
"""JSON Schema validation for canonical output matching iOS app expectations.
This module defines schemas that match the Swift structs in BootstrapService.swift:
- JSONCanonicalStadium
- JSONCanonicalTeam
- JSONCanonicalGame
Validation is performed at runtime before outputting JSON to ensure
Python output matches what the iOS app expects.
"""
import re
from dataclasses import dataclass
from typing import Any, Callable, Optional, Union
class SchemaValidationError(Exception):
"""Raised when canonical output fails schema validation."""
def __init__(self, model_type: str, errors: list[str]):
self.model_type = model_type
self.errors = errors
super().__init__(f"{model_type} schema validation failed:\n" + "\n".join(f" - {e}" for e in errors))
# ISO8601 UTC datetime pattern: YYYY-MM-DDTHH:MM:SSZ
ISO8601_UTC_PATTERN = re.compile(r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z$")
# Season format patterns
SEASON_SPLIT_PATTERN = re.compile(r"^\d{4}-\d{2}$") # e.g., "2025-26"
SEASON_SINGLE_PATTERN = re.compile(r"^\d{4}$") # e.g., "2025"
@dataclass
class FieldSpec:
"""Specification for a field in the canonical schema."""
name: str
required: bool
field_type: Union[type, tuple]
validator: Optional[Callable] = None
# Schema definitions matching Swift structs in BootstrapService.swift
STADIUM_SCHEMA: list[FieldSpec] = [
FieldSpec("canonical_id", required=True, field_type=str),
FieldSpec("name", required=True, field_type=str),
FieldSpec("city", required=True, field_type=str),
FieldSpec("state", required=True, field_type=str),
FieldSpec("latitude", required=True, field_type=(int, float)),
FieldSpec("longitude", required=True, field_type=(int, float)),
FieldSpec("capacity", required=True, field_type=int),
FieldSpec("sport", required=True, field_type=str),
FieldSpec("primary_team_abbrevs", required=True, field_type=list),
FieldSpec("year_opened", required=False, field_type=(int, type(None))),
]
TEAM_SCHEMA: list[FieldSpec] = [
FieldSpec("canonical_id", required=True, field_type=str),
FieldSpec("name", required=True, field_type=str),
FieldSpec("abbreviation", required=True, field_type=str),
FieldSpec("sport", required=True, field_type=str),
FieldSpec("city", required=True, field_type=str),
FieldSpec("stadium_canonical_id", required=True, field_type=str),
FieldSpec("conference_id", required=False, field_type=(str, type(None))),
FieldSpec("division_id", required=False, field_type=(str, type(None))),
FieldSpec("primary_color", required=False, field_type=(str, type(None))),
FieldSpec("secondary_color", required=False, field_type=(str, type(None))),
]
GAME_SCHEMA: list[FieldSpec] = [
FieldSpec("canonical_id", required=True, field_type=str),
FieldSpec("sport", required=True, field_type=str),
FieldSpec(
"season",
required=True,
field_type=str,
validator=lambda v: SEASON_SPLIT_PATTERN.match(v) or SEASON_SINGLE_PATTERN.match(v),
),
FieldSpec(
"game_datetime_utc",
required=True,
field_type=str,
validator=lambda v: ISO8601_UTC_PATTERN.match(v),
),
FieldSpec("home_team_canonical_id", required=True, field_type=str),
FieldSpec("away_team_canonical_id", required=True, field_type=str),
FieldSpec("stadium_canonical_id", required=True, field_type=str),
FieldSpec("is_playoff", required=True, field_type=bool),
FieldSpec("broadcast", required=False, field_type=(str, type(None))),
]
def validate_field(data: dict[str, Any], spec: FieldSpec) -> list[str]:
"""Validate a single field against its specification.
Args:
data: The dictionary to validate
spec: The field specification
Returns:
List of error messages (empty if valid)
"""
errors = []
if spec.name not in data:
if spec.required:
errors.append(f"Missing required field: {spec.name}")
return errors
value = data[spec.name]
# Check type
if not isinstance(value, spec.field_type):
expected = spec.field_type.__name__ if isinstance(spec.field_type, type) else str(spec.field_type)
actual = type(value).__name__
errors.append(f"Field '{spec.name}' has wrong type: expected {expected}, got {actual} (value: {value!r})")
return errors
# Check custom validator
if spec.validator and value is not None:
if not spec.validator(value):
errors.append(f"Field '{spec.name}' failed validation: {value!r}")
return errors
def validate_canonical_stadium(data: dict[str, Any]) -> list[str]:
"""Validate a canonical stadium dictionary.
Args:
data: Stadium dictionary from to_canonical_dict()
Returns:
List of error messages (empty if valid)
"""
errors = []
for spec in STADIUM_SCHEMA:
errors.extend(validate_field(data, spec))
# Additional validation: primary_team_abbrevs should contain strings
if "primary_team_abbrevs" in data and isinstance(data["primary_team_abbrevs"], list):
for i, abbrev in enumerate(data["primary_team_abbrevs"]):
if not isinstance(abbrev, str):
errors.append(f"primary_team_abbrevs[{i}] must be string, got {type(abbrev).__name__}")
return errors
def validate_canonical_team(data: dict[str, Any]) -> list[str]:
"""Validate a canonical team dictionary.
Args:
data: Team dictionary from to_canonical_dict()
Returns:
List of error messages (empty if valid)
"""
errors = []
for spec in TEAM_SCHEMA:
errors.extend(validate_field(data, spec))
return errors
def validate_canonical_game(data: dict[str, Any]) -> list[str]:
"""Validate a canonical game dictionary.
Args:
data: Game dictionary from to_canonical_dict()
Returns:
List of error messages (empty if valid)
"""
errors = []
for spec in GAME_SCHEMA:
errors.extend(validate_field(data, spec))
return errors
def validate_and_raise(data: dict[str, Any], model_type: str) -> None:
"""Validate a canonical dictionary and raise on error.
Args:
data: Dictionary from to_canonical_dict()
model_type: One of 'stadium', 'team', 'game'
Raises:
SchemaValidationError: If validation fails
ValueError: If model_type is unknown
"""
validators = {
"stadium": validate_canonical_stadium,
"team": validate_canonical_team,
"game": validate_canonical_game,
}
if model_type not in validators:
raise ValueError(f"Unknown model type: {model_type}")
errors = validators[model_type](data)
if errors:
raise SchemaValidationError(model_type, errors)
def validate_batch(
items: list[dict[str, Any]],
model_type: str,
fail_fast: bool = True,
) -> list[tuple[int, list[str]]]:
"""Validate a batch of canonical dictionaries.
Args:
items: List of dictionaries from to_canonical_dict()
model_type: One of 'stadium', 'team', 'game'
fail_fast: If True, raise on first error; if False, collect all errors
Returns:
List of (index, errors) tuples for items with validation errors
Raises:
SchemaValidationError: If fail_fast=True and validation fails
"""
validators = {
"stadium": validate_canonical_stadium,
"team": validate_canonical_team,
"game": validate_canonical_game,
}
if model_type not in validators:
raise ValueError(f"Unknown model type: {model_type}")
validator = validators[model_type]
all_errors = []
for i, item in enumerate(items):
errors = validator(item)
if errors:
if fail_fast:
raise SchemaValidationError(
model_type,
[f"Item {i}: {e}" for e in errors],
)
all_errors.append((i, errors))
return all_errors

File diff suppressed because it is too large Load Diff

View File

@@ -1,634 +0,0 @@
[
{
"id": "alias_mlb_1",
"team_canonical_id": "team_mlb_wsn",
"alias_type": "name",
"alias_value": "Montreal Expos",
"valid_from": "1969-01-01",
"valid_until": "2004-12-31"
},
{
"id": "alias_mlb_2",
"team_canonical_id": "team_mlb_wsn",
"alias_type": "abbreviation",
"alias_value": "MON",
"valid_from": "1969-01-01",
"valid_until": "2004-12-31"
},
{
"id": "alias_mlb_3",
"team_canonical_id": "team_mlb_wsn",
"alias_type": "city",
"alias_value": "Montreal",
"valid_from": "1969-01-01",
"valid_until": "2004-12-31"
},
{
"id": "alias_mlb_4",
"team_canonical_id": "team_mlb_oak",
"alias_type": "name",
"alias_value": "Kansas City Athletics",
"valid_from": "1955-01-01",
"valid_until": "1967-12-31"
},
{
"id": "alias_mlb_5",
"team_canonical_id": "team_mlb_oak",
"alias_type": "abbreviation",
"alias_value": "KCA",
"valid_from": "1955-01-01",
"valid_until": "1967-12-31"
},
{
"id": "alias_mlb_6",
"team_canonical_id": "team_mlb_oak",
"alias_type": "city",
"alias_value": "Kansas City",
"valid_from": "1955-01-01",
"valid_until": "1967-12-31"
},
{
"id": "alias_mlb_7",
"team_canonical_id": "team_mlb_oak",
"alias_type": "name",
"alias_value": "Philadelphia Athletics",
"valid_from": "1901-01-01",
"valid_until": "1954-12-31"
},
{
"id": "alias_mlb_8",
"team_canonical_id": "team_mlb_oak",
"alias_type": "abbreviation",
"alias_value": "PHA",
"valid_from": "1901-01-01",
"valid_until": "1954-12-31"
},
{
"id": "alias_mlb_9",
"team_canonical_id": "team_mlb_oak",
"alias_type": "city",
"alias_value": "Philadelphia",
"valid_from": "1901-01-01",
"valid_until": "1954-12-31"
},
{
"id": "alias_mlb_10",
"team_canonical_id": "team_mlb_cle",
"alias_type": "name",
"alias_value": "Cleveland Indians",
"valid_from": "1915-01-01",
"valid_until": "2021-12-31"
},
{
"id": "alias_mlb_11",
"team_canonical_id": "team_mlb_tbr",
"alias_type": "name",
"alias_value": "Tampa Bay Devil Rays",
"valid_from": "1998-01-01",
"valid_until": "2007-12-31"
},
{
"id": "alias_mlb_12",
"team_canonical_id": "team_mlb_mia",
"alias_type": "name",
"alias_value": "Florida Marlins",
"valid_from": "1993-01-01",
"valid_until": "2011-12-31"
},
{
"id": "alias_mlb_13",
"team_canonical_id": "team_mlb_mia",
"alias_type": "city",
"alias_value": "Florida",
"valid_from": "1993-01-01",
"valid_until": "2011-12-31"
},
{
"id": "alias_mlb_14",
"team_canonical_id": "team_mlb_laa",
"alias_type": "name",
"alias_value": "Anaheim Angels",
"valid_from": "1997-01-01",
"valid_until": "2004-12-31"
},
{
"id": "alias_mlb_15",
"team_canonical_id": "team_mlb_laa",
"alias_type": "name",
"alias_value": "Los Angeles Angels of Anaheim",
"valid_from": "2005-01-01",
"valid_until": "2015-12-31"
},
{
"id": "alias_mlb_16",
"team_canonical_id": "team_mlb_laa",
"alias_type": "name",
"alias_value": "California Angels",
"valid_from": "1965-01-01",
"valid_until": "1996-12-31"
},
{
"id": "alias_mlb_17",
"team_canonical_id": "team_mlb_tex",
"alias_type": "name",
"alias_value": "Washington Senators",
"valid_from": "1961-01-01",
"valid_until": "1971-12-31"
},
{
"id": "alias_mlb_18",
"team_canonical_id": "team_mlb_tex",
"alias_type": "abbreviation",
"alias_value": "WS2",
"valid_from": "1961-01-01",
"valid_until": "1971-12-31"
},
{
"id": "alias_mlb_19",
"team_canonical_id": "team_mlb_tex",
"alias_type": "city",
"alias_value": "Washington",
"valid_from": "1961-01-01",
"valid_until": "1971-12-31"
},
{
"id": "alias_mlb_20",
"team_canonical_id": "team_mlb_mil",
"alias_type": "name",
"alias_value": "Seattle Pilots",
"valid_from": "1969-01-01",
"valid_until": "1969-12-31"
},
{
"id": "alias_mlb_21",
"team_canonical_id": "team_mlb_mil",
"alias_type": "abbreviation",
"alias_value": "SEP",
"valid_from": "1969-01-01",
"valid_until": "1969-12-31"
},
{
"id": "alias_mlb_22",
"team_canonical_id": "team_mlb_mil",
"alias_type": "city",
"alias_value": "Seattle",
"valid_from": "1969-01-01",
"valid_until": "1969-12-31"
},
{
"id": "alias_mlb_23",
"team_canonical_id": "team_mlb_hou",
"alias_type": "name",
"alias_value": "Houston Colt .45s",
"valid_from": "1962-01-01",
"valid_until": "1964-12-31"
},
{
"id": "alias_nba_24",
"team_canonical_id": "team_nba_brk",
"alias_type": "name",
"alias_value": "New Jersey Nets",
"valid_from": "1977-01-01",
"valid_until": "2012-04-30"
},
{
"id": "alias_nba_25",
"team_canonical_id": "team_nba_brk",
"alias_type": "abbreviation",
"alias_value": "NJN",
"valid_from": "1977-01-01",
"valid_until": "2012-04-30"
},
{
"id": "alias_nba_26",
"team_canonical_id": "team_nba_brk",
"alias_type": "city",
"alias_value": "New Jersey",
"valid_from": "1977-01-01",
"valid_until": "2012-04-30"
},
{
"id": "alias_nba_27",
"team_canonical_id": "team_nba_brk",
"alias_type": "name",
"alias_value": "New York Nets",
"valid_from": "1968-01-01",
"valid_until": "1977-12-31"
},
{
"id": "alias_nba_28",
"team_canonical_id": "team_nba_okc",
"alias_type": "name",
"alias_value": "Seattle SuperSonics",
"valid_from": "1967-01-01",
"valid_until": "2008-07-01"
},
{
"id": "alias_nba_29",
"team_canonical_id": "team_nba_okc",
"alias_type": "abbreviation",
"alias_value": "SEA",
"valid_from": "1967-01-01",
"valid_until": "2008-07-01"
},
{
"id": "alias_nba_30",
"team_canonical_id": "team_nba_okc",
"alias_type": "city",
"alias_value": "Seattle",
"valid_from": "1967-01-01",
"valid_until": "2008-07-01"
},
{
"id": "alias_nba_31",
"team_canonical_id": "team_nba_mem",
"alias_type": "name",
"alias_value": "Vancouver Grizzlies",
"valid_from": "1995-01-01",
"valid_until": "2001-05-31"
},
{
"id": "alias_nba_32",
"team_canonical_id": "team_nba_mem",
"alias_type": "abbreviation",
"alias_value": "VAN",
"valid_from": "1995-01-01",
"valid_until": "2001-05-31"
},
{
"id": "alias_nba_33",
"team_canonical_id": "team_nba_mem",
"alias_type": "city",
"alias_value": "Vancouver",
"valid_from": "1995-01-01",
"valid_until": "2001-05-31"
},
{
"id": "alias_nba_34",
"team_canonical_id": "team_nba_nop",
"alias_type": "name",
"alias_value": "New Orleans Hornets",
"valid_from": "2002-01-01",
"valid_until": "2013-04-30"
},
{
"id": "alias_nba_35",
"team_canonical_id": "team_nba_nop",
"alias_type": "abbreviation",
"alias_value": "NOH",
"valid_from": "2002-01-01",
"valid_until": "2013-04-30"
},
{
"id": "alias_nba_36",
"team_canonical_id": "team_nba_nop",
"alias_type": "name",
"alias_value": "New Orleans/Oklahoma City Hornets",
"valid_from": "2005-01-01",
"valid_until": "2007-12-31"
},
{
"id": "alias_nba_37",
"team_canonical_id": "team_nba_cho",
"alias_type": "name",
"alias_value": "Charlotte Bobcats",
"valid_from": "2004-01-01",
"valid_until": "2014-04-30"
},
{
"id": "alias_nba_38",
"team_canonical_id": "team_nba_cho",
"alias_type": "abbreviation",
"alias_value": "CHA",
"valid_from": "2004-01-01",
"valid_until": "2014-04-30"
},
{
"id": "alias_nba_39",
"team_canonical_id": "team_nba_was",
"alias_type": "name",
"alias_value": "Washington Bullets",
"valid_from": "1974-01-01",
"valid_until": "1997-05-31"
},
{
"id": "alias_nba_40",
"team_canonical_id": "team_nba_was",
"alias_type": "name",
"alias_value": "Capital Bullets",
"valid_from": "1973-01-01",
"valid_until": "1973-12-31"
},
{
"id": "alias_nba_41",
"team_canonical_id": "team_nba_was",
"alias_type": "name",
"alias_value": "Baltimore Bullets",
"valid_from": "1963-01-01",
"valid_until": "1972-12-31"
},
{
"id": "alias_nba_42",
"team_canonical_id": "team_nba_lac",
"alias_type": "name",
"alias_value": "San Diego Clippers",
"valid_from": "1978-01-01",
"valid_until": "1984-05-31"
},
{
"id": "alias_nba_43",
"team_canonical_id": "team_nba_lac",
"alias_type": "abbreviation",
"alias_value": "SDC",
"valid_from": "1978-01-01",
"valid_until": "1984-05-31"
},
{
"id": "alias_nba_44",
"team_canonical_id": "team_nba_lac",
"alias_type": "city",
"alias_value": "San Diego",
"valid_from": "1978-01-01",
"valid_until": "1984-05-31"
},
{
"id": "alias_nba_45",
"team_canonical_id": "team_nba_lac",
"alias_type": "name",
"alias_value": "Buffalo Braves",
"valid_from": "1970-01-01",
"valid_until": "1978-05-31"
},
{
"id": "alias_nba_46",
"team_canonical_id": "team_nba_lac",
"alias_type": "abbreviation",
"alias_value": "BUF",
"valid_from": "1970-01-01",
"valid_until": "1978-05-31"
},
{
"id": "alias_nba_47",
"team_canonical_id": "team_nba_lac",
"alias_type": "city",
"alias_value": "Buffalo",
"valid_from": "1970-01-01",
"valid_until": "1978-05-31"
},
{
"id": "alias_nba_48",
"team_canonical_id": "team_nba_sac",
"alias_type": "name",
"alias_value": "Kansas City Kings",
"valid_from": "1975-01-01",
"valid_until": "1985-05-31"
},
{
"id": "alias_nba_49",
"team_canonical_id": "team_nba_sac",
"alias_type": "abbreviation",
"alias_value": "KCK",
"valid_from": "1975-01-01",
"valid_until": "1985-05-31"
},
{
"id": "alias_nba_50",
"team_canonical_id": "team_nba_sac",
"alias_type": "city",
"alias_value": "Kansas City",
"valid_from": "1975-01-01",
"valid_until": "1985-05-31"
},
{
"id": "alias_nba_51",
"team_canonical_id": "team_nba_uta",
"alias_type": "name",
"alias_value": "New Orleans Jazz",
"valid_from": "1974-01-01",
"valid_until": "1979-05-31"
},
{
"id": "alias_nba_52",
"team_canonical_id": "team_nba_uta",
"alias_type": "city",
"alias_value": "New Orleans",
"valid_from": "1974-01-01",
"valid_until": "1979-05-31"
},
{
"id": "alias_nhl_53",
"team_canonical_id": "team_nhl_ari",
"alias_type": "name",
"alias_value": "Arizona Coyotes",
"valid_from": "2014-01-01",
"valid_until": "2024-04-30"
},
{
"id": "alias_nhl_54",
"team_canonical_id": "team_nhl_ari",
"alias_type": "name",
"alias_value": "Phoenix Coyotes",
"valid_from": "1996-01-01",
"valid_until": "2013-12-31"
},
{
"id": "alias_nhl_55",
"team_canonical_id": "team_nhl_ari",
"alias_type": "abbreviation",
"alias_value": "PHX",
"valid_from": "1996-01-01",
"valid_until": "2013-12-31"
},
{
"id": "alias_nhl_56",
"team_canonical_id": "team_nhl_ari",
"alias_type": "city",
"alias_value": "Phoenix",
"valid_from": "1996-01-01",
"valid_until": "2013-12-31"
},
{
"id": "alias_nhl_57",
"team_canonical_id": "team_nhl_ari",
"alias_type": "name",
"alias_value": "Winnipeg Jets",
"valid_from": "1979-01-01",
"valid_until": "1996-05-31"
},
{
"id": "alias_nhl_58",
"team_canonical_id": "team_nhl_car",
"alias_type": "name",
"alias_value": "Hartford Whalers",
"valid_from": "1979-01-01",
"valid_until": "1997-05-31"
},
{
"id": "alias_nhl_59",
"team_canonical_id": "team_nhl_car",
"alias_type": "abbreviation",
"alias_value": "HFD",
"valid_from": "1979-01-01",
"valid_until": "1997-05-31"
},
{
"id": "alias_nhl_60",
"team_canonical_id": "team_nhl_car",
"alias_type": "city",
"alias_value": "Hartford",
"valid_from": "1979-01-01",
"valid_until": "1997-05-31"
},
{
"id": "alias_nhl_61",
"team_canonical_id": "team_nhl_col",
"alias_type": "name",
"alias_value": "Quebec Nordiques",
"valid_from": "1979-01-01",
"valid_until": "1995-05-31"
},
{
"id": "alias_nhl_62",
"team_canonical_id": "team_nhl_col",
"alias_type": "abbreviation",
"alias_value": "QUE",
"valid_from": "1979-01-01",
"valid_until": "1995-05-31"
},
{
"id": "alias_nhl_63",
"team_canonical_id": "team_nhl_col",
"alias_type": "city",
"alias_value": "Quebec",
"valid_from": "1979-01-01",
"valid_until": "1995-05-31"
},
{
"id": "alias_nhl_64",
"team_canonical_id": "team_nhl_dal",
"alias_type": "name",
"alias_value": "Minnesota North Stars",
"valid_from": "1967-01-01",
"valid_until": "1993-05-31"
},
{
"id": "alias_nhl_65",
"team_canonical_id": "team_nhl_dal",
"alias_type": "abbreviation",
"alias_value": "MNS",
"valid_from": "1967-01-01",
"valid_until": "1993-05-31"
},
{
"id": "alias_nhl_66",
"team_canonical_id": "team_nhl_dal",
"alias_type": "city",
"alias_value": "Minnesota",
"valid_from": "1967-01-01",
"valid_until": "1993-05-31"
},
{
"id": "alias_nhl_67",
"team_canonical_id": "team_nhl_njd",
"alias_type": "name",
"alias_value": "Colorado Rockies",
"valid_from": "1976-01-01",
"valid_until": "1982-05-31"
},
{
"id": "alias_nhl_68",
"team_canonical_id": "team_nhl_njd",
"alias_type": "abbreviation",
"alias_value": "CLR",
"valid_from": "1976-01-01",
"valid_until": "1982-05-31"
},
{
"id": "alias_nhl_69",
"team_canonical_id": "team_nhl_njd",
"alias_type": "city",
"alias_value": "Colorado",
"valid_from": "1976-01-01",
"valid_until": "1982-05-31"
},
{
"id": "alias_nhl_70",
"team_canonical_id": "team_nhl_njd",
"alias_type": "name",
"alias_value": "Kansas City Scouts",
"valid_from": "1974-01-01",
"valid_until": "1976-05-31"
},
{
"id": "alias_nhl_71",
"team_canonical_id": "team_nhl_njd",
"alias_type": "abbreviation",
"alias_value": "KCS",
"valid_from": "1974-01-01",
"valid_until": "1976-05-31"
},
{
"id": "alias_nhl_72",
"team_canonical_id": "team_nhl_njd",
"alias_type": "city",
"alias_value": "Kansas City",
"valid_from": "1974-01-01",
"valid_until": "1976-05-31"
},
{
"id": "alias_nhl_73",
"team_canonical_id": "team_nhl_wpg",
"alias_type": "name",
"alias_value": "Atlanta Thrashers",
"valid_from": "1999-01-01",
"valid_until": "2011-05-31"
},
{
"id": "alias_nhl_74",
"team_canonical_id": "team_nhl_wpg",
"alias_type": "abbreviation",
"alias_value": "ATL",
"valid_from": "1999-01-01",
"valid_until": "2011-05-31"
},
{
"id": "alias_nhl_75",
"team_canonical_id": "team_nhl_wpg",
"alias_type": "city",
"alias_value": "Atlanta",
"valid_from": "1999-01-01",
"valid_until": "2011-05-31"
},
{
"id": "alias_nhl_76",
"team_canonical_id": "team_nhl_fla",
"alias_type": "city",
"alias_value": "Miami",
"valid_from": "1993-01-01",
"valid_until": "1998-12-31"
},
{
"id": "alias_nfl_77",
"team_canonical_id": "team_nfl_was",
"alias_type": "name",
"alias_value": "Washington Redskins",
"valid_from": "1937-01-01",
"valid_until": "2020-07-13"
},
{
"id": "alias_nfl_78",
"team_canonical_id": "team_nfl_was",
"alias_type": "name",
"alias_value": "Washington Football Team",
"valid_from": "2020-07-13",
"valid_until": "2022-02-02"
},
{
"id": "alias_nfl_79",
"team_canonical_id": "team_nfl_was",
"alias_type": "abbreviation",
"alias_value": "WFT",
"valid_from": "2020-07-13",
"valid_until": "2022-02-02"
}
]

View File

@@ -1,99 +0,0 @@
#!/usr/bin/env python3
"""Validate alias files for orphan references and format issues.
This script checks stadium_aliases.json and team_aliases.json for:
1. Orphan references (aliases pointing to non-existent canonical IDs)
2. JSON syntax errors
3. Required field presence
Usage:
python validate_aliases.py
Returns exit code 0 on success, 1 on failure.
"""
import json
import sys
from pathlib import Path
# Add parent to path for imports
sys.path.insert(0, str(Path(__file__).parent))
from sportstime_parser.normalizers.stadium_resolver import STADIUM_MAPPINGS
from sportstime_parser.normalizers.team_resolver import TEAM_MAPPINGS
def main() -> int:
"""Run validation checks on alias files."""
errors: list[str] = []
# Build valid stadium ID set
valid_stadium_ids: set[str] = set()
for sport_stadiums in STADIUM_MAPPINGS.values():
for stadium_id in sport_stadiums.keys():
valid_stadium_ids.add(stadium_id)
# Build valid team ID set
valid_team_ids: set[str] = set()
for sport_teams in TEAM_MAPPINGS.values():
for abbrev, team_data in sport_teams.items():
valid_team_ids.add(team_data[0]) # team_id is first element
print(f"Valid stadium IDs: {len(valid_stadium_ids)}")
print(f"Valid team IDs: {len(valid_team_ids)}")
print()
# Check stadium aliases
try:
stadium_aliases = json.load(open("stadium_aliases.json"))
print(f"✓ stadium_aliases.json: Valid JSON ({len(stadium_aliases)} aliases)")
for alias in stadium_aliases:
# Check required fields
if "alias_name" not in alias:
errors.append(f"Stadium alias missing 'alias_name': {alias}")
if "stadium_canonical_id" not in alias:
errors.append(f"Stadium alias missing 'stadium_canonical_id': {alias}")
elif alias["stadium_canonical_id"] not in valid_stadium_ids:
errors.append(
f"Orphan stadium alias: '{alias.get('alias_name', '?')}' -> "
f"'{alias['stadium_canonical_id']}'"
)
except FileNotFoundError:
errors.append("stadium_aliases.json not found")
except json.JSONDecodeError as e:
errors.append(f"stadium_aliases.json: Invalid JSON - {e}")
# Check team aliases
try:
team_aliases = json.load(open("team_aliases.json"))
print(f"✓ team_aliases.json: Valid JSON ({len(team_aliases)} aliases)")
for alias in team_aliases:
# Check required fields
if "team_canonical_id" not in alias:
errors.append(f"Team alias missing 'team_canonical_id': {alias}")
elif alias["team_canonical_id"] not in valid_team_ids:
errors.append(
f"Orphan team alias: '{alias.get('alias_value', '?')}' -> "
f"'{alias['team_canonical_id']}'"
)
except FileNotFoundError:
errors.append("team_aliases.json not found")
except json.JSONDecodeError as e:
errors.append(f"team_aliases.json: Invalid JSON - {e}")
# Report results
print()
if errors:
print(f"❌ Validation failed with {len(errors)} error(s):")
for error in errors:
print(f" - {error}")
return 1
print("✅ All aliases valid")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@@ -28,6 +28,7 @@ enum PlanningMode: String, Codable, CaseIterable, Identifiable {
case gameFirst // Pick games first, trip around those games
case locations // Start/end locations, optional games along route
case followTeam // Follow one team's schedule (home + away)
case teamFirst // Select teams, find optimal trip windows across season
var id: String { rawValue }
@@ -37,6 +38,7 @@ enum PlanningMode: String, Codable, CaseIterable, Identifiable {
case .gameFirst: return "By Games"
case .locations: return "By Route"
case .followTeam: return "Follow Team"
case .teamFirst: return "By Teams"
}
}
@@ -46,6 +48,7 @@ enum PlanningMode: String, Codable, CaseIterable, Identifiable {
case .gameFirst: return "Build trip around specific games"
case .locations: return "Plan route between locations"
case .followTeam: return "Follow your team on the road"
case .teamFirst: return "Select teams, find best trip windows"
}
}
@@ -55,6 +58,7 @@ enum PlanningMode: String, Codable, CaseIterable, Identifiable {
case .gameFirst: return "sportscourt"
case .locations: return "map"
case .followTeam: return "person.3.fill"
case .teamFirst: return "person.2.badge.gearshape"
}
}
}
@@ -258,6 +262,9 @@ struct TripPreferences: Codable, Hashable {
/// Trip duration for gameFirst mode sliding windows (number of days)
var gameFirstTripDuration: Int
/// Teams to visit (for Team-First mode) - canonical team IDs
var selectedTeamIds: Set<String> = []
init(
planningMode: PlanningMode = .dateRange,
startLocation: LocationInput? = nil,
@@ -281,7 +288,8 @@ struct TripPreferences: Codable, Hashable {
selectedRegions: Set<Region> = [.east, .central, .west],
followTeamId: String? = nil,
useHomeLocation: Bool = true,
gameFirstTripDuration: Int = 7
gameFirstTripDuration: Int = 7,
selectedTeamIds: Set<String> = []
) {
self.planningMode = planningMode
self.startLocation = startLocation
@@ -306,6 +314,7 @@ struct TripPreferences: Codable, Hashable {
self.followTeamId = followTeamId
self.useHomeLocation = useHomeLocation
self.gameFirstTripDuration = gameFirstTripDuration
self.selectedTeamIds = selectedTeamIds
}
var totalDriverHoursPerDay: Double {
@@ -318,4 +327,9 @@ struct TripPreferences: Codable, Hashable {
let days = Calendar.current.dateComponents([.day], from: startDate, to: endDate).day ?? 7
return max(1, days)
}
/// Maximum trip duration for Team-First mode (2 days per selected team)
var teamFirstMaxDays: Int {
selectedTeamIds.count * 2
}
}

View File

@@ -122,7 +122,6 @@ struct AnimatedSportsIcon: View {
(0.9, 0.85, "mappin.circle.fill", -8, 0.75),
(0.5, 0.03, "car.fill", 0, 0.7),
(0.5, 0.97, "map.fill", 3, 0.75),
(0.25, 0.93, "stadium.fill", -5, 0.7),
(0.75, 0.95, "flag.checkered", 7, 0.7),
// Middle area icons (will appear behind cards)
(0.35, 0.22, "tennisball.fill", -8, 0.65),

View File

@@ -28,7 +28,6 @@ struct SportsIconImageGenerator {
"figure.soccer",
// Venues/events
"sportscourt.fill",
"stadium.fill",
"trophy.fill",
"ticket.fill",
// Travel/navigation

View File

@@ -65,6 +65,11 @@ final class TripWizardViewModel {
var startLocation: LocationInput? = nil
var endLocation: LocationInput? = nil
// MARK: - Mode-Specific: teamFirst
var teamFirstSport: Sport? = nil
var teamFirstSelectedTeamIds: Set<String> = []
// MARK: - Planning State
var isPlanning: Bool = false
@@ -106,6 +111,10 @@ final class TripWizardViewModel {
planningMode == .locations
}
var showTeamFirstStep: Bool {
planningMode == .teamFirst
}
// MARK: - Validation
/// All required fields must be set before planning
@@ -124,6 +133,8 @@ final class TripWizardViewModel {
return startLocation != nil && endLocation != nil && hasSetDates && !selectedSports.isEmpty
case .followTeam:
return selectedTeamId != nil && hasSetDates
case .teamFirst:
return teamFirstSport != nil && teamFirstSelectedTeamIds.count >= 2
}
}
@@ -139,7 +150,9 @@ final class TripWizardViewModel {
selectedGames: selectedGameIds.isEmpty ? .missing : .valid,
selectedTeam: selectedTeamId == nil ? .missing : .valid,
startLocation: startLocation == nil ? .missing : .valid,
endLocation: endLocation == nil ? .missing : .valid
endLocation: endLocation == nil ? .missing : .valid,
teamFirstTeams: teamFirstSelectedTeamIds.count >= 2 ? .valid : .missing,
teamFirstTeamCount: teamFirstSelectedTeamIds.count
)
}
@@ -198,6 +211,10 @@ final class TripWizardViewModel {
// locations mode fields
startLocation = nil
endLocation = nil
// teamFirst mode fields
teamFirstSport = nil
teamFirstSelectedTeamIds = []
}
}
@@ -223,6 +240,8 @@ struct FieldValidation {
let selectedTeam: Status
let startLocation: Status
let endLocation: Status
let teamFirstTeams: Status
let teamFirstTeamCount: Int
/// Returns only the fields that are required for the current planning mode
var requiredFields: [(name: String, status: Status)] {
@@ -261,6 +280,12 @@ struct FieldValidation {
("Route Preference", routePreference),
("Repeat Cities", repeatCities)
]
case .teamFirst:
fields = [
("Teams", teamFirstTeams),
("Route Preference", routePreference),
("Repeat Cities", repeatCities)
]
}
return fields

View File

@@ -0,0 +1,189 @@
//
// TeamPickerView.swift
// SportsTime
//
// Multi-select team picker grid for Team-First planning mode.
// Users select multiple teams (>=2) and the app finds optimal trip windows.
//
import SwiftUI
struct TeamPickerView: View {
@Environment(\.colorScheme) private var colorScheme
let sport: Sport
@Binding var selectedTeamIds: Set<String>
@State private var searchText = ""
private let columns = [
GridItem(.flexible()),
GridItem(.flexible()),
GridItem(.flexible())
]
private var teams: [Team] {
let allTeams = AppDataProvider.shared.teams
.filter { $0.sport == sport }
.sorted { $0.fullName < $1.fullName }
if searchText.isEmpty {
return allTeams
}
return allTeams.filter {
$0.fullName.localizedCaseInsensitiveContains(searchText) ||
$0.city.localizedCaseInsensitiveContains(searchText) ||
$0.abbreviation.localizedCaseInsensitiveContains(searchText)
}
}
var body: some View {
VStack(spacing: Theme.Spacing.md) {
// Search bar
HStack {
Image(systemName: "magnifyingglass")
.foregroundStyle(Theme.textMuted(colorScheme))
TextField("Search teams...", text: $searchText)
.textFieldStyle(.plain)
if !searchText.isEmpty {
Button {
searchText = ""
} label: {
Image(systemName: "xmark.circle.fill")
.foregroundStyle(Theme.textMuted(colorScheme))
}
}
}
.padding(Theme.Spacing.sm)
.background(Theme.cardBackgroundElevated(colorScheme))
.clipShape(RoundedRectangle(cornerRadius: Theme.CornerRadius.medium))
// Selection count badge
if !selectedTeamIds.isEmpty {
HStack {
Text("\(selectedTeamIds.count) team\(selectedTeamIds.count == 1 ? "" : "s") selected")
.font(.caption)
.fontWeight(.medium)
.foregroundStyle(.white)
.padding(.horizontal, Theme.Spacing.sm)
.padding(.vertical, Theme.Spacing.xs)
.background(Theme.warmOrange)
.clipShape(Capsule())
Spacer()
Button("Clear all") {
withAnimation(Theme.Animation.spring) {
selectedTeamIds.removeAll()
}
}
.font(.caption)
.foregroundStyle(Theme.warmOrange)
}
}
// Team grid
ScrollView {
LazyVGrid(columns: columns, spacing: Theme.Spacing.sm) {
ForEach(teams) { team in
TeamCard(
team: team,
isSelected: selectedTeamIds.contains(team.id),
onTap: { toggleTeam(team) }
)
}
}
.padding(.bottom, Theme.Spacing.lg)
}
}
}
private func toggleTeam(_ team: Team) {
withAnimation(Theme.Animation.spring) {
if selectedTeamIds.contains(team.id) {
selectedTeamIds.remove(team.id)
} else {
selectedTeamIds.insert(team.id)
}
}
}
}
// MARK: - Team Card
private struct TeamCard: View {
@Environment(\.colorScheme) private var colorScheme
let team: Team
let isSelected: Bool
let onTap: () -> Void
var body: some View {
Button(action: onTap) {
VStack(spacing: Theme.Spacing.xs) {
// Team color circle with checkmark overlay
ZStack {
Circle()
.fill(teamColor)
.frame(width: 44, height: 44)
if isSelected {
Circle()
.fill(Color.black.opacity(0.3))
.frame(width: 44, height: 44)
Image(systemName: "checkmark")
.font(.title3)
.fontWeight(.bold)
.foregroundStyle(.white)
}
}
// Team name
Text(team.name)
.font(.caption)
.fontWeight(isSelected ? .semibold : .regular)
.foregroundStyle(isSelected ? Theme.warmOrange : Theme.textPrimary(colorScheme))
.lineLimit(1)
.minimumScaleFactor(0.8)
// City
Text(team.city)
.font(.caption2)
.foregroundStyle(Theme.textMuted(colorScheme))
.lineLimit(1)
}
.frame(maxWidth: .infinity)
.padding(.vertical, Theme.Spacing.sm)
.background(isSelected ? Theme.warmOrange.opacity(0.15) : Color.clear)
.clipShape(RoundedRectangle(cornerRadius: Theme.CornerRadius.medium))
.contentShape(Rectangle())
.overlay(
RoundedRectangle(cornerRadius: Theme.CornerRadius.medium)
.stroke(isSelected ? Theme.warmOrange : Theme.textMuted(colorScheme).opacity(0.3), lineWidth: isSelected ? 2 : 1)
)
}
.buttonStyle(.plain)
}
private var teamColor: Color {
if let hex = team.primaryColor {
return Color(hex: hex)
}
return team.sport.themeColor
}
}
// MARK: - Preview
#Preview {
TeamPickerView(
sport: .mlb,
selectedTeamIds: .constant(["team_mlb_bos", "team_mlb_nyy"])
)
.padding()
.themedBackground()
}

View File

@@ -27,6 +27,7 @@ struct ReviewStep: View {
var selectedTeamName: String? = nil
var startLocationName: String? = nil
var endLocationName: String? = nil
var teamFirstTeamCount: Int = 0
var body: some View {
VStack(alignment: .leading, spacing: Theme.Spacing.md) {
@@ -114,6 +115,8 @@ struct ReviewStep: View {
return startLocationName ?? "Not selected"
case "End Location":
return endLocationName ?? "Not selected"
case "Teams":
return teamFirstTeamCount >= 2 ? "\(teamFirstTeamCount) teams selected" : "Select at least 2 teams"
default:
return ""
}
@@ -180,7 +183,9 @@ private struct ReviewRow: View {
selectedGames: .valid,
selectedTeam: .valid,
startLocation: .valid,
endLocation: .valid
endLocation: .valid,
teamFirstTeams: .valid,
teamFirstTeamCount: 0
),
onPlan: {}
)
@@ -210,7 +215,9 @@ private struct ReviewRow: View {
selectedGames: .valid,
selectedTeam: .valid,
startLocation: .valid,
endLocation: .valid
endLocation: .valid,
teamFirstTeams: .valid,
teamFirstTeamCount: 0
),
onPlan: {}
)

View File

@@ -0,0 +1,347 @@
//
// TeamFirstWizardStep.swift
// SportsTime
//
// Wizard step for Team-First planning mode.
// Uses sheet-based drill-down matching TeamPickerStep: Sport Teams (multi-select).
//
import SwiftUI
struct TeamFirstWizardStep: View {
@Environment(\.colorScheme) private var colorScheme
@Binding var selectedSport: Sport?
@Binding var selectedTeamIds: Set<String>
@State private var showTeamPicker = false
private var selectedTeams: [Team] {
selectedTeamIds.compactMap { teamId in
AppDataProvider.shared.teams.first { $0.id == teamId }
}.sorted { $0.fullName < $1.fullName }
}
private var isValid: Bool {
selectedTeamIds.count >= 2
}
var body: some View {
VStack(alignment: .leading, spacing: Theme.Spacing.md) {
StepHeader(
title: "Which teams do you want to see?",
subtitle: "Select 2 or more teams to find optimal trip windows"
)
// Selection button
Button {
showTeamPicker = true
} label: {
HStack {
if !selectedTeams.isEmpty {
// Show selected teams
teamPreview
Spacer()
Button {
selectedTeamIds.removeAll()
selectedSport = nil
} label: {
Image(systemName: "xmark.circle.fill")
.foregroundStyle(Theme.textMuted(colorScheme))
}
} else {
// Empty state
Image(systemName: "person.2.fill")
.foregroundStyle(Theme.warmOrange)
Text("Select teams")
.font(.subheadline)
.foregroundStyle(Theme.textMuted(colorScheme))
Spacer()
Image(systemName: "chevron.right")
.font(.caption)
.foregroundStyle(Theme.textMuted(colorScheme))
}
}
.padding(Theme.Spacing.md)
.background(Theme.cardBackgroundElevated(colorScheme))
.clipShape(RoundedRectangle(cornerRadius: Theme.CornerRadius.medium))
.overlay(
RoundedRectangle(cornerRadius: Theme.CornerRadius.medium)
.stroke(isValid ? Theme.warmOrange : Theme.textMuted(colorScheme).opacity(0.3), lineWidth: isValid ? 2 : 1)
)
}
.buttonStyle(.plain)
// Validation message
if selectedTeamIds.isEmpty {
validationLabel(text: "Select at least 2 teams", isValid: false)
} else if selectedTeamIds.count == 1 {
validationLabel(text: "Select 1 more team (minimum 2)", isValid: false)
} else {
validationLabel(text: "Ready to find trips for \(selectedTeamIds.count) teams", isValid: true)
}
}
.padding(Theme.Spacing.lg)
.background(Theme.cardBackground(colorScheme))
.clipShape(RoundedRectangle(cornerRadius: Theme.CornerRadius.large))
.overlay {
RoundedRectangle(cornerRadius: Theme.CornerRadius.large)
.stroke(Theme.surfaceGlow(colorScheme), lineWidth: 1)
}
.sheet(isPresented: $showTeamPicker) {
TeamFirstPickerSheet(
selectedSport: $selectedSport,
selectedTeamIds: $selectedTeamIds
)
}
}
// MARK: - Team Preview
@ViewBuilder
private var teamPreview: some View {
if selectedTeams.count <= 3 {
// Show individual teams
HStack(spacing: Theme.Spacing.sm) {
ForEach(selectedTeams.prefix(3)) { team in
HStack(spacing: Theme.Spacing.xs) {
Circle()
.fill(team.primaryColor.map { Color(hex: $0) } ?? team.sport.themeColor)
.frame(width: 20, height: 20)
Text(team.abbreviation)
.font(.caption)
.fontWeight(.medium)
.foregroundStyle(Theme.textPrimary(colorScheme))
}
.padding(.horizontal, Theme.Spacing.sm)
.padding(.vertical, Theme.Spacing.xs)
.background(Theme.cardBackground(colorScheme))
.clipShape(Capsule())
}
}
} else {
// Show count with first few team colors
HStack(spacing: -8) {
ForEach(Array(selectedTeams.prefix(4).enumerated()), id: \.element.id) { index, team in
Circle()
.fill(team.primaryColor.map { Color(hex: $0) } ?? team.sport.themeColor)
.frame(width: 24, height: 24)
.overlay(
Circle()
.stroke(Theme.cardBackgroundElevated(colorScheme), lineWidth: 2)
)
.zIndex(Double(4 - index))
}
}
Text("\(selectedTeamIds.count) teams")
.font(.subheadline)
.fontWeight(.medium)
.foregroundStyle(Theme.textPrimary(colorScheme))
}
}
private func validationLabel(text: String, isValid: Bool) -> some View {
HStack(spacing: Theme.Spacing.xs) {
Image(systemName: isValid ? "checkmark.circle.fill" : "info.circle.fill")
.foregroundStyle(isValid ? .green : Theme.warmOrange)
Text(text)
.font(.caption)
.foregroundStyle(Theme.textMuted(colorScheme))
Spacer()
}
}
}
// MARK: - Team First Picker Sheet
private struct TeamFirstPickerSheet: View {
@Environment(\.dismiss) private var dismiss
@Environment(\.colorScheme) private var colorScheme
@Binding var selectedSport: Sport?
@Binding var selectedTeamIds: Set<String>
var body: some View {
NavigationStack {
List {
ForEach(Sport.supported, id: \.self) { sport in
NavigationLink {
TeamMultiSelectListView(
sport: sport,
selectedTeamIds: $selectedTeamIds,
onDone: {
selectedSport = sport
dismiss()
}
)
} label: {
HStack(spacing: Theme.Spacing.sm) {
Image(systemName: sport.iconName)
.font(.title2)
.foregroundStyle(sport.themeColor)
.frame(width: 32)
Text(sport.rawValue)
.font(.body)
.foregroundStyle(Theme.textPrimary(colorScheme))
Spacer()
Text("\(teamsCount(for: sport)) teams")
.font(.caption)
.foregroundStyle(Theme.textMuted(colorScheme))
}
.padding(.vertical, Theme.Spacing.xs)
}
}
}
.listStyle(.plain)
.navigationTitle("Select Sport")
.navigationBarTitleDisplayMode(.inline)
.toolbar {
ToolbarItem(placement: .cancellationAction) {
Button("Cancel") { dismiss() }
}
}
}
.presentationDetents([.large])
}
private func teamsCount(for sport: Sport) -> Int {
AppDataProvider.shared.teams.filter { $0.sport == sport }.count
}
}
// MARK: - Team Multi-Select List View
private struct TeamMultiSelectListView: View {
@Environment(\.colorScheme) private var colorScheme
let sport: Sport
@Binding var selectedTeamIds: Set<String>
let onDone: () -> Void
@State private var searchText = ""
private var teams: [Team] {
let allTeams = AppDataProvider.shared.teams
.filter { $0.sport == sport }
.sorted { $0.fullName < $1.fullName }
if searchText.isEmpty {
return allTeams
}
return allTeams.filter {
$0.fullName.localizedCaseInsensitiveContains(searchText) ||
$0.city.localizedCaseInsensitiveContains(searchText)
}
}
private var selectionCount: Int {
// Count only teams from the current sport
let sportTeamIds = Set(AppDataProvider.shared.teams.filter { $0.sport == sport }.map { $0.id })
return selectedTeamIds.intersection(sportTeamIds).count
}
var body: some View {
List {
ForEach(teams) { team in
Button {
toggleTeam(team)
} label: {
HStack(spacing: Theme.Spacing.sm) {
Circle()
.fill(team.primaryColor.map { Color(hex: $0) } ?? sport.themeColor)
.frame(width: 28, height: 28)
VStack(alignment: .leading, spacing: 2) {
Text(team.fullName)
.font(.body)
.foregroundStyle(Theme.textPrimary(colorScheme))
Text(team.city)
.font(.caption)
.foregroundStyle(Theme.textMuted(colorScheme))
}
Spacer()
if selectedTeamIds.contains(team.id) {
Image(systemName: "checkmark.circle.fill")
.foregroundStyle(Theme.warmOrange)
} else {
Image(systemName: "circle")
.foregroundStyle(Theme.textMuted(colorScheme).opacity(0.5))
}
}
.padding(.vertical, Theme.Spacing.xs)
}
.buttonStyle(.plain)
}
}
.listStyle(.plain)
.searchable(text: $searchText, prompt: "Search teams")
.navigationTitle(sport.rawValue)
.navigationBarTitleDisplayMode(.inline)
.toolbar {
ToolbarItem(placement: .confirmationAction) {
Button {
onDone()
} label: {
if selectionCount >= 2 {
Text("Done (\(selectionCount))")
.fontWeight(.semibold)
} else {
Text("Done")
}
}
.disabled(selectionCount < 2)
}
}
.onAppear {
// Clear selections from other sports when entering
let sportTeamIds = Set(AppDataProvider.shared.teams.filter { $0.sport == sport }.map { $0.id })
selectedTeamIds = selectedTeamIds.intersection(sportTeamIds)
}
}
private func toggleTeam(_ team: Team) {
withAnimation(.easeInOut(duration: 0.15)) {
if selectedTeamIds.contains(team.id) {
selectedTeamIds.remove(team.id)
} else {
selectedTeamIds.insert(team.id)
}
}
}
}
// MARK: - Preview
#Preview("Empty") {
TeamFirstWizardStep(
selectedSport: .constant(nil),
selectedTeamIds: .constant([])
)
.padding()
.themedBackground()
}
#Preview("Teams Selected") {
TeamFirstWizardStep(
selectedSport: .constant(.mlb),
selectedTeamIds: .constant(["team_mlb_bos", "team_mlb_nyy", "team_mlb_chc"])
)
.padding()
.themedBackground()
}

View File

@@ -60,6 +60,13 @@ struct TripWizardView: View {
)
}
if viewModel.showTeamFirstStep {
TeamFirstWizardStep(
selectedSport: $viewModel.teamFirstSport,
selectedTeamIds: $viewModel.teamFirstSelectedTeamIds
)
}
// Common steps (conditionally shown)
if viewModel.showDatesStep {
DatesStep(
@@ -116,7 +123,8 @@ struct TripWizardView: View {
selectedGameCount: viewModel.selectedGameIds.count,
selectedTeamName: selectedTeamName,
startLocationName: viewModel.startLocation?.name,
endLocationName: viewModel.endLocation?.name
endLocationName: viewModel.endLocation?.name,
teamFirstTeamCount: viewModel.teamFirstSelectedTeamIds.count
)
}
.transition(.opacity)
@@ -167,7 +175,12 @@ struct TripWizardView: View {
// For gameFirst mode, use the UI-selected date range (set by GamePickerStep)
// The date range is a 7-day span centered on the selected game(s)
var games: [Game]
if viewModel.planningMode == .gameFirst && !viewModel.selectedGameIds.isEmpty {
if viewModel.planningMode == .teamFirst {
// Team-First mode: fetch ALL games for the season
// ScenarioEPlanner will generate sliding windows across the full season
games = try await AppDataProvider.shared.allGames(for: preferences.sports)
print("🔍 TripWizard: Team-First mode - fetched \(games.count) games for \(preferences.sports)")
} else if viewModel.planningMode == .gameFirst && !viewModel.selectedGameIds.isEmpty {
// Fetch all games for the selected sports within the UI date range
// GamePickerStep already set viewModel.startDate/endDate to a 7-day span
let allGames = try await AppDataProvider.shared.allGames(for: preferences.sports)
@@ -248,6 +261,8 @@ struct TripWizardView: View {
sports = viewModel.gamePickerSports
} else if viewModel.planningMode == .followTeam, let sport = viewModel.teamPickerSport {
sports = [sport]
} else if viewModel.planningMode == .teamFirst, let sport = viewModel.teamFirstSport {
sports = [sport]
} else {
sports = viewModel.selectedSports
}
@@ -264,7 +279,8 @@ struct TripWizardView: View {
routePreference: viewModel.routePreference,
allowRepeatCities: viewModel.allowRepeatCities,
selectedRegions: viewModel.selectedRegions,
followTeamId: viewModel.selectedTeamId
followTeamId: viewModel.selectedTeamId,
selectedTeamIds: viewModel.teamFirstSelectedTeamIds
)
}

View File

@@ -0,0 +1,498 @@
//
// ScenarioEPlanner.swift
// SportsTime
//
// Scenario E: Team-First planning.
// User selects teams (not dates), we find optimal trip windows across the season.
//
// Key Features:
// - Selected teams determine which home games to visit
// - Sliding window logic generates all N-day windows across the season
// - Windows are filtered to those where ALL selected teams have home games
// - Routes are built and ranked by duration + miles
//
// Sliding Window Algorithm:
// 1. Fetch all home games for selected teams (full season)
// 2. Generate all N-day windows (N = selectedTeamIds.count * 2)
// 3. Filter to windows where each selected team has at least 1 home game
// 4. Cap at 50 windows (sample if more exist)
// 5. For each valid window, find routes using GameDAGRouter with team games as anchors
// 6. Rank by shortest duration + minimal miles
// 7. Return top 10 results
//
// Example:
// User selects: Yankees, Red Sox, Phillies
// Window duration: 3 * 2 = 6 days
// Valid windows: Any 6-day span where all 3 teams have a home game
// Output: Top 10 trip options, ranked by efficiency
//
import Foundation
import CoreLocation
/// Scenario E: Team-First planning
/// Input: selectedTeamIds, sports (for fetching games)
/// Output: Top 10 trip windows with routes visiting all selected teams' home stadiums
///
/// - Expected Behavior:
/// - No selected teams returns .failure with .missingTeamSelection
/// - Fewer than 2 teams returns .failure with .missingTeamSelection
/// - No home games found returns .failure with .noGamesInRange
/// - No valid windows (no overlap) returns .failure with .noValidRoutes
/// - Each returned route visits ALL selected teams' home stadiums
/// - Routes ranked by duration + miles (shorter is better)
/// - Returns maximum of 10 options
///
/// - Invariants:
/// - All routes contain at least one home game per selected team
/// - Window duration = selectedTeamIds.count * 2 days
/// - Maximum 50 windows evaluated (sampled if more exist)
/// - Maximum 10 results returned
///
final class ScenarioEPlanner: ScenarioPlanner {
// MARK: - Configuration
/// Maximum number of windows to evaluate (to prevent combinatorial explosion)
private let maxWindowsToEvaluate = 50
/// Maximum number of results to return
private let maxResultsToReturn = 10
// MARK: - ScenarioPlanner Protocol
func plan(request: PlanningRequest) -> ItineraryResult {
//
// Step 1: Validate team selection
//
let selectedTeamIds = request.preferences.selectedTeamIds
if selectedTeamIds.count < 2 {
return .failure(
PlanningFailure(
reason: .missingTeamSelection,
violations: [
ConstraintViolation(
type: .general,
description: "Team-First mode requires at least 2 teams selected",
severity: .error
)
]
)
)
}
//
// Step 2: Collect all HOME games for selected teams
//
// For Team-First mode, we only care about home games because
// the user wants to visit each team's home stadium.
var homeGamesByTeam: [String: [Game]] = [:]
var allHomeGames: [Game] = []
for game in request.allGames {
if selectedTeamIds.contains(game.homeTeamId) {
homeGamesByTeam[game.homeTeamId, default: []].append(game)
allHomeGames.append(game)
}
}
// Verify each team has at least one home game
for teamId in selectedTeamIds {
if homeGamesByTeam[teamId]?.isEmpty ?? true {
let teamName = request.teams[teamId]?.fullName ?? teamId
return .failure(
PlanningFailure(
reason: .noGamesInRange,
violations: [
ConstraintViolation(
type: .selectedGames,
description: "No home games found for \(teamName)",
severity: .error
)
]
)
)
}
}
print("🔍 ScenarioE Step 2: Found \(allHomeGames.count) total home games across \(selectedTeamIds.count) teams")
for (teamId, games) in homeGamesByTeam {
let teamName = request.teams[teamId]?.fullName ?? teamId
print("🔍 \(teamName): \(games.count) home games")
}
//
// Step 3: Generate sliding windows across the season
//
let windowDuration = request.preferences.teamFirstMaxDays
print("🔍 ScenarioE Step 3: Window duration = \(windowDuration) days")
let validWindows = generateValidWindows(
homeGamesByTeam: homeGamesByTeam,
windowDurationDays: windowDuration,
selectedTeamIds: selectedTeamIds
)
if validWindows.isEmpty {
return .failure(
PlanningFailure(
reason: .noValidRoutes,
violations: [
ConstraintViolation(
type: .dateRange,
description: "No \(windowDuration)-day windows found where all selected teams have home games",
severity: .error
)
]
)
)
}
print("🔍 ScenarioE Step 3: Found \(validWindows.count) valid windows")
// Sample windows if too many exist
let windowsToEvaluate: [DateInterval]
if validWindows.count > maxWindowsToEvaluate {
windowsToEvaluate = sampleWindows(validWindows, count: maxWindowsToEvaluate)
print("🔍 ScenarioE Step 3: Sampled down to \(windowsToEvaluate.count) windows")
} else {
windowsToEvaluate = validWindows
}
//
// Step 4: For each valid window, find routes
//
var allItineraryOptions: [ItineraryOption] = []
for (windowIndex, window) in windowsToEvaluate.enumerated() {
// Collect games in this window
// Use one home game per team as anchors (the best one for route efficiency)
var gamesInWindow: [Game] = []
var anchorGameIds = Set<String>()
for teamId in selectedTeamIds {
guard let teamGames = homeGamesByTeam[teamId] else { continue }
// Get all home games for this team in the window
let teamGamesInWindow = teamGames.filter { window.contains($0.startTime) }
if teamGamesInWindow.isEmpty {
// Window doesn't have a game for this team - skip this window
// This shouldn't happen since we pre-filtered windows
continue
}
// Add all games to the pool
gamesInWindow.append(contentsOf: teamGamesInWindow)
// Mark the earliest game as anchor (must visit this team)
if let earliestGame = teamGamesInWindow.sorted(by: { $0.startTime < $1.startTime }).first {
anchorGameIds.insert(earliestGame.id)
}
}
// Skip if we don't have anchors for all teams
guard anchorGameIds.count == selectedTeamIds.count else { continue }
// Remove duplicate games (same game could be added multiple times if team plays multiple home games)
let uniqueGames = Array(Set(gamesInWindow)).sorted { $0.startTime < $1.startTime }
// Find routes using GameDAGRouter with anchor games
let validRoutes = GameDAGRouter.findAllSensibleRoutes(
from: uniqueGames,
stadiums: request.stadiums,
anchorGameIds: anchorGameIds,
allowRepeatCities: request.preferences.allowRepeatCities,
stopBuilder: buildStops
)
// Build itineraries for valid routes
for routeGames in validRoutes {
let stops = buildStops(from: routeGames, stadiums: request.stadiums)
guard !stops.isEmpty else { continue }
// Use shared ItineraryBuilder with arrival time validator
guard let itinerary = ItineraryBuilder.build(
stops: stops,
constraints: request.drivingConstraints,
logPrefix: "[ScenarioE]",
segmentValidator: ItineraryBuilder.arrivalBeforeGameStart()
) else {
continue
}
// Format window dates for rationale
let dateFormatter = DateFormatter()
dateFormatter.dateFormat = "MMM d"
let windowDesc = "\(dateFormatter.string(from: window.start)) - \(dateFormatter.string(from: window.end))"
let teamsVisited = routeGames.compactMap { game -> String? in
if anchorGameIds.contains(game.id) {
return request.teams[game.homeTeamId]?.abbreviation ?? game.homeTeamId
}
return nil
}.joined(separator: ", ")
let cities = stops.map { $0.city }.joined(separator: " -> ")
let option = ItineraryOption(
rank: 0, // Will re-rank later
stops: itinerary.stops,
travelSegments: itinerary.travelSegments,
totalDrivingHours: itinerary.totalDrivingHours,
totalDistanceMiles: itinerary.totalDistanceMiles,
geographicRationale: "\(windowDesc): \(teamsVisited) - \(cities)"
)
allItineraryOptions.append(option)
}
// Early exit if we have enough options
if allItineraryOptions.count >= maxResultsToReturn * 5 {
print("🔍 ScenarioE: Early exit at window \(windowIndex + 1) with \(allItineraryOptions.count) options")
break
}
}
//
// Step 5: Rank and return top results
//
if allItineraryOptions.isEmpty {
return .failure(
PlanningFailure(
reason: .constraintsUnsatisfiable,
violations: [
ConstraintViolation(
type: .geographicSanity,
description: "No routes found that can visit all selected teams within driving constraints",
severity: .error
)
]
)
)
}
// Deduplicate options (same stops in same order)
let uniqueOptions = deduplicateOptions(allItineraryOptions)
// Sort by: shortest duration first, then fewest miles
let sortedOptions = uniqueOptions.sorted { optionA, optionB in
// Calculate trip duration in days
let daysA = tripDurationDays(for: optionA)
let daysB = tripDurationDays(for: optionB)
if daysA != daysB {
return daysA < daysB // Shorter trips first
}
return optionA.totalDistanceMiles < optionB.totalDistanceMiles // Fewer miles second
}
// Take top N and re-rank
let topOptions = Array(sortedOptions.prefix(maxResultsToReturn))
let rankedOptions = topOptions.enumerated().map { index, option in
ItineraryOption(
rank: index + 1,
stops: option.stops,
travelSegments: option.travelSegments,
totalDrivingHours: option.totalDrivingHours,
totalDistanceMiles: option.totalDistanceMiles,
geographicRationale: option.geographicRationale
)
}
print("🔍 ScenarioE: Returning \(rankedOptions.count) options")
return .success(rankedOptions)
}
// MARK: - Window Generation
/// Generates all valid N-day windows across the season where ALL selected teams have home games.
///
/// Algorithm:
/// 1. Find the overall date range (earliest to latest game across all teams)
/// 2. Slide a window across this range, one day at a time
/// 3. Keep windows where each team has at least one home game
///
private func generateValidWindows(
homeGamesByTeam: [String: [Game]],
windowDurationDays: Int,
selectedTeamIds: Set<String>
) -> [DateInterval] {
// Find overall date range
let allGames = homeGamesByTeam.values.flatMap { $0 }
guard let earliest = allGames.map({ $0.startTime }).min(),
let latest = allGames.map({ $0.startTime }).max() else {
return []
}
let calendar = Calendar.current
let earliestDay = calendar.startOfDay(for: earliest)
let latestDay = calendar.startOfDay(for: latest)
// Generate sliding windows
var validWindows: [DateInterval] = []
var currentStart = earliestDay
while let windowEnd = calendar.date(byAdding: .day, value: windowDurationDays, to: currentStart),
windowEnd <= calendar.date(byAdding: .day, value: 1, to: latestDay)! {
let window = DateInterval(start: currentStart, end: windowEnd)
// Check if all teams have at least one home game in this window
let allTeamsHaveGame = selectedTeamIds.allSatisfy { teamId in
guard let teamGames = homeGamesByTeam[teamId] else { return false }
return teamGames.contains { window.contains($0.startTime) }
}
if allTeamsHaveGame {
validWindows.append(window)
}
// Slide window by one day
guard let nextStart = calendar.date(byAdding: .day, value: 1, to: currentStart) else {
break
}
currentStart = nextStart
}
return validWindows
}
/// Samples windows evenly across the season to avoid clustering.
private func sampleWindows(_ windows: [DateInterval], count: Int) -> [DateInterval] {
guard windows.count > count else { return windows }
// Take evenly spaced samples
var sampled: [DateInterval] = []
let step = Double(windows.count) / Double(count)
for i in 0..<count {
let index = Int(Double(i) * step)
if index < windows.count {
sampled.append(windows[index])
}
}
return sampled
}
// MARK: - Stop Building
/// Converts a list of games into itinerary stops.
/// Groups consecutive games at the same stadium into one stop.
private func buildStops(
from games: [Game],
stadiums: [String: Stadium]
) -> [ItineraryStop] {
guard !games.isEmpty else { return [] }
// Sort games chronologically
let sortedGames = games.sorted { $0.startTime < $1.startTime }
// Group consecutive games at the same stadium
var stops: [ItineraryStop] = []
var currentStadiumId: String? = nil
var currentGames: [Game] = []
for game in sortedGames {
if game.stadiumId == currentStadiumId {
// Same stadium as previous game - add to current group
currentGames.append(game)
} else {
// Different stadium - finalize previous stop (if any) and start new one
if let stadiumId = currentStadiumId, !currentGames.isEmpty {
if let stop = createStop(from: currentGames, stadiumId: stadiumId, stadiums: stadiums) {
stops.append(stop)
}
}
currentStadiumId = game.stadiumId
currentGames = [game]
}
}
// Don't forget the last group
if let stadiumId = currentStadiumId, !currentGames.isEmpty {
if let stop = createStop(from: currentGames, stadiumId: stadiumId, stadiums: stadiums) {
stops.append(stop)
}
}
return stops
}
/// Creates an ItineraryStop from a group of games at the same stadium.
private func createStop(
from games: [Game],
stadiumId: String,
stadiums: [String: Stadium]
) -> ItineraryStop? {
guard !games.isEmpty else { return nil }
let sortedGames = games.sorted { $0.startTime < $1.startTime }
let stadium = stadiums[stadiumId]
let city = stadium?.city ?? "Unknown"
let state = stadium?.state ?? ""
let coordinate = stadium?.coordinate
let location = LocationInput(
name: city,
coordinate: coordinate,
address: stadium?.fullAddress
)
// departureDate is same day as last game
let lastGameDate = sortedGames.last?.gameDate ?? Date()
return ItineraryStop(
city: city,
state: state,
coordinate: coordinate,
games: sortedGames.map { $0.id },
arrivalDate: sortedGames.first?.gameDate ?? Date(),
departureDate: lastGameDate,
location: location,
firstGameStart: sortedGames.first?.startTime
)
}
// MARK: - Deduplication
/// Removes duplicate itinerary options (same stops in same order).
private func deduplicateOptions(_ options: [ItineraryOption]) -> [ItineraryOption] {
var seen = Set<String>()
var unique: [ItineraryOption] = []
for option in options {
// Create key from stop cities in order
let key = option.stops.map { $0.city }.joined(separator: "-")
if !seen.contains(key) {
seen.insert(key)
unique.append(option)
}
}
return unique
}
// MARK: - Trip Duration Calculation
/// Calculates trip duration in days for an itinerary option.
private func tripDurationDays(for option: ItineraryOption) -> Int {
guard let firstStop = option.stops.first,
let lastStop = option.stops.last else {
return 0
}
let calendar = Calendar.current
let days = calendar.dateComponents(
[.day],
from: calendar.startOfDay(for: firstStop.arrivalDate),
to: calendar.startOfDay(for: lastStop.departureDate)
).day ?? 0
return max(1, days + 1)
}
}

View File

@@ -8,7 +8,7 @@
import Foundation
/// Protocol that all scenario planners must implement.
/// Each scenario (A, B, C, D) has its own isolated implementation.
/// Each scenario (A, B, C, D, E) has its own isolated implementation.
///
/// - Invariants:
/// - Always returns either success or explicit failure, never throws
@@ -25,28 +25,39 @@ protocol ScenarioPlanner {
/// Factory for creating the appropriate scenario planner.
///
/// - Expected Behavior:
/// - planningMode == .teamFirst with >= 2 teams ScenarioEPlanner
/// - followTeamId != nil ScenarioDPlanner
/// - selectedGames not empty ScenarioBPlanner
/// - startLocation AND endLocation != nil ScenarioCPlanner
/// - Otherwise ScenarioAPlanner (default)
///
/// Priority order: D > B > C > A (first matching wins)
/// Priority order: E > D > B > C > A (first matching wins)
enum ScenarioPlannerFactory {
/// Creates the appropriate planner based on the request inputs.
///
/// - Expected Behavior:
/// - planningMode == .teamFirst with >= 2 teams ScenarioEPlanner
/// - followTeamId set ScenarioDPlanner
/// - selectedGames not empty ScenarioBPlanner
/// - Both start and end locations ScenarioCPlanner
/// - Otherwise ScenarioAPlanner
static func planner(for request: PlanningRequest) -> ScenarioPlanner {
print("🔍 ScenarioPlannerFactory: Selecting planner...")
print(" - planningMode: \(request.preferences.planningMode)")
print(" - selectedTeamIds.count: \(request.preferences.selectedTeamIds.count)")
print(" - followTeamId: \(request.preferences.followTeamId ?? "nil")")
print(" - selectedGames.count: \(request.selectedGames.count)")
print(" - startLocation: \(request.startLocation?.name ?? "nil")")
print(" - endLocation: \(request.endLocation?.name ?? "nil")")
// Scenario E: Team-First mode - user selects teams, finds optimal trip windows
if request.preferences.planningMode == .teamFirst &&
request.preferences.selectedTeamIds.count >= 2 {
print("🔍 ScenarioPlannerFactory: → ScenarioEPlanner (team-first)")
return ScenarioEPlanner()
}
// Scenario D: User wants to follow a specific team
if request.preferences.followTeamId != nil {
print("🔍 ScenarioPlannerFactory: → ScenarioDPlanner (follow team)")
@@ -73,11 +84,16 @@ enum ScenarioPlannerFactory {
/// Classifies which scenario applies to this request.
///
/// - Expected Behavior:
/// - planningMode == .teamFirst with >= 2 teams .scenarioE
/// - followTeamId set .scenarioD
/// - selectedGames not empty .scenarioB
/// - Both start and end locations .scenarioC
/// - Otherwise .scenarioA
static func classify(_ request: PlanningRequest) -> PlanningScenario {
if request.preferences.planningMode == .teamFirst &&
request.preferences.selectedTeamIds.count >= 2 {
return .scenarioE
}
if request.preferences.followTeamId != nil {
return .scenarioD
}

View File

@@ -17,11 +17,13 @@ import CoreLocation
/// - scenarioB: User selects specific games + date range
/// - scenarioC: User provides start/end locations, system plans route
/// - scenarioD: User follows a team's schedule
/// - scenarioE: User selects teams, system finds best trip windows across season
enum PlanningScenario: Equatable {
case scenarioA // Date range only
case scenarioB // Selected games + date range
case scenarioC // Start + end locations
case scenarioD // Follow team schedule
case scenarioE // Team-first (select teams, find trip windows)
}
// MARK: - Planning Failure
@@ -568,4 +570,25 @@ struct PlanningRequest {
func stadium(for game: Game) -> Stadium? {
stadiums[game.stadiumId]
}
// MARK: - Team-First Mode Properties
/// Teams selected for Team-First mode (resolved from selectedTeamIds)
@MainActor
var selectedTeams: [Team] {
preferences.selectedTeamIds.compactMap { teamId in
AppDataProvider.shared.team(for: teamId)
}
}
/// All home games for the selected teams (for Team-First mode)
/// Filters availableGames to only those where a selected team is the home team
var homeGamesForSelectedTeams: [Game] {
let selectedIds = preferences.selectedTeamIds
guard !selectedIds.isEmpty else { return [] }
return availableGames.filter { game in
selectedIds.contains(game.homeTeamId)
}
}
}

View File

@@ -0,0 +1,971 @@
//
// ScenarioEPlannerTests.swift
// SportsTimeTests
//
// TDD specification tests for ScenarioEPlanner (Team-First planning mode).
//
// Team-First mode allows users to select multiple teams and finds optimal
// trip windows across the season where all selected teams have home games.
//
import Testing
import CoreLocation
@testable import SportsTime
@Suite("ScenarioEPlanner")
struct ScenarioEPlannerTests {
// MARK: - Test Data
private let planner = ScenarioEPlanner()
// East Coast coordinates
private let nycCoord = CLLocationCoordinate2D(latitude: 40.7580, longitude: -73.9855)
private let bostonCoord = CLLocationCoordinate2D(latitude: 42.3467, longitude: -71.0972)
private let phillyCoord = CLLocationCoordinate2D(latitude: 39.9526, longitude: -75.1652)
// Central coordinates
private let chicagoCoord = CLLocationCoordinate2D(latitude: 41.8827, longitude: -87.6233)
// West Coast coordinates
private let laCoord = CLLocationCoordinate2D(latitude: 34.0430, longitude: -118.2673)
// MARK: - E1. Window Generator Tests
@Test("generateValidWindows: 2 teams with 4-day window finds valid windows")
func generateValidWindows_2Teams4DayWindow_findsValidWindows() {
// Setup: 2 teams (simpler case), each with home games in overlapping time periods
// NYC and Boston are only ~4 hours apart, making routes feasible
//
// With 2 teams, window duration = 4 days.
// The window algorithm checks: windowEnd <= latestGameDay + 1
// So with games on day 1 and day 4: latestDay=4, windowEnd=5 <= day 5 (4+1) - valid!
let calendar = Calendar.current
let baseDate = calendar.startOfDay(for: Date())
let nycStadium = makeStadium(id: "nyc", city: "New York", coordinate: nycCoord)
let bostonStadium = makeStadium(id: "boston", city: "Boston", coordinate: bostonCoord)
// Create games with evening times (7 PM) to give ample travel time
// Day 1 evening: Yankees home game
// Day 4 evening: Red Sox home game (spans 4 days, fits in 4-day window)
let day1Evening = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: baseDate.addingTimeInterval(86400 * 1))!
let day4Evening = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: baseDate.addingTimeInterval(86400 * 4))!
let yankeesGame = makeGame(
id: "yankees-home",
homeTeamId: "yankees",
awayTeamId: "opponent",
stadiumId: "nyc",
dateTime: day1Evening
)
let redsoxGame = makeGame(
id: "redsox-home",
homeTeamId: "redsox",
awayTeamId: "opponent",
stadiumId: "boston",
dateTime: day4Evening
)
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 30),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["yankees", "redsox"]
)
let request = PlanningRequest(
preferences: prefs,
availableGames: [yankeesGame, redsoxGame],
teams: [
"yankees": makeTeam(id: "yankees", name: "Yankees"),
"redsox": makeTeam(id: "redsox", name: "Red Sox")
],
stadiums: ["nyc": nycStadium, "boston": bostonStadium]
)
let result = planner.plan(request: request)
// Should succeed - NYC to Boston is easily drivable
guard case .success(let options) = result else {
Issue.record("Expected success with overlapping home games for nearby cities")
return
}
#expect(!options.isEmpty, "Should find valid windows when all teams have overlapping home games")
}
@Test("generateValidWindows: window with only 2 of 3 teams excluded")
func generateValidWindows_windowMissingTeam_excluded() {
// Setup: 3 teams selected, but games are spread so no single window covers all 3
let baseDate = Date()
let nycStadium = makeStadium(id: "nyc", city: "New York", coordinate: nycCoord)
let bostonStadium = makeStadium(id: "boston", city: "Boston", coordinate: bostonCoord)
let phillyStadium = makeStadium(id: "philly", city: "Philadelphia", coordinate: phillyCoord)
// Yankees game on day 1
let yankeesGame = makeGame(
id: "yankees-home",
homeTeamId: "yankees",
awayTeamId: "opponent",
stadiumId: "nyc",
dateTime: baseDate.addingTimeInterval(86400 * 1)
)
// Red Sox game on day 2
let redsoxGame = makeGame(
id: "redsox-home",
homeTeamId: "redsox",
awayTeamId: "opponent",
stadiumId: "boston",
dateTime: baseDate.addingTimeInterval(86400 * 2)
)
// Phillies game on day 20 (way outside any 6-day window with the others)
let philliesGame = makeGame(
id: "phillies-home",
homeTeamId: "phillies",
awayTeamId: "opponent",
stadiumId: "philly",
dateTime: baseDate.addingTimeInterval(86400 * 20)
)
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 30),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["yankees", "redsox", "phillies"]
)
let request = PlanningRequest(
preferences: prefs,
availableGames: [yankeesGame, redsoxGame, philliesGame],
teams: [
"yankees": makeTeam(id: "yankees", name: "Yankees"),
"redsox": makeTeam(id: "redsox", name: "Red Sox"),
"phillies": makeTeam(id: "phillies", name: "Phillies")
],
stadiums: ["nyc": nycStadium, "boston": bostonStadium, "philly": phillyStadium]
)
let result = planner.plan(request: request)
guard case .failure(let failure) = result else {
Issue.record("Expected failure when no valid window covers all teams")
return
}
#expect(failure.reason == .noValidRoutes, "Should return noValidRoutes when no window covers all teams")
}
@Test("generateValidWindows: empty season returns empty")
func generateValidWindows_emptySeason_returnsEmpty() {
// Setup: No games available
let baseDate = Date()
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 30),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["yankees", "redsox"]
)
let request = PlanningRequest(
preferences: prefs,
availableGames: [], // No games
teams: [
"yankees": makeTeam(id: "yankees", name: "Yankees"),
"redsox": makeTeam(id: "redsox", name: "Red Sox")
],
stadiums: [:]
)
let result = planner.plan(request: request)
guard case .failure(let failure) = result else {
Issue.record("Expected failure with no games")
return
}
#expect(failure.reason == .noGamesInRange, "Should return noGamesInRange when no home games exist")
}
@Test("generateValidWindows: sampling works when more than 50 windows")
func generateValidWindows_manyWindows_samplesProperly() {
// Setup: Create many overlapping games so there are >50 valid windows
let calendar = Calendar.current
let baseDate = calendar.startOfDay(for: Date())
let nycStadium = makeStadium(id: "nyc", city: "New York", coordinate: nycCoord)
let bostonStadium = makeStadium(id: "boston", city: "Boston", coordinate: bostonCoord)
// Create games on alternating days - Yankees on even days, Red Sox on odd days
// This ensures games don't conflict and routes are feasible
// With 90 days of games, we get ~45 games per team with plenty of valid 4-day windows
var games: [Game] = []
for day in 0..<90 {
let dayDate = baseDate.addingTimeInterval(Double(86400 * day))
let eveningTime = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: dayDate)!
if day % 2 == 0 {
// Yankees home games on even days
let yankeesGame = makeGame(
id: "yankees-\(day)",
homeTeamId: "yankees",
awayTeamId: "opponent",
stadiumId: "nyc",
dateTime: eveningTime
)
games.append(yankeesGame)
} else {
// Red Sox home games on odd days
let redsoxGame = makeGame(
id: "redsox-\(day)",
homeTeamId: "redsox",
awayTeamId: "opponent",
stadiumId: "boston",
dateTime: eveningTime
)
games.append(redsoxGame)
}
}
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 90),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["yankees", "redsox"]
)
let request = PlanningRequest(
preferences: prefs,
availableGames: games,
teams: [
"yankees": makeTeam(id: "yankees", name: "Yankees"),
"redsox": makeTeam(id: "redsox", name: "Red Sox")
],
stadiums: ["nyc": nycStadium, "boston": bostonStadium]
)
let result = planner.plan(request: request)
// Should succeed and return results (sampling internal behavior)
guard case .success(let options) = result else {
Issue.record("Expected success with many valid windows")
return
}
#expect(!options.isEmpty, "Should return options even with many windows (sampling applies internally)")
#expect(options.count <= 10, "Should return at most 10 results")
}
// MARK: - E2. ScenarioEPlanner Unit Tests
@Test("plan: returns PlanningResult with routes")
func plan_validRequest_returnsPlanningResultWithRoutes() {
let calendar = Calendar.current
let baseDate = calendar.startOfDay(for: Date())
let nycStadium = makeStadium(id: "nyc", city: "New York", coordinate: nycCoord)
let bostonStadium = makeStadium(id: "boston", city: "Boston", coordinate: bostonCoord)
// Use evening game times to allow travel during the day
// Games must span 4 days for 2-team window (day 1 to day 4)
let day1Evening = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: baseDate.addingTimeInterval(86400 * 1))!
let day4Evening = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: baseDate.addingTimeInterval(86400 * 4))!
let yankeesGame = makeGame(
id: "yankees-home",
homeTeamId: "yankees",
awayTeamId: "opponent",
stadiumId: "nyc",
dateTime: day1Evening
)
let redsoxGame = makeGame(
id: "redsox-home",
homeTeamId: "redsox",
awayTeamId: "opponent",
stadiumId: "boston",
dateTime: day4Evening
)
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 30),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["yankees", "redsox"]
)
let request = PlanningRequest(
preferences: prefs,
availableGames: [yankeesGame, redsoxGame],
teams: [
"yankees": makeTeam(id: "yankees", name: "Yankees"),
"redsox": makeTeam(id: "redsox", name: "Red Sox")
],
stadiums: ["nyc": nycStadium, "boston": bostonStadium]
)
let result = planner.plan(request: request)
guard case .success(let options) = result else {
Issue.record("Expected success with valid team-first request")
return
}
#expect(!options.isEmpty, "Should return route options")
#expect(options.first?.stops.count ?? 0 > 0, "Each option should have stops")
}
@Test("plan: all routes include all selected teams")
func plan_allRoutesIncludeAllSelectedTeams() {
// Use just 2 teams for a simpler, more reliable test
let calendar = Calendar.current
let baseDate = calendar.startOfDay(for: Date())
let nycStadium = makeStadium(id: "nyc", city: "New York", coordinate: nycCoord)
let bostonStadium = makeStadium(id: "boston", city: "Boston", coordinate: bostonCoord)
// Games within a 4-day window (2 teams * 2) with evening times
// Games must span 4 days for the window to be valid
let day1Evening = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: baseDate.addingTimeInterval(86400 * 1))!
let day4Evening = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: baseDate.addingTimeInterval(86400 * 4))!
let yankeesGame = makeGame(
id: "yankees-home",
homeTeamId: "yankees",
awayTeamId: "opponent",
stadiumId: "nyc",
dateTime: day1Evening
)
let redsoxGame = makeGame(
id: "redsox-home",
homeTeamId: "redsox",
awayTeamId: "opponent",
stadiumId: "boston",
dateTime: day4Evening
)
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 30),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["yankees", "redsox"]
)
let request = PlanningRequest(
preferences: prefs,
availableGames: [yankeesGame, redsoxGame],
teams: [
"yankees": makeTeam(id: "yankees", name: "Yankees"),
"redsox": makeTeam(id: "redsox", name: "Red Sox")
],
stadiums: ["nyc": nycStadium, "boston": bostonStadium]
)
let result = planner.plan(request: request)
guard case .success(let options) = result else {
Issue.record("Expected success with valid team-first request")
return
}
// Verify each route includes games from all 2 teams
for option in options {
let allGameIds = Set(option.stops.flatMap { $0.games })
// Check that at least one home game per team is included
let hasYankeesGame = allGameIds.contains("yankees-home")
let hasRedsoxGame = allGameIds.contains("redsox-home")
#expect(hasYankeesGame, "Every route must include Yankees home game")
#expect(hasRedsoxGame, "Every route must include Red Sox home game")
}
}
@Test("plan: routes sorted by duration ascending")
func plan_routesSortedByDurationAscending() {
let baseDate = Date()
let nycStadium = makeStadium(id: "nyc", city: "New York", coordinate: nycCoord)
let bostonStadium = makeStadium(id: "boston", city: "Boston", coordinate: bostonCoord)
// Create multiple windows with different durations
// Window 1: Games on day 1 and 2 (shorter trip)
// Window 2: Games on day 10 and 14 (longer trip within window)
let yankeesGame1 = makeGame(
id: "yankees-1",
homeTeamId: "yankees",
awayTeamId: "opponent",
stadiumId: "nyc",
dateTime: baseDate.addingTimeInterval(86400 * 1)
)
let redsoxGame1 = makeGame(
id: "redsox-1",
homeTeamId: "redsox",
awayTeamId: "opponent",
stadiumId: "boston",
dateTime: baseDate.addingTimeInterval(86400 * 2)
)
let yankeesGame2 = makeGame(
id: "yankees-2",
homeTeamId: "yankees",
awayTeamId: "opponent",
stadiumId: "nyc",
dateTime: baseDate.addingTimeInterval(86400 * 10)
)
let redsoxGame2 = makeGame(
id: "redsox-2",
homeTeamId: "redsox",
awayTeamId: "opponent",
stadiumId: "boston",
dateTime: baseDate.addingTimeInterval(86400 * 12)
)
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 30),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["yankees", "redsox"]
)
let request = PlanningRequest(
preferences: prefs,
availableGames: [yankeesGame1, redsoxGame1, yankeesGame2, redsoxGame2],
teams: [
"yankees": makeTeam(id: "yankees", name: "Yankees"),
"redsox": makeTeam(id: "redsox", name: "Red Sox")
],
stadiums: ["nyc": nycStadium, "boston": bostonStadium]
)
let result = planner.plan(request: request)
guard case .success(let options) = result else {
Issue.record("Expected success")
return
}
// Verify ranking is assigned correctly
for (index, option) in options.enumerated() {
#expect(option.rank == index + 1, "Routes should be ranked 1, 2, 3...")
}
}
@Test("plan: respects max driving time constraint")
func plan_respectsMaxDrivingTimeConstraint() {
let baseDate = Date()
// NYC and LA are ~40 hours apart by car
let nycStadium = makeStadium(id: "nyc", city: "New York", coordinate: nycCoord)
let laStadium = makeStadium(id: "la", city: "Los Angeles", coordinate: laCoord)
// Games on consecutive days - impossible to drive between
let yankeesGame = makeGame(
id: "yankees-home",
homeTeamId: "yankees",
awayTeamId: "opponent",
stadiumId: "nyc",
dateTime: baseDate.addingTimeInterval(86400 * 1)
)
let dodgersGame = makeGame(
id: "dodgers-home",
homeTeamId: "dodgers",
awayTeamId: "opponent",
stadiumId: "la",
dateTime: baseDate.addingTimeInterval(86400 * 2) // Next day - impossible
)
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 30),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 1, // Single driver, 8 hours max
selectedTeamIds: ["yankees", "dodgers"]
)
let request = PlanningRequest(
preferences: prefs,
availableGames: [yankeesGame, dodgersGame],
teams: [
"yankees": makeTeam(id: "yankees", name: "Yankees"),
"dodgers": makeTeam(id: "dodgers", name: "Dodgers")
],
stadiums: ["nyc": nycStadium, "la": laStadium]
)
let result = planner.plan(request: request)
// Should fail because driving constraint cannot be met
guard case .failure(let failure) = result else {
Issue.record("Expected failure when driving constraint cannot be met")
return
}
// Could be noValidRoutes or constraintsUnsatisfiable
let validFailures: [PlanningFailure.FailureReason] = [.noValidRoutes, .constraintsUnsatisfiable]
#expect(validFailures.contains { $0 == failure.reason }, "Should fail due to route constraints")
}
// MARK: - E4. Edge Case Tests
@Test("plan: teams with no overlapping games returns graceful error")
func plan_noOverlappingGames_returnsGracefulError() {
let baseDate = Date()
let nycStadium = makeStadium(id: "nyc", city: "New York", coordinate: nycCoord)
let bostonStadium = makeStadium(id: "boston", city: "Boston", coordinate: bostonCoord)
// Yankees plays in January, Red Sox plays in July - no overlap
let yankeesGame = makeGame(
id: "yankees-home",
homeTeamId: "yankees",
awayTeamId: "opponent",
stadiumId: "nyc",
dateTime: baseDate // Day 0
)
let redsoxGame = makeGame(
id: "redsox-home",
homeTeamId: "redsox",
awayTeamId: "opponent",
stadiumId: "boston",
dateTime: baseDate.addingTimeInterval(86400 * 180) // 6 months later
)
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 365),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["yankees", "redsox"]
)
let request = PlanningRequest(
preferences: prefs,
availableGames: [yankeesGame, redsoxGame],
teams: [
"yankees": makeTeam(id: "yankees", name: "Yankees"),
"redsox": makeTeam(id: "redsox", name: "Red Sox")
],
stadiums: ["nyc": nycStadium, "boston": bostonStadium]
)
let result = planner.plan(request: request)
guard case .failure(let failure) = result else {
Issue.record("Expected failure when teams have no overlapping game windows")
return
}
#expect(failure.reason == .noValidRoutes, "Should return noValidRoutes for non-overlapping schedules")
}
@Test("plan: single team selected returns validation error")
func plan_singleTeamSelected_returnsValidationError() {
let baseDate = Date()
let nycStadium = makeStadium(id: "nyc", city: "New York", coordinate: nycCoord)
let yankeesGame = makeGame(
id: "yankees-home",
homeTeamId: "yankees",
awayTeamId: "opponent",
stadiumId: "nyc",
dateTime: baseDate.addingTimeInterval(86400 * 1)
)
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 30),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["yankees"] // Only 1 team
)
let request = PlanningRequest(
preferences: prefs,
availableGames: [yankeesGame],
teams: ["yankees": makeTeam(id: "yankees", name: "Yankees")],
stadiums: ["nyc": nycStadium]
)
let result = planner.plan(request: request)
guard case .failure(let failure) = result else {
Issue.record("Expected failure when only 1 team selected")
return
}
#expect(failure.reason == .missingTeamSelection, "Should return missingTeamSelection for single team")
}
@Test("plan: no teams selected returns validation error")
func plan_noTeamsSelected_returnsValidationError() {
let baseDate = Date()
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 30),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: [] // No teams
)
let request = PlanningRequest(
preferences: prefs,
availableGames: [],
teams: [:],
stadiums: [:]
)
let result = planner.plan(request: request)
guard case .failure(let failure) = result else {
Issue.record("Expected failure when no teams selected")
return
}
#expect(failure.reason == .missingTeamSelection, "Should return missingTeamSelection for no teams")
}
@Test("plan: teams in same city treated as separate stops")
func plan_teamsInSameCity_treatedAsSeparateStops() {
// Setup: Yankees and Mets both play in NYC but at different stadiums
let calendar = Calendar.current
let baseDate = calendar.startOfDay(for: Date())
let yankeeStadiumCoord = CLLocationCoordinate2D(latitude: 40.8296, longitude: -73.9262)
let citiFieldCoord = CLLocationCoordinate2D(latitude: 40.7571, longitude: -73.8458)
let yankeeStadium = makeStadium(id: "yankee-stadium", city: "New York", coordinate: yankeeStadiumCoord)
let citiField = makeStadium(id: "citi-field", city: "New York", coordinate: citiFieldCoord)
// Games on different days in the same city with evening times
// Games must span 4 days for the 2-team window to be valid
let day1Evening = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: baseDate.addingTimeInterval(86400 * 1))!
let day4Evening = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: baseDate.addingTimeInterval(86400 * 4))!
let yankeesGame = makeGame(
id: "yankees-home",
homeTeamId: "yankees",
awayTeamId: "opponent",
stadiumId: "yankee-stadium",
dateTime: day1Evening
)
let metsGame = makeGame(
id: "mets-home",
homeTeamId: "mets",
awayTeamId: "opponent",
stadiumId: "citi-field",
dateTime: day4Evening
)
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 30),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["yankees", "mets"]
)
let request = PlanningRequest(
preferences: prefs,
availableGames: [yankeesGame, metsGame],
teams: [
"yankees": makeTeam(id: "yankees", name: "Yankees"),
"mets": makeTeam(id: "mets", name: "Mets")
],
stadiums: ["yankee-stadium": yankeeStadium, "citi-field": citiField]
)
let result = planner.plan(request: request)
guard case .success(let options) = result else {
Issue.record("Expected success with same-city teams")
return
}
#expect(!options.isEmpty, "Should find routes for same-city teams")
// Verify both games are included in routes
for option in options {
let allGameIds = Set(option.stops.flatMap { $0.games })
#expect(allGameIds.contains("yankees-home"), "Should include Yankees game")
#expect(allGameIds.contains("mets-home"), "Should include Mets game")
}
}
@Test("plan: team with no home games returns error")
func plan_teamWithNoHomeGames_returnsError() {
let baseDate = Date()
let nycStadium = makeStadium(id: "nyc", city: "New York", coordinate: nycCoord)
// Only Yankees have a home game, Red Sox have no home games
let yankeesGame = makeGame(
id: "yankees-home",
homeTeamId: "yankees",
awayTeamId: "redsox", // Red Sox are away
stadiumId: "nyc",
dateTime: baseDate.addingTimeInterval(86400 * 1)
)
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 30),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["yankees", "redsox"] // Red Sox selected but no home games
)
let request = PlanningRequest(
preferences: prefs,
availableGames: [yankeesGame],
teams: [
"yankees": makeTeam(id: "yankees", name: "Yankees"),
"redsox": makeTeam(id: "redsox", name: "Red Sox")
],
stadiums: ["nyc": nycStadium]
)
let result = planner.plan(request: request)
guard case .failure(let failure) = result else {
Issue.record("Expected failure when selected team has no home games")
return
}
#expect(failure.reason == .noGamesInRange, "Should return noGamesInRange when team has no home games")
}
// MARK: - Invariant Tests
@Test("Invariant: window duration equals teams count times 2")
func invariant_windowDurationEqualsTeamsTimestwo() {
// Test that teamFirstMaxDays is calculated correctly
var prefs = TripPreferences()
prefs.selectedTeamIds = ["team1", "team2", "team3"]
#expect(prefs.teamFirstMaxDays == 6, "3 teams should result in 6-day window")
prefs.selectedTeamIds = ["team1", "team2"]
#expect(prefs.teamFirstMaxDays == 4, "2 teams should result in 4-day window")
prefs.selectedTeamIds = ["team1", "team2", "team3", "team4", "team5"]
#expect(prefs.teamFirstMaxDays == 10, "5 teams should result in 10-day window")
}
@Test("Invariant: maximum 10 results returned")
func invariant_maximum10ResultsReturned() {
let baseDate = Date()
let nycStadium = makeStadium(id: "nyc", city: "New York", coordinate: nycCoord)
let bostonStadium = makeStadium(id: "boston", city: "Boston", coordinate: bostonCoord)
// Create many games to generate many possible routes
var games: [Game] = []
for day in 0..<60 {
games.append(makeGame(
id: "yankees-\(day)",
homeTeamId: "yankees",
awayTeamId: "opponent",
stadiumId: "nyc",
dateTime: baseDate.addingTimeInterval(Double(86400 * day))
))
games.append(makeGame(
id: "redsox-\(day)",
homeTeamId: "redsox",
awayTeamId: "opponent",
stadiumId: "boston",
dateTime: baseDate.addingTimeInterval(Double(86400 * day))
))
}
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 60),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["yankees", "redsox"]
)
let request = PlanningRequest(
preferences: prefs,
availableGames: games,
teams: [
"yankees": makeTeam(id: "yankees", name: "Yankees"),
"redsox": makeTeam(id: "redsox", name: "Red Sox")
],
stadiums: ["nyc": nycStadium, "boston": bostonStadium]
)
let result = planner.plan(request: request)
if case .success(let options) = result {
#expect(options.count <= 10, "Should return at most 10 results")
}
}
@Test("Invariant: all routes contain home games from all selected teams")
func invariant_allRoutesContainAllSelectedTeams() {
let baseDate = Date()
let nycStadium = makeStadium(id: "nyc", city: "New York", coordinate: nycCoord)
let bostonStadium = makeStadium(id: "boston", city: "Boston", coordinate: bostonCoord)
let yankeesGame = makeGame(
id: "yankees-home",
homeTeamId: "yankees",
awayTeamId: "opponent",
stadiumId: "nyc",
dateTime: baseDate.addingTimeInterval(86400 * 1)
)
let redsoxGame = makeGame(
id: "redsox-home",
homeTeamId: "redsox",
awayTeamId: "opponent",
stadiumId: "boston",
dateTime: baseDate.addingTimeInterval(86400 * 2)
)
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 30),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["yankees", "redsox"]
)
let request = PlanningRequest(
preferences: prefs,
availableGames: [yankeesGame, redsoxGame],
teams: [
"yankees": makeTeam(id: "yankees", name: "Yankees"),
"redsox": makeTeam(id: "redsox", name: "Red Sox")
],
stadiums: ["nyc": nycStadium, "boston": bostonStadium]
)
let result = planner.plan(request: request)
if case .success(let options) = result {
for option in options {
let allGameIds = Set(option.stops.flatMap { $0.games })
// At minimum, should have one game per selected team
let hasYankeesGame = allGameIds.contains { gameId in
// Check if any game in this route is a Yankees home game
request.availableGames.first { $0.id == gameId }?.homeTeamId == "yankees"
}
let hasRedsoxGame = allGameIds.contains { gameId in
request.availableGames.first { $0.id == gameId }?.homeTeamId == "redsox"
}
#expect(hasYankeesGame, "Every route must include a Yankees home game")
#expect(hasRedsoxGame, "Every route must include a Red Sox home game")
}
}
}
// MARK: - Helper Methods
private func makeStadium(
id: String,
city: String,
coordinate: CLLocationCoordinate2D
) -> Stadium {
Stadium(
id: id,
name: "\(city) Stadium",
city: city,
state: "XX",
latitude: coordinate.latitude,
longitude: coordinate.longitude,
capacity: 40000,
sport: .mlb
)
}
private func makeGame(
id: String,
homeTeamId: String,
awayTeamId: String,
stadiumId: String,
dateTime: Date
) -> Game {
Game(
id: id,
homeTeamId: homeTeamId,
awayTeamId: awayTeamId,
stadiumId: stadiumId,
dateTime: dateTime,
sport: .mlb,
season: "2026",
isPlayoff: false
)
}
private func makeTeam(id: String, name: String) -> Team {
Team(
id: id,
name: name,
abbreviation: String(name.prefix(3)).uppercased(),
sport: .mlb,
city: "Test City",
stadiumId: id,
primaryColor: "#000000",
secondaryColor: "#FFFFFF"
)
}
}

View File

@@ -0,0 +1,599 @@
//
// TeamFirstIntegrationTests.swift
// SportsTimeTests
//
// Integration tests for Team-First planning mode.
//
// These tests verify the end-to-end flow from team selection
// to route generation, ensuring all components work together.
//
import Testing
import CoreLocation
@testable import SportsTime
@Suite("TeamFirst Integration")
struct TeamFirstIntegrationTests {
// MARK: - Test Data
private let planner = ScenarioEPlanner()
// MLB stadiums with realistic coordinates
private let yankeeStadiumCoord = CLLocationCoordinate2D(latitude: 40.8296, longitude: -73.9262)
private let fenwayParkCoord = CLLocationCoordinate2D(latitude: 42.3467, longitude: -71.0972)
private let citizensBankCoord = CLLocationCoordinate2D(latitude: 39.9061, longitude: -75.1665)
// MARK: - E3. Full Flow Integration Tests
@Test("Integration: 3 MLB teams returns top 10 routes")
func integration_3MLBTeams_returnsTop10Routes() {
let baseDate = Date()
// Create realistic MLB stadiums
let yankeeStadium = Stadium(
id: "stadium_mlb_yankee_stadium",
name: "Yankee Stadium",
city: "New York",
state: "NY",
latitude: yankeeStadiumCoord.latitude,
longitude: yankeeStadiumCoord.longitude,
capacity: 47309,
sport: .mlb
)
let fenwayPark = Stadium(
id: "stadium_mlb_fenway_park",
name: "Fenway Park",
city: "Boston",
state: "MA",
latitude: fenwayParkCoord.latitude,
longitude: fenwayParkCoord.longitude,
capacity: 37755,
sport: .mlb
)
let citizensBank = Stadium(
id: "stadium_mlb_citizens_bank",
name: "Citizens Bank Park",
city: "Philadelphia",
state: "PA",
latitude: citizensBankCoord.latitude,
longitude: citizensBankCoord.longitude,
capacity: 42792,
sport: .mlb
)
// Create teams
let yankees = Team(
id: "team_mlb_nyy",
name: "Yankees",
abbreviation: "NYY",
sport: .mlb,
city: "New York",
stadiumId: "stadium_mlb_yankee_stadium"
)
let redsox = Team(
id: "team_mlb_bos",
name: "Red Sox",
abbreviation: "BOS",
sport: .mlb,
city: "Boston",
stadiumId: "stadium_mlb_fenway_park"
)
let phillies = Team(
id: "team_mlb_phi",
name: "Phillies",
abbreviation: "PHI",
sport: .mlb,
city: "Philadelphia",
stadiumId: "stadium_mlb_citizens_bank"
)
// Create games within a reasonable 6-day window (3 teams * 2 = 6 days)
// For the window algorithm to find valid windows, games must span at least 6 days
// Window check: windowEnd <= latestDay + 1, so with 6-day window from day 1,
// windowEnd = day 7, so latestDay must be >= day 6
// Day 1: Yankees home
// Day 3: Red Sox home
// Day 6: Phillies home (spans 6 days, window fits)
let calendar = Calendar.current
let day1 = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: baseDate.addingTimeInterval(86400 * 1))!
let day3 = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: baseDate.addingTimeInterval(86400 * 3))!
let day6 = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: baseDate.addingTimeInterval(86400 * 6))!
let yankeesGame = Game(
id: "game_mlb_2026_nyy_opp_0401",
homeTeamId: "team_mlb_nyy",
awayTeamId: "team_mlb_opp",
stadiumId: "stadium_mlb_yankee_stadium",
dateTime: day1,
sport: .mlb,
season: "2026",
isPlayoff: false
)
let redsoxGame = Game(
id: "game_mlb_2026_bos_opp_0403",
homeTeamId: "team_mlb_bos",
awayTeamId: "team_mlb_opp",
stadiumId: "stadium_mlb_fenway_park",
dateTime: day3,
sport: .mlb,
season: "2026",
isPlayoff: false
)
let philliesGame = Game(
id: "game_mlb_2026_phi_opp_0405",
homeTeamId: "team_mlb_phi",
awayTeamId: "team_mlb_opp",
stadiumId: "stadium_mlb_citizens_bank",
dateTime: day6,
sport: .mlb,
season: "2026",
isPlayoff: false
)
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 30),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["team_mlb_nyy", "team_mlb_bos", "team_mlb_phi"]
)
let request = PlanningRequest(
preferences: prefs,
availableGames: [yankeesGame, redsoxGame, philliesGame],
teams: [
"team_mlb_nyy": yankees,
"team_mlb_bos": redsox,
"team_mlb_phi": phillies
],
stadiums: [
"stadium_mlb_yankee_stadium": yankeeStadium,
"stadium_mlb_fenway_park": fenwayPark,
"stadium_mlb_citizens_bank": citizensBank
]
)
let result = planner.plan(request: request)
guard case .success(let options) = result else {
Issue.record("Expected success with 3 MLB teams in drivable region")
return
}
// Verify we get routes (may be less than 10 due to limited game combinations)
#expect(!options.isEmpty, "Should return at least one route")
#expect(options.count <= 10, "Should return at most 10 routes")
}
@Test("Integration: each route visits all 3 stadiums")
func integration_eachRouteVisitsAll3Stadiums() {
let baseDate = Date()
let yankeeStadium = makeStadium(
id: "yankee-stadium",
name: "Yankee Stadium",
city: "New York",
coordinate: yankeeStadiumCoord
)
let fenwayPark = makeStadium(
id: "fenway-park",
name: "Fenway Park",
city: "Boston",
coordinate: fenwayParkCoord
)
let citizensBank = makeStadium(
id: "citizens-bank",
name: "Citizens Bank Park",
city: "Philadelphia",
coordinate: citizensBankCoord
)
// Create multiple games per team to ensure routes can be found
var games: [Game] = []
// Yankees games
for dayOffset in [1, 7, 14] {
games.append(Game(
id: "yankees-\(dayOffset)",
homeTeamId: "yankees",
awayTeamId: "opponent",
stadiumId: "yankee-stadium",
dateTime: baseDate.addingTimeInterval(Double(86400 * dayOffset)),
sport: .mlb,
season: "2026",
isPlayoff: false
))
}
// Red Sox games
for dayOffset in [2, 8, 15] {
games.append(Game(
id: "redsox-\(dayOffset)",
homeTeamId: "redsox",
awayTeamId: "opponent",
stadiumId: "fenway-park",
dateTime: baseDate.addingTimeInterval(Double(86400 * dayOffset)),
sport: .mlb,
season: "2026",
isPlayoff: false
))
}
// Phillies games
for dayOffset in [3, 9, 16] {
games.append(Game(
id: "phillies-\(dayOffset)",
homeTeamId: "phillies",
awayTeamId: "opponent",
stadiumId: "citizens-bank",
dateTime: baseDate.addingTimeInterval(Double(86400 * dayOffset)),
sport: .mlb,
season: "2026",
isPlayoff: false
))
}
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 30),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["yankees", "redsox", "phillies"]
)
let request = PlanningRequest(
preferences: prefs,
availableGames: games,
teams: [
"yankees": makeTeam(id: "yankees", name: "Yankees"),
"redsox": makeTeam(id: "redsox", name: "Red Sox"),
"phillies": makeTeam(id: "phillies", name: "Phillies")
],
stadiums: [
"yankee-stadium": yankeeStadium,
"fenway-park": fenwayPark,
"citizens-bank": citizensBank
]
)
let result = planner.plan(request: request)
guard case .success(let options) = result else {
Issue.record("Expected success")
return
}
// Verify each route visits all 3 stadiums
for option in options {
let citiesVisited = Set(option.stops.map { $0.city })
#expect(citiesVisited.contains("New York"), "Every route must visit New York")
#expect(citiesVisited.contains("Boston"), "Every route must visit Boston")
#expect(citiesVisited.contains("Philadelphia"), "Every route must visit Philadelphia")
}
}
@Test("Integration: total duration within 6 days (teams x 2)")
func integration_totalDurationWithinLimit() {
let baseDate = Date()
let yankeeStadium = makeStadium(
id: "yankee-stadium",
name: "Yankee Stadium",
city: "New York",
coordinate: yankeeStadiumCoord
)
let fenwayPark = makeStadium(
id: "fenway-park",
name: "Fenway Park",
city: "Boston",
coordinate: fenwayParkCoord
)
let citizensBank = makeStadium(
id: "citizens-bank",
name: "Citizens Bank Park",
city: "Philadelphia",
coordinate: citizensBankCoord
)
// Create games that fit within a 6-day window
// For 3 teams, window = 6 days. Games must span at least 6 days.
let calendar = Calendar.current
let day1 = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: baseDate.addingTimeInterval(86400 * 1))!
let day3 = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: baseDate.addingTimeInterval(86400 * 3))!
let day6 = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: baseDate.addingTimeInterval(86400 * 6))!
let yankeesGame = Game(
id: "yankees-home",
homeTeamId: "yankees",
awayTeamId: "opponent",
stadiumId: "yankee-stadium",
dateTime: day1,
sport: .mlb,
season: "2026",
isPlayoff: false
)
let redsoxGame = Game(
id: "redsox-home",
homeTeamId: "redsox",
awayTeamId: "opponent",
stadiumId: "fenway-park",
dateTime: day3,
sport: .mlb,
season: "2026",
isPlayoff: false
)
let philliesGame = Game(
id: "phillies-home",
homeTeamId: "phillies",
awayTeamId: "opponent",
stadiumId: "citizens-bank",
dateTime: day6,
sport: .mlb,
season: "2026",
isPlayoff: false
)
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 30),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["yankees", "redsox", "phillies"]
)
// 3 teams * 2 = 6 days max window
let maxDays = prefs.teamFirstMaxDays
#expect(maxDays == 6, "Window should be 6 days for 3 teams")
let request = PlanningRequest(
preferences: prefs,
availableGames: [yankeesGame, redsoxGame, philliesGame],
teams: [
"yankees": makeTeam(id: "yankees", name: "Yankees"),
"redsox": makeTeam(id: "redsox", name: "Red Sox"),
"phillies": makeTeam(id: "phillies", name: "Phillies")
],
stadiums: [
"yankee-stadium": yankeeStadium,
"fenway-park": fenwayPark,
"citizens-bank": citizensBank
]
)
let result = planner.plan(request: request)
guard case .success(let options) = result else {
Issue.record("Expected success")
return
}
// Verify each route's duration is reasonable
for option in options {
guard let firstStop = option.stops.first,
let lastStop = option.stops.last else {
continue
}
let calendar = Calendar.current
let tripDays = calendar.dateComponents(
[.day],
from: calendar.startOfDay(for: firstStop.arrivalDate),
to: calendar.startOfDay(for: lastStop.departureDate)
).day ?? 0
// Trip duration should be within the window (allowing +1 for same-day start/end)
#expect(tripDays <= maxDays, "Trip duration (\(tripDays) days) should not exceed window (\(maxDays) days)")
}
}
@Test("Integration: factory selects ScenarioEPlanner for teamFirst mode")
func integration_factorySelectsScenarioEPlanner() {
let baseDate = Date()
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 30),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["team1", "team2"]
)
let request = PlanningRequest(
preferences: prefs,
availableGames: [],
teams: [:],
stadiums: [:]
)
let scenario = ScenarioPlannerFactory.classify(request)
#expect(scenario == .scenarioE, "Should classify as scenarioE for teamFirst mode with 2+ teams")
let planner = ScenarioPlannerFactory.planner(for: request)
#expect(planner is ScenarioEPlanner, "Should return ScenarioEPlanner for teamFirst mode")
}
@Test("Integration: factory requires 2+ teams for ScenarioE")
func integration_factoryRequires2TeamsForScenarioE() {
let baseDate = Date()
// With only 1 team, should NOT select ScenarioE
var prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 30),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["team1"] // Only 1 team
)
var request = PlanningRequest(
preferences: prefs,
availableGames: [],
teams: [:],
stadiums: [:]
)
var scenario = ScenarioPlannerFactory.classify(request)
#expect(scenario != .scenarioE, "Should NOT classify as scenarioE with only 1 team")
// With 0 teams, should also NOT select ScenarioE
prefs.selectedTeamIds = []
request = PlanningRequest(
preferences: prefs,
availableGames: [],
teams: [:],
stadiums: [:]
)
scenario = ScenarioPlannerFactory.classify(request)
#expect(scenario != .scenarioE, "Should NOT classify as scenarioE with 0 teams")
}
@Test("Integration: realistic east coast trip with 4 teams")
func integration_realisticEastCoastTrip() {
let baseDate = Date()
// East coast stadiums (NYC, Boston, Philly, Baltimore)
let yankeeStadium = makeStadium(
id: "yankee-stadium",
name: "Yankee Stadium",
city: "New York",
coordinate: CLLocationCoordinate2D(latitude: 40.8296, longitude: -73.9262)
)
let fenwayPark = makeStadium(
id: "fenway-park",
name: "Fenway Park",
city: "Boston",
coordinate: CLLocationCoordinate2D(latitude: 42.3467, longitude: -71.0972)
)
let citizensBank = makeStadium(
id: "citizens-bank",
name: "Citizens Bank Park",
city: "Philadelphia",
coordinate: CLLocationCoordinate2D(latitude: 39.9061, longitude: -75.1665)
)
let camdenYards = makeStadium(
id: "camden-yards",
name: "Camden Yards",
city: "Baltimore",
coordinate: CLLocationCoordinate2D(latitude: 39.2838, longitude: -76.6215)
)
// Create games spread across 8-day window (4 teams * 2 = 8 days)
// For 4 teams, window = 8 days. Games must span at least 8 days.
let calendar = Calendar.current
let day1 = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: baseDate.addingTimeInterval(86400 * 1))!
let day3 = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: baseDate.addingTimeInterval(86400 * 3))!
let day5 = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: baseDate.addingTimeInterval(86400 * 5))!
let day8 = calendar.date(bySettingHour: 19, minute: 0, second: 0, of: baseDate.addingTimeInterval(86400 * 8))!
let games = [
Game(id: "yankees-1", homeTeamId: "yankees", awayTeamId: "opp", stadiumId: "yankee-stadium",
dateTime: day1, sport: .mlb, season: "2026", isPlayoff: false),
Game(id: "redsox-1", homeTeamId: "redsox", awayTeamId: "opp", stadiumId: "fenway-park",
dateTime: day3, sport: .mlb, season: "2026", isPlayoff: false),
Game(id: "phillies-1", homeTeamId: "phillies", awayTeamId: "opp", stadiumId: "citizens-bank",
dateTime: day5, sport: .mlb, season: "2026", isPlayoff: false),
Game(id: "orioles-1", homeTeamId: "orioles", awayTeamId: "opp", stadiumId: "camden-yards",
dateTime: day8, sport: .mlb, season: "2026", isPlayoff: false)
]
let prefs = TripPreferences(
planningMode: .teamFirst,
sports: [.mlb],
startDate: baseDate,
endDate: baseDate.addingTimeInterval(86400 * 30),
leisureLevel: .moderate,
lodgingType: .hotel,
numberOfDrivers: 2,
selectedTeamIds: ["yankees", "redsox", "phillies", "orioles"]
)
let request = PlanningRequest(
preferences: prefs,
availableGames: games,
teams: [
"yankees": makeTeam(id: "yankees", name: "Yankees"),
"redsox": makeTeam(id: "redsox", name: "Red Sox"),
"phillies": makeTeam(id: "phillies", name: "Phillies"),
"orioles": makeTeam(id: "orioles", name: "Orioles")
],
stadiums: [
"yankee-stadium": yankeeStadium,
"fenway-park": fenwayPark,
"citizens-bank": citizensBank,
"camden-yards": camdenYards
]
)
let result = planner.plan(request: request)
guard case .success(let options) = result else {
Issue.record("Expected success with drivable east coast trip")
return
}
#expect(!options.isEmpty, "Should find routes for east coast trip")
// Verify each route visits all 4 cities
for option in options {
let citiesVisited = Set(option.stops.map { $0.city })
#expect(citiesVisited.count >= 4, "Each route should visit at least 4 cities (one per team)")
}
}
// MARK: - Helper Methods
private func makeStadium(
id: String,
name: String,
city: String,
coordinate: CLLocationCoordinate2D
) -> Stadium {
Stadium(
id: id,
name: name,
city: city,
state: "XX",
latitude: coordinate.latitude,
longitude: coordinate.longitude,
capacity: 40000,
sport: .mlb
)
}
private func makeTeam(id: String, name: String) -> Team {
Team(
id: id,
name: name,
abbreviation: String(name.prefix(3)).uppercased(),
sport: .mlb,
city: "Test City",
stadiumId: id
)
}
}

View File

@@ -14,3 +14,7 @@
- When there are no save trips Home Screen looks empty
- Trip detail when searching and trip detail after saving do not match
- when selecting by games need to be able to click anywhere on the row to select
- nwsl has names twice (houston houston dash)
- by games routing scenario needs to have a date selection. and when the user selects a particular game it should move the calendar and highlight that day with a 7 day span and the game being in the middle, spot #4. if the user selects multiple games the calendar selection should range from soonest to further away
- schedule is not showing all games - there are more baseketball games tonight than its showing, is it a data issue or client issue? nba san antonio @ houston is tonight and isnt showing

View File

@@ -0,0 +1,444 @@
# Customer Feedback Implementation Plan
**Created:** January 20, 2026
**Updated:** January 20, 2026
**Status:** In Progress (3 of 4 implemented)
**Priority:** High (User-reported issues)
---
## Executive Summary
This document outlines the implementation plan for four customer-reported issues:
| # | Issue | Type | Priority | Status |
|---|-------|------|----------|--------|
| 1 | Game row selection area too small | UX Bug | P1 | ✅ **DONE** |
| 2 | NWSL team names duplicated | Data Bug | P1 | ⏸️ Deferred (per user request) |
| 3 | "By Games" mode needs date selection UI | Feature | P2 | ✅ **DONE** |
| 4 | Schedule view missing NBA games | Investigation | P1 | ✅ **DONE** (debugging added) |
---
## Phase 1: Critical Bug Fixes (P1)
### 1.1 Game Row Selection Area
**Problem:** Users report they cannot click anywhere on the row to select a game.
**Investigation Findings:**
- File: `Features/Trip/Views/Wizard/Steps/GamePickerStep.swift`
- Lines 397-512: `GamesPickerSheet`
- Current implementation wraps content in `Button` with `.buttonStyle(.plain)`
- The entire row SHOULD be tappable
**Root Cause Hypothesis:**
The issue may be in a DIFFERENT view than `GamesPickerSheet`. Need to verify which screen users are referring to. Possible locations:
1. `ScheduleListView.swift` - GameRowView is display-only (no selection)
2. Game selection in other wizard steps
3. Possibly hitting the checkbox icon area feels required
**Tasks:**
- [ ] **Task 1.1.1:** Reproduce the issue - identify EXACT screen where selection fails
- [ ] **Task 1.1.2:** If `GamesPickerSheet` - Add `.contentShape(Rectangle())` to ensure full row hit testing
- [ ] **Task 1.1.3:** If `ScheduleListView` - This is intentionally display-only; clarify requirements
- [ ] **Task 1.1.4:** Add visual tap feedback (highlight on press) for better UX
- [ ] **Task 1.1.5:** Write regression test for row selection
**Proposed Fix (if GamesPickerSheet):**
```swift
Button {
// toggle selection
} label: {
HStack {
// ... content
}
.padding(.vertical, Theme.Spacing.xs)
.contentShape(Rectangle()) // ADD THIS
}
.buttonStyle(.plain)
```
**Files to Modify:**
- `SportsTime/Features/Trip/Views/Wizard/Steps/GamePickerStep.swift`
**Estimated Effort:** 1-2 hours (including investigation)
---
### 1.2 NWSL Team Name Duplication
**Problem:** NWSL teams display as "Houston Houston Dash" instead of "Houston Dash".
**Investigation Findings:**
- File: `Core/Models/Domain/Team.swift:59`
- The `fullName` computed property: `city.isEmpty ? name : "\(city) \(name)"`
- NWSL teams in JSON have:
- `city: "Houston"`
- `name: "Houston Dash"` (already includes city)
- Result: "Houston" + " " + "Houston Dash" = "Houston Houston Dash"
**Affected Teams:**
| Team | city | name | Current fullName |
|------|------|------|------------------|
| Houston Dash | Houston | Houston Dash | Houston Houston Dash |
| Portland Thorns | Portland | Portland Thorns | Portland Portland Thorns |
| Seattle Reign | Seattle | Seattle Reign | Seattle Seattle Reign |
| Orlando Pride | Orlando | Orlando Pride | Orlando Orlando Pride |
| Kansas City Current | Kansas City | Kansas City Current | Kansas City Kansas City Current |
| North Carolina Courage | North Carolina | North Carolina Courage | North Carolina North Carolina Courage |
| + other NWSL teams... | | | |
**Solution Options:**
**Option A: Fix Data (Recommended)**
Update `teams_canonical.json` and regenerate from scripts:
```json
// BEFORE
{ "city": "Houston", "name": "Houston Dash" }
// AFTER
{ "city": "Houston", "name": "Dash" }
```
Pros:
- Clean separation of city and nickname
- Consistent with MLB/NFL/NHL conventions
- `fullName` logic remains simple
Cons:
- Requires data regeneration
- Need to verify all NWSL teams
- Must update CloudKit canonical data
**Option B: Fix Code**
Detect city prefix in `fullName` and avoid duplication:
```swift
var fullName: String {
if city.isEmpty { return name }
if name.hasPrefix(city) { return name } // ADD THIS
return "\(city) \(name)"
}
```
Pros:
- No data migration needed
- Handles edge cases automatically
Cons:
- Hides data inconsistency
- Fragile (what about "New York City FC" vs "NYC"?)
- Adds runtime overhead
**Selected Approach: Option A (Data Fix)**
**Tasks:**
- [ ] **Task 1.2.1:** Update `Scripts/output/teams_nwsl.json` - change `name` to nickname only
- [ ] **Task 1.2.2:** Run data pipeline to regenerate `teams_canonical.json`
- [ ] **Task 1.2.3:** Verify all NWSL teams display correctly
- [ ] **Task 1.2.4:** Update CloudKit canonical data via admin tool
- [ ] **Task 1.2.5:** Write unit test to prevent regression: `test_NWSLTeamNames_DoNotDuplicateCity`
**Files to Modify:**
- `Scripts/output/teams_nwsl.json`
- `SportsTime/Resources/teams_canonical.json` (regenerated)
- CloudKit `CanonicalTeam` records
**Estimated Effort:** 2-3 hours
---
### 1.3 Schedule View Missing Games Investigation
**Problem:** User reports NBA San Antonio @ Houston game (tonight) not showing in schedule.
**Investigation Findings:**
**Date Filter Analysis (TODAY = Jan 20, 2026):**
- Default range: `Date()` to `Date() + 14 days`
- If "tonight" means Jan 20, the game SHOULD show
- Database contains: `game_nba_2025_20260120_sas_hou` dated 2026-01-20 06:00 UTC
**Potential Causes:**
| Cause | Likelihood | How to Verify |
|-------|------------|---------------|
| Timezone mismatch | High | Game stored as UTC, filter uses local time |
| Data not loaded | Medium | Check `AppDataProvider.shared.allGames(for: [.nba])` count |
| Sport filter unchecked | Medium | Verify NBA is in `selectedSports` |
| Team/Stadium lookup fail | Low | Check `teamsById["team_nba_sas"]` exists |
**Most Likely Issue: Timezone Bug**
Game time: `2026-01-20T06:00:00Z` (6 AM UTC = midnight CST on Jan 20)
If user's device is in CST (UTC-6) and filter is:
```swift
game.dateTime >= Date() // Using local time
```
At 7 PM CST on Jan 19, `Date()` = `2026-01-20T01:00:00Z`
Game is at `2026-01-20T06:00:00Z`
→ Game IS >= Date() → Should show
But if filter comparison isn't timezone-aware, edge cases occur.
**Tasks:**
- [ ] **Task 1.3.1:** Add debug logging to `filterGames()` - log date range and game count
- [ ] **Task 1.3.2:** Verify timezone handling in `CanonicalGame.dateTime` vs `Date()`
- [ ] **Task 1.3.3:** Check if games are loading - add game count to Schedule header temporarily
- [ ] **Task 1.3.4:** Verify NBA games exist in local SwiftData store
- [ ] **Task 1.3.5:** If data issue - check `CanonicalSyncService` last sync status
- [ ] **Task 1.3.6:** If client issue - fix timezone/filter logic
- [ ] **Task 1.3.7:** Add diagnostic screen in Settings showing:
- Total games per sport in database
- Last sync timestamp
- Date range being used for filtering
**Files to Investigate:**
- `SportsTime/Features/Schedule/ViewModels/ScheduleViewModel.swift`
- `SportsTime/Core/Services/DataProvider.swift`
- `SportsTime/Core/Services/CanonicalSyncService.swift`
**Estimated Effort:** 2-4 hours (investigation + fix)
---
## Phase 2: Feature Enhancement (P2)
### 2.1 "By Games" Mode - Date Selection with Auto-Scroll
**Problem:** When selecting games in "By Games" routing mode, users want:
1. A visible date selection calendar
2. When selecting a game, calendar auto-scrolls to that game's date
3. Selected game should be in the middle of a 7-day span (position 4 of 7)
4. Multiple games should show range from earliest to latest
**Current Behavior:**
- No calendar shown in `gameFirst` mode
- Dates are auto-derived AFTER user finishes selecting games
- Date range calculated: `min(gameDates)` to `max(gameDates) + 1 day`
**Proposed UX:**
```
┌─────────────────────────────────────────────────┐
│ BY GAMES MODE - STEP 2 │
├─────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────┐ │
│ │ JANUARY 2026 │ │
│ │ Su Mo Tu We Th Fr Sa │ │
│ │ 1 2 3 4 │ │
│ │ 5 6 7 8 9 10 11 │ │
│ │ 12 13 14 15 16 17 18 │ │
│ │ 19 20 [21][22][23][24][25] │ │ ← 7-day span
│ │ 26 27 28 29 30 31 │ │ centered on
│ └─────────────────────────────────────────┘ │ selected game
│ │
│ SELECTED GAMES (2): │
│ ┌─────────────────────────────────────────┐ │
│ │ 🏀 SAS @ HOU • Jan 22 • 7:00 PM ✕│ │ ← Game #1 (anchor)
│ │ ⚾ STL @ HOU • Jan 25 • 1:10 PM ✕│ │ ← Game #2
│ └─────────────────────────────────────────────┘ │
│ │
│ [+ Add More Games] │
│ │
└─────────────────────────────────────────────────┘
```
**Behavior Rules:**
1. **Single Game Selected:**
- Calendar centers on game date (position 4 of 7)
- Date range: `gameDate - 3 days` to `gameDate + 3 days`
- Visual: 7 consecutive days highlighted
2. **Multiple Games Selected:**
- Date range: `min(gameDates) - 1 day` to `max(gameDates) + 1 day`
- All game dates marked with sport-colored dots
- Calendar shows earliest month containing games
3. **User Manually Adjusts Range:**
- Allow expanding beyond auto-calculated range
- Prevent shrinking below game dates (show error)
**Implementation Tasks:**
- [ ] **Task 2.1.1:** Add `DateRangePicker` to `GamePickerStep.swift` (below game summary)
- [ ] **Task 2.1.2:** Create `@State var dateRange: ClosedRange<Date>` in ViewModel
- [ ] **Task 2.1.3:** Implement `updateDateRangeForSelectedGames()`:
```swift
func updateDateRangeForSelectedGames() {
guard !selectedGameIds.isEmpty else {
dateRange = nil // Clear range when no games
return
}
let gameDates = selectedGames.map { $0.dateTime }.sorted()
if gameDates.count == 1 {
// Single game: 7-day span centered on game
let gameDate = gameDates[0]
let start = Calendar.current.date(byAdding: .day, value: -3, to: gameDate)!
let end = Calendar.current.date(byAdding: .day, value: 3, to: gameDate)!
dateRange = start...end
} else {
// Multiple games: span from first to last with 1-day buffer
let start = Calendar.current.date(byAdding: .day, value: -1, to: gameDates.first!)!
let end = Calendar.current.date(byAdding: .day, value: 1, to: gameDates.last!)!
dateRange = start...end
}
}
```
- [ ] **Task 2.1.4:** Modify `DateRangePicker` to accept game dates for marking:
```swift
var gameDates: Set<Date> // Dates with selected games
```
- [ ] **Task 2.1.5:** Add game date markers to calendar (sport-colored dots)
- [ ] **Task 2.1.6:** Implement calendar auto-scroll to earliest game month
- [ ] **Task 2.1.7:** Add `.onChange(of: selectedGameIds)` to trigger `updateDateRangeForSelectedGames()`
- [ ] **Task 2.1.8:** Update `TripWizardView.planTrip()` to use the visible date range
- [ ] **Task 2.1.9:** Write UI tests for auto-scroll behavior
**Files to Modify:**
- `SportsTime/Features/Trip/Views/Wizard/Steps/GamePickerStep.swift`
- `SportsTime/Features/Trip/Views/Wizard/Steps/DateRangePicker.swift`
- `SportsTime/Features/Trip/ViewModels/TripWizardViewModel.swift`
**Estimated Effort:** 6-8 hours
---
## Implementation Schedule
### Week 1: Bug Fixes
| Day | Task | Owner |
|-----|------|-------|
| Day 1 | Task 1.3.1-1.3.4: Investigate schedule missing games | TBD |
| Day 1 | Task 1.2.1-1.2.2: Fix NWSL team data | TBD |
| Day 2 | Task 1.3.5-1.3.7: Complete schedule fix | TBD |
| Day 2 | Task 1.2.3-1.2.5: Verify NWSL fix + tests | TBD |
| Day 3 | Task 1.1.1-1.1.5: Fix row selection | TBD |
| Day 3 | Integration testing all Phase 1 fixes | TBD |
### Week 2: Feature Enhancement
| Day | Task | Owner |
|-----|------|-------|
| Day 4-5 | Tasks 2.1.1-2.1.4: Date picker integration | TBD |
| Day 6 | Tasks 2.1.5-2.1.7: Calendar markers + auto-scroll | TBD |
| Day 7 | Tasks 2.1.8-2.1.9: Integration + UI tests | TBD |
---
## Testing Checklist
### Phase 1 Tests
- [ ] **1.1:** Tap anywhere on game row → row selected
- [ ] **1.1:** Selected row shows checkmark, deselected shows empty circle
- [ ] **1.2:** Houston Dash displays as "Houston Dash" (not "Houston Houston Dash")
- [ ] **1.2:** All NWSL teams display correctly
- [ ] **1.3:** NBA games for today show in Schedule view
- [ ] **1.3:** Games from different timezones show on correct local date
### Phase 2 Tests
- [ ] **2.1:** Select single game → calendar shows 7-day range centered on game
- [ ] **2.1:** Select multiple games → calendar spans earliest to latest
- [ ] **2.1:** Game dates have colored markers on calendar
- [ ] **2.1:** Selecting game auto-scrolls calendar to that month
- [ ] **2.1:** Cannot shrink date range to exclude selected games
- [ ] **2.1:** Trip planning uses visible date range
---
## Risk Assessment
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| NWSL data fix breaks team lookups | Low | High | Test with full data load before deploy |
| Schedule timezone fix affects other sports | Medium | Medium | Test with games from multiple timezones |
| Date picker state conflicts with game selection | Low | Medium | Clear state isolation between components |
| CloudKit sync overwrites data fix | Medium | High | Update CloudKit admin data simultaneously |
---
## Success Criteria
1. **Zero** instances of duplicated team names
2. **All** today's NBA games visible in Schedule (when NBA filter enabled)
3. **100%** row area tappable for game selection
4. **Automatic** calendar updates when games selected in "By Games" mode
---
## Implementation Notes (January 20, 2026)
### Issue 1: Row Selection - COMPLETED
**Fix Applied:** Added `.contentShape(Rectangle())` to all picker sheet rows in `GamePickerStep.swift`:
- `GamesPickerSheet` (line 468)
- `SportsPickerSheet` (line 268)
- `TeamsPickerSheet` (line 366)
This ensures the entire row area is tappable, not just the visible content.
### Issue 2: NWSL Names - DEFERRED
Per user request, this fix is deferred for later investigation.
### Issue 3: By Games Date Selection - COMPLETED
**Implementation:**
- Added `startDate` and `endDate` bindings to `GamePickerStep`
- Added `dateRangeSection` view that appears when games are selected
- Added `updateDateRangeForSelectedGames()` function with auto-scroll logic:
- **Single game:** 7-day span centered on game (position 4 of 7)
- **Multiple games:** Range from earliest to latest with 1-day buffer
- Added game date markers legend showing selected games by date
- Updated `TripWizardView` to pass date bindings
**Files Modified:**
- `GamePickerStep.swift` (new date bindings, date section, auto-update logic)
- `TripWizardView.swift` (pass date bindings to GamePickerStep)
### Issue 4: Missing NBA Games - DEBUGGING ADDED
**Implementation:**
- Added `os.log` logging to `ScheduleViewModel.loadGames()` with:
- Query date range
- Sports filter
- Teams/stadiums loaded count
- Games returned count by sport
- Added `ScheduleDiagnostics` struct to capture query details
- Added `ScheduleDiagnosticsSheet` accessible via toolbar menu in Schedule view
- Shows: date range used, sports queried, data loaded counts, games by sport, troubleshooting steps
**Files Modified:**
- `ScheduleViewModel.swift` (logger, diagnostics tracking)
- `ScheduleListView.swift` (diagnostics menu item and sheet)
**How to Use:**
1. Open Schedule tab
2. Tap filter menu (three lines icon)
3. Tap "Diagnostics"
4. Review date range, loaded data counts, and games by sport
---
## Appendix: File Reference
| File | Purpose |
|------|---------|
| `Features/Trip/Views/Wizard/Steps/GamePickerStep.swift` | Game selection UI |
| `Features/Trip/Views/Wizard/Steps/DateRangePicker.swift` | Calendar component |
| `Features/Trip/ViewModels/TripWizardViewModel.swift` | Trip wizard state |
| `Features/Schedule/Views/ScheduleListView.swift` | Schedule display |
| `Features/Schedule/ViewModels/ScheduleViewModel.swift` | Schedule filtering |
| `Core/Models/Domain/Team.swift` | Team model with `fullName` |
| `Core/Services/DataProvider.swift` | Data loading and filtering |
| `Scripts/output/teams_nwsl.json` | NWSL team source data |
| `Resources/teams_canonical.json` | Bundled team data |

283
docs/stadiumGuide.md Normal file
View File

@@ -0,0 +1,283 @@
# Team-First Planning Mode Implementation Plan
## Overview
**Feature:** Select teams (not dates), find optimal trip windows across the entire season.
**Use Case:** User has 3 stadiums left on bucket list. Select those 3 teams, app finds all windows throughout the season where all 3 teams play at home within a reasonable trip duration.
## Requirements
| Constraint | Value |
|------------|-------|
| Sport scope | Single sport per search |
| Start city | Flexible (algorithm picks optimal) |
| Max duration | `teams × 2` days (3 teams = 6 days) |
| Drive limit | Existing settings value |
| Optimize for | Shortest duration + minimal backtracking |
| Results | Top 10 options with reasoning |
| Season | Current season only |
---
## Architecture
### Current Flow (Date-First)
```
User picks dates → Fetch games in range → Find routes → Rank → Display
```
### New Flow (Team-First)
```
User picks teams → Fetch ALL home games for those teams (full season)
→ Generate sliding windows (N-day chunks)
→ Filter to windows where all teams have ≥1 home game
→ For each valid window: find optimal routes
→ Rank by duration + miles → Return top 10
```
### Reuse Ratio: ~90%
| Component | Status | Notes |
|-----------|--------|-------|
| `allGames(for sports:)` | ✅ Exists | Full season fetch |
| Sliding window generator | ✅ Exists | In ScenarioBPlanner |
| GameDAGRouter | ✅ Exists | Supports anchor constraints |
| ItineraryBuilder | ✅ Exists | No changes needed |
| Max driving time | ✅ Exists | `UserPreferences.maxDrivingHoursPerDriver` |
| Multi-team selection UI | ❌ New | Currently single-team only |
| Team-first planner | ❌ New | ~300 lines adapting Scenario B |
| Window optimization | ⚠️ Adapt | Season = 180+ days → need sampling |
---
## Parallel Track Structure
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 1 (Parallel) │
├──────────────────────┬──────────────────────┬──────────────────────────────┤
│ Track A: Models │ Track B: Planner │ Track C: UI Components │
│ (Agent 1) │ (Agent 2) │ (Agent 3) │
├──────────────────────┼──────────────────────┼──────────────────────────────┤
│ A1. Add teamFirst │ B1. Create │ C1. Create TeamPickerView │
│ to PlanningMode │ ScenarioEPlanner │ (multi-select grid) │
│ │ skeleton │ │
│ A2. Add │ B2. Implement │ C2. Create │
│ selectedTeamIds │ window generator │ TeamFirstWizardStep │
│ to TripPrefs │ for teams │ │
│ │ │ │
│ A3. Update │ B3. Implement route │ C3. Update WizardViewModel │
│ PlanningRequest │ finding with │ for teamFirst mode │
│ computed props │ team anchors │ │
└──────────────────────┴──────────────────────┴──────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 2 (Parallel) │
├─────────────────────────────────┬───────────────────────────────────────────┤
│ Track D: Integration │ Track E: Tests │
│ (Agent 4) │ (Agent 5) │
├─────────────────────────────────┼───────────────────────────────────────────┤
│ D1. Update ScenarioPlannerFactory│ E1. Unit tests for window generator │
│ to detect teamFirst mode │ │
│ │ E2. Unit tests for ScenarioEPlanner │
│ D2. Wire TeamPickerView into │ │
│ existing wizard flow │ E3. Integration test: 3 teams → routes │
│ │ │
│ D3. Add teamFirst option to │ E4. Edge case tests (no windows, etc.) │
│ planning mode selector │ │
└─────────────────────────────────┴───────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ PHASE 3 (Sequential) │
├─────────────────────────────────────────────────────────────────────────────┤
│ F1. End-to-end testing & bug fixes │
│ F2. Performance optimization (if needed) │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## Detailed Task Specifications
### Track A: Data Models (Agent 1)
**A1. Add `teamFirst` to PlanningMode**
```
File: SportsTime/Core/Models/Domain/TripPreferences.swift
Action: Add case to PlanningMode enum
```
**A2. Add `selectedTeamIds` to TripPreferences**
```
File: SportsTime/Core/Models/Domain/TripPreferences.swift
Action: Add `var selectedTeamIds: Set<String> = []`
Action: Add `var teamFirstMaxDays: Int` computed as `selectedTeamIds.count * 2`
```
**A3. Update PlanningRequest computed properties**
```
File: SportsTime/Planning/Models/PlanningModels.swift
Action: Add computed property to extract teams for selected IDs
Action: Add helper to get all home games for selected teams
```
---
### Track B: Scenario Planner (Agent 2)
**B1. Create ScenarioEPlanner skeleton**
```
File: SportsTime/Planning/Scenarios/ScenarioEPlanner.swift (NEW)
Action: Create class conforming to ScenarioPlanner protocol
Action: Implement required methods with TODOs
```
**B2. Implement window generator for teams**
```
File: SportsTime/Planning/Scenarios/ScenarioEPlanner.swift
Action: Generate all N-day windows across season
Action: Filter to windows where ALL selected teams have ≥1 home game
Action: Cap at 50 windows (sample if more)
```
**B3. Implement route finding with team anchors**
```
File: SportsTime/Planning/Scenarios/ScenarioEPlanner.swift
Action: For each valid window, collect all home games for selected teams
Action: Pass to GameDAGRouter with those games as anchors
Action: Build itineraries, rank by duration + miles
Action: Return top 10
```
---
### Track C: UI Components (Agent 3)
**C1. Create TeamPickerView**
```
File: SportsTime/Features/Trip/Views/TeamPickerView.swift (NEW)
Action: Grid of team logos/names for selected sport
Action: Multi-select with checkmarks
Action: Show count badge "3 selected"
Action: Binding to Set<String> for team IDs
```
**C2. Create TeamFirstWizardStep**
```
File: SportsTime/Features/Trip/Views/Wizard/TeamFirstWizardStep.swift (NEW)
Action: Wrapper that uses TeamPickerView
Action: Validation: require ≥2 teams selected
Action: "Find trips for these teams" CTA
```
**C3. Update WizardViewModel for teamFirst mode**
```
File: SportsTime/Features/Trip/ViewModels/TripWizardViewModel.swift
Action: Add teamFirst step configuration
Action: Handle selectedTeamIds state
Action: Skip date selection step when in teamFirst mode
```
---
### Track D: Integration (Agent 4)
**D1. Update ScenarioPlannerFactory**
```
File: SportsTime/Planning/Scenarios/ScenarioPlannerFactory.swift
Action: Add detection for teamFirst mode
Action: Return ScenarioEPlanner when selectedTeamIds.count >= 2
```
**D2. Wire TeamPickerView into wizard**
```
File: SportsTime/Features/Trip/Views/Wizard/TripWizardView.swift
Action: Add case for teamFirst step
Action: Navigation flow: Sport → Teams → (skip dates) → Results
```
**D3. Add teamFirst to mode selector**
```
File: SportsTime/Features/Trip/Views/Wizard/PlanningModeSelector.swift
Action: Add "Teams First" option with icon
Action: Description: "Pick teams, we'll find the best windows"
```
---
### Track E: Tests (Agent 5)
**E1. Unit tests for window generator**
```
File: SportsTimeTests/Planning/ScenarioEPlannerTests.swift (NEW)
Tests:
- 3 teams, 6-day window → finds valid windows
- Window with only 2 of 3 teams → excluded
- Empty season → returns empty
- Sampling works when >50 windows
```
**E2. Unit tests for ScenarioEPlanner**
```
File: SportsTimeTests/Planning/ScenarioEPlannerTests.swift
Tests:
- Returns PlanningResult with routes
- All routes include all selected teams
- Routes sorted by duration ascending
- Respects max driving time constraint
```
**E3. Integration test**
```
File: SportsTimeTests/Planning/TeamFirstIntegrationTests.swift (NEW)
Tests:
- Full flow: 3 MLB teams → top 10 routes
- Each route visits all 3 stadiums
- Total duration ≤ 6 days
```
**E4. Edge case tests**
```
File: SportsTimeTests/Planning/ScenarioEPlannerTests.swift
Tests:
- Teams with no overlapping games → graceful error
- Single team selected → validation error
- Teams in same city → treated as separate stops
```
---
## Risks & Mitigations
| Risk | Mitigation |
|------|------------|
| Season = 180+ days = too many windows | Sample every 2nd day, or limit to weekends |
| No valid windows (teams never align) | Early validation + user feedback |
| Slow computation (10+ windows × routing) | Cap at 50 windows, parallelize |
| Teams in same city (e.g., 2 NYC teams) | Treat as separate stops (different stadiums) |
---
## Test Strategy
1. **Unit:** Window generator finds correct multi-team windows
2. **Unit:** Anchor constraints include all teams' home games
3. **Integration:** 3 teams → returns routes visiting all 3
4. **Edge:** Teams with non-overlapping seasons → graceful "no trips found"
5. **Perf:** Full MLB season (2,430 games) completes in <5s
---
## Estimated Effort
| Track | New Code | Modified Code | Total |
|-------|----------|---------------|-------|
| A: Models | 50 lines | 20 lines | 70 lines |
| B: Planner | 300 lines | 0 lines | 300 lines |
| C: UI | 250 lines | 50 lines | 300 lines |
| D: Integration | 30 lines | 80 lines | 110 lines |
| E: Tests | 400 lines | 0 lines | 400 lines |
| **Total** | **1,030 lines** | **150 lines** | **1,180 lines** |

View File

@@ -0,0 +1,2 @@
# Get your Mapbox token at https://console.mapbox.com/account/access-tokens/
REMOTION_MAPBOX_TOKEN=pk.your-mapbox-access-token

16
marketing-videos/.gitignore vendored Normal file
View File

@@ -0,0 +1,16 @@
# Dependencies
node_modules/
# Build output
dist/
out/
# Remotion cache
.remotion/
# Environment
.env
.env.local
# OS files
.DS_Store

5280
marketing-videos/package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,29 @@
{
"name": "sportstime-marketing-videos",
"version": "1.0.0",
"description": "Marketing videos for SportsTime app",
"scripts": {
"start": "remotion studio",
"build": "remotion render",
"render:route": "remotion render TheRoute out/the-route.mp4 --gl=angle --concurrency=1",
"render:checklist": "remotion render TheChecklist out/the-checklist.mp4",
"render:bucketlist": "remotion render TheBucketList out/the-bucket-list.mp4",
"render:squad": "remotion render TheSquad out/the-squad.mp4",
"render:handoff": "remotion render TheHandoff out/the-handoff.mp4",
"render:all": "npm run render:route && npm run render:checklist && npm run render:bucketlist && npm run render:squad && npm run render:handoff"
},
"type": "module",
"dependencies": {
"@remotion/cli": "^4.0.410",
"@remotion/transitions": "^4.0.410",
"@turf/turf": "^7.3.2",
"@types/mapbox-gl": "^3.4.1",
"@types/react": "^19.2.9",
"@types/react-dom": "^19.2.3",
"mapbox-gl": "^3.18.1",
"react": "^19.2.4",
"react-dom": "^19.2.4",
"remotion": "^4.0.410",
"typescript": "^5.9.3"
}
}

View File

@@ -0,0 +1,4 @@
import { Config } from "@remotion/cli/config";
Config.setVideoImageFormat("jpeg");
Config.setOverwriteOutput(true);

Some files were not shown because too many files have changed in this diff Show More