Replace monolithic scraping scripts with sportstime_parser package: - Multi-source scrapers with automatic fallback for 7 sports - Canonical ID generation for games, teams, and stadiums - Fuzzy matching with configurable thresholds for name resolution - CloudKit Web Services uploader with JWT auth, diff-based updates - Resumable uploads with checkpoint state persistence - Validation reports with manual review items and suggested matches - Comprehensive test suite (249 tests) CLI: sportstime-parser scrape|validate|upload|status|retry|clear Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
689 lines
20 KiB
Markdown
689 lines
20 KiB
Markdown
# SportsTime Parser
|
|
|
|
A Python CLI tool for scraping sports schedules, normalizing data with canonical IDs, and uploading to CloudKit.
|
|
|
|
## Features
|
|
|
|
- Scrapes game schedules from multiple sources with automatic fallback
|
|
- Supports 7 major sports leagues: NBA, MLB, NFL, NHL, MLS, WNBA, NWSL
|
|
- Generates deterministic canonical IDs for games, teams, and stadiums
|
|
- Produces validation reports with manual review lists
|
|
- Uploads to CloudKit with resumable, diff-based updates
|
|
|
|
## Requirements
|
|
|
|
- Python 3.11+
|
|
- CloudKit credentials (for upload functionality)
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
# From the Scripts directory
|
|
cd Scripts
|
|
|
|
# Install in development mode
|
|
pip install -e ".[dev]"
|
|
|
|
# Or install dependencies only
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Scrape NBA 2025-26 season
|
|
sportstime-parser scrape nba --season 2025
|
|
|
|
# Scrape all sports
|
|
sportstime-parser scrape all --season 2025
|
|
|
|
# Validate existing scraped data
|
|
sportstime-parser validate nba --season 2025
|
|
|
|
# Check status
|
|
sportstime-parser status
|
|
|
|
# Upload to CloudKit (development)
|
|
sportstime-parser upload nba --season 2025
|
|
|
|
# Upload to CloudKit (production)
|
|
sportstime-parser upload nba --season 2025 --environment production
|
|
```
|
|
|
|
## CLI Reference
|
|
|
|
### scrape
|
|
|
|
Scrape game schedules, teams, and stadiums from web sources.
|
|
|
|
```bash
|
|
sportstime-parser scrape <sport> [options]
|
|
|
|
Arguments:
|
|
sport Sport to scrape: nba, mlb, nfl, nhl, mls, wnba, nwsl, or "all"
|
|
|
|
Options:
|
|
--season, -s INT Season start year (default: 2025)
|
|
--dry-run Parse and validate only, don't write output files
|
|
--verbose, -v Enable verbose output
|
|
```
|
|
|
|
**Examples:**
|
|
|
|
```bash
|
|
# Scrape NBA 2025-26 season
|
|
sportstime-parser scrape nba --season 2025
|
|
|
|
# Scrape all sports with verbose output
|
|
sportstime-parser scrape all --season 2025 --verbose
|
|
|
|
# Dry run to test without writing files
|
|
sportstime-parser scrape mlb --season 2026 --dry-run
|
|
```
|
|
|
|
### validate
|
|
|
|
Run validation on existing scraped data and regenerate reports. Validation performs these checks:
|
|
|
|
1. **Game Coverage**: Compares scraped game count against expected totals per league (e.g., ~1,230 for NBA, ~2,430 for MLB)
|
|
2. **Team Resolution**: Identifies team names that couldn't be matched to canonical IDs using fuzzy matching
|
|
3. **Stadium Resolution**: Identifies venue names that couldn't be matched to canonical stadium IDs
|
|
4. **Duplicate Detection**: Finds games with the same home/away teams on the same date (potential doubleheader issues or data errors)
|
|
5. **Missing Data**: Flags games missing required fields (stadium_id, team IDs, valid dates)
|
|
|
|
The output is a Markdown report with:
|
|
- Summary statistics (total games, valid games, coverage percentage)
|
|
- Manual review items grouped by type (unresolved teams, unresolved stadiums, duplicates)
|
|
- Fuzzy match suggestions with confidence scores to help resolve unmatched names
|
|
|
|
```bash
|
|
sportstime-parser validate <sport> [options]
|
|
|
|
Arguments:
|
|
sport Sport to validate: nba, mlb, nfl, nhl, mls, wnba, nwsl, or "all"
|
|
|
|
Options:
|
|
--season, -s INT Season start year (default: 2025)
|
|
```
|
|
|
|
**Examples:**
|
|
|
|
```bash
|
|
# Validate NBA data
|
|
sportstime-parser validate nba --season 2025
|
|
|
|
# Validate all sports
|
|
sportstime-parser validate all
|
|
```
|
|
|
|
### upload
|
|
|
|
Upload scraped data to CloudKit with diff-based updates.
|
|
|
|
```bash
|
|
sportstime-parser upload <sport> [options]
|
|
|
|
Arguments:
|
|
sport Sport to upload: nba, mlb, nfl, nhl, mls, wnba, nwsl, or "all"
|
|
|
|
Options:
|
|
--season, -s INT Season start year (default: 2025)
|
|
--environment, -e CloudKit environment: development or production (default: development)
|
|
--resume Resume interrupted upload from last checkpoint
|
|
```
|
|
|
|
**Examples:**
|
|
|
|
```bash
|
|
# Upload NBA to development
|
|
sportstime-parser upload nba --season 2025
|
|
|
|
# Upload to production
|
|
sportstime-parser upload nba --season 2025 --environment production
|
|
|
|
# Resume interrupted upload
|
|
sportstime-parser upload mlb --season 2026 --resume
|
|
```
|
|
|
|
### status
|
|
|
|
Show current scrape and upload status.
|
|
|
|
```bash
|
|
sportstime-parser status
|
|
```
|
|
|
|
### retry
|
|
|
|
Retry failed uploads from previous attempts.
|
|
|
|
```bash
|
|
sportstime-parser retry <sport> [options]
|
|
|
|
Arguments:
|
|
sport Sport to retry: nba, mlb, nfl, nhl, mls, wnba, nwsl, or "all"
|
|
|
|
Options:
|
|
--season, -s INT Season start year (default: 2025)
|
|
--environment, -e CloudKit environment (default: development)
|
|
--max-retries INT Maximum retry attempts per record (default: 3)
|
|
```
|
|
|
|
### clear
|
|
|
|
Clear upload session state to start fresh.
|
|
|
|
```bash
|
|
sportstime-parser clear <sport> [options]
|
|
|
|
Arguments:
|
|
sport Sport to clear: nba, mlb, nfl, nhl, mls, wnba, nwsl, or "all"
|
|
|
|
Options:
|
|
--season, -s INT Season start year (default: 2025)
|
|
--environment, -e CloudKit environment (default: development)
|
|
```
|
|
|
|
## CloudKit Configuration
|
|
|
|
To upload data to CloudKit, you need to configure authentication credentials.
|
|
|
|
### 1. Get Credentials from Apple Developer Portal
|
|
|
|
1. Go to [Apple Developer Portal](https://developer.apple.com)
|
|
2. Navigate to **Certificates, Identifiers & Profiles** > **Keys**
|
|
3. Create a new key with **CloudKit** capability
|
|
4. Download the private key file (.p8)
|
|
5. Note the Key ID
|
|
|
|
### 2. Set Environment Variables
|
|
|
|
```bash
|
|
# Key ID from Apple Developer Portal
|
|
export CLOUDKIT_KEY_ID="your_key_id_here"
|
|
|
|
# Path to private key file
|
|
export CLOUDKIT_PRIVATE_KEY_PATH="/path/to/AuthKey_XXXXXX.p8"
|
|
|
|
# Or provide key content directly (useful for CI/CD)
|
|
export CLOUDKIT_PRIVATE_KEY="-----BEGIN EC PRIVATE KEY-----
|
|
...key content...
|
|
-----END EC PRIVATE KEY-----"
|
|
```
|
|
|
|
### 3. Verify Configuration
|
|
|
|
```bash
|
|
sportstime-parser status
|
|
```
|
|
|
|
The status output will show whether CloudKit is configured correctly.
|
|
|
|
## Output Files
|
|
|
|
Scraped data is saved to the `output/` directory:
|
|
|
|
```
|
|
output/
|
|
games_nba_2025.json # Game schedules
|
|
teams_nba.json # Team data
|
|
stadiums_nba.json # Stadium data
|
|
validation_nba_2025.md # Validation report
|
|
```
|
|
|
|
## Validation Reports
|
|
|
|
Validation reports are generated in Markdown format at `output/validation_{sport}_{season}.md`.
|
|
|
|
### Report Sections
|
|
|
|
**Summary Table**
|
|
| Metric | Description |
|
|
|--------|-------------|
|
|
| Total Games | Number of games scraped |
|
|
| Valid Games | Games with all required fields resolved |
|
|
| Coverage | Percentage of expected games found (based on league schedule) |
|
|
| Unresolved Teams | Team names that couldn't be matched |
|
|
| Unresolved Stadiums | Venue names that couldn't be matched |
|
|
| Duplicates | Potential duplicate game entries |
|
|
|
|
**Manual Review Items**
|
|
|
|
Items are grouped by type and include the raw value, source URL, and suggested fixes:
|
|
|
|
- **Unresolved Teams**: Team names not in the alias mapping. Add to `team_aliases.json` to resolve.
|
|
- **Unresolved Stadiums**: Venue names not recognized. Common for renamed arenas (naming rights changes). Add to `stadium_aliases.json`.
|
|
- **Duplicate Games**: Same matchup on same date. May indicate doubleheader parsing issues or duplicate entries from different sources.
|
|
- **Missing Data**: Games missing stadium coordinates or other required fields.
|
|
|
|
**Fuzzy Match Suggestions**
|
|
|
|
For each unresolved name, the validator provides the top fuzzy matches with confidence scores (0-100). High-confidence matches (>80) are likely correct; lower scores need manual verification.
|
|
|
|
## Canonical IDs
|
|
|
|
Canonical IDs are stable, deterministic identifiers that enable cross-referencing between games, teams, and stadiums across different data sources.
|
|
|
|
### ID Formats
|
|
|
|
**Games**
|
|
```
|
|
{sport}_{season}_{away}_{home}_{MMDD}[_{game_number}]
|
|
```
|
|
Examples:
|
|
- `nba_2025_hou_okc_1021` - NBA 2025-26, Houston @ OKC, Oct 21
|
|
- `mlb_2026_nyy_bos_0401_1` - MLB 2026, Yankees @ Red Sox, Apr 1, Game 1 (doubleheader)
|
|
|
|
**Teams**
|
|
```
|
|
{sport}_{city}_{name}
|
|
```
|
|
Examples:
|
|
- `nba_la_lakers`
|
|
- `mlb_new_york_yankees`
|
|
- `nfl_new_york_giants`
|
|
|
|
**Stadiums**
|
|
```
|
|
{sport}_{normalized_name}
|
|
```
|
|
Examples:
|
|
- `mlb_yankee_stadium`
|
|
- `nba_crypto_com_arena`
|
|
- `nfl_sofi_stadium`
|
|
|
|
### Generated vs Matched IDs
|
|
|
|
| Entity | Generated | Matched |
|
|
|--------|-----------|---------|
|
|
| **Teams** | Pre-defined in `team_resolver.py` mappings | Resolved from raw scraped names via aliases + fuzzy matching |
|
|
| **Stadiums** | Pre-defined in `stadium_resolver.py` mappings | Resolved from raw venue names via aliases + fuzzy matching |
|
|
| **Games** | Generated at scrape time from resolved team IDs + date | N/A (always generated, never matched) |
|
|
|
|
**Resolution Flow:**
|
|
```
|
|
Raw Name (from scraper)
|
|
↓
|
|
Exact Match (alias lookup in team_aliases.json / stadium_aliases.json)
|
|
↓ (if no match)
|
|
Fuzzy Match (Levenshtein distance against known names)
|
|
↓ (if confidence > threshold)
|
|
Canonical ID assigned
|
|
↓ (if no match)
|
|
Manual Review Item created
|
|
```
|
|
|
|
### Cross-References
|
|
|
|
Entities reference each other via canonical IDs:
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Game │
|
|
│ id: nba_2025_hou_okc_1021 │
|
|
│ home_team_id: nba_oklahoma_city_thunder ──────────────┐ │
|
|
│ away_team_id: nba_houston_rockets ────────────────┐ │ │
|
|
│ stadium_id: nba_paycom_center ────────────────┐ │ │ │
|
|
└─────────────────────────────────────────────────│───│───│───┘
|
|
│ │ │
|
|
┌─────────────────────────────────────────────────│───│───│───┐
|
|
│ Stadium │ │ │ │
|
|
│ id: nba_paycom_center ◄───────────────────────┘ │ │ │
|
|
│ name: "Paycom Center" │ │ │
|
|
│ city: "Oklahoma City" │ │ │
|
|
│ latitude: 35.4634 │ │ │
|
|
│ longitude: -97.5151 │ │ │
|
|
└─────────────────────────────────────────────────────│───│───┘
|
|
│ │
|
|
┌─────────────────────────────────────────────────────│───│───┐
|
|
│ Team │ │ │
|
|
│ id: nba_houston_rockets ◄─────────────────────────┘ │ │
|
|
│ name: "Rockets" │ │
|
|
│ city: "Houston" │ │
|
|
│ stadium_id: nba_toyota_center │ │
|
|
└─────────────────────────────────────────────────────────│───┘
|
|
│
|
|
┌─────────────────────────────────────────────────────────│───┐
|
|
│ Team │ │
|
|
│ id: nba_oklahoma_city_thunder ◄───────────────────────┘ │
|
|
│ name: "Thunder" │
|
|
│ city: "Oklahoma City" │
|
|
│ stadium_id: nba_paycom_center │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Alias Files
|
|
|
|
Aliases map variant names to canonical IDs:
|
|
|
|
**`team_aliases.json`**
|
|
```json
|
|
{
|
|
"nba": {
|
|
"LA Lakers": "nba_la_lakers",
|
|
"Los Angeles Lakers": "nba_la_lakers",
|
|
"LAL": "nba_la_lakers"
|
|
}
|
|
}
|
|
```
|
|
|
|
**`stadium_aliases.json`**
|
|
```json
|
|
{
|
|
"nba": {
|
|
"Crypto.com Arena": "nba_crypto_com_arena",
|
|
"Staples Center": "nba_crypto_com_arena",
|
|
"STAPLES Center": "nba_crypto_com_arena"
|
|
}
|
|
}
|
|
```
|
|
|
|
When a scraper returns a raw name like "LA Lakers", the resolver:
|
|
1. Checks `team_aliases.json` for an exact match → finds `nba_la_lakers`
|
|
2. If no exact match, runs fuzzy matching against all known team names
|
|
3. If fuzzy match confidence > 80%, uses that canonical ID
|
|
4. Otherwise, creates a manual review item for human resolution
|
|
|
|
## Adding a New Sport
|
|
|
|
To add support for a new sport (e.g., `cfb` for college football), update these files:
|
|
|
|
### 1. Configuration (`config.py`)
|
|
|
|
Add the sport to `SUPPORTED_SPORTS` and `EXPECTED_GAME_COUNTS`:
|
|
|
|
```python
|
|
SUPPORTED_SPORTS: list[str] = [
|
|
"nba", "mlb", "nfl", "nhl", "mls", "wnba", "nwsl",
|
|
"cfb", # ← Add new sport
|
|
]
|
|
|
|
EXPECTED_GAME_COUNTS: dict[str, int] = {
|
|
# ... existing sports ...
|
|
"cfb": 900, # ← Add expected game count for validation
|
|
}
|
|
```
|
|
|
|
### 2. Team Mappings (`normalizers/team_resolver.py`)
|
|
|
|
Add team definitions to `TEAM_MAPPINGS`. Each entry maps an abbreviation to `(canonical_id, full_name, city)`:
|
|
|
|
```python
|
|
TEAM_MAPPINGS: dict[str, dict[str, tuple[str, str, str]]] = {
|
|
# ... existing sports ...
|
|
"cfb": {
|
|
"ALA": ("team_cfb_ala", "Alabama Crimson Tide", "Tuscaloosa"),
|
|
"OSU": ("team_cfb_osu", "Ohio State Buckeyes", "Columbus"),
|
|
# ... all teams ...
|
|
},
|
|
}
|
|
```
|
|
|
|
### 3. Stadium Mappings (`normalizers/stadium_resolver.py`)
|
|
|
|
Add stadium definitions to `STADIUM_MAPPINGS`. Each entry is a `StadiumInfo` with coordinates:
|
|
|
|
```python
|
|
STADIUM_MAPPINGS: dict[str, dict[str, StadiumInfo]] = {
|
|
# ... existing sports ...
|
|
"cfb": {
|
|
"stadium_cfb_bryant_denny": StadiumInfo(
|
|
id="stadium_cfb_bryant_denny",
|
|
name="Bryant-Denny Stadium",
|
|
city="Tuscaloosa",
|
|
state="AL",
|
|
country="USA",
|
|
sport="cfb",
|
|
latitude=33.2083,
|
|
longitude=-87.5503,
|
|
),
|
|
# ... all stadiums ...
|
|
},
|
|
}
|
|
```
|
|
|
|
### 4. Scraper Implementation (`scrapers/cfb.py`)
|
|
|
|
Create a new scraper class extending `BaseScraper`:
|
|
|
|
```python
|
|
from .base import BaseScraper, RawGameData, ScrapeResult
|
|
|
|
class CFBScraper(BaseScraper):
|
|
def __init__(self, season: int, **kwargs):
|
|
super().__init__("cfb", season, **kwargs)
|
|
self._team_resolver = get_team_resolver("cfb")
|
|
self._stadium_resolver = get_stadium_resolver("cfb")
|
|
|
|
def _get_sources(self) -> list[str]:
|
|
return ["espn", "sports_reference"] # Priority order
|
|
|
|
def _get_source_url(self, source: str, **kwargs) -> str:
|
|
# Return URL for each source
|
|
...
|
|
|
|
def _scrape_games_from_source(self, source: str) -> list[RawGameData]:
|
|
# Implement scraping logic
|
|
...
|
|
|
|
def _normalize_games(self, raw_games: list[RawGameData]) -> tuple[list[Game], list[ManualReviewItem]]:
|
|
# Convert raw data to Game objects using resolvers
|
|
...
|
|
|
|
def scrape_teams(self) -> list[Team]:
|
|
# Return Team objects from TEAM_MAPPINGS
|
|
...
|
|
|
|
def scrape_stadiums(self) -> list[Stadium]:
|
|
# Return Stadium objects from STADIUM_MAPPINGS
|
|
...
|
|
|
|
def create_cfb_scraper(season: int) -> CFBScraper:
|
|
return CFBScraper(season=season)
|
|
```
|
|
|
|
### 5. Register Scraper (`scrapers/__init__.py`)
|
|
|
|
Export the new scraper:
|
|
|
|
```python
|
|
from .cfb import CFBScraper, create_cfb_scraper
|
|
|
|
__all__ = [
|
|
# ... existing exports ...
|
|
"CFBScraper",
|
|
"create_cfb_scraper",
|
|
]
|
|
```
|
|
|
|
### 6. CLI Registration (`cli.py`)
|
|
|
|
Add the sport to `get_scraper()`:
|
|
|
|
```python
|
|
def get_scraper(sport: str, season: int):
|
|
# ... existing sports ...
|
|
elif sport == "cfb":
|
|
from .scrapers.cfb import create_cfb_scraper
|
|
return create_cfb_scraper(season)
|
|
```
|
|
|
|
### 7. Alias Files (`team_aliases.json`, `stadium_aliases.json`)
|
|
|
|
Add initial aliases for common name variants:
|
|
|
|
```json
|
|
// team_aliases.json
|
|
{
|
|
"cfb": {
|
|
"Alabama": "team_cfb_ala",
|
|
"Bama": "team_cfb_ala",
|
|
"Roll Tide": "team_cfb_ala"
|
|
}
|
|
}
|
|
|
|
// stadium_aliases.json
|
|
{
|
|
"cfb": {
|
|
"Bryant Denny Stadium": "stadium_cfb_bryant_denny",
|
|
"Bryant-Denny": "stadium_cfb_bryant_denny"
|
|
}
|
|
}
|
|
```
|
|
|
|
### 8. Documentation (`SOURCES.md`)
|
|
|
|
Document data sources with URLs, rate limits, and notes:
|
|
|
|
```markdown
|
|
## CFB (College Football)
|
|
|
|
**Teams**: 134 (FBS)
|
|
**Expected Games**: ~900 per season
|
|
**Season**: August - January
|
|
|
|
### Sources
|
|
|
|
| Priority | Source | URL Pattern | Data Type |
|
|
|----------|--------|-------------|-----------|
|
|
| 1 | ESPN API | `site.api.espn.com/apis/site/v2/sports/football/college-football/scoreboard` | JSON |
|
|
| 2 | Sports-Reference | `sports-reference.com/cfb/years/{YEAR}-schedule.html` | HTML |
|
|
```
|
|
|
|
### 9. Tests (`tests/test_scrapers/test_cfb.py`)
|
|
|
|
Create tests for the new scraper:
|
|
|
|
```python
|
|
import pytest
|
|
from sportstime_parser.scrapers.cfb import CFBScraper, create_cfb_scraper
|
|
|
|
class TestCFBScraper:
|
|
def test_factory_creates_scraper(self):
|
|
scraper = create_cfb_scraper(season=2025)
|
|
assert scraper.sport == "cfb"
|
|
assert scraper.season == 2025
|
|
|
|
def test_get_sources_returns_priority_list(self):
|
|
scraper = CFBScraper(season=2025)
|
|
sources = scraper._get_sources()
|
|
assert "espn" in sources
|
|
|
|
# ... more tests ...
|
|
```
|
|
|
|
### Checklist
|
|
|
|
- [ ] Add to `SUPPORTED_SPORTS` in `config.py`
|
|
- [ ] Add to `EXPECTED_GAME_COUNTS` in `config.py`
|
|
- [ ] Add team mappings to `team_resolver.py`
|
|
- [ ] Add stadium mappings to `stadium_resolver.py`
|
|
- [ ] Create `scrapers/{sport}.py` with scraper class
|
|
- [ ] Export in `scrapers/__init__.py`
|
|
- [ ] Register in `cli.py` `get_scraper()`
|
|
- [ ] Add aliases to `team_aliases.json`
|
|
- [ ] Add aliases to `stadium_aliases.json`
|
|
- [ ] Document sources in `SOURCES.md`
|
|
- [ ] Create tests in `tests/test_scrapers/`
|
|
- [ ] Run `pytest` to verify all tests pass
|
|
- [ ] Run dry-run scrape: `sportstime-parser scrape {sport} --season 2025 --dry-run`
|
|
|
|
## Development
|
|
|
|
### Running Tests
|
|
|
|
```bash
|
|
# Run all tests
|
|
pytest
|
|
|
|
# Run with coverage
|
|
pytest --cov=sportstime_parser --cov-report=html
|
|
|
|
# Run specific test file
|
|
pytest tests/test_scrapers/test_nba.py
|
|
|
|
# Run with verbose output
|
|
pytest -v
|
|
```
|
|
|
|
### Project Structure
|
|
|
|
```
|
|
sportstime_parser/
|
|
__init__.py
|
|
__main__.py # CLI entry point
|
|
cli.py # Subcommand definitions
|
|
config.py # Constants, defaults
|
|
|
|
models/
|
|
game.py # Game dataclass
|
|
team.py # Team dataclass
|
|
stadium.py # Stadium dataclass
|
|
aliases.py # Alias dataclasses
|
|
|
|
scrapers/
|
|
base.py # BaseScraper abstract class
|
|
nba.py # NBA scrapers
|
|
mlb.py # MLB scrapers
|
|
nfl.py # NFL scrapers
|
|
nhl.py # NHL scrapers
|
|
mls.py # MLS scrapers
|
|
wnba.py # WNBA scrapers
|
|
nwsl.py # NWSL scrapers
|
|
|
|
normalizers/
|
|
canonical_id.py # ID generation
|
|
team_resolver.py # Team name resolution
|
|
stadium_resolver.py # Stadium name resolution
|
|
timezone.py # Timezone conversion
|
|
fuzzy.py # Fuzzy matching
|
|
|
|
validators/
|
|
report.py # Validation report generator
|
|
|
|
uploaders/
|
|
cloudkit.py # CloudKit Web Services client
|
|
state.py # Resumable upload state
|
|
diff.py # Record comparison
|
|
|
|
utils/
|
|
http.py # Rate-limited HTTP client
|
|
logging.py # Verbose logger
|
|
progress.py # Progress bars
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### "No games file found"
|
|
|
|
Run the scrape command first:
|
|
```bash
|
|
sportstime-parser scrape nba --season 2025
|
|
```
|
|
|
|
### "CloudKit not configured"
|
|
|
|
Set the required environment variables:
|
|
```bash
|
|
export CLOUDKIT_KEY_ID="your_key_id"
|
|
export CLOUDKIT_PRIVATE_KEY_PATH="/path/to/key.p8"
|
|
```
|
|
|
|
### Rate limit errors
|
|
|
|
The scraper includes automatic rate limiting and exponential backoff. If you encounter persistent rate limit errors:
|
|
|
|
1. Wait a few minutes before retrying
|
|
2. Try scraping one sport at a time instead of "all"
|
|
3. Check that you're not running multiple instances
|
|
|
|
### Scrape fails with no data
|
|
|
|
1. Check your internet connection
|
|
2. Run with `--verbose` to see detailed error messages
|
|
3. The scraper will try multiple sources - if all fail, the source websites may be temporarily unavailable
|
|
|
|
## License
|
|
|
|
MIT
|