Adds the full Django application layer on top of sportstime_parser: - core: Sport, Team, Stadium, Game models with aliases and league structure - scraper: orchestration engine, adapter, job management, Celery tasks - cloudkit: CloudKit sync client, sync state tracking, sync jobs - dashboard: staff dashboard for monitoring scrapers, sync, review queue - notifications: email reports for scrape/sync results - Docker setup for deployment (Dockerfile, docker-compose, entrypoint) Game exports now use game_datetime_utc (ISO 8601 UTC) instead of venue-local date+time strings, matching the canonical format used by the iOS app. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SportsTime Data Pipeline
A Django-based sports data pipeline that scrapes game schedules from official sources, normalizes data, and syncs to CloudKit for iOS app consumption.
Features
- Multi-sport support: NBA, MLB, NFL, NHL, MLS, WNBA, NWSL
- Automated scraping: Scheduled data collection from ESPN and league APIs
- Smart name resolution: Team/stadium aliases with date validity support
- CloudKit sync: Push data to iCloud for iOS app consumption
- Admin dashboard: Monitor scrapers, review items, manage data
- Import/Export: Bulk data management via JSON, CSV, XLSX
- Audit history: Track all changes with django-simple-history
Quick Start
Prerequisites
- Docker and Docker Compose
- (Optional) CloudKit credentials for sync
Setup
-
Clone the repository:
git clone <repo-url> cd SportsTimeScripts -
Copy environment template:
cp .env.example .env -
Start the containers:
docker-compose up -d -
Run migrations:
docker-compose exec web python manage.py migrate -
Create a superuser:
docker-compose exec web python manage.py createsuperuser -
Access the admin at http://localhost:8000/admin/
-
Access the dashboard at http://localhost:8000/dashboard/
Architecture
┌─────────────────┐ ┌──────────────┐ ┌─────────────┐ ┌──────────┐
│ Data Sources │ ──▶ │ Scrapers │ ──▶ │ PostgreSQL │ ──▶ │ CloudKit │
│ (ESPN, leagues) │ │ (sportstime_ │ │ (Django) │ │ (iOS) │
└─────────────────┘ │ parser) │ └─────────────┘ └──────────┘
└──────────────┘
Components
| Component | Description |
|---|---|
| Django | Web framework, ORM, admin interface |
| PostgreSQL | Primary database |
| Redis | Celery message broker |
| Celery | Async task queue (scraping, syncing) |
| Celery Beat | Scheduled task runner |
| sportstime_parser | Standalone scraper library |
Usage
Dashboard
Visit http://localhost:8000/dashboard/ (staff login required) to:
- View scraper status and run scrapers
- Monitor CloudKit sync status
- Review items needing manual attention
- See statistics across all sports
Running Scrapers
Via Dashboard:
- Go to Dashboard → Scraper Status
- Click "Run Now" for a specific sport or "Run All Enabled"
Via Command Line:
docker-compose exec web python manage.py shell
>>> from scraper.tasks import run_scraper_task
>>> from scraper.models import ScraperConfig
>>> config = ScraperConfig.objects.get(sport__code='nba', season=2025)
>>> run_scraper_task.delay(config.id)
Managing Aliases
When scrapers encounter unknown team or stadium names:
- A Review Item is created for manual resolution
- Add an alias via Admin → Team Aliases or Stadium Aliases
- Re-run the scraper to pick up the new mapping
Aliases support validity dates - useful for:
- Historical team names (e.g., "Washington Redskins" valid until 2020)
- Stadium naming rights changes (e.g., "Staples Center" valid until 2021)
Import/Export
All admin models support bulk import/export:
- Go to any admin list page (e.g., Teams)
- Click Export → Select format (JSON recommended) → Submit
- Modify the data as needed (e.g., ask Claude to update it)
- Click Import → Upload file → Preview → Confirm
Imports will update existing records and create new ones.
Project Structure
SportsTimeScripts/
├── core/ # Core Django models
│ ├── models/ # Sport, Team, Stadium, Game, Aliases
│ ├── admin/ # Admin configuration with import/export
│ └── resources.py # Import/export resource definitions
├── scraper/ # Scraper orchestration
│ ├── engine/ # Adapter, DB alias loaders
│ │ ├── adapter.py # Bridges sportstime_parser to Django
│ │ └── db_alias_loader.py # Database alias resolution
│ ├── models.py # ScraperConfig, ScrapeJob, ManualReviewItem
│ └── tasks.py # Celery tasks
├── sportstime_parser/ # Standalone scraper library
│ ├── scrapers/ # Per-sport scrapers (NBA, MLB, etc.)
│ ├── normalizers/ # Team/stadium name resolution
│ ├── models/ # Data classes
│ └── uploaders/ # CloudKit client (legacy)
├── cloudkit/ # CloudKit sync
│ ├── client.py # CloudKit API client
│ ├── models.py # CloudKitConfiguration, SyncState, SyncJob
│ └── tasks.py # Sync tasks
├── dashboard/ # Staff dashboard
│ ├── views.py # Dashboard views
│ └── urls.py # Dashboard URLs
├── templates/ # Django templates
│ ├── base.html # Base template
│ └── dashboard/ # Dashboard templates
├── sportstime/ # Django project config
│ ├── settings.py # Django settings
│ ├── urls.py # URL routing
│ └── celery.py # Celery configuration
├── docker-compose.yml # Container orchestration
├── Dockerfile # Container image
├── requirements.txt # Python dependencies
├── CLAUDE.md # Claude Code context
└── README.md # This file
Data Models
Model Hierarchy
Sport
├── Conference
│ └── Division
│ └── Team (has TeamAliases)
├── Stadium (has StadiumAliases)
└── Game (references Team, Stadium)
Key Models
| Model | Description |
|---|---|
| Sport | Sports with season configuration |
| Team | Teams with division, colors, logos |
| Stadium | Venues with location, capacity |
| Game | Games with scores, status, teams |
| TeamAlias | Historical team names with validity dates |
| StadiumAlias | Historical stadium names with validity dates |
| ScraperConfig | Scraper settings per sport/season |
| ScrapeJob | Scrape execution logs |
| ManualReviewItem | Items needing human review |
| CloudKitSyncState | Per-record sync status |
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
DEBUG |
Debug mode | False |
SECRET_KEY |
Django secret key | (required in prod) |
DATABASE_URL |
PostgreSQL connection | postgresql://... |
REDIS_URL |
Redis connection | redis://localhost:6379/0 |
CLOUDKIT_CONTAINER |
CloudKit container ID | - |
CLOUDKIT_KEY_ID |
CloudKit key ID | - |
CLOUDKIT_PRIVATE_KEY_PATH |
Path to CloudKit private key | - |
Scraper Settings
| Setting | Description | Default |
|---|---|---|
SCRAPER_REQUEST_DELAY |
Delay between requests (seconds) | 3.0 |
SCRAPER_MAX_RETRIES |
Max retry attempts | 3 |
SCRAPER_FUZZY_THRESHOLD |
Fuzzy match confidence threshold | 85 |
Supported Sports
| Code | League | Season Type | Games/Season | Data Sources |
|---|---|---|---|---|
| nba | NBA | Oct-Jun (split) | ~1,230 | ESPN, NBA.com |
| mlb | MLB | Mar-Nov (calendar) | ~2,430 | ESPN, MLB.com |
| nfl | NFL | Sep-Feb (split) | ~272 | ESPN, NFL.com |
| nhl | NHL | Oct-Jun (split) | ~1,312 | ESPN, NHL.com |
| mls | MLS | Feb-Nov (calendar) | ~544 | ESPN |
| wnba | WNBA | May-Oct (calendar) | ~228 | ESPN |
| nwsl | NWSL | Mar-Nov (calendar) | ~182 | ESPN |
Development
Useful Commands
# Start containers
docker-compose up -d
# Stop containers
docker-compose down
# Restart containers
docker-compose restart
# Rebuild after requirements change
docker-compose down && docker-compose up -d --build
# View logs
docker-compose logs -f web
docker-compose logs -f celery-worker
# Django shell
docker-compose exec web python manage.py shell
# Database shell
docker-compose exec db psql -U sportstime -d sportstime
# Run migrations
docker-compose exec web python manage.py migrate
# Create superuser
docker-compose exec web python manage.py createsuperuser
Running Tests
docker-compose exec web pytest
Adding a New Sport
- Create scraper in
sportstime_parser/scrapers/{sport}.py - Add team mappings in
sportstime_parser/normalizers/team_resolver.py - Add stadium mappings in
sportstime_parser/normalizers/stadium_resolver.py - Register scraper in
scraper/engine/adapter.py - Add Sport record via Django admin
- Create ScraperConfig for the sport/season
sportstime_parser Library
The sportstime_parser package is a standalone library that handles:
- Scraping from multiple sources (ESPN, league APIs)
- Normalizing team/stadium names to canonical IDs
- Resolving names using exact match, aliases, and fuzzy matching
Resolution Strategy
- Exact match against canonical mappings
- Alias lookup with date-aware validity
- Fuzzy match with 85% confidence threshold
- Manual review if unresolved
Canonical ID Format
team_nba_lal # Team: Los Angeles Lakers
stadium_nba_los_angeles_lakers # Stadium: Crypto.com Arena
game_nba_2025_20251022_bos_lal # Game: BOS @ LAL on Oct 22, 2025
Troubleshooting
Scraper fails with rate limiting
The system handles 429 errors automatically. If persistent, increase SCRAPER_REQUEST_DELAY.
Unknown team/stadium names
- Check ManualReviewItem in admin
- Add alias via Team Aliases or Stadium Aliases
- Re-run scraper
CloudKit sync errors
- Verify credentials in CloudKitConfiguration
- Check CloudKitSyncState for failed records
- Use "Retry failed syncs" action in admin
Docker volume issues
If template changes don't appear:
docker-compose down && docker-compose up -d --build
License
Private - All rights reserved.