Trey t 63acf7accb feat: add Django web app, CloudKit sync, dashboard, and game_datetime_utc export
Adds the full Django application layer on top of sportstime_parser:
- core: Sport, Team, Stadium, Game models with aliases and league structure
- scraper: orchestration engine, adapter, job management, Celery tasks
- cloudkit: CloudKit sync client, sync state tracking, sync jobs
- dashboard: staff dashboard for monitoring scrapers, sync, review queue
- notifications: email reports for scrape/sync results
- Docker setup for deployment (Dockerfile, docker-compose, entrypoint)

Game exports now use game_datetime_utc (ISO 8601 UTC) instead of
venue-local date+time strings, matching the canonical format used
by the iOS app.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 14:04:27 -06:00

SportsTime Data Pipeline

A Django-based sports data pipeline that scrapes game schedules from official sources, normalizes data, and syncs to CloudKit for iOS app consumption.

Features

  • Multi-sport support: NBA, MLB, NFL, NHL, MLS, WNBA, NWSL
  • Automated scraping: Scheduled data collection from ESPN and league APIs
  • Smart name resolution: Team/stadium aliases with date validity support
  • CloudKit sync: Push data to iCloud for iOS app consumption
  • Admin dashboard: Monitor scrapers, review items, manage data
  • Import/Export: Bulk data management via JSON, CSV, XLSX
  • Audit history: Track all changes with django-simple-history

Quick Start

Prerequisites

  • Docker and Docker Compose
  • (Optional) CloudKit credentials for sync

Setup

  1. Clone the repository:

    git clone <repo-url>
    cd SportsTimeScripts
    
  2. Copy environment template:

    cp .env.example .env
    
  3. Start the containers:

    docker-compose up -d
    
  4. Run migrations:

    docker-compose exec web python manage.py migrate
    
  5. Create a superuser:

    docker-compose exec web python manage.py createsuperuser
    
  6. Access the admin at http://localhost:8000/admin/

  7. Access the dashboard at http://localhost:8000/dashboard/

Architecture

┌─────────────────┐     ┌──────────────┐     ┌─────────────┐     ┌──────────┐
│  Data Sources   │ ──▶ │   Scrapers   │ ──▶ │  PostgreSQL │ ──▶ │ CloudKit │
│ (ESPN, leagues) │     │ (sportstime_ │     │  (Django)   │     │  (iOS)   │
└─────────────────┘     │   parser)    │     └─────────────┘     └──────────┘
                        └──────────────┘

Components

Component Description
Django Web framework, ORM, admin interface
PostgreSQL Primary database
Redis Celery message broker
Celery Async task queue (scraping, syncing)
Celery Beat Scheduled task runner
sportstime_parser Standalone scraper library

Usage

Dashboard

Visit http://localhost:8000/dashboard/ (staff login required) to:

  • View scraper status and run scrapers
  • Monitor CloudKit sync status
  • Review items needing manual attention
  • See statistics across all sports

Running Scrapers

Via Dashboard:

  1. Go to Dashboard → Scraper Status
  2. Click "Run Now" for a specific sport or "Run All Enabled"

Via Command Line:

docker-compose exec web python manage.py shell
>>> from scraper.tasks import run_scraper_task
>>> from scraper.models import ScraperConfig
>>> config = ScraperConfig.objects.get(sport__code='nba', season=2025)
>>> run_scraper_task.delay(config.id)

Managing Aliases

When scrapers encounter unknown team or stadium names:

  1. A Review Item is created for manual resolution
  2. Add an alias via Admin → Team Aliases or Stadium Aliases
  3. Re-run the scraper to pick up the new mapping

Aliases support validity dates - useful for:

  • Historical team names (e.g., "Washington Redskins" valid until 2020)
  • Stadium naming rights changes (e.g., "Staples Center" valid until 2021)

Import/Export

All admin models support bulk import/export:

  1. Go to any admin list page (e.g., Teams)
  2. Click Export → Select format (JSON recommended) → Submit
  3. Modify the data as needed (e.g., ask Claude to update it)
  4. Click Import → Upload file → Preview → Confirm

Imports will update existing records and create new ones.

Project Structure

SportsTimeScripts/
├── core/                   # Core Django models
│   ├── models/            # Sport, Team, Stadium, Game, Aliases
│   ├── admin/             # Admin configuration with import/export
│   └── resources.py       # Import/export resource definitions
├── scraper/               # Scraper orchestration
│   ├── engine/            # Adapter, DB alias loaders
│   │   ├── adapter.py     # Bridges sportstime_parser to Django
│   │   └── db_alias_loader.py  # Database alias resolution
│   ├── models.py          # ScraperConfig, ScrapeJob, ManualReviewItem
│   └── tasks.py           # Celery tasks
├── sportstime_parser/     # Standalone scraper library
│   ├── scrapers/          # Per-sport scrapers (NBA, MLB, etc.)
│   ├── normalizers/       # Team/stadium name resolution
│   ├── models/            # Data classes
│   └── uploaders/         # CloudKit client (legacy)
├── cloudkit/              # CloudKit sync
│   ├── client.py          # CloudKit API client
│   ├── models.py          # CloudKitConfiguration, SyncState, SyncJob
│   └── tasks.py           # Sync tasks
├── dashboard/             # Staff dashboard
│   ├── views.py           # Dashboard views
│   └── urls.py            # Dashboard URLs
├── templates/             # Django templates
│   ├── base.html          # Base template
│   └── dashboard/         # Dashboard templates
├── sportstime/            # Django project config
│   ├── settings.py        # Django settings
│   ├── urls.py            # URL routing
│   └── celery.py          # Celery configuration
├── docker-compose.yml     # Container orchestration
├── Dockerfile             # Container image
├── requirements.txt       # Python dependencies
├── CLAUDE.md              # Claude Code context
└── README.md              # This file

Data Models

Model Hierarchy

Sport
├── Conference
│   └── Division
│       └── Team (has TeamAliases)
├── Stadium (has StadiumAliases)
└── Game (references Team, Stadium)

Key Models

Model Description
Sport Sports with season configuration
Team Teams with division, colors, logos
Stadium Venues with location, capacity
Game Games with scores, status, teams
TeamAlias Historical team names with validity dates
StadiumAlias Historical stadium names with validity dates
ScraperConfig Scraper settings per sport/season
ScrapeJob Scrape execution logs
ManualReviewItem Items needing human review
CloudKitSyncState Per-record sync status

Configuration

Environment Variables

Variable Description Default
DEBUG Debug mode False
SECRET_KEY Django secret key (required in prod)
DATABASE_URL PostgreSQL connection postgresql://...
REDIS_URL Redis connection redis://localhost:6379/0
CLOUDKIT_CONTAINER CloudKit container ID -
CLOUDKIT_KEY_ID CloudKit key ID -
CLOUDKIT_PRIVATE_KEY_PATH Path to CloudKit private key -

Scraper Settings

Setting Description Default
SCRAPER_REQUEST_DELAY Delay between requests (seconds) 3.0
SCRAPER_MAX_RETRIES Max retry attempts 3
SCRAPER_FUZZY_THRESHOLD Fuzzy match confidence threshold 85

Supported Sports

Code League Season Type Games/Season Data Sources
nba NBA Oct-Jun (split) ~1,230 ESPN, NBA.com
mlb MLB Mar-Nov (calendar) ~2,430 ESPN, MLB.com
nfl NFL Sep-Feb (split) ~272 ESPN, NFL.com
nhl NHL Oct-Jun (split) ~1,312 ESPN, NHL.com
mls MLS Feb-Nov (calendar) ~544 ESPN
wnba WNBA May-Oct (calendar) ~228 ESPN
nwsl NWSL Mar-Nov (calendar) ~182 ESPN

Development

Useful Commands

# Start containers
docker-compose up -d

# Stop containers
docker-compose down

# Restart containers
docker-compose restart

# Rebuild after requirements change
docker-compose down && docker-compose up -d --build

# View logs
docker-compose logs -f web
docker-compose logs -f celery-worker

# Django shell
docker-compose exec web python manage.py shell

# Database shell
docker-compose exec db psql -U sportstime -d sportstime

# Run migrations
docker-compose exec web python manage.py migrate

# Create superuser
docker-compose exec web python manage.py createsuperuser

Running Tests

docker-compose exec web pytest

Adding a New Sport

  1. Create scraper in sportstime_parser/scrapers/{sport}.py
  2. Add team mappings in sportstime_parser/normalizers/team_resolver.py
  3. Add stadium mappings in sportstime_parser/normalizers/stadium_resolver.py
  4. Register scraper in scraper/engine/adapter.py
  5. Add Sport record via Django admin
  6. Create ScraperConfig for the sport/season

sportstime_parser Library

The sportstime_parser package is a standalone library that handles:

  • Scraping from multiple sources (ESPN, league APIs)
  • Normalizing team/stadium names to canonical IDs
  • Resolving names using exact match, aliases, and fuzzy matching

Resolution Strategy

  1. Exact match against canonical mappings
  2. Alias lookup with date-aware validity
  3. Fuzzy match with 85% confidence threshold
  4. Manual review if unresolved

Canonical ID Format

team_nba_lal                    # Team: Los Angeles Lakers
stadium_nba_los_angeles_lakers  # Stadium: Crypto.com Arena
game_nba_2025_20251022_bos_lal  # Game: BOS @ LAL on Oct 22, 2025

Troubleshooting

Scraper fails with rate limiting

The system handles 429 errors automatically. If persistent, increase SCRAPER_REQUEST_DELAY.

Unknown team/stadium names

  1. Check ManualReviewItem in admin
  2. Add alias via Team Aliases or Stadium Aliases
  3. Re-run scraper

CloudKit sync errors

  1. Verify credentials in CloudKitConfiguration
  2. Check CloudKitSyncState for failed records
  3. Use "Retry failed syncs" action in admin

Docker volume issues

If template changes don't appear:

docker-compose down && docker-compose up -d --build

License

Private - All rights reserved.

Description
No description provided
Readme 462 KiB
Languages
Python 91.3%
HTML 8.5%
Shell 0.1%