feat(scripts): add sportstime-parser data pipeline

Complete Python package for scraping, normalizing, and uploading
sports schedule data to CloudKit. Includes:

- Multi-source scrapers for NBA, MLB, NFL, NHL, MLS, WNBA, NWSL
- Canonical ID system for teams, stadiums, and games
- Fuzzy matching with manual alias support
- CloudKit uploader with batch operations and deduplication
- Comprehensive test suite with fixtures
- WNBA abbreviation aliases for improved team resolution
- Alias validation script to detect orphan references

All 5 phases of data remediation plan completed:
- Phase 1: Alias fixes (team/stadium alias additions)
- Phase 2: NHL stadium coordinate fixes
- Phase 3: Re-scrape validation
- Phase 4: iOS bundle update
- Phase 5: Code quality improvements (WNBA aliases)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-01-20 18:56:25 -06:00
parent ac78042a7e
commit 52d445bca4
76 changed files with 25065 additions and 0 deletions

View File

@@ -0,0 +1,32 @@
"""Validators for scraped data."""
from .report import (
ValidationReport,
ValidationSummary,
generate_report,
detect_duplicate_games,
validate_games,
)
from .schema import (
SchemaValidationError,
validate_canonical_stadium,
validate_canonical_team,
validate_canonical_game,
validate_and_raise,
validate_batch,
)
__all__ = [
"ValidationReport",
"ValidationSummary",
"generate_report",
"detect_duplicate_games",
"validate_games",
"SchemaValidationError",
"validate_canonical_stadium",
"validate_canonical_team",
"validate_canonical_game",
"validate_and_raise",
"validate_batch",
]