feat(scripts): add sportstime-parser data pipeline

Complete Python package for scraping, normalizing, and uploading
sports schedule data to CloudKit. Includes:

- Multi-source scrapers for NBA, MLB, NFL, NHL, MLS, WNBA, NWSL
- Canonical ID system for teams, stadiums, and games
- Fuzzy matching with manual alias support
- CloudKit uploader with batch operations and deduplication
- Comprehensive test suite with fixtures
- WNBA abbreviation aliases for improved team resolution
- Alias validation script to detect orphan references

All 5 phases of data remediation plan completed:
- Phase 1: Alias fixes (team/stadium alias additions)
- Phase 2: NHL stadium coordinate fixes
- Phase 3: Re-scrape validation
- Phase 4: iOS bundle update
- Phase 5: Code quality improvements (WNBA aliases)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-01-20 18:56:25 -06:00
parent ac78042a7e
commit 52d445bca4
76 changed files with 25065 additions and 0 deletions

View File

@@ -0,0 +1,52 @@
"""CloudKit uploaders for sportstime-parser."""
from .cloudkit import (
CloudKitClient,
CloudKitRecord,
CloudKitError,
CloudKitAuthError,
CloudKitRateLimitError,
CloudKitServerError,
RecordType,
OperationResult,
BatchResult,
)
from .state import (
RecordState,
UploadSession,
StateManager,
)
from .diff import (
DiffAction,
RecordDiff,
DiffResult,
RecordDiffer,
game_to_cloudkit_record,
team_to_cloudkit_record,
stadium_to_cloudkit_record,
)
__all__ = [
# CloudKit client
"CloudKitClient",
"CloudKitRecord",
"CloudKitError",
"CloudKitAuthError",
"CloudKitRateLimitError",
"CloudKitServerError",
"RecordType",
"OperationResult",
"BatchResult",
# State manager
"RecordState",
"UploadSession",
"StateManager",
# Differ
"DiffAction",
"RecordDiff",
"DiffResult",
"RecordDiffer",
"game_to_cloudkit_record",
"team_to_cloudkit_record",
"stadium_to_cloudkit_record",
]