feat(scripts): add sportstime-parser data pipeline

Complete Python package for scraping, normalizing, and uploading
sports schedule data to CloudKit. Includes:

- Multi-source scrapers for NBA, MLB, NFL, NHL, MLS, WNBA, NWSL
- Canonical ID system for teams, stadiums, and games
- Fuzzy matching with manual alias support
- CloudKit uploader with batch operations and deduplication
- Comprehensive test suite with fixtures
- WNBA abbreviation aliases for improved team resolution
- Alias validation script to detect orphan references

All 5 phases of data remediation plan completed:
- Phase 1: Alias fixes (team/stadium alias additions)
- Phase 2: NHL stadium coordinate fixes
- Phase 3: Re-scrape validation
- Phase 4: iOS bundle update
- Phase 5: Code quality improvements (WNBA aliases)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-01-20 18:56:25 -06:00
parent ac78042a7e
commit 52d445bca4
76 changed files with 25065 additions and 0 deletions

15
requirements.txt Normal file
View File

@@ -0,0 +1,15 @@
# Core dependencies
requests>=2.31.0
beautifulsoup4>=4.12.0
lxml>=5.0.0
rapidfuzz>=3.5.0
python-dateutil>=2.8.0
pytz>=2024.1
rich>=13.7.0
pyjwt>=2.8.0
cryptography>=42.0.0
# Development dependencies
pytest>=8.0.0
pytest-cov>=4.1.0
responses>=0.25.0