feat(scripts): rewrite parser as modular Python CLI

Replace monolithic scraping scripts with sportstime_parser package:

- Multi-source scrapers with automatic fallback for 7 sports
- Canonical ID generation for games, teams, and stadiums
- Fuzzy matching with configurable thresholds for name resolution
- CloudKit Web Services uploader with JWT auth, diff-based updates
- Resumable uploads with checkpoint state persistence
- Validation reports with manual review items and suggested matches
- Comprehensive test suite (249 tests)

CLI: sportstime-parser scrape|validate|upload|status|retry|clear

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-01-10 21:06:12 -06:00
parent 284a10d9e1
commit eeaf900e5a
109 changed files with 18415 additions and 266211 deletions

View File

@@ -1,8 +1,15 @@
# Sports Schedule Scraper Dependencies
requests>=2.28.0
beautifulsoup4>=4.11.0
pandas>=2.0.0
lxml>=4.9.0
# Core dependencies
requests>=2.31.0
beautifulsoup4>=4.12.0
lxml>=5.0.0
rapidfuzz>=3.5.0
python-dateutil>=2.8.0
pytz>=2024.1
rich>=13.7.0
pyjwt>=2.8.0
cryptography>=42.0.0
# CloudKit Import (optional - only needed for cloudkit_import.py)
cryptography>=41.0.0
# Development dependencies
pytest>=8.0.0
pytest-cov>=4.1.0
responses>=0.25.0