Commit Graph

3 Commits

Author SHA1 Message Date
Trey t
11c0ae70d2 docs(scripts): add comprehensive README for data scraping pipeline
Documents the complete sportstime_parser package including architecture,
multi-source scraping, name normalization with aliases, CloudKit uploads,
and workflows for manual review and adding new sports.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 13:22:33 -06:00
Trey t
eeaf900e5a feat(scripts): rewrite parser as modular Python CLI
Replace monolithic scraping scripts with sportstime_parser package:

- Multi-source scrapers with automatic fallback for 7 sports
- Canonical ID generation for games, teams, and stadiums
- Fuzzy matching with configurable thresholds for name resolution
- CloudKit Web Services uploader with JWT auth, diff-based updates
- Resumable uploads with checkpoint state persistence
- Validation reports with manual review items and suggested matches
- Comprehensive test suite (249 tests)

CLI: sportstime-parser scrape|validate|upload|status|retry|clear

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 21:06:12 -06:00
Trey t
d9f446bccb docs(07-01): create Scripts/README.md with pipeline documentation
- Overview and quick start commands
- ASCII architecture diagram showing data flow
- Module reference table for all Python scripts
- Sport modules table with stadium counts
- Data files and alias file documentation
- Pipeline commands for scraping, canonicalization, CloudKit

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 10:42:47 -06:00