feat(scripts): add sportstime-parser data pipeline

Complete Python package for scraping, normalizing, and uploading
sports schedule data to CloudKit. Includes:

- Multi-source scrapers for NBA, MLB, NFL, NHL, MLS, WNBA, NWSL
- Canonical ID system for teams, stadiums, and games
- Fuzzy matching with manual alias support
- CloudKit uploader with batch operations and deduplication
- Comprehensive test suite with fixtures
- WNBA abbreviation aliases for improved team resolution
- Alias validation script to detect orphan references

All 5 phases of data remediation plan completed:
- Phase 1: Alias fixes (team/stadium alias additions)
- Phase 2: NHL stadium coordinate fixes
- Phase 3: Re-scrape validation
- Phase 4: iOS bundle update
- Phase 5: Code quality improvements (WNBA aliases)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-01-20 18:56:25 -06:00
parent ac78042a7e
commit 52d445bca4
76 changed files with 25065 additions and 0 deletions

View File

@@ -0,0 +1,58 @@
"""Utility modules for sportstime-parser."""
from .logging import (
get_console,
get_logger,
is_verbose,
log_error,
log_failure,
log_game,
log_stadium,
log_success,
log_team,
log_warning,
set_verbose,
)
from .http import (
RateLimitedSession,
get_session,
fetch_url,
fetch_json,
fetch_html,
)
from .progress import (
create_progress,
create_spinner_progress,
progress_bar,
track_progress,
ProgressTracker,
ScrapeProgress,
)
__all__ = [
# Logging
"get_console",
"get_logger",
"is_verbose",
"log_error",
"log_failure",
"log_game",
"log_stadium",
"log_success",
"log_team",
"log_warning",
"set_verbose",
# HTTP
"RateLimitedSession",
"get_session",
"fetch_url",
"fetch_json",
"fetch_html",
# Progress
"create_progress",
"create_spinner_progress",
"progress_bar",
"track_progress",
"ProgressTracker",
"ScrapeProgress",
]