- Add ET timezone (America/New_York) to all Sports-Reference scrapers:
- NBA: Basketball-Reference times parsed as ET
- NFL: Pro-Football-Reference times parsed as ET
- NHL: Hockey-Reference times parsed as ET
- MLB: Baseball-Reference times parsed as ET
- Document source timezones in scraper docstrings
- Add 11 new stadiums to STADIUM_MAPPINGS:
- NFL: 5 international venues (Corinthians Arena, Croke Park,
Olympic Stadium Berlin, Santiago Bernabéu, Tom Benson Hall of Fame)
- MLS: 4 alternate venues (Miami Freedom Park, Citi Field,
LA Memorial Coliseum, M&T Bank Stadium)
- NWSL: 2 alternate venues (Northwestern Medicine Field, ONE Spokane)
- Add 15 stadium aliases for MLS/NWSL team-based lookups
- Fix CanonicalSyncService to sync timezone identifier to SwiftData
- Update debug logging to use stadium timezone for display
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added explicit timezones for ~100 stadiums that were defaulting to America/New_York
- Coverage by timezone:
- America/Chicago: TX, IL, MO, MN, WI, OK, TN, LA stadiums
- America/Los_Angeles: CA, WA, OR, NV stadiums
- America/Denver: CO, UT stadiums
- America/Phoenix: AZ stadiums (no DST)
- America/Toronto: ON, QC stadiums
- America/Vancouver: BC stadiums
- America/Edmonton: AB stadiums
- America/Winnipeg: MB stadiums
- All 7 sports leagues now have correct timezone data for scrapers
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Stadium timezones were always null because scrapers weren't passing
the timezone from STADIUM_MAPPINGS to the Stadium constructor. This
fix propagates timezone data through the entire pipeline: scrapers,
CloudKit uploader, and Swift CloudKit model.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Scripts changes:
- Add WNBA abbreviation aliases to team_resolver.py
- Fix NHL stadium coordinates in stadium_resolver.py
- Add validate_aliases.py script for orphan detection
- Update scrapers with improved error handling
- Add DATA_AUDIT.md and REMEDIATION_PLAN.md documentation
- Update alias JSON files with new mappings
iOS bundle updates:
- Update games_canonical.json with latest scraped data
- Update teams_canonical.json and stadiums_canonical.json
- Sync alias files with Scripts versions
All 5 remediation phases complete.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Documents the complete sportstime_parser package including architecture,
multi-source scraping, name normalization with aliases, CloudKit uploads,
and workflows for manual review and adding new sports.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
NBA scraper now resolves old stadium name to current Spurs arena.
The AT&T Center was renamed to Frost Bank Center in July 2024.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds timeZoneIdentifier to Stadium model and localGameTime/localGameTimeShort
computed properties to RichGame. Game times can now display in venue local
timezone. Also adds timezone field to Python StadiumInfo dataclass with
example entries for Pacific, Central, Mountain, and Toronto timezones.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace monolithic scraping scripts with sportstime_parser package:
- Multi-source scrapers with automatic fallback for 7 sports
- Canonical ID generation for games, teams, and stadiums
- Fuzzy matching with configurable thresholds for name resolution
- CloudKit Web Services uploader with JWT auth, diff-based updates
- Resumable uploads with checkpoint state persistence
- Validation reports with manual review items and suggested matches
- Comprehensive test suite (249 tests)
CLI: sportstime-parser scrape|validate|upload|status|retry|clear
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CBB (College Basketball) was deferred in Phase 2.1 due to 350+ D1 teams
requiring a separate scoped approach. Remove it from pipeline scripts.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
After Phase 1 refactoring moved scraper functions to sport-specific
modules (nba.py, mlb.py, etc.), these pipeline scripts still imported
from scrape_schedules.py.
- run_pipeline.py: import from core.py and sport modules
- validate_data.py: import from core.py and sport modules
- run_canonicalization_pipeline.py: import from core.py and sport modules
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Overview and quick start commands
- ASCII architecture diagram showing data flow
- Module reference table for all Python scripts
- Sport modules table with stadium counts
- Data files and alias file documentation
- Pipeline commands for scraping, canonicalization, CloudKit
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add --list-orphans flag with orphan detection by record type,
data completeness metrics (coordinates, capacity, team/stadium refs),
health score calculation (0-100), and actionable recommendations.
Includes JSON export and menu option 17.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add --validate flag with local validation, CloudKit relationship
checking, and sync status comparison. Includes JSON export via
--output flag and menu option 16 for interactive mode.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add commands for managing individual CloudKit records:
- --get TYPE ID: Retrieve and display single record
- --list TYPE [--count]: List all recordNames for a type
- --update-record TYPE ID FIELD=VALUE: Update fields with conflict handling
- --delete-record TYPE ID [--force]: Delete with confirmation
Features:
- Type validation against VALID_RECORD_TYPES
- Triple lookup fallback: direct -> deterministic UUID -> canonicalId query
- Automatic type parsing for numeric field values
- Conflict detection with automatic forceReplace retry
- Deletion confirmation (skip with --force)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add --verify flag for quick verification (counts + 5-record spot-check)
- Add --verify-deep flag for full field-by-field comparison
- Add verify_sync() function to compare CloudKit vs local data
- Add lookup() method to CloudKit class for record lookups
- Add menu options 14-15 for verify sync quick/deep
- sync_diff() for differential uploads
- update operation with recordChangeTag conflict handling
- --smart-sync and --delete-orphans flags
- Menu options 12-13 for smart sync
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- query_all() method with pagination
- compute_diff() returns new/updated/unchanged/deleted
- --diff flag shows report without importing
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Generate canonical games with team/stadium links for 5760 games across
NBA, MLB, NHL, NFL, and MLS.
Added missing team aliases:
- NFL WSH -> team_nfl_was (Washington Commanders)
- MLS NY -> team_mls_nyrb (NY Red Bulls)
- MLS ATX -> team_mls_aus (Austin FC)
Remaining 8 warnings are expected NFL playoff placeholders (TBD/AFC/NFC).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Import WNBA_TEAMS from wnba module
- Add WNBA_DIVISIONS dict (single league structure, no divisions)
- Add WNBA to sport_mappings for team canonicalization
- Update arena_key to use 'arena' for WNBA (like NBA/NHL)
- Add WNBA team abbreviation aliases (LV, LAS, NYL, PHX, etc.)
- Add WNBA stadium aliases (Michelob Ultra Arena, Gateway Center, etc.)
Total teams: 167 (13 WNBA teams added)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add NFL entries to HISTORICAL_STADIUM_ALIASES dict:
- Caesars Superdome (Mercedes-Benz, Louisiana Superdome)
- Paycor Stadium (Paul Brown Stadium)
- Empower Field at Mile High (Broncos Stadium, Sports Authority, Invesco, Mile High)
- Acrisure Stadium (Heinz Field)
- EverBank Stadium (TIAA Bank, Alltel, Jacksonville Municipal)
- Northwest Stadium (FedExField, Jack Kent Cooke)
- Hard Rock Stadium (Sun Life, Land Shark, Dolphin, Pro Player, Joe Robbie)
- Highmark Stadium (Bills Stadium, New Era, Ralph Wilson, Rich Stadium)
- GEHA Field at Arrowhead Stadium (Arrowhead Stadium)
- AT&T Stadium (Cowboys Stadium)
- Lumen Field (CenturyLink, Qwest, Seahawks Stadium)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add NFL entries to TEAM_ABBREV_ALIASES dict:
- Historical relocations: OAK→LV, SD→LAC, STL→LAR
- Common 3-letter variations: JAC, GNB, KAN, NWE, NOR, TAM, SFO
- Direct match for WAS included for completeness
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add NFL support to canonicalize_teams.py:
- Import NFL_TEAMS from scrape_schedules
- Add NFL_DIVISIONS dict with all 32 teams mapped to conference/division
- Include NFL in sport_mappings for canonicalization
- Add NFL_DIVISIONS to division_map lookup
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update scrape_schedules.py to import NWSL stadium functionality from nwsl.py:
- Add import for NWSL_TEAMS, get_nwsl_team_abbrev, scrape_nwsl_stadiums
- Remove inline NWSL_TEAMS dict (now imported from nwsl.py)
- Remove stub scrape_nwsl_stadiums function (now using module implementation)
- Update docstrings and comments to reflect module structure
Stadium scraping now uses modules for all secondary sports:
- MLS: 30 stadiums from mls.py
- WNBA: 13 arenas from wnba.py
- NWSL: 13 stadiums from nwsl.py
Only CBB remains inline (350+ D1 teams requires separate scoped phase).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Create nwsl.py following the established sport module pattern:
- 13 NWSL teams matching current 2025 season roster
- All 13 stadiums with complete data (capacity, year_opened, coordinates)
- Cross-referenced MLS coordinates for shared stadiums (10 shared with MLS)
- 3 NWSL-specific stadiums: SeatGeek Stadium, CPKC Stadium, WakeMed Soccer Park
Module exports:
- NWSL_TEAMS dict
- get_nwsl_team_abbrev() function
- scrape_nwsl_stadiums_hardcoded() function
- scrape_nwsl_stadiums() function with fallback system
- NWSL_STADIUM_SOURCES configuration
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add complete MLS stadium data following established sport module pattern:
- 30 MLS stadiums with capacity (soccer configuration) and year_opened
- MLS_TEAMS dict with all 30 teams
- get_mls_team_abbrev() function for team abbreviation lookup
- scrape_mls_stadiums_hardcoded() as primary source
- scrape_mls_stadiums_gavinr() as fallback source
- MLS_STADIUM_SOURCES configuration for fallback system
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Ran scrape_schedules.py --stadiums-update
- Ran canonicalize_stadiums.py for canonical IDs
- Core sports: MLB:30, NBA:30, NHL:32, NFL:30 (122 total)
- MLS stadiums also included from comprehensive scrape (30)
- Stadium aliases generated for historical name mappings
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added year_opened field to stadium data across all 4 sport modules:
- MLB: 30 ballparks (1912-2023)
- NBA: 30 arenas (1968-2024)
- NHL: 32 arenas (1968-2021)
- NFL: 30 stadiums (1924-2020)
Updated Stadium object creation in all modules to pass year_opened.
Stadium dataclass already supported the field.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extract NFL scrapers from monolithic scrape_schedules.py into dedicated
sport module following established pattern from nba.py/nhl.py:
- NFL_TEAMS: 32 teams with stadiums
- Game scrapers: ESPN API, Pro-Football-Reference, CBS Sports
- Stadium scrapers: ScoreBot, GeoJSON gist, hardcoded fallback
- NFL_GAME_SOURCES and NFL_STADIUM_SOURCES configurations
- get_nfl_season_string() for cross-calendar-year format (2025-26)
- scrape_nfl_games() convenience function with fallback
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- MLB_TEAMS dictionary with all 30 teams
- Game scrapers: Baseball-Reference, MLB Stats API, ESPN
- Stadium scrapers: MLBScoreBot, GeoJSON, hardcoded fallback
- MLB_GAME_SOURCES and MLB_STADIUM_SOURCES configurations
- scrape_mlb_games() convenience function
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add RegionMapSelector UI for geographic trip filtering (East/Central/West)
- Add RouteFilters module for allowRepeatCities preference
- Improve GameDAGRouter to preserve route length diversity
- Routes now grouped by city count before scoring
- Ensures 2-city trips appear alongside longer trips
- Increased beam width and max options for better coverage
- Add TripOptionsView filters (max cities slider, pace filter)
- Remove TravelStyle section from trip creation (replaced by region selector)
- Clean up debug logging from DataProvider and ScenarioAPlanner
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>