- Added MILESTONES.md entry with key accomplishments
- Evolved PROJECT.md with validated requirements
- Reorganized ROADMAP.md with milestone grouping
- Created milestone archive: milestones/v1.0-ROADMAP.md
- Updated STATE.md for next milestone planning
- Tagged v1.0
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CBB (College Basketball) was deferred in Phase 2.1 due to 350+ D1 teams
requiring a separate scoped approach. Remove it from pipeline scripts.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
After Phase 1 refactoring moved scraper functions to sport-specific
modules (nba.py, mlb.py, etc.), these pipeline scripts still imported
from scrape_schedules.py.
- run_pipeline.py: import from core.py and sport modules
- validate_data.py: import from core.py and sport modules
- run_canonicalization_pipeline.py: import from core.py and sport modules
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tasks completed: 2/2
- Create Scripts/README.md with pipeline documentation
- Update PROJECT.md with completion status
SUMMARY: .planning/phases/07-testing-documentation/07-01-SUMMARY.md
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Mark all Active requirements as complete (7 items)
- Update Key Decisions outcomes (split by sport, validation reports, full CRUD)
- Update Current State to reflect resolved data quality and complete pipeline
- Update last updated date to 2026-01-10
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Overview and quick start commands
- ASCII architecture diagram showing data flow
- Module reference table for all Python scripts
- Sport modules table with stadium counts
- Data files and alias file documentation
- Pipeline commands for scraping, canonicalization, CloudKit
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Phase 7: Testing & Documentation
- 1 plan created
- 2 tasks defined (README.md, PROJECT.md updates)
- Ready for execution
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add SUMMARY.md documenting validation capabilities:
- --validate flag with local/CloudKit/sync validation
- --list-orphans flag with completeness metrics and health score
- Menu options 16-17 for interactive mode
Update STATE.md: Phase 6 complete (14/14 plans, 100%)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add --list-orphans flag with orphan detection by record type,
data completeness metrics (coordinates, capacity, team/stadium refs),
health score calculation (0-100), and actionable recommendations.
Includes JSON export and menu option 17.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add --validate flag with local validation, CloudKit relationship
checking, and sync status comparison. Includes JSON export via
--output flag and menu option 16 for interactive mode.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add commands for managing individual CloudKit records:
- --get TYPE ID: Retrieve and display single record
- --list TYPE [--count]: List all recordNames for a type
- --update-record TYPE ID FIELD=VALUE: Update fields with conflict handling
- --delete-record TYPE ID [--force]: Delete with confirmation
Features:
- Type validation against VALID_RECORD_TYPES
- Triple lookup fallback: direct -> deterministic UUID -> canonicalId query
- Automatic type parsing for numeric field values
- Conflict detection with automatic forceReplace retry
- Deletion confirmation (skip with --force)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add --verify flag for quick verification (counts + 5-record spot-check)
- Add --verify-deep flag for full field-by-field comparison
- Add verify_sync() function to compare CloudKit vs local data
- Add lookup() method to CloudKit class for record lookups
- Add menu options 14-15 for verify sync quick/deep
- sync_diff() for differential uploads
- update operation with recordChangeTag conflict handling
- --smart-sync and --delete-orphans flags
- Menu options 12-13 for smart sync
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- query_all() method with pagination
- compute_diff() returns new/updated/unchanged/deleted
- --diff flag shows report without importing
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Phase 5: CloudKit CRUD
- 2 plans created
- 4 total tasks defined
- Ready for execution
Plan 05-01: Smart sync with change detection
- Change detection with diff reporting
- Differential sync (upload only changed records)
Plan 05-02: Verification and record management
- Sync verification (CloudKit vs local comparison)
- Individual record CRUD operations
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Generate canonical games with team/stadium links for 5760 games across
NBA, MLB, NHL, NFL, and MLS.
Added missing team aliases:
- NFL WSH -> team_nfl_was (Washington Commanders)
- MLS NY -> team_mls_nyrb (NY Red Bulls)
- MLS ATX -> team_mls_aus (Austin FC)
Remaining 8 warnings are expected NFL playoff placeholders (TBD/AFC/NFC).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Import WNBA_TEAMS from wnba module
- Add WNBA_DIVISIONS dict (single league structure, no divisions)
- Add WNBA to sport_mappings for team canonicalization
- Update arena_key to use 'arena' for WNBA (like NBA/NHL)
- Add WNBA team abbreviation aliases (LV, LAS, NYL, PHX, etc.)
- Add WNBA stadium aliases (Michelob Ultra Arena, Gateway Center, etc.)
Total teams: 167 (13 WNBA teams added)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add NFL entries to HISTORICAL_STADIUM_ALIASES dict:
- Caesars Superdome (Mercedes-Benz, Louisiana Superdome)
- Paycor Stadium (Paul Brown Stadium)
- Empower Field at Mile High (Broncos Stadium, Sports Authority, Invesco, Mile High)
- Acrisure Stadium (Heinz Field)
- EverBank Stadium (TIAA Bank, Alltel, Jacksonville Municipal)
- Northwest Stadium (FedExField, Jack Kent Cooke)
- Hard Rock Stadium (Sun Life, Land Shark, Dolphin, Pro Player, Joe Robbie)
- Highmark Stadium (Bills Stadium, New Era, Ralph Wilson, Rich Stadium)
- GEHA Field at Arrowhead Stadium (Arrowhead Stadium)
- AT&T Stadium (Cowboys Stadium)
- Lumen Field (CenturyLink, Qwest, Seahawks Stadium)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add NFL entries to TEAM_ABBREV_ALIASES dict:
- Historical relocations: OAK→LV, SD→LAC, STL→LAR
- Common 3-letter variations: JAC, GNB, KAN, NWE, NOR, TAM, SFO
- Direct match for WAS included for completeness
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add NFL support to canonicalize_teams.py:
- Import NFL_TEAMS from scrape_schedules
- Add NFL_DIVISIONS dict with all 32 teams mapped to conference/division
- Include NFL in sport_mappings for canonicalization
- Add NFL_DIVISIONS to division_map lookup
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Current focus: Phase 3 - Alias Systems
- Phase planned, ready for execution
- Next action: Execute 03-01-PLAN.md
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Phase 03: Alias Systems
- 2 plans created
- 6 total tasks defined
- Ready for execution
Plan 1: Add NFL to canonicalization pipeline with aliases
Plan 2: Add MLS, WNBA, NWSL to canonicalization pipeline
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update scrape_schedules.py to import NWSL stadium functionality from nwsl.py:
- Add import for NWSL_TEAMS, get_nwsl_team_abbrev, scrape_nwsl_stadiums
- Remove inline NWSL_TEAMS dict (now imported from nwsl.py)
- Remove stub scrape_nwsl_stadiums function (now using module implementation)
- Update docstrings and comments to reflect module structure
Stadium scraping now uses modules for all secondary sports:
- MLS: 30 stadiums from mls.py
- WNBA: 13 arenas from wnba.py
- NWSL: 13 stadiums from nwsl.py
Only CBB remains inline (350+ D1 teams requires separate scoped phase).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Create nwsl.py following the established sport module pattern:
- 13 NWSL teams matching current 2025 season roster
- All 13 stadiums with complete data (capacity, year_opened, coordinates)
- Cross-referenced MLS coordinates for shared stadiums (10 shared with MLS)
- 3 NWSL-specific stadiums: SeatGeek Stadium, CPKC Stadium, WakeMed Soccer Park
Module exports:
- NWSL_TEAMS dict
- get_nwsl_team_abbrev() function
- scrape_nwsl_stadiums_hardcoded() function
- scrape_nwsl_stadiums() function with fallback system
- NWSL_STADIUM_SOURCES configuration
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tasks completed: 2/2
- Create MLS sport module with 30 hardcoded stadiums
- Integrate MLS module with scrape_schedules.py
SUMMARY: .planning/phases/2.1-add-stadium-data-mls-wnba-nwsl-cbb/02.1-01-SUMMARY.md
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add complete MLS stadium data following established sport module pattern:
- 30 MLS stadiums with capacity (soccer configuration) and year_opened
- MLS_TEAMS dict with all 30 teams
- get_mls_team_abbrev() function for team abbreviation lookup
- scrape_mls_stadiums_hardcoded() as primary source
- scrape_mls_stadiums_gavinr() as fallback source
- MLS_STADIUM_SOURCES configuration for fallback system
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add 02-02-SUMMARY.md documenting pipeline regeneration
- Update STATE.md: Phase 2 complete, next is Phase 2.1
- Update ROADMAP.md: Mark Phase 2 as complete (2/2 plans)
- Performance: 5 plans, 37 min total, 7.4 min average
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Filter bundled JSON to core 4 sports only (152 → 122 stadiums)
- Exclude MLS stadiums (incomplete data, deferred to Phase 2.1)
- Filter aliases to match (200 → 165 aliases)
- All fields populated: no empty state, zero capacity, or null year
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Ran scrape_schedules.py --stadiums-update
- Ran canonicalize_stadiums.py for canonical IDs
- Core sports: MLB:30, NBA:30, NHL:32, NFL:30 (122 total)
- MLS stadiums also included from comprehensive scrape (30)
- Stadium aliases generated for historical name mappings
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added year_opened field to stadium data across all 4 sport modules:
- MLB: 30 ballparks (1912-2023)
- NBA: 30 arenas (1968-2024)
- NHL: 32 arenas (1968-2021)
- NFL: 30 stadiums (1924-2020)
Updated Stadium object creation in all modules to pass year_opened.
Stadium dataclass already supported the field.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Phase 2: Stadium Foundation
- 2 plans created
- 5 total tasks defined
- Ready for execution
Plan 02-01: Audit & complete hardcoded stadium data
Plan 02-02: Regenerate canonical data and verify pipeline
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>