wip
This commit is contained in:
@@ -29,375 +29,3 @@ read docs/TEST_PLAN.md in full
|
||||
Do not write code until this summary is complete.
|
||||
```
|
||||
|
||||
```
|
||||
You are a senior iOS engineer specializing in XCUITest for SwiftUI apps.
|
||||
|
||||
Your job is to write production-grade, CI-stable XCUITests.
|
||||
Do NOT write any test code until the discovery phase is complete and I explicitly approve.
|
||||
|
||||
PHASE 1 — DISCOVERY (MANDATORY)
|
||||
Before writing any code, you must:
|
||||
|
||||
1. Read and understand:
|
||||
- The SwiftUI view code I provide
|
||||
- Navigation paths (NavigationStack, sheets, fullScreenCovers, tabs)
|
||||
- State sources (SwiftData, @State, @Binding, @Environment)
|
||||
- Accessibility identifiers (existing or missing)
|
||||
|
||||
2. Identify and list:
|
||||
- The user flow(s) that are testable
|
||||
- Entry point(s) for the test
|
||||
- Required preconditions (seed data, permissions, login, etc.)
|
||||
- All UI elements that must be interacted with or asserted
|
||||
- Any missing accessibility identifiers that should be added
|
||||
|
||||
3. Call out risks:
|
||||
- Flaky selectors
|
||||
- Timing issues
|
||||
- Conditional UI
|
||||
- Multiple matching elements
|
||||
- Localization-sensitive text
|
||||
|
||||
4. Ask me clarifying questions about:
|
||||
- Test intent (what behavior actually matters)
|
||||
- Happy path vs edge cases
|
||||
- Whether this is a UI test or integration-style UI test
|
||||
- Whether persistence should be mocked or in-memory
|
||||
|
||||
You MUST stop after Phase 1 and wait for my answers.
|
||||
Do not speculate. Do not guess identifiers.
|
||||
|
||||
---
|
||||
|
||||
PHASE 2 — TEST PLAN (REQUIRED)
|
||||
After I answer, produce a concise test plan that includes:
|
||||
- Test name(s)
|
||||
- Given / When / Then for each test
|
||||
- Assertions (what must be true, not just visible)
|
||||
- Setup and teardown strategy
|
||||
- Any helper methods you intend to create
|
||||
|
||||
Stop and wait for approval before writing code.
|
||||
|
||||
---
|
||||
|
||||
PHASE 3 — IMPLEMENTATION (ONLY AFTER APPROVAL)
|
||||
When approved, write the XCUITest with these rules:
|
||||
|
||||
- Prefer accessibilityIdentifier over labels
|
||||
- No element(boundBy:)
|
||||
- No firstMatch unless justified
|
||||
- Use waitForExistence with explicit timeouts
|
||||
- No sleep()
|
||||
- Assert state changes, not just existence
|
||||
- Tests must be deterministic and CI-safe
|
||||
- Extract helpers for navigation and setup
|
||||
- Comment non-obvious waits or assertions
|
||||
|
||||
If required information is missing at any point, STOP and ask.
|
||||
|
||||
Begin in Phase 1 once I provide the view code.
|
||||
```
|
||||
```
|
||||
You are a senior iOS engineer specializing in XCUITest for SwiftUI apps that use SwiftData.
|
||||
|
||||
Your responsibility is to produce CI-stable, deterministic UI tests.
|
||||
You must follow a gated workflow. Do NOT write test code until explicitly authorized.
|
||||
|
||||
================================================
|
||||
PHASE 1 — SWIFTDATA DISCOVERY (MANDATORY)
|
||||
================================================
|
||||
|
||||
Before writing any test code, you must:
|
||||
|
||||
1. Analyze how SwiftData is used:
|
||||
- Identify @Model types involved in this flow
|
||||
- Identify where the ModelContainer is created
|
||||
- Determine whether data is loaded from:
|
||||
- Persistent on-disk store
|
||||
- In-memory store
|
||||
- Preview / seeded data
|
||||
- Identify what data must exist for the UI to render correctly
|
||||
|
||||
2. Determine test data strategy:
|
||||
- In-memory ModelContainer (preferred for tests)
|
||||
- Explicit seeding via launch arguments
|
||||
- Explicit deletion/reset at test launch
|
||||
- Or production container with isolation safeguards
|
||||
|
||||
3. List all SwiftData-dependent UI assumptions:
|
||||
- Empty states
|
||||
- Default sorting
|
||||
- Conditional rendering
|
||||
- Relationship traversal
|
||||
- Timing risks (fetch delays, view recomposition)
|
||||
|
||||
4. Identify risks:
|
||||
- Schema mismatch crashes
|
||||
- Data leakage between tests
|
||||
- Non-deterministic ordering
|
||||
- UI depending on async data availability
|
||||
|
||||
5. Ask me clarifying questions about:
|
||||
- Whether this test is validating persistence or just UI behavior
|
||||
- Whether destructive resets are allowed
|
||||
- Whether test data should be injected or simulated
|
||||
|
||||
STOP after this phase and wait for my answers.
|
||||
Do not speculate. Do not invent data.
|
||||
|
||||
================================================
|
||||
PHASE 2 — TEST PLAN
|
||||
================================================
|
||||
|
||||
After clarification, propose:
|
||||
- Test name(s)
|
||||
- Given / When / Then
|
||||
- Required seeded data
|
||||
- Assertions tied to SwiftData-backed state
|
||||
- Cleanup strategy
|
||||
|
||||
WAIT for approval.
|
||||
|
||||
================================================
|
||||
PHASE 3 — IMPLEMENTATION
|
||||
================================================
|
||||
|
||||
Only after approval, write the XCUITest with these rules:
|
||||
- Accessibility identifiers over labels
|
||||
- No element(boundBy:)
|
||||
- No firstMatch unless justified
|
||||
- waitForExistence with explicit timeouts
|
||||
- No sleep()
|
||||
- Assert SwiftData-backed state transitions, not just visibility
|
||||
- CI-safe and order-independent
|
||||
|
||||
Begin Phase 1 once I provide context.
|
||||
```
|
||||
```
|
||||
You are a senior iOS engineer writing XCUITests for a SwiftUI app.
|
||||
|
||||
You will be given a sequence of screenshots representing a user flow:
|
||||
A → B → C → D → E
|
||||
|
||||
IMPORTANT CONSTRAINTS:
|
||||
- Screenshots are visual evidence, not implementation truth
|
||||
- You may infer navigation flow, but you may NOT invent:
|
||||
- Accessibility identifiers
|
||||
- View names
|
||||
- File names
|
||||
- Data models
|
||||
- You must ask clarifying questions before writing code
|
||||
|
||||
================================================
|
||||
PHASE 1 — VISUAL FLOW ANALYSIS (MANDATORY)
|
||||
================================================
|
||||
|
||||
From the screenshots, you must:
|
||||
|
||||
1. Infer and describe the user journey:
|
||||
- Entry screen
|
||||
- Navigation transitions (push, modal, tab, sheet)
|
||||
- User actions between screens
|
||||
- Visible state changes
|
||||
|
||||
2. Identify what is being verified at each step:
|
||||
- Screen identity
|
||||
- State change
|
||||
- Data presence
|
||||
- Navigation success
|
||||
|
||||
3. Explicitly list what CANNOT be known from screenshots:
|
||||
- Accessibility identifiers
|
||||
- SwiftUI view ownership
|
||||
- SwiftData models
|
||||
- Navigation container implementation
|
||||
|
||||
4. Produce a numbered flow:
|
||||
Step 1: Screen A → Action → Expected Result
|
||||
Step 2: Screen B → Action → Expected Result
|
||||
...
|
||||
Step N: Screen E → Final Assertions
|
||||
|
||||
5. Ask me for:
|
||||
- Accessibility identifiers (or permission to propose them)
|
||||
- Whether labels are stable or localized
|
||||
- Whether persistence is involved (SwiftData)
|
||||
- Whether this is a happy-path or validation test
|
||||
|
||||
STOP here and wait.
|
||||
Do NOT write code.
|
||||
|
||||
================================================
|
||||
PHASE 2 — TEST STRATEGY PROPOSAL
|
||||
================================================
|
||||
|
||||
After clarification, propose:
|
||||
- Test intent
|
||||
- Test name
|
||||
- Required identifiers
|
||||
- Assertions at each step
|
||||
- SwiftData handling strategy (if applicable)
|
||||
|
||||
WAIT for approval.
|
||||
|
||||
================================================
|
||||
PHASE 3 — XCUITEST IMPLEMENTATION
|
||||
================================================
|
||||
|
||||
Only after approval:
|
||||
- Write the test from A → E
|
||||
- Verify state at each step
|
||||
- Use explicit waits
|
||||
- Avoid fragile selectors
|
||||
- Extract helpers for navigation
|
||||
- Comment assumptions derived from screenshots
|
||||
|
||||
If anything is ambiguous, STOP and ask.
|
||||
```
|
||||
```
|
||||
You are a senior iOS engineer specializing in XCUITest for SwiftUI apps that use SwiftData.
|
||||
|
||||
Your task is to write production-grade, CI-stable XCUITests by analyzing:
|
||||
- A sequence of screenshots representing a user flow (A → B → C → D → E)
|
||||
- Any additional context I provide
|
||||
|
||||
You must follow a gated, multi-phase workflow.
|
||||
DO NOT write any test code until I explicitly approve.
|
||||
|
||||
====================================================
|
||||
GLOBAL RULES (NON-NEGOTIABLE)
|
||||
====================================================
|
||||
|
||||
- Screenshots are visual evidence, not implementation truth
|
||||
- Do NOT invent:
|
||||
- Accessibility identifiers
|
||||
- View names or file names
|
||||
- SwiftData models or relationships
|
||||
- Prefer asking questions over guessing
|
||||
- Treat SwiftData persistence as a source of flakiness unless constrained
|
||||
- Assume tests run on CI across multiple simulators and iOS versions
|
||||
|
||||
====================================================
|
||||
PHASE 1 — VISUAL FLOW ANALYSIS (MANDATORY)
|
||||
====================================================
|
||||
|
||||
From the screenshots, you must:
|
||||
|
||||
1. Infer and describe the user journey:
|
||||
- Entry point (first screen)
|
||||
- Navigation transitions between screens:
|
||||
- push (NavigationStack)
|
||||
- modal
|
||||
- sheet
|
||||
- tab switch
|
||||
- User actions that cause each transition
|
||||
|
||||
2. Produce a numbered flow:
|
||||
Step 1: Screen A → User Action → Expected Result
|
||||
Step 2: Screen B → User Action → Expected Result
|
||||
...
|
||||
Final Step: Screen E → Final Expected State
|
||||
|
||||
3. Identify what is being validated at each step:
|
||||
- Screen identity
|
||||
- Navigation success
|
||||
- Visible state change
|
||||
- Data presence or mutation
|
||||
|
||||
4. Explicitly list what CANNOT be known from screenshots:
|
||||
- Accessibility identifiers
|
||||
- SwiftUI view/file ownership
|
||||
- SwiftData schema details
|
||||
- Data source configuration
|
||||
|
||||
STOP after Phase 1 and wait for confirmation.
|
||||
|
||||
====================================================
|
||||
PHASE 2 — SWIFTDATA DISCOVERY (MANDATORY)
|
||||
====================================================
|
||||
|
||||
Before any test planning, you must:
|
||||
|
||||
1. Determine SwiftData involvement assumptions:
|
||||
- Which screens appear to depend on persisted data
|
||||
- Whether data is being created, edited, or merely displayed
|
||||
- Whether ordering, filtering, or relationships are visible
|
||||
|
||||
2. Identify SwiftData risks:
|
||||
- Empty vs non-empty states
|
||||
- Non-deterministic ordering
|
||||
- Schema migration crashes
|
||||
- Data leaking between tests
|
||||
- Async fetch timing affecting UI
|
||||
|
||||
3. Propose (do not assume) a test data strategy:
|
||||
- In-memory ModelContainer (preferred)
|
||||
- Explicit seeding via launch arguments
|
||||
- Destructive reset at test launch
|
||||
- Production container with isolation safeguards
|
||||
|
||||
4. Ask me clarifying questions about:
|
||||
- Whether persistence correctness is under test or just UI behavior
|
||||
- Whether destructive resets are allowed in tests
|
||||
- Whether test data may be injected or must use real flows
|
||||
|
||||
STOP after this phase and wait.
|
||||
|
||||
====================================================
|
||||
PHASE 3 — SELECTORS & ACCESSIBILITY CONTRACT
|
||||
====================================================
|
||||
|
||||
Before writing code, you must:
|
||||
|
||||
1. Request or propose accessibility identifiers for:
|
||||
- Screens
|
||||
- Interactive controls
|
||||
- Assertion targets
|
||||
|
||||
2. If proposing identifiers:
|
||||
- Clearly label them as PROPOSED
|
||||
- Group them by screen
|
||||
- Explain why each is needed for test stability
|
||||
|
||||
3. Identify selectors that would be fragile:
|
||||
- Static text labels
|
||||
- Localized strings
|
||||
- firstMatch or indexed access
|
||||
|
||||
STOP and wait for approval.
|
||||
|
||||
====================================================
|
||||
PHASE 4 — TEST PLAN (REQUIRED)
|
||||
====================================================
|
||||
|
||||
After clarification, produce a concise test plan containing:
|
||||
- Test name(s)
|
||||
- Given / When / Then for each test
|
||||
- Preconditions and seeded data
|
||||
- Assertions at each step (not just existence)
|
||||
- Setup and teardown strategy
|
||||
- Any helper abstractions you intend to create
|
||||
|
||||
WAIT for explicit approval before proceeding.
|
||||
|
||||
====================================================
|
||||
PHASE 5 — XCUITEST IMPLEMENTATION
|
||||
====================================================
|
||||
|
||||
Only after approval, write the XCUITest with these rules:
|
||||
|
||||
- Accessibility identifiers over labels
|
||||
- No element(boundBy:)
|
||||
- No firstMatch unless justified
|
||||
- Explicit waitForExistence timeouts
|
||||
- No sleep()
|
||||
- Assert state changes, not just visibility
|
||||
- Tests must be deterministic, isolated, and CI-safe
|
||||
- Extract helpers for navigation and setup
|
||||
- Comment assumptions inferred from screenshots
|
||||
|
||||
If any ambiguity remains, STOP and ask.
|
||||
|
||||
Begin Phase 1 once I provide the screenshots.
|
||||
```
|
||||
@@ -7,7 +7,16 @@
|
||||
"Skill(superpowers:subagent-driven-development)",
|
||||
"Bash(git add:*)",
|
||||
"Bash(git commit:*)",
|
||||
"WebSearch"
|
||||
"WebSearch",
|
||||
"Bash(wc:*)",
|
||||
"Bash(find:*)",
|
||||
"Skill(lead-designer)",
|
||||
"Skill(frontend-design:frontend-design)",
|
||||
"Bash(python3:*)",
|
||||
"Bash(ls:*)",
|
||||
"Bash(xargs basename:*)",
|
||||
"Bash(python -m sportstime_parser:*)",
|
||||
"Bash(python -m py_compile:*)"
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
@@ -221,11 +221,11 @@ def get_scraper(sport: str, season: int):
|
||||
|
||||
|
||||
def cmd_scrape(args: argparse.Namespace) -> int:
|
||||
"""Execute the scrape command."""
|
||||
from .models.game import save_games
|
||||
from .models.team import save_teams
|
||||
from .models.stadium import save_stadiums
|
||||
"""Execute the scrape command with canonical output format."""
|
||||
import json
|
||||
from .validators.report import generate_report, validate_games
|
||||
from .normalizers.timezone import get_stadium_timezone
|
||||
from .validators.schema import SchemaValidationError, validate_batch
|
||||
|
||||
logger = get_logger()
|
||||
|
||||
@@ -282,14 +282,60 @@ def cmd_scrape(args: argparse.Namespace) -> int:
|
||||
logger.info(f"Review items: {report.summary.review_count}")
|
||||
|
||||
if not args.dry_run:
|
||||
# Save output files
|
||||
# Build mappings for canonical conversion
|
||||
stadium_timezone_map: dict[str, str] = {}
|
||||
for stadium in result.stadiums:
|
||||
tz = get_stadium_timezone(stadium.state, stadium.timezone)
|
||||
stadium_timezone_map[stadium.id] = tz
|
||||
|
||||
stadium_team_abbrevs: dict[str, list[str]] = {}
|
||||
for team in result.teams:
|
||||
if team.stadium_id:
|
||||
if team.stadium_id not in stadium_team_abbrevs:
|
||||
stadium_team_abbrevs[team.stadium_id] = []
|
||||
stadium_team_abbrevs[team.stadium_id].append(team.abbreviation)
|
||||
|
||||
# Convert to canonical format
|
||||
canonical_stadiums = [
|
||||
s.to_canonical_dict(primary_team_abbrevs=stadium_team_abbrevs.get(s.id, []))
|
||||
for s in result.stadiums
|
||||
]
|
||||
canonical_teams = [t.to_canonical_dict() for t in result.teams]
|
||||
canonical_games = [
|
||||
g.to_canonical_dict(stadium_timezone=stadium_timezone_map.get(g.stadium_id, "America/New_York"))
|
||||
for g in result.games
|
||||
]
|
||||
|
||||
# Validate canonical output
|
||||
stadium_errors = validate_batch(canonical_stadiums, "stadium", fail_fast=False)
|
||||
team_errors = validate_batch(canonical_teams, "team", fail_fast=False)
|
||||
game_errors = validate_batch(canonical_games, "game", fail_fast=False)
|
||||
|
||||
if stadium_errors or team_errors or game_errors:
|
||||
for idx, errors in stadium_errors:
|
||||
for e in errors:
|
||||
logger.error(f"Stadium {result.stadiums[idx].id}: {e}")
|
||||
for idx, errors in team_errors:
|
||||
for e in errors:
|
||||
logger.error(f"Team {result.teams[idx].id}: {e}")
|
||||
for idx, errors in game_errors[:10]:
|
||||
for e in errors:
|
||||
logger.error(f"Game {result.games[idx].id}: {e}")
|
||||
if len(game_errors) > 10:
|
||||
logger.error(f"... and {len(game_errors) - 10} more game errors")
|
||||
raise SchemaValidationError("canonical", ["Schema validation failed"])
|
||||
|
||||
# Save canonical output files
|
||||
games_file = OUTPUT_DIR / f"games_{sport}_{args.season}.json"
|
||||
teams_file = OUTPUT_DIR / f"teams_{sport}.json"
|
||||
stadiums_file = OUTPUT_DIR / f"stadiums_{sport}.json"
|
||||
|
||||
save_games(result.games, str(games_file))
|
||||
save_teams(result.teams, str(teams_file))
|
||||
save_stadiums(result.stadiums, str(stadiums_file))
|
||||
with open(games_file, "w", encoding="utf-8") as f:
|
||||
json.dump(canonical_games, f, indent=2)
|
||||
with open(teams_file, "w", encoding="utf-8") as f:
|
||||
json.dump(canonical_teams, f, indent=2)
|
||||
with open(stadiums_file, "w", encoding="utf-8") as f:
|
||||
json.dump(canonical_stadiums, f, indent=2)
|
||||
|
||||
# Save validation report
|
||||
report_path = report.save()
|
||||
@@ -307,6 +353,11 @@ def cmd_scrape(args: argparse.Namespace) -> int:
|
||||
failure_count += 1
|
||||
continue
|
||||
|
||||
except SchemaValidationError as e:
|
||||
log_failure(f"{sport.upper()}: {e}")
|
||||
failure_count += 1
|
||||
continue
|
||||
|
||||
except Exception as e:
|
||||
log_failure(f"{sport.upper()}: {e}")
|
||||
logger.exception("Scraping failed")
|
||||
|
||||
@@ -3,6 +3,7 @@
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from typing import Optional
|
||||
from zoneinfo import ZoneInfo
|
||||
import json
|
||||
|
||||
|
||||
@@ -64,9 +65,53 @@ class Game:
|
||||
"raw_stadium": self.raw_stadium,
|
||||
}
|
||||
|
||||
def to_canonical_dict(
|
||||
self,
|
||||
stadium_timezone: str,
|
||||
is_playoff: bool = False,
|
||||
broadcast: Optional[str] = None,
|
||||
) -> dict:
|
||||
"""Convert to canonical dictionary format matching iOS app schema.
|
||||
|
||||
Args:
|
||||
stadium_timezone: IANA timezone of the stadium (e.g., 'America/Chicago')
|
||||
is_playoff: Whether this is a playoff game
|
||||
broadcast: Broadcast network info (e.g., 'ESPN')
|
||||
|
||||
Returns:
|
||||
Dictionary with field names matching JSONCanonicalGame in BootstrapService.swift
|
||||
"""
|
||||
# Convert game_date to UTC
|
||||
if self.game_date.tzinfo is None:
|
||||
# Localize naive datetime to stadium timezone first
|
||||
local_tz = ZoneInfo(stadium_timezone)
|
||||
local_dt = self.game_date.replace(tzinfo=local_tz)
|
||||
else:
|
||||
local_dt = self.game_date
|
||||
|
||||
utc_dt = local_dt.astimezone(ZoneInfo("UTC"))
|
||||
|
||||
# Format season as string (e.g., 2025 -> "2025-26" for NBA/NHL, "2025" for MLB)
|
||||
if self.sport in ("nba", "nhl"):
|
||||
season_str = f"{self.season}-{str(self.season + 1)[-2:]}"
|
||||
else:
|
||||
season_str = str(self.season)
|
||||
|
||||
return {
|
||||
"canonical_id": self.id,
|
||||
"sport": self.sport,
|
||||
"season": season_str,
|
||||
"game_datetime_utc": utc_dt.strftime("%Y-%m-%dT%H:%M:%SZ"),
|
||||
"home_team_canonical_id": self.home_team_id,
|
||||
"away_team_canonical_id": self.away_team_id,
|
||||
"stadium_canonical_id": self.stadium_id,
|
||||
"is_playoff": is_playoff,
|
||||
"broadcast": broadcast,
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: dict) -> "Game":
|
||||
"""Create a Game from a dictionary."""
|
||||
"""Create a Game from a dictionary (internal format)."""
|
||||
game_date = data["game_date"]
|
||||
if isinstance(game_date, str):
|
||||
game_date = datetime.fromisoformat(game_date)
|
||||
@@ -89,6 +134,26 @@ class Game:
|
||||
raw_stadium=data.get("raw_stadium"),
|
||||
)
|
||||
|
||||
@classmethod
|
||||
def from_canonical_dict(cls, data: dict) -> "Game":
|
||||
"""Create a Game from a canonical dictionary (iOS app format)."""
|
||||
game_date = datetime.fromisoformat(data["game_datetime_utc"])
|
||||
|
||||
# Parse season string (e.g., "2025-26" -> 2025, or "2025" -> 2025)
|
||||
season_str = data["season"]
|
||||
season = int(season_str.split("-")[0])
|
||||
|
||||
return cls(
|
||||
id=data["canonical_id"],
|
||||
sport=data["sport"],
|
||||
season=season,
|
||||
home_team_id=data["home_team_canonical_id"],
|
||||
away_team_id=data["away_team_canonical_id"],
|
||||
stadium_id=data["stadium_canonical_id"],
|
||||
game_date=game_date,
|
||||
status="scheduled",
|
||||
)
|
||||
|
||||
def to_json(self) -> str:
|
||||
"""Serialize to JSON string."""
|
||||
return json.dumps(self.to_dict(), indent=2)
|
||||
@@ -106,7 +171,10 @@ def save_games(games: list[Game], filepath: str) -> None:
|
||||
|
||||
|
||||
def load_games(filepath: str) -> list[Game]:
|
||||
"""Load a list of games from a JSON file."""
|
||||
"""Load a list of games from a JSON file (auto-detects format)."""
|
||||
with open(filepath, "r", encoding="utf-8") as f:
|
||||
data = json.load(f)
|
||||
# Detect format: canonical has "canonical_id" and "game_datetime_utc", internal has "id"
|
||||
if data and "canonical_id" in data[0] and "game_datetime_utc" in data[0]:
|
||||
return [Game.from_canonical_dict(d) for d in data]
|
||||
return [Game.from_dict(d) for d in data]
|
||||
|
||||
@@ -60,9 +60,32 @@ class Stadium:
|
||||
"timezone": self.timezone,
|
||||
}
|
||||
|
||||
def to_canonical_dict(self, primary_team_abbrevs: list[str] | None = None) -> dict:
|
||||
"""Convert to canonical dictionary format matching iOS app schema.
|
||||
|
||||
Args:
|
||||
primary_team_abbrevs: List of team abbreviations that play at this stadium.
|
||||
If None, defaults to empty list.
|
||||
|
||||
Returns:
|
||||
Dictionary with field names matching JSONCanonicalStadium in BootstrapService.swift
|
||||
"""
|
||||
return {
|
||||
"canonical_id": self.id,
|
||||
"name": self.name,
|
||||
"city": self.city,
|
||||
"state": self.state,
|
||||
"latitude": self.latitude,
|
||||
"longitude": self.longitude,
|
||||
"capacity": self.capacity if self.capacity is not None else 0,
|
||||
"sport": self.sport,
|
||||
"primary_team_abbrevs": primary_team_abbrevs or [],
|
||||
"year_opened": self.opened_year,
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: dict) -> "Stadium":
|
||||
"""Create a Stadium from a dictionary."""
|
||||
"""Create a Stadium from a dictionary (internal format)."""
|
||||
return cls(
|
||||
id=data["id"],
|
||||
sport=data["sport"],
|
||||
@@ -80,6 +103,22 @@ class Stadium:
|
||||
timezone=data.get("timezone"),
|
||||
)
|
||||
|
||||
@classmethod
|
||||
def from_canonical_dict(cls, data: dict) -> "Stadium":
|
||||
"""Create a Stadium from a canonical dictionary (iOS app format)."""
|
||||
return cls(
|
||||
id=data["canonical_id"],
|
||||
sport=data["sport"],
|
||||
name=data["name"],
|
||||
city=data["city"],
|
||||
state=data["state"],
|
||||
country="USA", # Canonical format doesn't include country
|
||||
latitude=data["latitude"],
|
||||
longitude=data["longitude"],
|
||||
capacity=data.get("capacity"),
|
||||
opened_year=data.get("year_opened"),
|
||||
)
|
||||
|
||||
def to_json(self) -> str:
|
||||
"""Serialize to JSON string."""
|
||||
return json.dumps(self.to_dict(), indent=2)
|
||||
@@ -102,7 +141,10 @@ def save_stadiums(stadiums: list[Stadium], filepath: str) -> None:
|
||||
|
||||
|
||||
def load_stadiums(filepath: str) -> list[Stadium]:
|
||||
"""Load a list of stadiums from a JSON file."""
|
||||
"""Load a list of stadiums from a JSON file (auto-detects format)."""
|
||||
with open(filepath, "r", encoding="utf-8") as f:
|
||||
data = json.load(f)
|
||||
# Detect format: canonical has "canonical_id", internal has "id"
|
||||
if data and "canonical_id" in data[0]:
|
||||
return [Stadium.from_canonical_dict(d) for d in data]
|
||||
return [Stadium.from_dict(d) for d in data]
|
||||
|
||||
@@ -54,9 +54,28 @@ class Team:
|
||||
"stadium_id": self.stadium_id,
|
||||
}
|
||||
|
||||
def to_canonical_dict(self) -> dict:
|
||||
"""Convert to canonical dictionary format matching iOS app schema.
|
||||
|
||||
Returns:
|
||||
Dictionary with field names matching JSONCanonicalTeam in BootstrapService.swift
|
||||
"""
|
||||
return {
|
||||
"canonical_id": self.id,
|
||||
"name": self.name,
|
||||
"abbreviation": self.abbreviation,
|
||||
"sport": self.sport,
|
||||
"city": self.city,
|
||||
"stadium_canonical_id": self.stadium_id or "",
|
||||
"conference_id": self.conference,
|
||||
"division_id": self.division,
|
||||
"primary_color": self.primary_color,
|
||||
"secondary_color": self.secondary_color,
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: dict) -> "Team":
|
||||
"""Create a Team from a dictionary."""
|
||||
"""Create a Team from a dictionary (internal format)."""
|
||||
return cls(
|
||||
id=data["id"],
|
||||
sport=data["sport"],
|
||||
@@ -72,6 +91,23 @@ class Team:
|
||||
stadium_id=data.get("stadium_id"),
|
||||
)
|
||||
|
||||
@classmethod
|
||||
def from_canonical_dict(cls, data: dict) -> "Team":
|
||||
"""Create a Team from a canonical dictionary (iOS app format)."""
|
||||
return cls(
|
||||
id=data["canonical_id"],
|
||||
sport=data["sport"],
|
||||
city=data["city"],
|
||||
name=data["name"],
|
||||
full_name=f"{data['city']} {data['name']}", # Reconstruct full_name
|
||||
abbreviation=data["abbreviation"],
|
||||
conference=data.get("conference_id"),
|
||||
division=data.get("division_id"),
|
||||
primary_color=data.get("primary_color"),
|
||||
secondary_color=data.get("secondary_color"),
|
||||
stadium_id=data.get("stadium_canonical_id"),
|
||||
)
|
||||
|
||||
def to_json(self) -> str:
|
||||
"""Serialize to JSON string."""
|
||||
return json.dumps(self.to_dict(), indent=2)
|
||||
@@ -89,7 +125,10 @@ def save_teams(teams: list[Team], filepath: str) -> None:
|
||||
|
||||
|
||||
def load_teams(filepath: str) -> list[Team]:
|
||||
"""Load a list of teams from a JSON file."""
|
||||
"""Load a list of teams from a JSON file (auto-detects format)."""
|
||||
with open(filepath, "r", encoding="utf-8") as f:
|
||||
data = json.load(f)
|
||||
# Detect format: canonical has "canonical_id", internal has "id"
|
||||
if data and "canonical_id" in data[0]:
|
||||
return [Team.from_canonical_dict(d) for d in data]
|
||||
return [Team.from_dict(d) for d in data]
|
||||
|
||||
@@ -240,6 +240,13 @@ STADIUM_MAPPINGS: dict[str, dict[str, StadiumInfo]] = {
|
||||
"stadium_nwsl_america_first_field": StadiumInfo("stadium_nwsl_america_first_field", "America First Field", "Sandy", "UT", "USA", "nwsl", 40.5830, -111.8933),
|
||||
"stadium_nwsl_audi_field": StadiumInfo("stadium_nwsl_audi_field", "Audi Field", "Washington", "DC", "USA", "nwsl", 38.8687, -77.0128),
|
||||
"stadium_nwsl_paypal_park": StadiumInfo("stadium_nwsl_paypal_park", "PayPal Park", "San Jose", "CA", "USA", "nwsl", 37.3511, -121.9250),
|
||||
# Boston Legacy FC venues
|
||||
"stadium_nwsl_gillette_stadium": StadiumInfo("stadium_nwsl_gillette_stadium", "Gillette Stadium", "Foxborough", "MA", "USA", "nwsl", 42.0909, -71.2643),
|
||||
"stadium_nwsl_centreville_bank_stadium": StadiumInfo("stadium_nwsl_centreville_bank_stadium", "Centreville Bank Stadium", "Pawtucket", "RI", "USA", "nwsl", 41.8770, -71.3910),
|
||||
# Denver Summit FC venues
|
||||
"stadium_nwsl_empower_field": StadiumInfo("stadium_nwsl_empower_field", "Empower Field at Mile High", "Denver", "CO", "USA", "nwsl", 39.7439, -105.0201, "America/Denver"),
|
||||
"stadium_nwsl_dicks_sporting_goods_park": StadiumInfo("stadium_nwsl_dicks_sporting_goods_park", "Dick's Sporting Goods Park", "Commerce City", "CO", "USA", "nwsl", 39.8056, -104.8922, "America/Denver"),
|
||||
"stadium_nwsl_centennial_stadium": StadiumInfo("stadium_nwsl_centennial_stadium", "Centennial Stadium", "Centennial", "CO", "USA", "nwsl", 39.6000, -104.8800, "America/Denver"),
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
@@ -265,6 +265,8 @@ TEAM_MAPPINGS: dict[str, dict[str, tuple[str, str, str]]] = {
|
||||
"SLC": ("team_nwsl_slc", "Utah Royals", "Utah"),
|
||||
"WAS": ("team_nwsl_was", "Washington Spirit", "Washington"),
|
||||
"BFC": ("team_nwsl_bfc", "Bay FC", "San Francisco"),
|
||||
"BOS": ("team_nwsl_bos", "Boston Legacy FC", "Boston"),
|
||||
"DEN": ("team_nwsl_den", "Denver Summit FC", "Denver"),
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
@@ -185,9 +185,12 @@ class BaseScraper(ABC):
|
||||
"""
|
||||
sources = self._get_sources()
|
||||
last_error: Optional[str] = None
|
||||
sources_tried = 0
|
||||
max_sources_to_try = 2 # Don't try all sources if first few return nothing
|
||||
|
||||
for source in sources:
|
||||
self._logger.info(f"Trying source: {source}")
|
||||
sources_tried += 1
|
||||
|
||||
try:
|
||||
# Scrape raw data
|
||||
@@ -195,6 +198,12 @@ class BaseScraper(ABC):
|
||||
|
||||
if not raw_games:
|
||||
log_warning(f"No games found from {source}")
|
||||
# If multiple sources return nothing, the schedule likely doesn't exist
|
||||
if sources_tried >= max_sources_to_try:
|
||||
return ScrapeResult(
|
||||
success=False,
|
||||
error_message=f"No schedule data available (tried {sources_tried} sources)",
|
||||
)
|
||||
continue
|
||||
|
||||
self._logger.info(f"Found {len(raw_games)} raw games from {source}")
|
||||
@@ -216,7 +225,9 @@ class BaseScraper(ABC):
|
||||
except Exception as e:
|
||||
last_error = str(e)
|
||||
log_error(f"Failed to scrape from {source}: {e}", exc_info=True)
|
||||
# Discard partial data and try next source
|
||||
# If we've tried enough sources, bail out
|
||||
if sources_tried >= max_sources_to_try:
|
||||
break
|
||||
continue
|
||||
|
||||
# All sources failed
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
"""MLB scraper implementation with multi-source fallback."""
|
||||
|
||||
from datetime import datetime, date
|
||||
from datetime import datetime, date, timedelta
|
||||
from typing import Optional
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
@@ -45,7 +45,10 @@ class MLBScraper(BaseScraper):
|
||||
|
||||
def _get_sources(self) -> list[str]:
|
||||
"""Return source list in priority order."""
|
||||
return ["baseball_reference", "mlb_api", "espn"]
|
||||
# MLB API is best - returns full schedule in one request
|
||||
# ESPN caps at ~25 results for baseball
|
||||
# Baseball-Reference requires HTML parsing
|
||||
return ["mlb_api", "espn", "baseball_reference"]
|
||||
|
||||
def _get_source_url(self, source: str, **kwargs) -> str:
|
||||
"""Build URL for a source."""
|
||||
@@ -215,43 +218,29 @@ class MLBScraper(BaseScraper):
|
||||
)
|
||||
|
||||
def _scrape_mlb_api(self) -> list[RawGameData]:
|
||||
"""Scrape games from MLB Stats API.
|
||||
"""Scrape games from MLB Stats API using full season query."""
|
||||
# Build date range for entire season (March-November)
|
||||
season_months = self._get_season_months()
|
||||
start_year, start_month = season_months[0]
|
||||
end_year, end_month = season_months[-1]
|
||||
|
||||
MLB API allows date range queries.
|
||||
"""
|
||||
all_games: list[RawGameData] = []
|
||||
# Get last day of end month
|
||||
if end_month == 12:
|
||||
end_date = date(end_year + 1, 1, 1) - timedelta(days=1)
|
||||
else:
|
||||
end_date = date(end_year, end_month + 1, 1) - timedelta(days=1)
|
||||
|
||||
# Query by month to avoid hitting API limits
|
||||
for year, month in self._get_season_months():
|
||||
start_date = date(year, month, 1)
|
||||
start_date = date(start_year, start_month, 1)
|
||||
|
||||
# Get last day of month
|
||||
if month == 12:
|
||||
end_date = date(year + 1, 1, 1)
|
||||
else:
|
||||
end_date = date(year, month + 1, 1)
|
||||
url = f"https://statsapi.mlb.com/api/v1/schedule?sportId=1&startDate={start_date.strftime('%Y-%m-%d')}&endDate={end_date.strftime('%Y-%m-%d')}"
|
||||
self._logger.info(f"Fetching MLB schedule: {start_date} to {end_date}")
|
||||
|
||||
# Adjust end date to last day of month
|
||||
from datetime import timedelta
|
||||
end_date = end_date - timedelta(days=1)
|
||||
|
||||
url = self._get_source_url(
|
||||
"mlb_api",
|
||||
start_date=start_date.strftime("%Y-%m-%d"),
|
||||
end_date=end_date.strftime("%Y-%m-%d"),
|
||||
)
|
||||
|
||||
try:
|
||||
data = self.session.get_json(url)
|
||||
games = self._parse_mlb_api_response(data, url)
|
||||
all_games.extend(games)
|
||||
self._logger.debug(f"Found {len(games)} games in {year}-{month:02d}")
|
||||
|
||||
except Exception as e:
|
||||
self._logger.debug(f"MLB API error for {year}-{month}: {e}")
|
||||
continue
|
||||
|
||||
return all_games
|
||||
try:
|
||||
data = self.session.get_json(url)
|
||||
return self._parse_mlb_api_response(data, url)
|
||||
except Exception as e:
|
||||
self._logger.error(f"MLB API error: {e}")
|
||||
return []
|
||||
|
||||
def _parse_mlb_api_response(
|
||||
self,
|
||||
@@ -345,33 +334,30 @@ class MLBScraper(BaseScraper):
|
||||
)
|
||||
|
||||
def _scrape_espn(self) -> list[RawGameData]:
|
||||
"""Scrape games from ESPN API."""
|
||||
all_games: list[RawGameData] = []
|
||||
"""Scrape games from ESPN API using date range query."""
|
||||
# Build date range for entire season (March-November)
|
||||
season_months = self._get_season_months()
|
||||
start_year, start_month = season_months[0]
|
||||
end_year, end_month = season_months[-1]
|
||||
|
||||
for year, month in self._get_season_months():
|
||||
# Get number of days in month
|
||||
if month == 12:
|
||||
next_month = date(year + 1, 1, 1)
|
||||
else:
|
||||
next_month = date(year, month + 1, 1)
|
||||
# Get last day of end month
|
||||
if end_month == 12:
|
||||
end_date = date(end_year + 1, 1, 1) - timedelta(days=1)
|
||||
else:
|
||||
end_date = date(end_year, end_month + 1, 1) - timedelta(days=1)
|
||||
|
||||
days_in_month = (next_month - date(year, month, 1)).days
|
||||
start_date = date(start_year, start_month, 1)
|
||||
date_range = f"{start_date.strftime('%Y%m%d')}-{end_date.strftime('%Y%m%d')}"
|
||||
|
||||
for day in range(1, days_in_month + 1):
|
||||
try:
|
||||
game_date = date(year, month, day)
|
||||
date_str = game_date.strftime("%Y%m%d")
|
||||
url = self._get_source_url("espn", date=date_str)
|
||||
url = f"https://site.api.espn.com/apis/site/v2/sports/baseball/mlb/scoreboard?limit=3000&dates={date_range}"
|
||||
self._logger.info(f"Fetching MLB schedule: {date_range}")
|
||||
|
||||
data = self.session.get_json(url)
|
||||
games = self._parse_espn_response(data, url)
|
||||
all_games.extend(games)
|
||||
|
||||
except Exception as e:
|
||||
self._logger.debug(f"ESPN error for {year}-{month}-{day}: {e}")
|
||||
continue
|
||||
|
||||
return all_games
|
||||
try:
|
||||
data = self.session.get_json(url)
|
||||
return self._parse_espn_response(data, url)
|
||||
except Exception as e:
|
||||
self._logger.error(f"ESPN error: {e}")
|
||||
return []
|
||||
|
||||
def _parse_espn_response(
|
||||
self,
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
"""MLS scraper implementation with multi-source fallback."""
|
||||
|
||||
from datetime import datetime, date
|
||||
from datetime import datetime, date, timedelta
|
||||
from typing import Optional
|
||||
|
||||
from .base import BaseScraper, RawGameData, ScrapeResult
|
||||
@@ -78,33 +78,30 @@ class MLSScraper(BaseScraper):
|
||||
raise ValueError(f"Unknown source: {source}")
|
||||
|
||||
def _scrape_espn(self) -> list[RawGameData]:
|
||||
"""Scrape games from ESPN API."""
|
||||
all_games: list[RawGameData] = []
|
||||
"""Scrape games from ESPN API using date range query."""
|
||||
# Build date range for entire season (Feb-November)
|
||||
season_months = self._get_season_months()
|
||||
start_year, start_month = season_months[0]
|
||||
end_year, end_month = season_months[-1]
|
||||
|
||||
for year, month in self._get_season_months():
|
||||
# Get number of days in month
|
||||
if month == 12:
|
||||
next_month = date(year + 1, 1, 1)
|
||||
else:
|
||||
next_month = date(year, month + 1, 1)
|
||||
# Get last day of end month
|
||||
if end_month == 12:
|
||||
end_date = date(end_year + 1, 1, 1) - timedelta(days=1)
|
||||
else:
|
||||
end_date = date(end_year, end_month + 1, 1) - timedelta(days=1)
|
||||
|
||||
days_in_month = (next_month - date(year, month, 1)).days
|
||||
start_date = date(start_year, start_month, 1)
|
||||
date_range = f"{start_date.strftime('%Y%m%d')}-{end_date.strftime('%Y%m%d')}"
|
||||
|
||||
for day in range(1, days_in_month + 1):
|
||||
try:
|
||||
game_date = date(year, month, day)
|
||||
date_str = game_date.strftime("%Y%m%d")
|
||||
url = self._get_source_url("espn", date=date_str)
|
||||
url = f"https://site.api.espn.com/apis/site/v2/sports/soccer/usa.1/scoreboard?limit=1000&dates={date_range}"
|
||||
self._logger.info(f"Fetching MLS schedule: {date_range}")
|
||||
|
||||
data = self.session.get_json(url)
|
||||
games = self._parse_espn_response(data, url)
|
||||
all_games.extend(games)
|
||||
|
||||
except Exception as e:
|
||||
self._logger.debug(f"ESPN error for {year}-{month}-{day}: {e}")
|
||||
continue
|
||||
|
||||
return all_games
|
||||
try:
|
||||
data = self.session.get_json(url)
|
||||
return self._parse_espn_response(data, url)
|
||||
except Exception as e:
|
||||
self._logger.error(f"ESPN error: {e}")
|
||||
return []
|
||||
|
||||
def _parse_espn_response(
|
||||
self,
|
||||
|
||||
@@ -95,9 +95,11 @@ class NBAScraper(BaseScraper):
|
||||
BR organizes games by month with separate pages.
|
||||
Format: https://www.basketball-reference.com/leagues/NBA_YYYY_games-month.html
|
||||
where YYYY is the ending year of the season.
|
||||
Bails early if first few months have no data (season doesn't exist).
|
||||
"""
|
||||
all_games: list[RawGameData] = []
|
||||
end_year = self.season + 1
|
||||
consecutive_empty_months = 0
|
||||
|
||||
for month in BR_MONTHS:
|
||||
url = self._get_source_url("basketball_reference", month=month, year=end_year)
|
||||
@@ -105,13 +107,23 @@ class NBAScraper(BaseScraper):
|
||||
try:
|
||||
html = self.session.get_html(url)
|
||||
games = self._parse_basketball_reference(html, url)
|
||||
all_games.extend(games)
|
||||
self._logger.debug(f"Found {len(games)} games in {month}")
|
||||
|
||||
if games:
|
||||
all_games.extend(games)
|
||||
consecutive_empty_months = 0
|
||||
self._logger.debug(f"Found {len(games)} games in {month}")
|
||||
else:
|
||||
consecutive_empty_months += 1
|
||||
|
||||
except Exception as e:
|
||||
# Some months may not exist (e.g., no games in August)
|
||||
self._logger.debug(f"No data for {month}: {e}")
|
||||
continue
|
||||
consecutive_empty_months += 1
|
||||
|
||||
# If first 3 months (Oct, Nov, Dec) all have no data, season doesn't exist
|
||||
if consecutive_empty_months >= 3 and not all_games:
|
||||
self._logger.info(f"No games found in first {consecutive_empty_months} months, season likely doesn't exist")
|
||||
break
|
||||
|
||||
return all_games
|
||||
|
||||
@@ -247,8 +259,11 @@ class NBAScraper(BaseScraper):
|
||||
|
||||
ESPN API returns games for a specific date range.
|
||||
We iterate through each day of the season.
|
||||
Bails out early if no games found after checking first month.
|
||||
"""
|
||||
all_games: list[RawGameData] = []
|
||||
consecutive_empty_days = 0
|
||||
max_empty_days = 45 # Bail after ~1.5 months of no games
|
||||
|
||||
for year, month in self._get_season_months():
|
||||
# Get number of days in month
|
||||
@@ -267,10 +282,25 @@ class NBAScraper(BaseScraper):
|
||||
|
||||
data = self.session.get_json(url)
|
||||
games = self._parse_espn_response(data, url)
|
||||
all_games.extend(games)
|
||||
|
||||
if games:
|
||||
all_games.extend(games)
|
||||
consecutive_empty_days = 0
|
||||
else:
|
||||
consecutive_empty_days += 1
|
||||
|
||||
# Bail early if no games found for a long stretch
|
||||
if consecutive_empty_days >= max_empty_days:
|
||||
self._logger.info(f"No games found for {max_empty_days} consecutive days, stopping ESPN scrape")
|
||||
return all_games
|
||||
|
||||
except Exception as e:
|
||||
self._logger.debug(f"ESPN error for {year}-{month}-{day}: {e}")
|
||||
consecutive_empty_days += 1
|
||||
|
||||
if consecutive_empty_days >= max_empty_days:
|
||||
self._logger.info(f"Too many consecutive failures, stopping ESPN scrape")
|
||||
return all_games
|
||||
continue
|
||||
|
||||
return all_games
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
"""NWSL scraper implementation with multi-source fallback."""
|
||||
|
||||
from datetime import datetime, date
|
||||
from datetime import datetime, date, timedelta
|
||||
from typing import Optional
|
||||
|
||||
from .base import BaseScraper, RawGameData, ScrapeResult
|
||||
@@ -73,33 +73,30 @@ class NWSLScraper(BaseScraper):
|
||||
raise ValueError(f"Unknown source: {source}")
|
||||
|
||||
def _scrape_espn(self) -> list[RawGameData]:
|
||||
"""Scrape games from ESPN API."""
|
||||
all_games: list[RawGameData] = []
|
||||
"""Scrape games from ESPN API using date range query."""
|
||||
# Build date range for entire season (March-November)
|
||||
season_months = self._get_season_months()
|
||||
start_year, start_month = season_months[0]
|
||||
end_year, end_month = season_months[-1]
|
||||
|
||||
for year, month in self._get_season_months():
|
||||
# Get number of days in month
|
||||
if month == 12:
|
||||
next_month = date(year + 1, 1, 1)
|
||||
else:
|
||||
next_month = date(year, month + 1, 1)
|
||||
# Get last day of end month
|
||||
if end_month == 12:
|
||||
end_date = date(end_year + 1, 1, 1) - timedelta(days=1)
|
||||
else:
|
||||
end_date = date(end_year, end_month + 1, 1) - timedelta(days=1)
|
||||
|
||||
days_in_month = (next_month - date(year, month, 1)).days
|
||||
start_date = date(start_year, start_month, 1)
|
||||
date_range = f"{start_date.strftime('%Y%m%d')}-{end_date.strftime('%Y%m%d')}"
|
||||
|
||||
for day in range(1, days_in_month + 1):
|
||||
try:
|
||||
game_date = date(year, month, day)
|
||||
date_str = game_date.strftime("%Y%m%d")
|
||||
url = self._get_source_url("espn", date=date_str)
|
||||
url = f"https://site.api.espn.com/apis/site/v2/sports/soccer/usa.nwsl/scoreboard?limit=1000&dates={date_range}"
|
||||
self._logger.info(f"Fetching NWSL schedule: {date_range}")
|
||||
|
||||
data = self.session.get_json(url)
|
||||
games = self._parse_espn_response(data, url)
|
||||
all_games.extend(games)
|
||||
|
||||
except Exception as e:
|
||||
self._logger.debug(f"ESPN error for {year}-{month}-{day}: {e}")
|
||||
continue
|
||||
|
||||
return all_games
|
||||
try:
|
||||
data = self.session.get_json(url)
|
||||
return self._parse_espn_response(data, url)
|
||||
except Exception as e:
|
||||
self._logger.error(f"ESPN error: {e}")
|
||||
return []
|
||||
|
||||
def _parse_espn_response(
|
||||
self,
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
"""WNBA scraper implementation with multi-source fallback."""
|
||||
|
||||
from datetime import datetime, date
|
||||
from datetime import datetime, date, timedelta
|
||||
from typing import Optional
|
||||
|
||||
from .base import BaseScraper, RawGameData, ScrapeResult
|
||||
@@ -73,33 +73,30 @@ class WNBAScraper(BaseScraper):
|
||||
raise ValueError(f"Unknown source: {source}")
|
||||
|
||||
def _scrape_espn(self) -> list[RawGameData]:
|
||||
"""Scrape games from ESPN API."""
|
||||
all_games: list[RawGameData] = []
|
||||
"""Scrape games from ESPN API using date range query."""
|
||||
# Build date range for entire season (May-October)
|
||||
season_months = self._get_season_months()
|
||||
start_year, start_month = season_months[0]
|
||||
end_year, end_month = season_months[-1]
|
||||
|
||||
for year, month in self._get_season_months():
|
||||
# Get number of days in month
|
||||
if month == 12:
|
||||
next_month = date(year + 1, 1, 1)
|
||||
else:
|
||||
next_month = date(year, month + 1, 1)
|
||||
# Get last day of end month
|
||||
if end_month == 12:
|
||||
end_date = date(end_year + 1, 1, 1) - timedelta(days=1)
|
||||
else:
|
||||
end_date = date(end_year, end_month + 1, 1) - timedelta(days=1)
|
||||
|
||||
days_in_month = (next_month - date(year, month, 1)).days
|
||||
start_date = date(start_year, start_month, 1)
|
||||
date_range = f"{start_date.strftime('%Y%m%d')}-{end_date.strftime('%Y%m%d')}"
|
||||
|
||||
for day in range(1, days_in_month + 1):
|
||||
try:
|
||||
game_date = date(year, month, day)
|
||||
date_str = game_date.strftime("%Y%m%d")
|
||||
url = self._get_source_url("espn", date=date_str)
|
||||
url = f"https://site.api.espn.com/apis/site/v2/sports/basketball/wnba/scoreboard?limit=1000&dates={date_range}"
|
||||
self._logger.info(f"Fetching WNBA schedule: {date_range}")
|
||||
|
||||
data = self.session.get_json(url)
|
||||
games = self._parse_espn_response(data, url)
|
||||
all_games.extend(games)
|
||||
|
||||
except Exception as e:
|
||||
self._logger.debug(f"ESPN error for {year}-{month}-{day}: {e}")
|
||||
continue
|
||||
|
||||
return all_games
|
||||
try:
|
||||
data = self.session.get_json(url)
|
||||
return self._parse_espn_response(data, url)
|
||||
except Exception as e:
|
||||
self._logger.error(f"ESPN error: {e}")
|
||||
return []
|
||||
|
||||
def _parse_espn_response(
|
||||
self,
|
||||
|
||||
@@ -8,10 +8,25 @@ from .report import (
|
||||
validate_games,
|
||||
)
|
||||
|
||||
from .schema import (
|
||||
SchemaValidationError,
|
||||
validate_canonical_stadium,
|
||||
validate_canonical_team,
|
||||
validate_canonical_game,
|
||||
validate_and_raise,
|
||||
validate_batch,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"ValidationReport",
|
||||
"ValidationSummary",
|
||||
"generate_report",
|
||||
"detect_duplicate_games",
|
||||
"validate_games",
|
||||
"SchemaValidationError",
|
||||
"validate_canonical_stadium",
|
||||
"validate_canonical_team",
|
||||
"validate_canonical_game",
|
||||
"validate_and_raise",
|
||||
"validate_batch",
|
||||
]
|
||||
|
||||
246
Scripts/sportstime_parser/validators/schema.py
Normal file
246
Scripts/sportstime_parser/validators/schema.py
Normal file
@@ -0,0 +1,246 @@
|
||||
"""JSON Schema validation for canonical output matching iOS app expectations.
|
||||
|
||||
This module defines schemas that match the Swift structs in BootstrapService.swift:
|
||||
- JSONCanonicalStadium
|
||||
- JSONCanonicalTeam
|
||||
- JSONCanonicalGame
|
||||
|
||||
Validation is performed at runtime before outputting JSON to ensure
|
||||
Python output matches what the iOS app expects.
|
||||
"""
|
||||
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
from typing import Any, Callable, Optional, Union
|
||||
|
||||
|
||||
class SchemaValidationError(Exception):
|
||||
"""Raised when canonical output fails schema validation."""
|
||||
|
||||
def __init__(self, model_type: str, errors: list[str]):
|
||||
self.model_type = model_type
|
||||
self.errors = errors
|
||||
super().__init__(f"{model_type} schema validation failed:\n" + "\n".join(f" - {e}" for e in errors))
|
||||
|
||||
|
||||
# ISO8601 UTC datetime pattern: YYYY-MM-DDTHH:MM:SSZ
|
||||
ISO8601_UTC_PATTERN = re.compile(r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z$")
|
||||
|
||||
# Season format patterns
|
||||
SEASON_SPLIT_PATTERN = re.compile(r"^\d{4}-\d{2}$") # e.g., "2025-26"
|
||||
SEASON_SINGLE_PATTERN = re.compile(r"^\d{4}$") # e.g., "2025"
|
||||
|
||||
|
||||
@dataclass
|
||||
class FieldSpec:
|
||||
"""Specification for a field in the canonical schema."""
|
||||
|
||||
name: str
|
||||
required: bool
|
||||
field_type: Union[type, tuple]
|
||||
validator: Optional[Callable] = None
|
||||
|
||||
|
||||
# Schema definitions matching Swift structs in BootstrapService.swift
|
||||
|
||||
STADIUM_SCHEMA: list[FieldSpec] = [
|
||||
FieldSpec("canonical_id", required=True, field_type=str),
|
||||
FieldSpec("name", required=True, field_type=str),
|
||||
FieldSpec("city", required=True, field_type=str),
|
||||
FieldSpec("state", required=True, field_type=str),
|
||||
FieldSpec("latitude", required=True, field_type=(int, float)),
|
||||
FieldSpec("longitude", required=True, field_type=(int, float)),
|
||||
FieldSpec("capacity", required=True, field_type=int),
|
||||
FieldSpec("sport", required=True, field_type=str),
|
||||
FieldSpec("primary_team_abbrevs", required=True, field_type=list),
|
||||
FieldSpec("year_opened", required=False, field_type=(int, type(None))),
|
||||
]
|
||||
|
||||
TEAM_SCHEMA: list[FieldSpec] = [
|
||||
FieldSpec("canonical_id", required=True, field_type=str),
|
||||
FieldSpec("name", required=True, field_type=str),
|
||||
FieldSpec("abbreviation", required=True, field_type=str),
|
||||
FieldSpec("sport", required=True, field_type=str),
|
||||
FieldSpec("city", required=True, field_type=str),
|
||||
FieldSpec("stadium_canonical_id", required=True, field_type=str),
|
||||
FieldSpec("conference_id", required=False, field_type=(str, type(None))),
|
||||
FieldSpec("division_id", required=False, field_type=(str, type(None))),
|
||||
FieldSpec("primary_color", required=False, field_type=(str, type(None))),
|
||||
FieldSpec("secondary_color", required=False, field_type=(str, type(None))),
|
||||
]
|
||||
|
||||
GAME_SCHEMA: list[FieldSpec] = [
|
||||
FieldSpec("canonical_id", required=True, field_type=str),
|
||||
FieldSpec("sport", required=True, field_type=str),
|
||||
FieldSpec(
|
||||
"season",
|
||||
required=True,
|
||||
field_type=str,
|
||||
validator=lambda v: SEASON_SPLIT_PATTERN.match(v) or SEASON_SINGLE_PATTERN.match(v),
|
||||
),
|
||||
FieldSpec(
|
||||
"game_datetime_utc",
|
||||
required=True,
|
||||
field_type=str,
|
||||
validator=lambda v: ISO8601_UTC_PATTERN.match(v),
|
||||
),
|
||||
FieldSpec("home_team_canonical_id", required=True, field_type=str),
|
||||
FieldSpec("away_team_canonical_id", required=True, field_type=str),
|
||||
FieldSpec("stadium_canonical_id", required=True, field_type=str),
|
||||
FieldSpec("is_playoff", required=True, field_type=bool),
|
||||
FieldSpec("broadcast", required=False, field_type=(str, type(None))),
|
||||
]
|
||||
|
||||
|
||||
def validate_field(data: dict[str, Any], spec: FieldSpec) -> list[str]:
|
||||
"""Validate a single field against its specification.
|
||||
|
||||
Args:
|
||||
data: The dictionary to validate
|
||||
spec: The field specification
|
||||
|
||||
Returns:
|
||||
List of error messages (empty if valid)
|
||||
"""
|
||||
errors = []
|
||||
|
||||
if spec.name not in data:
|
||||
if spec.required:
|
||||
errors.append(f"Missing required field: {spec.name}")
|
||||
return errors
|
||||
|
||||
value = data[spec.name]
|
||||
|
||||
# Check type
|
||||
if not isinstance(value, spec.field_type):
|
||||
expected = spec.field_type.__name__ if isinstance(spec.field_type, type) else str(spec.field_type)
|
||||
actual = type(value).__name__
|
||||
errors.append(f"Field '{spec.name}' has wrong type: expected {expected}, got {actual} (value: {value!r})")
|
||||
return errors
|
||||
|
||||
# Check custom validator
|
||||
if spec.validator and value is not None:
|
||||
if not spec.validator(value):
|
||||
errors.append(f"Field '{spec.name}' failed validation: {value!r}")
|
||||
|
||||
return errors
|
||||
|
||||
|
||||
def validate_canonical_stadium(data: dict[str, Any]) -> list[str]:
|
||||
"""Validate a canonical stadium dictionary.
|
||||
|
||||
Args:
|
||||
data: Stadium dictionary from to_canonical_dict()
|
||||
|
||||
Returns:
|
||||
List of error messages (empty if valid)
|
||||
"""
|
||||
errors = []
|
||||
for spec in STADIUM_SCHEMA:
|
||||
errors.extend(validate_field(data, spec))
|
||||
|
||||
# Additional validation: primary_team_abbrevs should contain strings
|
||||
if "primary_team_abbrevs" in data and isinstance(data["primary_team_abbrevs"], list):
|
||||
for i, abbrev in enumerate(data["primary_team_abbrevs"]):
|
||||
if not isinstance(abbrev, str):
|
||||
errors.append(f"primary_team_abbrevs[{i}] must be string, got {type(abbrev).__name__}")
|
||||
|
||||
return errors
|
||||
|
||||
|
||||
def validate_canonical_team(data: dict[str, Any]) -> list[str]:
|
||||
"""Validate a canonical team dictionary.
|
||||
|
||||
Args:
|
||||
data: Team dictionary from to_canonical_dict()
|
||||
|
||||
Returns:
|
||||
List of error messages (empty if valid)
|
||||
"""
|
||||
errors = []
|
||||
for spec in TEAM_SCHEMA:
|
||||
errors.extend(validate_field(data, spec))
|
||||
return errors
|
||||
|
||||
|
||||
def validate_canonical_game(data: dict[str, Any]) -> list[str]:
|
||||
"""Validate a canonical game dictionary.
|
||||
|
||||
Args:
|
||||
data: Game dictionary from to_canonical_dict()
|
||||
|
||||
Returns:
|
||||
List of error messages (empty if valid)
|
||||
"""
|
||||
errors = []
|
||||
for spec in GAME_SCHEMA:
|
||||
errors.extend(validate_field(data, spec))
|
||||
return errors
|
||||
|
||||
|
||||
def validate_and_raise(data: dict[str, Any], model_type: str) -> None:
|
||||
"""Validate a canonical dictionary and raise on error.
|
||||
|
||||
Args:
|
||||
data: Dictionary from to_canonical_dict()
|
||||
model_type: One of 'stadium', 'team', 'game'
|
||||
|
||||
Raises:
|
||||
SchemaValidationError: If validation fails
|
||||
ValueError: If model_type is unknown
|
||||
"""
|
||||
validators = {
|
||||
"stadium": validate_canonical_stadium,
|
||||
"team": validate_canonical_team,
|
||||
"game": validate_canonical_game,
|
||||
}
|
||||
|
||||
if model_type not in validators:
|
||||
raise ValueError(f"Unknown model type: {model_type}")
|
||||
|
||||
errors = validators[model_type](data)
|
||||
if errors:
|
||||
raise SchemaValidationError(model_type, errors)
|
||||
|
||||
|
||||
def validate_batch(
|
||||
items: list[dict[str, Any]],
|
||||
model_type: str,
|
||||
fail_fast: bool = True,
|
||||
) -> list[tuple[int, list[str]]]:
|
||||
"""Validate a batch of canonical dictionaries.
|
||||
|
||||
Args:
|
||||
items: List of dictionaries from to_canonical_dict()
|
||||
model_type: One of 'stadium', 'team', 'game'
|
||||
fail_fast: If True, raise on first error; if False, collect all errors
|
||||
|
||||
Returns:
|
||||
List of (index, errors) tuples for items with validation errors
|
||||
|
||||
Raises:
|
||||
SchemaValidationError: If fail_fast=True and validation fails
|
||||
"""
|
||||
validators = {
|
||||
"stadium": validate_canonical_stadium,
|
||||
"team": validate_canonical_team,
|
||||
"game": validate_canonical_game,
|
||||
}
|
||||
|
||||
if model_type not in validators:
|
||||
raise ValueError(f"Unknown model type: {model_type}")
|
||||
|
||||
validator = validators[model_type]
|
||||
all_errors = []
|
||||
|
||||
for i, item in enumerate(items):
|
||||
errors = validator(item)
|
||||
if errors:
|
||||
if fail_fast:
|
||||
raise SchemaValidationError(
|
||||
model_type,
|
||||
[f"Item {i}: {e}" for e in errors],
|
||||
)
|
||||
all_errors.append((i, errors))
|
||||
|
||||
return all_errors
|
||||
10
TO-DOS.md
10
TO-DOS.md
@@ -1,12 +1,16 @@
|
||||
question: do we need sync schedules anymore in settings
|
||||
//questions
|
||||
|
||||
|
||||
// new dev
|
||||
- Notification reminders - "Your trip starts in 3 days"
|
||||
|
||||
// enhancements
|
||||
- need achievements for every supported league (how does this work with adding new sports in backed, might have to push with app update so if not achievements exist for that sport don’t show it as an option)
|
||||
- Preseason baseball
|
||||
|
||||
// bugs
|
||||
- fucking game show at 7 am ... the fuck?
|
||||
all all trips view when choosing "packed" "moderate" "relaxed" the capsule the option is in does a weird animation that looks off.
|
||||
- group poll refreshed every time I go to screen, should update in bg and pull to refresh?
|
||||
- all all trips view when choosing "packed" "moderate" "relaxed" the capsule the option is in does a weird animation that looks off.
|
||||
- When there are no save trips Home Screen looks empty
|
||||
- Trip detail when searching and trip detail after saving do not match
|
||||
|
||||
|
||||
503
docs/DATA_ARCHITECTURE.md
Normal file
503
docs/DATA_ARCHITECTURE.md
Normal file
@@ -0,0 +1,503 @@
|
||||
# SportsTime Data Architecture
|
||||
|
||||
> A plain-English guide for understanding how data flows through the app, from source to screen.
|
||||
|
||||
## Quick Summary
|
||||
|
||||
**The Big Picture:**
|
||||
1. **Python scripts** scrape schedules from sports websites
|
||||
2. **JSON files** get bundled into the app (so it works offline on day one)
|
||||
3. **CloudKit** syncs updates in the background (so users get fresh data)
|
||||
4. **AppDataProvider.shared** is the single source of truth the app uses
|
||||
|
||||
---
|
||||
|
||||
## Part 1: What Data Do We Have?
|
||||
|
||||
### Core Data Types
|
||||
|
||||
| Data Type | Description | Count | Update Frequency |
|
||||
|-----------|-------------|-------|------------------|
|
||||
| **Stadiums** | Venues where games are played | ~178 total | Rarely (new stadium every few years) |
|
||||
| **Teams** | Professional sports teams | ~180 total | Rarely (expansion, relocation) |
|
||||
| **Games** | Scheduled matches | ~5,000/season | Daily during season |
|
||||
| **League Structure** | Conferences, divisions | ~50 entries | Rarely (realignment) |
|
||||
| **Aliases** | Historical names (old stadium names, team relocations) | ~100 | As needed |
|
||||
|
||||
### Data by Sport
|
||||
|
||||
| Sport | Teams | Stadiums | Divisions | Conferences |
|
||||
|-------|-------|----------|-----------|-------------|
|
||||
| MLB | 30 | 30 | 6 | 2 |
|
||||
| NBA | 30 | 30 | 6 | 2 |
|
||||
| NHL | 32 | 32 | 4 | 2 |
|
||||
| NFL | 32 | 30* | 8 | 2 |
|
||||
| MLS | 30 | 30 | 0 | 2 |
|
||||
| WNBA | 13 | 13 | 0 | 2 |
|
||||
| NWSL | 13 | 13 | 0 | 0 |
|
||||
|
||||
*NFL: Giants/Jets share MetLife Stadium; Rams/Chargers share SoFi Stadium
|
||||
|
||||
---
|
||||
|
||||
## Part 2: Where Does Data Live?
|
||||
|
||||
Data exists in **four places**, each serving a different purpose:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ │
|
||||
│ 1. BUNDLED JSON FILES (App Bundle) │
|
||||
│ └─ Ships with app, works offline on first launch │
|
||||
│ │
|
||||
│ 2. SWIFTDATA (Local Database) │
|
||||
│ └─ Fast local storage, persists between launches │
|
||||
│ │
|
||||
│ 3. CLOUDKIT (Apple's Cloud) │
|
||||
│ └─ Remote sync, shared across all users │
|
||||
│ │
|
||||
│ 4. APPDATAPROVIDER (In-Memory Cache) │
|
||||
│ └─ What the app actually uses at runtime │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### How They Work Together
|
||||
|
||||
```
|
||||
┌──────────────────┐
|
||||
│ Python Scripts │
|
||||
│ (scrape data) │
|
||||
└────────┬─────────┘
|
||||
│
|
||||
┌──────────────┴──────────────┐
|
||||
│ │
|
||||
▼ ▼
|
||||
┌────────────────┐ ┌────────────────┐
|
||||
│ JSON Files │ │ CloudKit │
|
||||
│ (bundled in │ │ (remote │
|
||||
│ app) │ │ sync) │
|
||||
└───────┬────────┘ └───────┬────────┘
|
||||
│ │
|
||||
│ First launch │ Background sync
|
||||
│ bootstrap │ (ongoing)
|
||||
▼ ▼
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ SwiftData │
|
||||
│ (local database) │
|
||||
└────────────────────┬────────────────────────┘
|
||||
│
|
||||
│ App reads from here
|
||||
▼
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ AppDataProvider.shared │
|
||||
│ (single source of truth) │
|
||||
└────────────────────┬────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ Features & Views │
|
||||
│ (Trip planning, Progress tracking, etc.) │
|
||||
└─────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 3: The JSON Files (What Ships With the App)
|
||||
|
||||
Located in: `SportsTime/Resources/`
|
||||
|
||||
### File Inventory
|
||||
|
||||
| File | Contains | Updated By |
|
||||
|------|----------|------------|
|
||||
| `stadiums_canonical.json` | All stadium data | `generate_missing_data.py` |
|
||||
| `teams_canonical.json` | All team data | `generate_missing_data.py` |
|
||||
| `games_canonical.json` | Game schedules | `scrape_schedules.py` |
|
||||
| `league_structure.json` | Conferences & divisions | Manual edit |
|
||||
| `stadium_aliases.json` | Historical stadium names | Manual edit |
|
||||
| `team_aliases.json` | Historical team names | Manual edit |
|
||||
|
||||
### Why Bundle JSON?
|
||||
|
||||
1. **Offline-first**: App works immediately, even without internet
|
||||
2. **Fast startup**: No network delay on first launch
|
||||
3. **Fallback**: If CloudKit is down, app still functions
|
||||
4. **Testing**: Deterministic data for development
|
||||
|
||||
### Example: Stadium Record
|
||||
|
||||
```json
|
||||
{
|
||||
"canonical_id": "stadium_mlb_fenway_park",
|
||||
"name": "Fenway Park",
|
||||
"city": "Boston",
|
||||
"state": "Massachusetts",
|
||||
"latitude": 42.3467,
|
||||
"longitude": -71.0972,
|
||||
"capacity": 37755,
|
||||
"sport": "MLB",
|
||||
"year_opened": 1912,
|
||||
"primary_team_abbrevs": ["BOS"]
|
||||
}
|
||||
```
|
||||
|
||||
**Key concept: Canonical IDs**
|
||||
- Every entity has a unique, permanent ID
|
||||
- Format: `{type}_{sport}_{identifier}`
|
||||
- Examples: `team_nba_lal`, `stadium_nfl_sofi_stadium`, `game_mlb_2026_20260401_bos_nyy`
|
||||
- IDs never change, even if names do (that's what aliases are for)
|
||||
|
||||
---
|
||||
|
||||
## Part 4: The Python Scripts (How Data Gets Updated)
|
||||
|
||||
Located in: `Scripts/`
|
||||
|
||||
### Script Overview
|
||||
|
||||
| Script | Purpose | When to Use |
|
||||
|--------|---------|-------------|
|
||||
| `scrape_schedules.py` | Fetch game schedules from sports-reference sites | New season starts |
|
||||
| `generate_missing_data.py` | Add teams/stadiums for new sports | Adding new league |
|
||||
| `sportstime_parser/cli.py` | Full pipeline: scrape → normalize → upload | Production updates |
|
||||
|
||||
### Common Workflows
|
||||
|
||||
#### 1. Adding a New Season's Schedule
|
||||
|
||||
```bash
|
||||
cd Scripts
|
||||
|
||||
# Scrape all sports for 2026 season
|
||||
python scrape_schedules.py --sport all --season 2026
|
||||
|
||||
# Or one sport at a time
|
||||
python scrape_schedules.py --sport nba --season 2026
|
||||
python scrape_schedules.py --sport mlb --season 2026
|
||||
```
|
||||
|
||||
**Output:** Updates `games_canonical.json` with new games
|
||||
|
||||
#### 2. Adding a New Sport or League
|
||||
|
||||
```bash
|
||||
cd Scripts
|
||||
|
||||
# Generate missing teams/stadiums
|
||||
python generate_missing_data.py
|
||||
|
||||
# This will:
|
||||
# 1. Check existing teams_canonical.json
|
||||
# 2. Add missing teams with proper IDs
|
||||
# 3. Add missing stadiums
|
||||
# 4. Update conference/division assignments
|
||||
```
|
||||
|
||||
**After running:** Copy output files to Resources folder:
|
||||
```bash
|
||||
cp output/teams_canonical.json ../SportsTime/Resources/
|
||||
cp output/stadiums_canonical.json ../SportsTime/Resources/
|
||||
```
|
||||
|
||||
#### 3. Updating League Structure (Manual)
|
||||
|
||||
Edit `SportsTime/Resources/league_structure.json` directly:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "nfl_afc_east",
|
||||
"sport": "NFL",
|
||||
"type": "division",
|
||||
"name": "AFC East",
|
||||
"abbreviation": "AFC East",
|
||||
"parent_id": "nfl_afc",
|
||||
"display_order": 1
|
||||
}
|
||||
```
|
||||
|
||||
**Also update:** `SportsTime/Core/Models/Domain/Division.swift` to match
|
||||
|
||||
#### 4. Uploading to CloudKit (Production)
|
||||
|
||||
```bash
|
||||
cd Scripts
|
||||
|
||||
# Upload all canonical data to CloudKit
|
||||
python -m sportstime_parser upload all
|
||||
|
||||
# Or specific entity type
|
||||
python -m sportstime_parser upload games
|
||||
python -m sportstime_parser upload teams
|
||||
```
|
||||
|
||||
**Requires:** CloudKit credentials configured in `sportstime_parser/config.py`
|
||||
|
||||
---
|
||||
|
||||
## Part 5: How the App Uses This Data
|
||||
|
||||
### Startup Flow
|
||||
|
||||
```
|
||||
1. App launches
|
||||
↓
|
||||
2. Check: Is this first launch?
|
||||
↓
|
||||
3. YES → BootstrapService loads bundled JSON into SwiftData
|
||||
NO → Skip to step 4
|
||||
↓
|
||||
4. AppDataProvider.configure(modelContext)
|
||||
↓
|
||||
5. AppDataProvider.loadInitialData()
|
||||
- Reads from SwiftData
|
||||
- Populates in-memory cache
|
||||
↓
|
||||
6. App ready! (works offline)
|
||||
↓
|
||||
7. Background: CanonicalSyncService.syncAll()
|
||||
- Fetches updates from CloudKit
|
||||
- Merges into SwiftData
|
||||
- Refreshes AppDataProvider cache
|
||||
```
|
||||
|
||||
### Accessing Data in Code
|
||||
|
||||
```swift
|
||||
// ✅ CORRECT - Always use AppDataProvider
|
||||
let stadiums = AppDataProvider.shared.stadiums
|
||||
let teams = AppDataProvider.shared.teams
|
||||
let games = try await AppDataProvider.shared.filterGames(
|
||||
sports: [.nba, .mlb],
|
||||
startDate: Date(),
|
||||
endDate: Date().addingTimeInterval(86400 * 30)
|
||||
)
|
||||
|
||||
// ❌ WRONG - Never bypass AppDataProvider
|
||||
let stadiums = try await CloudKitService.shared.fetchStadiums()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 6: What Can Be Updated and How
|
||||
|
||||
### Update Matrix
|
||||
|
||||
| What | Bundled JSON | CloudKit | Swift Code | Requires App Update |
|
||||
|------|-------------|----------|------------|---------------------|
|
||||
| Add new games | ✅ | ✅ | No | No (CloudKit sync) |
|
||||
| Change stadium name | ✅ + alias | ✅ | No | No (CloudKit sync) |
|
||||
| Add new team | ✅ | ✅ | No | No (CloudKit sync) |
|
||||
| Add new sport | ✅ | ✅ | ✅ Division.swift | **YES** |
|
||||
| Add division | ✅ | ✅ | ✅ Division.swift | **YES** |
|
||||
| Add achievements | N/A | N/A | ✅ AchievementDefinitions.swift | **YES** |
|
||||
| Change team colors | ✅ | ✅ | No | No (CloudKit sync) |
|
||||
|
||||
### Why Some Changes Need App Updates
|
||||
|
||||
**CloudKit CAN handle:**
|
||||
- New records (games, teams, stadiums)
|
||||
- Updated records (name changes, etc.)
|
||||
- Soft deletes (deprecation flags)
|
||||
|
||||
**CloudKit CANNOT handle:**
|
||||
- New enum cases in Swift (Sport enum)
|
||||
- New achievement definitions (compiled into app)
|
||||
- UI for new sports (views reference Sport enum)
|
||||
- Division/Conference structure (static in Division.swift)
|
||||
|
||||
### The "New Sport" Problem
|
||||
|
||||
If you add a new sport via CloudKit only:
|
||||
1. ❌ App won't recognize the sport enum value
|
||||
2. ❌ No achievements defined for that sport
|
||||
3. ❌ UI filters won't show the sport
|
||||
4. ❌ Division.swift won't have the structure
|
||||
|
||||
**Solution:** New sports require:
|
||||
1. Add Sport enum case
|
||||
2. Add Division.swift entries
|
||||
3. Add AchievementDefinitions entries
|
||||
4. Bundle JSON with initial data
|
||||
5. Ship app update
|
||||
6. THEN CloudKit can sync ongoing changes
|
||||
|
||||
---
|
||||
|
||||
## Part 7: Aliases (Handling Historical Changes)
|
||||
|
||||
### Stadium Aliases
|
||||
|
||||
When a stadium is renamed (e.g., "Candlestick Park" → "3Com Park" → "Monster Park"):
|
||||
|
||||
```json
|
||||
// stadium_aliases.json
|
||||
{
|
||||
"alias_name": "Candlestick Park",
|
||||
"stadium_canonical_id": "stadium_nfl_candlestick_park",
|
||||
"valid_from": "1960-01-01",
|
||||
"valid_until": "1995-12-31"
|
||||
},
|
||||
{
|
||||
"alias_name": "3Com Park",
|
||||
"stadium_canonical_id": "stadium_nfl_candlestick_park",
|
||||
"valid_from": "1996-01-01",
|
||||
"valid_until": "2002-12-31"
|
||||
}
|
||||
```
|
||||
|
||||
**Why this matters:**
|
||||
- Old photos tagged "Candlestick Park" still resolve to correct stadium
|
||||
- Historical game data uses name from that era
|
||||
- User visits tracked correctly regardless of current name
|
||||
|
||||
### Team Aliases
|
||||
|
||||
When teams relocate or rebrand:
|
||||
|
||||
```json
|
||||
// team_aliases.json
|
||||
{
|
||||
"id": "alias_nba_sea_supersonics",
|
||||
"team_canonical_id": "team_nba_okc",
|
||||
"alias_type": "name",
|
||||
"alias_value": "Seattle SuperSonics",
|
||||
"valid_from": "1967-01-01",
|
||||
"valid_until": "2008-06-30"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 8: Sync State & Offline Behavior
|
||||
|
||||
### SyncState Tracking
|
||||
|
||||
The app tracks sync status in SwiftData:
|
||||
|
||||
```swift
|
||||
struct SyncState {
|
||||
var bootstrapCompleted: Bool // Has initial JSON been loaded?
|
||||
var lastSuccessfulSync: Date? // When did CloudKit last succeed?
|
||||
var syncInProgress: Bool // Is sync running now?
|
||||
var consecutiveFailures: Int // How many failures in a row?
|
||||
}
|
||||
```
|
||||
|
||||
### Offline Scenarios
|
||||
|
||||
| Scenario | Behavior |
|
||||
|----------|----------|
|
||||
| First launch, no internet | Bootstrap from bundled JSON, app works |
|
||||
| Returning user, no internet | Uses last synced SwiftData, app works |
|
||||
| CloudKit partially fails | Partial sync saved, retry later |
|
||||
| CloudKit down for days | App continues with local data |
|
||||
| User deletes app, reinstalls | Fresh bootstrap from bundled JSON |
|
||||
|
||||
---
|
||||
|
||||
## Part 9: Quick Reference - File Locations
|
||||
|
||||
### JSON Data Files
|
||||
```
|
||||
SportsTime/Resources/
|
||||
├── stadiums_canonical.json # Stadium metadata
|
||||
├── teams_canonical.json # Team metadata
|
||||
├── games_canonical.json # Game schedules
|
||||
├── league_structure.json # Divisions & conferences
|
||||
├── stadium_aliases.json # Historical stadium names
|
||||
└── team_aliases.json # Historical team names
|
||||
```
|
||||
|
||||
### Swift Code
|
||||
```
|
||||
SportsTime/Core/
|
||||
├── Models/
|
||||
│ ├── Domain/
|
||||
│ │ ├── Stadium.swift # Stadium struct
|
||||
│ │ ├── Team.swift # Team struct
|
||||
│ │ ├── Game.swift # Game struct
|
||||
│ │ ├── Division.swift # LeagueStructure + static data
|
||||
│ │ └── AchievementDefinitions.swift # Achievement registry
|
||||
│ ├── Local/
|
||||
│ │ └── CanonicalModels.swift # SwiftData models
|
||||
│ └── CloudKit/
|
||||
│ └── CKModels.swift # CloudKit record types
|
||||
└── Services/
|
||||
├── DataProvider.swift # AppDataProvider (source of truth)
|
||||
├── BootstrapService.swift # First-launch JSON → SwiftData
|
||||
├── CanonicalSyncService.swift # CloudKit → SwiftData sync
|
||||
└── CloudKitService.swift # CloudKit API wrapper
|
||||
```
|
||||
|
||||
### Python Scripts
|
||||
```
|
||||
Scripts/
|
||||
├── scrape_schedules.py # Game schedule scraper
|
||||
├── generate_missing_data.py # Team/stadium generator
|
||||
├── sportstime_parser/
|
||||
│ ├── cli.py # Main CLI
|
||||
│ ├── scrapers/ # Sport-specific scrapers
|
||||
│ ├── normalizers/ # Data standardization
|
||||
│ └── uploaders/ # CloudKit upload
|
||||
└── output/ # Generated JSON files
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Part 10: Common Tasks Checklist
|
||||
|
||||
### Before a New Season
|
||||
|
||||
- [ ] Run `scrape_schedules.py --sport all --season YYYY`
|
||||
- [ ] Verify `games_canonical.json` has expected game count
|
||||
- [ ] Copy to Resources folder
|
||||
- [ ] Test app locally
|
||||
- [ ] Upload to CloudKit: `python -m sportstime_parser upload games`
|
||||
|
||||
### Adding a New Stadium
|
||||
|
||||
1. [ ] Add to `stadiums_canonical.json`
|
||||
2. [ ] Add team reference in `teams_canonical.json`
|
||||
3. [ ] Copy both to Resources folder
|
||||
4. [ ] Upload to CloudKit
|
||||
|
||||
### Stadium Renamed
|
||||
|
||||
1. [ ] Add alias to `stadium_aliases.json` with date range
|
||||
2. [ ] Update stadium name in `stadiums_canonical.json`
|
||||
3. [ ] Copy both to Resources folder
|
||||
4. [ ] Upload to CloudKit
|
||||
|
||||
### Adding a New Sport (App Update Required)
|
||||
|
||||
1. [ ] Add Sport enum case in `Sport.swift`
|
||||
2. [ ] Add divisions/conferences to `Division.swift`
|
||||
3. [ ] Add achievements to `AchievementDefinitions.swift`
|
||||
4. [ ] Add teams to `teams_canonical.json`
|
||||
5. [ ] Add stadiums to `stadiums_canonical.json`
|
||||
6. [ ] Add league structure to `league_structure.json`
|
||||
7. [ ] Run `generate_missing_data.py` to validate
|
||||
8. [ ] Copy all JSON to Resources folder
|
||||
9. [ ] Build and test
|
||||
10. [ ] Ship app update
|
||||
11. [ ] Upload data to CloudKit for sync
|
||||
|
||||
---
|
||||
|
||||
## Glossary
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **Canonical ID** | Permanent, unique identifier for an entity (e.g., `stadium_mlb_fenway_park`) |
|
||||
| **Bootstrap** | First-launch process that loads bundled JSON into SwiftData |
|
||||
| **Delta sync** | Only fetching changes since last sync (not full data) |
|
||||
| **AppDataProvider** | The single source of truth for all canonical data in the app |
|
||||
| **SwiftData** | Apple's local database framework (replacement for Core Data) |
|
||||
| **CloudKit** | Apple's cloud database service |
|
||||
| **Alias** | Historical name mapping (old name → canonical ID) |
|
||||
| **Soft delete** | Marking record as deprecated instead of actually deleting |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: January 2026*
|
||||
171
docs/STADIUM_IDENTITY_SYSTEM.md
Normal file
171
docs/STADIUM_IDENTITY_SYSTEM.md
Normal file
@@ -0,0 +1,171 @@
|
||||
# Stadium Identity System
|
||||
|
||||
How SportsTime handles stadium renames, new stadiums, and team relocations while preserving user data.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
The system uses **immutable canonical IDs** for all references and **mutable display data** for names/locations. User data only stores canonical IDs, so it never needs migration when real-world changes occur.
|
||||
|
||||
```
|
||||
User data → canonical IDs (stable) → current display names (via StadiumIdentityService)
|
||||
```
|
||||
|
||||
## Data Models
|
||||
|
||||
### Canonical Data (Synced from CloudKit)
|
||||
|
||||
```swift
|
||||
// Core identity - canonicalId NEVER changes
|
||||
CanonicalStadium:
|
||||
canonicalId: "stadium_nba_los_angeles_lakers" // Immutable
|
||||
name: "Crypto.com Arena" // Can change
|
||||
city: "Los Angeles" // Can change
|
||||
deprecatedAt: nil // Set when demolished
|
||||
|
||||
CanonicalTeam:
|
||||
canonicalId: "team_mlb_athletics" // Immutable
|
||||
stadiumCanonicalId: "stadium_mlb_las_vegas" // Can change (relocation)
|
||||
city: "Las Vegas" // Can change
|
||||
deprecatedAt: nil // Set if team ceases
|
||||
|
||||
// Historical name tracking
|
||||
StadiumAlias:
|
||||
aliasName: "staples center" // Lowercase for matching
|
||||
stadiumCanonicalId: "stadium_nba_los_angeles_lakers"
|
||||
validFrom: 2021-01-01
|
||||
validUntil: 2022-12-25
|
||||
|
||||
TeamAlias:
|
||||
aliasValue: "Oakland"
|
||||
aliasType: .city
|
||||
teamCanonicalId: "team_mlb_athletics"
|
||||
validUntil: 2024-12-31
|
||||
```
|
||||
|
||||
### User Data (Local Only)
|
||||
|
||||
```swift
|
||||
StadiumVisit:
|
||||
stadiumId: String // Canonical ID - stable reference
|
||||
stadiumNameAtVisit: String // Frozen at visit time for history
|
||||
homeTeamId: String? // Canonical ID
|
||||
awayTeamId: String? // Canonical ID
|
||||
homeTeamName: String? // Frozen at visit time
|
||||
awayTeamName: String? // Frozen at visit time
|
||||
|
||||
Achievement:
|
||||
achievementTypeId: String // League structure ID (e.g., "mlb_al_west")
|
||||
sport: String?
|
||||
visitIdsSnapshot: Data // UUIDs of StadiumVisits - immutable
|
||||
```
|
||||
|
||||
## Scenario Handling
|
||||
|
||||
### Stadium Renames
|
||||
|
||||
**Example:** Staples Center → Crypto.com Arena (December 2021)
|
||||
|
||||
**What happens:**
|
||||
1. CloudKit updates `CanonicalStadium.name` to "Crypto.com Arena"
|
||||
2. CloudKit adds `StadiumAlias` for "Staples Center" with validity dates
|
||||
3. Next sync updates local SwiftData
|
||||
4. `canonicalId` remains `"stadium_nba_los_angeles_lakers"`
|
||||
|
||||
**User impact:** None
|
||||
- Existing `StadiumVisit.stadiumId` still resolves correctly
|
||||
- `StadiumVisit.stadiumNameAtVisit` preserves "Staples Center" for historical display
|
||||
- Searching "Staples Center" still finds the stadium via alias lookup
|
||||
|
||||
### New Stadium Built
|
||||
|
||||
**Example:** New Las Vegas A's stadium opens in 2028
|
||||
|
||||
**What happens:**
|
||||
1. CloudKit adds new `CanonicalStadium` record with new canonical ID
|
||||
2. Next sync creates record in SwiftData
|
||||
3. Deterministic UUID generated: `SHA256(canonicalId) → UUID`
|
||||
4. Stadium appears in app automatically
|
||||
|
||||
**User impact:** None
|
||||
- New stadium available for visits and achievements
|
||||
- No migration needed
|
||||
|
||||
### Team Relocates
|
||||
|
||||
**Example:** Oakland A's → Las Vegas A's (2024-2028)
|
||||
|
||||
**What happens:**
|
||||
1. CloudKit updates `CanonicalTeam`:
|
||||
- `canonicalId` stays `"team_mlb_athletics"` (never changes)
|
||||
- `stadiumCanonicalId` updated to Las Vegas stadium
|
||||
- `city` updated to "Las Vegas"
|
||||
2. CloudKit adds `TeamAlias` for "Oakland" with end date
|
||||
3. Old Oakland Coliseum gets `deprecatedAt` timestamp (soft delete)
|
||||
|
||||
**User impact:** None
|
||||
- Old visits preserved: `StadiumVisit.stadiumId` = Oakland Coliseum (still valid)
|
||||
- Old visits show historical context: "Oakland A's at Oakland Coliseum"
|
||||
- Achievements adapt: "Complete AL West" now requires Las Vegas stadium
|
||||
|
||||
### Stadium Demolished / Team Ceases
|
||||
|
||||
**What happens:**
|
||||
- Record gets `deprecatedAt` timestamp (soft delete, never hard delete)
|
||||
- Filtered from active queries: `predicate { $0.deprecatedAt == nil }`
|
||||
- Historical data fully preserved
|
||||
|
||||
**User impact:** None
|
||||
- Visits remain in history, just not in active stadium lists
|
||||
- Achievements not revoked - you earned it, you keep it
|
||||
|
||||
## Identity Resolution
|
||||
|
||||
`StadiumIdentityService` handles all lookups:
|
||||
|
||||
```swift
|
||||
// Find canonical ID from any name (current or historical)
|
||||
StadiumIdentityService.shared.canonicalId(forName: "Staples Center")
|
||||
// → "stadium_nba_los_angeles_lakers"
|
||||
|
||||
// Get current display name from canonical ID
|
||||
StadiumIdentityService.shared.currentName(forCanonicalId: "stadium_nba_los_angeles_lakers")
|
||||
// → "Crypto.com Arena"
|
||||
|
||||
// Get all historical names
|
||||
StadiumIdentityService.shared.allNames(forCanonicalId: "stadium_nba_los_angeles_lakers")
|
||||
// → ["Crypto.com Arena", "Staples Center", "Great Western Forum"]
|
||||
```
|
||||
|
||||
## Sync Safety
|
||||
|
||||
During `CanonicalSyncService.mergeStadium()`:
|
||||
|
||||
```swift
|
||||
if let existing = try context.fetch(descriptor).first {
|
||||
// PRESERVE user customizations
|
||||
let savedNickname = existing.userNickname
|
||||
let savedFavorite = existing.isFavorite
|
||||
|
||||
// UPDATE system fields only
|
||||
existing.name = remote.name
|
||||
existing.city = remote.city
|
||||
// canonicalId is NOT updated - it's immutable
|
||||
|
||||
// RESTORE user customizations
|
||||
existing.userNickname = savedNickname
|
||||
existing.isFavorite = savedFavorite
|
||||
}
|
||||
```
|
||||
|
||||
## Impact Summary
|
||||
|
||||
| Scenario | User Visits | Achievements | Historical Display |
|
||||
|----------|-------------|--------------|-------------------|
|
||||
| Stadium rename | ✅ Preserved | ✅ Preserved | Shows name at visit time |
|
||||
| New stadium | N/A | Available to earn | N/A |
|
||||
| Team relocates | ✅ Preserved | ✅ Logic adapts | Shows team + old stadium |
|
||||
| Stadium demolished | ✅ Preserved | ✅ Not revoked | Marked deprecated, visible in history |
|
||||
|
||||
## Key Principle
|
||||
|
||||
**Immutable references, mutable display.** User data stores only canonical IDs. Display names are resolved at read time via `StadiumIdentityService`. This means we can update the real world without ever migrating user data.
|
||||
Reference in New Issue
Block a user