docs: add sync reliability design for CloudKit pagination and cancellation

Design covers three improvements:
- Paginated CloudKit fetches to handle >400 records
- Graceful cancellation with per-entity progress saving
- NWPathMonitor for auto-sync on network restoration

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-01-13 18:49:12 -06:00
parent b99c25d8f6
commit b433e1dad5

View File

@@ -0,0 +1,304 @@
# Sync Reliability Design
**Date:** 2026-01-13
**Status:** Approved
**Author:** Claude Code + User collaboration
## Summary
Improve CloudKit sync reliability with three enhancements:
1. **Pagination** - Fetch all records, not just first ~400
2. **Cancellation** - Graceful background task expiration with progress saving
3. **Network monitoring** - Auto-sync when connection restored
## Problem Statement
The current sync system has a critical gap: CloudKit queries return max 400 records per request, but the code discards the pagination cursor. With ~5,000 games in the dataset, most records may never sync from CloudKit.
Additionally, background tasks can be terminated by iOS with no graceful handling, and there's no automatic sync when network connectivity is restored.
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Sync Reliability System │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────────────────────────┐ │
│ │ NetworkMonitor │────►│ BackgroundSyncManager │ │
│ │ (NWPathMonitor) │ │ - Cancellation token support │ │
│ │ - Debounced │ │ - Progress persistence │ │
│ └──────────────────┘ └──────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────┐ │
│ │ CloudKitService │ │
│ │ - Paginated fetches │ │
│ │ - CKQueryOperation + cursors │ │
│ │ - Cancellation-aware │ │
│ └──────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────┐ │
│ │ SyncState (enhanced) │ │
│ │ - Per-entity lastSync dates │ │
│ │ - Partial sync progress │ │
│ └──────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
## Design Details
### 1. Paginated CloudKit Fetches
**New Protocol:**
```swift
/// Token checked between pages to support graceful cancellation
protocol SyncCancellationToken: Sendable {
var isCancelled: Bool { get }
}
```
**Paginated Fetch Pattern:**
Each fetch method (`fetchGamesForSync`, `fetchTeamsForSync`, etc.) refactored to:
```swift
func fetchGamesForSync(
since lastSync: Date?,
cancellationToken: SyncCancellationToken? = nil
) async throws -> [SyncGame] {
var allResults: [SyncGame] = []
var cursor: CKQueryOperation.Cursor? = nil
repeat {
// Check cancellation before each page
if cancellationToken?.isCancelled == true {
break // Return what we have so far
}
let (pageResults, nextCursor) = try await fetchGamesPage(
since: lastSync,
cursor: cursor
)
allResults.append(contentsOf: pageResults)
cursor = nextCursor
} while cursor != nil
return allResults
}
private func fetchGamesPage(
since lastSync: Date?,
cursor: CKQueryOperation.Cursor?
) async throws -> ([SyncGame], CKQueryOperation.Cursor?) {
// Uses CKQueryOperation for explicit cursor control
// Returns up to 400 records per page
}
```
**Rationale:**
- Cancellation checked between pages (not mid-fetch)
- Returns partial results on cancellation (not thrown away)
- Page size of 400 balances throughput vs. memory
### 2. Cancellation & Progress Saving
**Cancellation Token Implementation:**
```swift
final class BackgroundTaskCancellationToken: SyncCancellationToken, @unchecked Sendable {
private let _isCancelled = OSAllocatedUnfairLock(initialState: false)
var isCancelled: Bool { _isCancelled.withLock { $0 } }
func cancel() { _isCancelled.withLock { $0 = true } }
}
```
**Enhanced SyncState:**
Add per-entity sync timestamps for partial progress:
```swift
// Add to existing SyncState model
var lastStadiumSync: Date?
var lastTeamSync: Date?
var lastGameSync: Date?
var lastLeagueStructureSync: Date?
var lastTeamAliasSync: Date?
var lastStadiumAliasSync: Date?
var lastSportSync: Date?
```
**Modified Sync Flow:**
```swift
func syncAll(context: ModelContext, cancellationToken: SyncCancellationToken? = nil) async throws -> SyncResult {
// Sync each entity type, saving progress after each completes
let stadiums = try await syncStadiums(context: context, cancellationToken: cancellationToken)
syncState.lastStadiumSync = Date() // Save progress immediately
try context.save()
if cancellationToken?.isCancelled == true {
return partialResult(stadiums: stadiums)
}
let teams = try await syncTeams(context: context, cancellationToken: cancellationToken)
syncState.lastTeamSync = Date()
try context.save()
// ... continue for each entity type
}
```
**Key behaviors:**
- Progress saved after each entity type completes
- Cancellation returns partial results instead of throwing
- Next sync uses per-entity timestamps to resume
### 3. Network Monitor
**NetworkMonitor Singleton:**
```swift
import Network
@MainActor
final class NetworkMonitor: ObservableObject {
static let shared = NetworkMonitor()
@Published private(set) var isConnected: Bool = true
@Published private(set) var connectionType: ConnectionType = .unknown
enum ConnectionType { case wifi, cellular, wired, unknown }
private let monitor = NWPathMonitor()
private let queue = DispatchQueue(label: "NetworkMonitor")
private var debounceTask: Task<Void, Never>?
private init() {
monitor.pathUpdateHandler = { [weak self] path in
Task { @MainActor in
self?.handlePathUpdate(path)
}
}
monitor.start(queue: queue)
}
private func handlePathUpdate(_ path: NWPath) {
let wasConnected = isConnected
isConnected = (path.status == .satisfied)
connectionType = detectConnectionType(path)
// Debounce: only trigger sync if we GAINED connection
if !wasConnected && isConnected {
debounceTask?.cancel()
debounceTask = Task {
try? await Task.sleep(for: .seconds(2.5))
guard !Task.isCancelled else { return }
NotificationCenter.default.post(name: .networkRestored, object: nil)
}
}
}
}
extension Notification.Name {
static let networkRestored = Notification.Name("NetworkRestored")
}
```
**BackgroundSyncManager Integration:**
```swift
// In BackgroundSyncManager.configure(with:)
NotificationCenter.default.addObserver(
forName: .networkRestored,
object: nil,
queue: .main
) { [weak self] _ in
Task { @MainActor in
await self?.performSyncIfNeeded()
}
}
```
**Rationale:**
- 2.5 second debounce handles WiFi<->cellular transitions
- Only triggers on connection *gained* (not every path update)
- Uses NotificationCenter for loose coupling
## Error Handling
| Scenario | Behavior |
|----------|----------|
| Cancellation mid-page | Current page completes, then stops. Partial data saved. |
| Network drops during sync | CloudKit throws error, caught by existing handling. Per-entity progress preserved. |
| Multiple sync triggers overlap | Existing `syncInProgress` flag prevents duplicate syncs. |
| App killed during background task | Per-entity saves ensure most progress kept. |
| Network flapping | 2.5s debounce collapses rapid transitions into single sync. |
| First launch offline | Bootstrap from bundled JSON. NetworkMonitor triggers sync when connected. |
**Cancellation vs Error distinction:**
```swift
struct SyncResult {
// ... existing fields
let wasCancelled: Bool // NEW: distinguish clean cancel from error
}
```
When `wasCancelled == true`:
- Don't increment `consecutiveFailures`
- Don't set `lastSyncError`
- Do update per-entity timestamps
## Testing Strategy
### Unit Tests (CloudKitService)
| Test | Validates |
|------|-----------|
| `test_pagination_fetchesAllPages` | Mock returns 3 pages of 400, verifies all 1200 returned |
| `test_pagination_stopsOnCancellation` | Cancel after page 1, verify only ~400 returned |
| `test_pagination_handlesEmptyResult` | No records returns empty array, not error |
| `test_pagination_handlesSinglePage` | <400 records returns without extra fetch |
### Unit Tests (Cancellation)
| Test | Validates |
|------|-----------|
| `test_cancellationToken_threadSafe` | Concurrent read/write doesn't crash |
| `test_syncAll_savesProgressPerEntity` | Mock cancel after teams, verify stadiums saved but games not |
| `test_syncAll_doesNotIncrementFailuresOnCancel` | `consecutiveFailures` unchanged after clean cancel |
### Unit Tests (NetworkMonitor)
| Test | Validates |
|------|-----------|
| `test_debounce_collapsesRapidChanges` | 5 changes in 1s = 1 notification |
| `test_onlyNotifiesOnGainedConnection` | connected->disconnected doesn't trigger sync |
| `test_cancelsDebounceIfDisconnects` | connect, disconnect before 2.5s = no notification |
### Integration Test
| Test | Validates |
|------|-----------|
| `test_fullSyncFlow_withMockCloudKit` | Bootstrap -> sync -> cancel -> resume picks up correctly |
## Files to Modify
| File | Changes |
|------|---------|
| `CloudKitService.swift` | Add pagination to all fetch methods, accept cancellation token |
| `CanonicalSyncService.swift` | Per-entity progress saving, pass cancellation token, handle partial results |
| `BackgroundSyncManager.swift` | Create/pass cancellation token, subscribe to network notifications |
| `CanonicalModels.swift` | Add per-entity sync timestamps to SyncState |
| `NetworkMonitor.swift` | **NEW FILE** - NWPathMonitor wrapper with debouncing |
## Migration
No data migration required. New `SyncState` fields will be `nil` initially, causing a full sync on first run after update (existing behavior).