docs: add sync reliability design for CloudKit pagination and cancellation
Design covers three improvements: - Paginated CloudKit fetches to handle >400 records - Graceful cancellation with per-entity progress saving - NWPathMonitor for auto-sync on network restoration Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
304
docs/plans/2026-01-13-sync-reliability-design.md
Normal file
304
docs/plans/2026-01-13-sync-reliability-design.md
Normal file
@@ -0,0 +1,304 @@
|
||||
# Sync Reliability Design
|
||||
|
||||
**Date:** 2026-01-13
|
||||
**Status:** Approved
|
||||
**Author:** Claude Code + User collaboration
|
||||
|
||||
## Summary
|
||||
|
||||
Improve CloudKit sync reliability with three enhancements:
|
||||
1. **Pagination** - Fetch all records, not just first ~400
|
||||
2. **Cancellation** - Graceful background task expiration with progress saving
|
||||
3. **Network monitoring** - Auto-sync when connection restored
|
||||
|
||||
## Problem Statement
|
||||
|
||||
The current sync system has a critical gap: CloudKit queries return max 400 records per request, but the code discards the pagination cursor. With ~5,000 games in the dataset, most records may never sync from CloudKit.
|
||||
|
||||
Additionally, background tasks can be terminated by iOS with no graceful handling, and there's no automatic sync when network connectivity is restored.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Sync Reliability System │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────────────┐ ┌──────────────────────────────────┐ │
|
||||
│ │ NetworkMonitor │────►│ BackgroundSyncManager │ │
|
||||
│ │ (NWPathMonitor) │ │ - Cancellation token support │ │
|
||||
│ │ - Debounced │ │ - Progress persistence │ │
|
||||
│ └──────────────────┘ └──────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌──────────────────────────────────┐ │
|
||||
│ │ CloudKitService │ │
|
||||
│ │ - Paginated fetches │ │
|
||||
│ │ - CKQueryOperation + cursors │ │
|
||||
│ │ - Cancellation-aware │ │
|
||||
│ └──────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌──────────────────────────────────┐ │
|
||||
│ │ SyncState (enhanced) │ │
|
||||
│ │ - Per-entity lastSync dates │ │
|
||||
│ │ - Partial sync progress │ │
|
||||
│ └──────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Design Details
|
||||
|
||||
### 1. Paginated CloudKit Fetches
|
||||
|
||||
**New Protocol:**
|
||||
|
||||
```swift
|
||||
/// Token checked between pages to support graceful cancellation
|
||||
protocol SyncCancellationToken: Sendable {
|
||||
var isCancelled: Bool { get }
|
||||
}
|
||||
```
|
||||
|
||||
**Paginated Fetch Pattern:**
|
||||
|
||||
Each fetch method (`fetchGamesForSync`, `fetchTeamsForSync`, etc.) refactored to:
|
||||
|
||||
```swift
|
||||
func fetchGamesForSync(
|
||||
since lastSync: Date?,
|
||||
cancellationToken: SyncCancellationToken? = nil
|
||||
) async throws -> [SyncGame] {
|
||||
var allResults: [SyncGame] = []
|
||||
var cursor: CKQueryOperation.Cursor? = nil
|
||||
|
||||
repeat {
|
||||
// Check cancellation before each page
|
||||
if cancellationToken?.isCancelled == true {
|
||||
break // Return what we have so far
|
||||
}
|
||||
|
||||
let (pageResults, nextCursor) = try await fetchGamesPage(
|
||||
since: lastSync,
|
||||
cursor: cursor
|
||||
)
|
||||
allResults.append(contentsOf: pageResults)
|
||||
cursor = nextCursor
|
||||
|
||||
} while cursor != nil
|
||||
|
||||
return allResults
|
||||
}
|
||||
|
||||
private func fetchGamesPage(
|
||||
since lastSync: Date?,
|
||||
cursor: CKQueryOperation.Cursor?
|
||||
) async throws -> ([SyncGame], CKQueryOperation.Cursor?) {
|
||||
// Uses CKQueryOperation for explicit cursor control
|
||||
// Returns up to 400 records per page
|
||||
}
|
||||
```
|
||||
|
||||
**Rationale:**
|
||||
- Cancellation checked between pages (not mid-fetch)
|
||||
- Returns partial results on cancellation (not thrown away)
|
||||
- Page size of 400 balances throughput vs. memory
|
||||
|
||||
### 2. Cancellation & Progress Saving
|
||||
|
||||
**Cancellation Token Implementation:**
|
||||
|
||||
```swift
|
||||
final class BackgroundTaskCancellationToken: SyncCancellationToken, @unchecked Sendable {
|
||||
private let _isCancelled = OSAllocatedUnfairLock(initialState: false)
|
||||
|
||||
var isCancelled: Bool { _isCancelled.withLock { $0 } }
|
||||
|
||||
func cancel() { _isCancelled.withLock { $0 = true } }
|
||||
}
|
||||
```
|
||||
|
||||
**Enhanced SyncState:**
|
||||
|
||||
Add per-entity sync timestamps for partial progress:
|
||||
|
||||
```swift
|
||||
// Add to existing SyncState model
|
||||
var lastStadiumSync: Date?
|
||||
var lastTeamSync: Date?
|
||||
var lastGameSync: Date?
|
||||
var lastLeagueStructureSync: Date?
|
||||
var lastTeamAliasSync: Date?
|
||||
var lastStadiumAliasSync: Date?
|
||||
var lastSportSync: Date?
|
||||
```
|
||||
|
||||
**Modified Sync Flow:**
|
||||
|
||||
```swift
|
||||
func syncAll(context: ModelContext, cancellationToken: SyncCancellationToken? = nil) async throws -> SyncResult {
|
||||
// Sync each entity type, saving progress after each completes
|
||||
|
||||
let stadiums = try await syncStadiums(context: context, cancellationToken: cancellationToken)
|
||||
syncState.lastStadiumSync = Date() // Save progress immediately
|
||||
try context.save()
|
||||
|
||||
if cancellationToken?.isCancelled == true {
|
||||
return partialResult(stadiums: stadiums)
|
||||
}
|
||||
|
||||
let teams = try await syncTeams(context: context, cancellationToken: cancellationToken)
|
||||
syncState.lastTeamSync = Date()
|
||||
try context.save()
|
||||
|
||||
// ... continue for each entity type
|
||||
}
|
||||
```
|
||||
|
||||
**Key behaviors:**
|
||||
- Progress saved after each entity type completes
|
||||
- Cancellation returns partial results instead of throwing
|
||||
- Next sync uses per-entity timestamps to resume
|
||||
|
||||
### 3. Network Monitor
|
||||
|
||||
**NetworkMonitor Singleton:**
|
||||
|
||||
```swift
|
||||
import Network
|
||||
|
||||
@MainActor
|
||||
final class NetworkMonitor: ObservableObject {
|
||||
static let shared = NetworkMonitor()
|
||||
|
||||
@Published private(set) var isConnected: Bool = true
|
||||
@Published private(set) var connectionType: ConnectionType = .unknown
|
||||
|
||||
enum ConnectionType { case wifi, cellular, wired, unknown }
|
||||
|
||||
private let monitor = NWPathMonitor()
|
||||
private let queue = DispatchQueue(label: "NetworkMonitor")
|
||||
private var debounceTask: Task<Void, Never>?
|
||||
|
||||
private init() {
|
||||
monitor.pathUpdateHandler = { [weak self] path in
|
||||
Task { @MainActor in
|
||||
self?.handlePathUpdate(path)
|
||||
}
|
||||
}
|
||||
monitor.start(queue: queue)
|
||||
}
|
||||
|
||||
private func handlePathUpdate(_ path: NWPath) {
|
||||
let wasConnected = isConnected
|
||||
isConnected = (path.status == .satisfied)
|
||||
connectionType = detectConnectionType(path)
|
||||
|
||||
// Debounce: only trigger sync if we GAINED connection
|
||||
if !wasConnected && isConnected {
|
||||
debounceTask?.cancel()
|
||||
debounceTask = Task {
|
||||
try? await Task.sleep(for: .seconds(2.5))
|
||||
guard !Task.isCancelled else { return }
|
||||
NotificationCenter.default.post(name: .networkRestored, object: nil)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
extension Notification.Name {
|
||||
static let networkRestored = Notification.Name("NetworkRestored")
|
||||
}
|
||||
```
|
||||
|
||||
**BackgroundSyncManager Integration:**
|
||||
|
||||
```swift
|
||||
// In BackgroundSyncManager.configure(with:)
|
||||
NotificationCenter.default.addObserver(
|
||||
forName: .networkRestored,
|
||||
object: nil,
|
||||
queue: .main
|
||||
) { [weak self] _ in
|
||||
Task { @MainActor in
|
||||
await self?.performSyncIfNeeded()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Rationale:**
|
||||
- 2.5 second debounce handles WiFi<->cellular transitions
|
||||
- Only triggers on connection *gained* (not every path update)
|
||||
- Uses NotificationCenter for loose coupling
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Behavior |
|
||||
|----------|----------|
|
||||
| Cancellation mid-page | Current page completes, then stops. Partial data saved. |
|
||||
| Network drops during sync | CloudKit throws error, caught by existing handling. Per-entity progress preserved. |
|
||||
| Multiple sync triggers overlap | Existing `syncInProgress` flag prevents duplicate syncs. |
|
||||
| App killed during background task | Per-entity saves ensure most progress kept. |
|
||||
| Network flapping | 2.5s debounce collapses rapid transitions into single sync. |
|
||||
| First launch offline | Bootstrap from bundled JSON. NetworkMonitor triggers sync when connected. |
|
||||
|
||||
**Cancellation vs Error distinction:**
|
||||
|
||||
```swift
|
||||
struct SyncResult {
|
||||
// ... existing fields
|
||||
let wasCancelled: Bool // NEW: distinguish clean cancel from error
|
||||
}
|
||||
```
|
||||
|
||||
When `wasCancelled == true`:
|
||||
- Don't increment `consecutiveFailures`
|
||||
- Don't set `lastSyncError`
|
||||
- Do update per-entity timestamps
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Tests (CloudKitService)
|
||||
|
||||
| Test | Validates |
|
||||
|------|-----------|
|
||||
| `test_pagination_fetchesAllPages` | Mock returns 3 pages of 400, verifies all 1200 returned |
|
||||
| `test_pagination_stopsOnCancellation` | Cancel after page 1, verify only ~400 returned |
|
||||
| `test_pagination_handlesEmptyResult` | No records returns empty array, not error |
|
||||
| `test_pagination_handlesSinglePage` | <400 records returns without extra fetch |
|
||||
|
||||
### Unit Tests (Cancellation)
|
||||
|
||||
| Test | Validates |
|
||||
|------|-----------|
|
||||
| `test_cancellationToken_threadSafe` | Concurrent read/write doesn't crash |
|
||||
| `test_syncAll_savesProgressPerEntity` | Mock cancel after teams, verify stadiums saved but games not |
|
||||
| `test_syncAll_doesNotIncrementFailuresOnCancel` | `consecutiveFailures` unchanged after clean cancel |
|
||||
|
||||
### Unit Tests (NetworkMonitor)
|
||||
|
||||
| Test | Validates |
|
||||
|------|-----------|
|
||||
| `test_debounce_collapsesRapidChanges` | 5 changes in 1s = 1 notification |
|
||||
| `test_onlyNotifiesOnGainedConnection` | connected->disconnected doesn't trigger sync |
|
||||
| `test_cancelsDebounceIfDisconnects` | connect, disconnect before 2.5s = no notification |
|
||||
|
||||
### Integration Test
|
||||
|
||||
| Test | Validates |
|
||||
|------|-----------|
|
||||
| `test_fullSyncFlow_withMockCloudKit` | Bootstrap -> sync -> cancel -> resume picks up correctly |
|
||||
|
||||
## Files to Modify
|
||||
|
||||
| File | Changes |
|
||||
|------|---------|
|
||||
| `CloudKitService.swift` | Add pagination to all fetch methods, accept cancellation token |
|
||||
| `CanonicalSyncService.swift` | Per-entity progress saving, pass cancellation token, handle partial results |
|
||||
| `BackgroundSyncManager.swift` | Create/pass cancellation token, subscribe to network notifications |
|
||||
| `CanonicalModels.swift` | Add per-entity sync timestamps to SyncState |
|
||||
| `NetworkMonitor.swift` | **NEW FILE** - NWPathMonitor wrapper with debouncing |
|
||||
|
||||
## Migration
|
||||
|
||||
No data migration required. New `SyncState` fields will be `nil` initially, causing a full sync on first run after update (existing behavior).
|
||||
Reference in New Issue
Block a user