Design covers three improvements: - Paginated CloudKit fetches to handle >400 records - Graceful cancellation with per-entity progress saving - NWPathMonitor for auto-sync on network restoration Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
11 KiB
Sync Reliability Design
Date: 2026-01-13 Status: Approved Author: Claude Code + User collaboration
Summary
Improve CloudKit sync reliability with three enhancements:
- Pagination - Fetch all records, not just first ~400
- Cancellation - Graceful background task expiration with progress saving
- Network monitoring - Auto-sync when connection restored
Problem Statement
The current sync system has a critical gap: CloudKit queries return max 400 records per request, but the code discards the pagination cursor. With ~5,000 games in the dataset, most records may never sync from CloudKit.
Additionally, background tasks can be terminated by iOS with no graceful handling, and there's no automatic sync when network connectivity is restored.
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Sync Reliability System │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────────────────────────┐ │
│ │ NetworkMonitor │────►│ BackgroundSyncManager │ │
│ │ (NWPathMonitor) │ │ - Cancellation token support │ │
│ │ - Debounced │ │ - Progress persistence │ │
│ └──────────────────┘ └──────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────┐ │
│ │ CloudKitService │ │
│ │ - Paginated fetches │ │
│ │ - CKQueryOperation + cursors │ │
│ │ - Cancellation-aware │ │
│ └──────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────┐ │
│ │ SyncState (enhanced) │ │
│ │ - Per-entity lastSync dates │ │
│ │ - Partial sync progress │ │
│ └──────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Design Details
1. Paginated CloudKit Fetches
New Protocol:
/// Token checked between pages to support graceful cancellation
protocol SyncCancellationToken: Sendable {
var isCancelled: Bool { get }
}
Paginated Fetch Pattern:
Each fetch method (fetchGamesForSync, fetchTeamsForSync, etc.) refactored to:
func fetchGamesForSync(
since lastSync: Date?,
cancellationToken: SyncCancellationToken? = nil
) async throws -> [SyncGame] {
var allResults: [SyncGame] = []
var cursor: CKQueryOperation.Cursor? = nil
repeat {
// Check cancellation before each page
if cancellationToken?.isCancelled == true {
break // Return what we have so far
}
let (pageResults, nextCursor) = try await fetchGamesPage(
since: lastSync,
cursor: cursor
)
allResults.append(contentsOf: pageResults)
cursor = nextCursor
} while cursor != nil
return allResults
}
private func fetchGamesPage(
since lastSync: Date?,
cursor: CKQueryOperation.Cursor?
) async throws -> ([SyncGame], CKQueryOperation.Cursor?) {
// Uses CKQueryOperation for explicit cursor control
// Returns up to 400 records per page
}
Rationale:
- Cancellation checked between pages (not mid-fetch)
- Returns partial results on cancellation (not thrown away)
- Page size of 400 balances throughput vs. memory
2. Cancellation & Progress Saving
Cancellation Token Implementation:
final class BackgroundTaskCancellationToken: SyncCancellationToken, @unchecked Sendable {
private let _isCancelled = OSAllocatedUnfairLock(initialState: false)
var isCancelled: Bool { _isCancelled.withLock { $0 } }
func cancel() { _isCancelled.withLock { $0 = true } }
}
Enhanced SyncState:
Add per-entity sync timestamps for partial progress:
// Add to existing SyncState model
var lastStadiumSync: Date?
var lastTeamSync: Date?
var lastGameSync: Date?
var lastLeagueStructureSync: Date?
var lastTeamAliasSync: Date?
var lastStadiumAliasSync: Date?
var lastSportSync: Date?
Modified Sync Flow:
func syncAll(context: ModelContext, cancellationToken: SyncCancellationToken? = nil) async throws -> SyncResult {
// Sync each entity type, saving progress after each completes
let stadiums = try await syncStadiums(context: context, cancellationToken: cancellationToken)
syncState.lastStadiumSync = Date() // Save progress immediately
try context.save()
if cancellationToken?.isCancelled == true {
return partialResult(stadiums: stadiums)
}
let teams = try await syncTeams(context: context, cancellationToken: cancellationToken)
syncState.lastTeamSync = Date()
try context.save()
// ... continue for each entity type
}
Key behaviors:
- Progress saved after each entity type completes
- Cancellation returns partial results instead of throwing
- Next sync uses per-entity timestamps to resume
3. Network Monitor
NetworkMonitor Singleton:
import Network
@MainActor
final class NetworkMonitor: ObservableObject {
static let shared = NetworkMonitor()
@Published private(set) var isConnected: Bool = true
@Published private(set) var connectionType: ConnectionType = .unknown
enum ConnectionType { case wifi, cellular, wired, unknown }
private let monitor = NWPathMonitor()
private let queue = DispatchQueue(label: "NetworkMonitor")
private var debounceTask: Task<Void, Never>?
private init() {
monitor.pathUpdateHandler = { [weak self] path in
Task { @MainActor in
self?.handlePathUpdate(path)
}
}
monitor.start(queue: queue)
}
private func handlePathUpdate(_ path: NWPath) {
let wasConnected = isConnected
isConnected = (path.status == .satisfied)
connectionType = detectConnectionType(path)
// Debounce: only trigger sync if we GAINED connection
if !wasConnected && isConnected {
debounceTask?.cancel()
debounceTask = Task {
try? await Task.sleep(for: .seconds(2.5))
guard !Task.isCancelled else { return }
NotificationCenter.default.post(name: .networkRestored, object: nil)
}
}
}
}
extension Notification.Name {
static let networkRestored = Notification.Name("NetworkRestored")
}
BackgroundSyncManager Integration:
// In BackgroundSyncManager.configure(with:)
NotificationCenter.default.addObserver(
forName: .networkRestored,
object: nil,
queue: .main
) { [weak self] _ in
Task { @MainActor in
await self?.performSyncIfNeeded()
}
}
Rationale:
- 2.5 second debounce handles WiFi<->cellular transitions
- Only triggers on connection gained (not every path update)
- Uses NotificationCenter for loose coupling
Error Handling
| Scenario | Behavior |
|---|---|
| Cancellation mid-page | Current page completes, then stops. Partial data saved. |
| Network drops during sync | CloudKit throws error, caught by existing handling. Per-entity progress preserved. |
| Multiple sync triggers overlap | Existing syncInProgress flag prevents duplicate syncs. |
| App killed during background task | Per-entity saves ensure most progress kept. |
| Network flapping | 2.5s debounce collapses rapid transitions into single sync. |
| First launch offline | Bootstrap from bundled JSON. NetworkMonitor triggers sync when connected. |
Cancellation vs Error distinction:
struct SyncResult {
// ... existing fields
let wasCancelled: Bool // NEW: distinguish clean cancel from error
}
When wasCancelled == true:
- Don't increment
consecutiveFailures - Don't set
lastSyncError - Do update per-entity timestamps
Testing Strategy
Unit Tests (CloudKitService)
| Test | Validates |
|---|---|
test_pagination_fetchesAllPages |
Mock returns 3 pages of 400, verifies all 1200 returned |
test_pagination_stopsOnCancellation |
Cancel after page 1, verify only ~400 returned |
test_pagination_handlesEmptyResult |
No records returns empty array, not error |
test_pagination_handlesSinglePage |
<400 records returns without extra fetch |
Unit Tests (Cancellation)
| Test | Validates |
|---|---|
test_cancellationToken_threadSafe |
Concurrent read/write doesn't crash |
test_syncAll_savesProgressPerEntity |
Mock cancel after teams, verify stadiums saved but games not |
test_syncAll_doesNotIncrementFailuresOnCancel |
consecutiveFailures unchanged after clean cancel |
Unit Tests (NetworkMonitor)
| Test | Validates |
|---|---|
test_debounce_collapsesRapidChanges |
5 changes in 1s = 1 notification |
test_onlyNotifiesOnGainedConnection |
connected->disconnected doesn't trigger sync |
test_cancelsDebounceIfDisconnects |
connect, disconnect before 2.5s = no notification |
Integration Test
| Test | Validates |
|---|---|
test_fullSyncFlow_withMockCloudKit |
Bootstrap -> sync -> cancel -> resume picks up correctly |
Files to Modify
| File | Changes |
|---|---|
CloudKitService.swift |
Add pagination to all fetch methods, accept cancellation token |
CanonicalSyncService.swift |
Per-entity progress saving, pass cancellation token, handle partial results |
BackgroundSyncManager.swift |
Create/pass cancellation token, subscribe to network notifications |
CanonicalModels.swift |
Add per-entity sync timestamps to SyncState |
NetworkMonitor.swift |
NEW FILE - NWPathMonitor wrapper with debouncing |
Migration
No data migration required. New SyncState fields will be nil initially, causing a full sync on first run after update (existing behavior).