Files
Sportstime/docs/plans/2026-01-13-sync-reliability-design.md
Trey t b433e1dad5 docs: add sync reliability design for CloudKit pagination and cancellation
Design covers three improvements:
- Paginated CloudKit fetches to handle >400 records
- Graceful cancellation with per-entity progress saving
- NWPathMonitor for auto-sync on network restoration

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 18:49:12 -06:00

11 KiB

Sync Reliability Design

Date: 2026-01-13 Status: Approved Author: Claude Code + User collaboration

Summary

Improve CloudKit sync reliability with three enhancements:

  1. Pagination - Fetch all records, not just first ~400
  2. Cancellation - Graceful background task expiration with progress saving
  3. Network monitoring - Auto-sync when connection restored

Problem Statement

The current sync system has a critical gap: CloudKit queries return max 400 records per request, but the code discards the pagination cursor. With ~5,000 games in the dataset, most records may never sync from CloudKit.

Additionally, background tasks can be terminated by iOS with no graceful handling, and there's no automatic sync when network connectivity is restored.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Sync Reliability System                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────────┐     ┌──────────────────────────────────┐  │
│  │ NetworkMonitor   │────►│ BackgroundSyncManager            │  │
│  │ (NWPathMonitor)  │     │ - Cancellation token support     │  │
│  │ - Debounced      │     │ - Progress persistence           │  │
│  └──────────────────┘     └──────────────────────────────────┘  │
│                                        │                         │
│                                        ▼                         │
│                           ┌──────────────────────────────────┐  │
│                           │ CloudKitService                  │  │
│                           │ - Paginated fetches              │  │
│                           │ - CKQueryOperation + cursors     │  │
│                           │ - Cancellation-aware             │  │
│                           └──────────────────────────────────┘  │
│                                        │                         │
│                                        ▼                         │
│                           ┌──────────────────────────────────┐  │
│                           │ SyncState (enhanced)             │  │
│                           │ - Per-entity lastSync dates      │  │
│                           │ - Partial sync progress          │  │
│                           └──────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Design Details

1. Paginated CloudKit Fetches

New Protocol:

/// Token checked between pages to support graceful cancellation
protocol SyncCancellationToken: Sendable {
    var isCancelled: Bool { get }
}

Paginated Fetch Pattern:

Each fetch method (fetchGamesForSync, fetchTeamsForSync, etc.) refactored to:

func fetchGamesForSync(
    since lastSync: Date?,
    cancellationToken: SyncCancellationToken? = nil
) async throws -> [SyncGame] {
    var allResults: [SyncGame] = []
    var cursor: CKQueryOperation.Cursor? = nil

    repeat {
        // Check cancellation before each page
        if cancellationToken?.isCancelled == true {
            break  // Return what we have so far
        }

        let (pageResults, nextCursor) = try await fetchGamesPage(
            since: lastSync,
            cursor: cursor
        )
        allResults.append(contentsOf: pageResults)
        cursor = nextCursor

    } while cursor != nil

    return allResults
}

private func fetchGamesPage(
    since lastSync: Date?,
    cursor: CKQueryOperation.Cursor?
) async throws -> ([SyncGame], CKQueryOperation.Cursor?) {
    // Uses CKQueryOperation for explicit cursor control
    // Returns up to 400 records per page
}

Rationale:

  • Cancellation checked between pages (not mid-fetch)
  • Returns partial results on cancellation (not thrown away)
  • Page size of 400 balances throughput vs. memory

2. Cancellation & Progress Saving

Cancellation Token Implementation:

final class BackgroundTaskCancellationToken: SyncCancellationToken, @unchecked Sendable {
    private let _isCancelled = OSAllocatedUnfairLock(initialState: false)

    var isCancelled: Bool { _isCancelled.withLock { $0 } }

    func cancel() { _isCancelled.withLock { $0 = true } }
}

Enhanced SyncState:

Add per-entity sync timestamps for partial progress:

// Add to existing SyncState model
var lastStadiumSync: Date?
var lastTeamSync: Date?
var lastGameSync: Date?
var lastLeagueStructureSync: Date?
var lastTeamAliasSync: Date?
var lastStadiumAliasSync: Date?
var lastSportSync: Date?

Modified Sync Flow:

func syncAll(context: ModelContext, cancellationToken: SyncCancellationToken? = nil) async throws -> SyncResult {
    // Sync each entity type, saving progress after each completes

    let stadiums = try await syncStadiums(context: context, cancellationToken: cancellationToken)
    syncState.lastStadiumSync = Date()  // Save progress immediately
    try context.save()

    if cancellationToken?.isCancelled == true {
        return partialResult(stadiums: stadiums)
    }

    let teams = try await syncTeams(context: context, cancellationToken: cancellationToken)
    syncState.lastTeamSync = Date()
    try context.save()

    // ... continue for each entity type
}

Key behaviors:

  • Progress saved after each entity type completes
  • Cancellation returns partial results instead of throwing
  • Next sync uses per-entity timestamps to resume

3. Network Monitor

NetworkMonitor Singleton:

import Network

@MainActor
final class NetworkMonitor: ObservableObject {
    static let shared = NetworkMonitor()

    @Published private(set) var isConnected: Bool = true
    @Published private(set) var connectionType: ConnectionType = .unknown

    enum ConnectionType { case wifi, cellular, wired, unknown }

    private let monitor = NWPathMonitor()
    private let queue = DispatchQueue(label: "NetworkMonitor")
    private var debounceTask: Task<Void, Never>?

    private init() {
        monitor.pathUpdateHandler = { [weak self] path in
            Task { @MainActor in
                self?.handlePathUpdate(path)
            }
        }
        monitor.start(queue: queue)
    }

    private func handlePathUpdate(_ path: NWPath) {
        let wasConnected = isConnected
        isConnected = (path.status == .satisfied)
        connectionType = detectConnectionType(path)

        // Debounce: only trigger sync if we GAINED connection
        if !wasConnected && isConnected {
            debounceTask?.cancel()
            debounceTask = Task {
                try? await Task.sleep(for: .seconds(2.5))
                guard !Task.isCancelled else { return }
                NotificationCenter.default.post(name: .networkRestored, object: nil)
            }
        }
    }
}

extension Notification.Name {
    static let networkRestored = Notification.Name("NetworkRestored")
}

BackgroundSyncManager Integration:

// In BackgroundSyncManager.configure(with:)
NotificationCenter.default.addObserver(
    forName: .networkRestored,
    object: nil,
    queue: .main
) { [weak self] _ in
    Task { @MainActor in
        await self?.performSyncIfNeeded()
    }
}

Rationale:

  • 2.5 second debounce handles WiFi<->cellular transitions
  • Only triggers on connection gained (not every path update)
  • Uses NotificationCenter for loose coupling

Error Handling

Scenario Behavior
Cancellation mid-page Current page completes, then stops. Partial data saved.
Network drops during sync CloudKit throws error, caught by existing handling. Per-entity progress preserved.
Multiple sync triggers overlap Existing syncInProgress flag prevents duplicate syncs.
App killed during background task Per-entity saves ensure most progress kept.
Network flapping 2.5s debounce collapses rapid transitions into single sync.
First launch offline Bootstrap from bundled JSON. NetworkMonitor triggers sync when connected.

Cancellation vs Error distinction:

struct SyncResult {
    // ... existing fields
    let wasCancelled: Bool  // NEW: distinguish clean cancel from error
}

When wasCancelled == true:

  • Don't increment consecutiveFailures
  • Don't set lastSyncError
  • Do update per-entity timestamps

Testing Strategy

Unit Tests (CloudKitService)

Test Validates
test_pagination_fetchesAllPages Mock returns 3 pages of 400, verifies all 1200 returned
test_pagination_stopsOnCancellation Cancel after page 1, verify only ~400 returned
test_pagination_handlesEmptyResult No records returns empty array, not error
test_pagination_handlesSinglePage <400 records returns without extra fetch

Unit Tests (Cancellation)

Test Validates
test_cancellationToken_threadSafe Concurrent read/write doesn't crash
test_syncAll_savesProgressPerEntity Mock cancel after teams, verify stadiums saved but games not
test_syncAll_doesNotIncrementFailuresOnCancel consecutiveFailures unchanged after clean cancel

Unit Tests (NetworkMonitor)

Test Validates
test_debounce_collapsesRapidChanges 5 changes in 1s = 1 notification
test_onlyNotifiesOnGainedConnection connected->disconnected doesn't trigger sync
test_cancelsDebounceIfDisconnects connect, disconnect before 2.5s = no notification

Integration Test

Test Validates
test_fullSyncFlow_withMockCloudKit Bootstrap -> sync -> cancel -> resume picks up correctly

Files to Modify

File Changes
CloudKitService.swift Add pagination to all fetch methods, accept cancellation token
CanonicalSyncService.swift Per-entity progress saving, pass cancellation token, handle partial results
BackgroundSyncManager.swift Create/pass cancellation token, subscribe to network notifications
CanonicalModels.swift Add per-entity sync timestamps to SyncState
NetworkMonitor.swift NEW FILE - NWPathMonitor wrapper with debouncing

Migration

No data migration required. New SyncState fields will be nil initially, causing a full sync on first run after update (existing behavior).