--- phase: 05-cloudkit-crud plan: 01 type: execute domain: data-pipeline --- Add smart sync with change detection to cloudkit_import.py. Purpose: Enable differential uploads that only sync new/changed records, reducing CloudKit API calls and sync time. Output: Enhanced cloudkit_import.py with --diff, --smart-sync, and --changes-only flags. ~/.claude/get-shit-done/workflows/execute-phase.md ~/.claude/get-shit-done/templates/summary.md @.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/04-canonical-linking/04-01-SUMMARY.md **Relevant source files:** @Scripts/cloudkit_import.py **Tech stack available:** Python 3, requests, cryptography, CloudKit server-to-server API **Established patterns:** forceReplace for create/update, query() for read, delete_all() for deletion, batch operations with BATCH_SIZE=200 **Constraining decisions:** - Phase 04-01: 5760 games canonicalized with 100% team/stadium resolution - Existing CloudKit import uses forceReplace (creates or replaces) for all operations - recordChangeTag must be used for conflict detection in updates Task 1: Add change detection with diff reporting Scripts/cloudkit_import.py Add change detection capability to compare local canonical data against CloudKit records. 1. Add `query_all(record_type, verbose)` method to CloudKit class: - Query with pagination (use continuationMarker for >200 records) - Return dict mapping recordName to record data (including recordChangeTag) - Handle query errors gracefully 2. Add `compute_diff(local_records, cloud_records)` function: - Returns dict with keys: 'new', 'updated', 'unchanged', 'deleted' - 'new': records in local but not in cloud (by recordName) - 'updated': records in both where fields differ (compare field values) - 'unchanged': records in both with same field values - 'deleted': records in cloud but not in local - Include count for each category 3. Add `--diff` flag to argparse: - When set, query CloudKit and show diff report for each record type - Format: "Stadiums: 32 unchanged, 2 new, 1 updated, 0 deleted" - Do NOT perform any imports, just report 4. Field comparison for 'updated' detection: - Compare string/int fields directly - For location fields, compare lat/lng with tolerance (0.0001) - For reference fields, compare recordName only - Ignore recordChangeTag and timestamps in comparison Avoid: Using forceReplace for everything. The goal is to identify WHAT changed before deciding HOW to sync. ```bash cd Scripts && python cloudkit_import.py --diff --verbose ``` Should output diff report showing counts for each record type (stadiums, teams, games, etc.) --diff flag works and reports new/updated/unchanged/deleted counts for each record type Task 2: Add differential sync with smart-sync flag Scripts/cloudkit_import.py Add differential sync capability that only uploads new and changed records. 1. Add `sync_diff(ck, diff, record_type, dry_run, verbose)` function: - For 'new' records: use forceReplace (creates new) - For 'updated' records: use 'update' operationType with recordChangeTag - For 'deleted' records: use 'delete' operationType (optional, controlled by flag) - Skip 'unchanged' records entirely - Return counts: created, updated, deleted, skipped 2. Add CloudKit `update` operation handling in modify(): - update operationType requires recordChangeTag field - Handle CONFLICT error (409) - means record was modified since query - On conflict: re-query that record, recompute if still needs update 3. Add `--smart-sync` flag: - Query CloudKit first to get current state - Compute diff against local data - Sync only new and updated records - Print summary: "Created N, Updated M, Skipped K unchanged" 4. Add `--delete-orphans` flag (used with --smart-sync): - When set, also delete records in CloudKit but not in local - Default: do NOT delete orphans (safe mode) - Print warning: "Would delete N orphan records (use --delete-orphans to confirm)" 5. Menu integration: - Add option 12: "Smart sync (diff-based)" - Add option 13: "Smart sync + delete orphans" Avoid: Deleting records without explicit flag. forceReplace on unchanged records. ```bash cd Scripts && python cloudkit_import.py --smart-sync --dry-run --verbose ``` Should show what would be created/updated/skipped without making changes. ```bash cd Scripts && python cloudkit_import.py --smart-sync --verbose ``` Should perform differential sync, reporting created/updated/skipped counts. --smart-sync flag performs differential upload, skipping unchanged records. Created/updated counts are accurate. Before declaring plan complete: - [ ] `python cloudkit_import.py --diff` reports accurate counts for all record types - [ ] `python cloudkit_import.py --smart-sync --dry-run` shows correct preview - [ ] `python cloudkit_import.py --smart-sync` uploads only changed records - [ ] Update with recordChangeTag handles conflicts gracefully - [ ] Interactive menu has new options 12 and 13 - Change detection accurately identifies new/updated/unchanged/deleted records - Smart sync reduces CloudKit API calls by skipping unchanged records - Conflict handling prevents data loss on concurrent updates - No regressions to existing import functionality After completion, create `.planning/phases/05-cloudkit-crud/05-01-SUMMARY.md`