diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md index a152784..9b20ac8 100644 --- a/.planning/ROADMAP.md +++ b/.planning/ROADMAP.md @@ -78,12 +78,12 @@ Plans: ### Phase 5: CloudKit CRUD **Goal**: Implement full create, read, update, delete operations for CloudKit management **Depends on**: Phase 4 -**Research**: Likely (CloudKit server-to-server API) -**Research topics**: CloudKit server-to-server authentication, record modification operations, batch operations, conflict resolution -**Plans**: TBD +**Research**: No (existing patterns in cloudkit_import.py sufficient) +**Plans**: 2 plans Plans: -- [ ] 05-01: TBD +- [ ] 05-01: Smart sync with change detection (diff reporting, differential upload) +- [ ] 05-02: Verification and record management (sync verification, individual CRUD) ### Phase 6: Validation Reports **Goal**: Generate validation reports showing record counts, data gaps, orphan records, and relationship integrity @@ -106,5 +106,5 @@ Phases execute in numeric order: 1 → 2 → 2.1 → 3 → 4 → 5 → 6 | 2.1. Additional Sports Stadiums | 3/3 | Complete | 2026-01-10 | | 3. Alias Systems | 2/2 | Complete | 2026-01-10 | | 4. Canonical Linking | 1/1 | Complete | 2026-01-10 | -| 5. CloudKit CRUD | 0/TBD | Not started | - | +| 5. CloudKit CRUD | 0/2 | In progress | - | | 6. Validation Reports | 0/TBD | Not started | - | diff --git a/.planning/phases/05-cloudkit-crud/05-01-PLAN.md b/.planning/phases/05-cloudkit-crud/05-01-PLAN.md new file mode 100644 index 0000000..7f4e51a --- /dev/null +++ b/.planning/phases/05-cloudkit-crud/05-01-PLAN.md @@ -0,0 +1,151 @@ +--- +phase: 05-cloudkit-crud +plan: 01 +type: execute +domain: data-pipeline +--- + + +Add smart sync with change detection to cloudkit_import.py. + +Purpose: Enable differential uploads that only sync new/changed records, reducing CloudKit API calls and sync time. +Output: Enhanced cloudkit_import.py with --diff, --smart-sync, and --changes-only flags. + + + +~/.claude/get-shit-done/workflows/execute-phase.md +~/.claude/get-shit-done/templates/summary.md + + + +@.planning/PROJECT.md +@.planning/ROADMAP.md +@.planning/STATE.md +@.planning/phases/04-canonical-linking/04-01-SUMMARY.md + +**Relevant source files:** +@Scripts/cloudkit_import.py + +**Tech stack available:** Python 3, requests, cryptography, CloudKit server-to-server API +**Established patterns:** forceReplace for create/update, query() for read, delete_all() for deletion, batch operations with BATCH_SIZE=200 + +**Constraining decisions:** +- Phase 04-01: 5760 games canonicalized with 100% team/stadium resolution +- Existing CloudKit import uses forceReplace (creates or replaces) for all operations +- recordChangeTag must be used for conflict detection in updates + + + + + + Task 1: Add change detection with diff reporting + Scripts/cloudkit_import.py + +Add change detection capability to compare local canonical data against CloudKit records. + +1. Add `query_all(record_type, verbose)` method to CloudKit class: + - Query with pagination (use continuationMarker for >200 records) + - Return dict mapping recordName to record data (including recordChangeTag) + - Handle query errors gracefully + +2. Add `compute_diff(local_records, cloud_records)` function: + - Returns dict with keys: 'new', 'updated', 'unchanged', 'deleted' + - 'new': records in local but not in cloud (by recordName) + - 'updated': records in both where fields differ (compare field values) + - 'unchanged': records in both with same field values + - 'deleted': records in cloud but not in local + - Include count for each category + +3. Add `--diff` flag to argparse: + - When set, query CloudKit and show diff report for each record type + - Format: "Stadiums: 32 unchanged, 2 new, 1 updated, 0 deleted" + - Do NOT perform any imports, just report + +4. Field comparison for 'updated' detection: + - Compare string/int fields directly + - For location fields, compare lat/lng with tolerance (0.0001) + - For reference fields, compare recordName only + - Ignore recordChangeTag and timestamps in comparison + +Avoid: Using forceReplace for everything. The goal is to identify WHAT changed before deciding HOW to sync. + + +```bash +cd Scripts && python cloudkit_import.py --diff --verbose +``` +Should output diff report showing counts for each record type (stadiums, teams, games, etc.) + + --diff flag works and reports new/updated/unchanged/deleted counts for each record type + + + + Task 2: Add differential sync with smart-sync flag + Scripts/cloudkit_import.py + +Add differential sync capability that only uploads new and changed records. + +1. Add `sync_diff(ck, diff, record_type, dry_run, verbose)` function: + - For 'new' records: use forceReplace (creates new) + - For 'updated' records: use 'update' operationType with recordChangeTag + - For 'deleted' records: use 'delete' operationType (optional, controlled by flag) + - Skip 'unchanged' records entirely + - Return counts: created, updated, deleted, skipped + +2. Add CloudKit `update` operation handling in modify(): + - update operationType requires recordChangeTag field + - Handle CONFLICT error (409) - means record was modified since query + - On conflict: re-query that record, recompute if still needs update + +3. Add `--smart-sync` flag: + - Query CloudKit first to get current state + - Compute diff against local data + - Sync only new and updated records + - Print summary: "Created N, Updated M, Skipped K unchanged" + +4. Add `--delete-orphans` flag (used with --smart-sync): + - When set, also delete records in CloudKit but not in local + - Default: do NOT delete orphans (safe mode) + - Print warning: "Would delete N orphan records (use --delete-orphans to confirm)" + +5. Menu integration: + - Add option 12: "Smart sync (diff-based)" + - Add option 13: "Smart sync + delete orphans" + +Avoid: Deleting records without explicit flag. forceReplace on unchanged records. + + +```bash +cd Scripts && python cloudkit_import.py --smart-sync --dry-run --verbose +``` +Should show what would be created/updated/skipped without making changes. + +```bash +cd Scripts && python cloudkit_import.py --smart-sync --verbose +``` +Should perform differential sync, reporting created/updated/skipped counts. + + --smart-sync flag performs differential upload, skipping unchanged records. Created/updated counts are accurate. + + + + + +Before declaring plan complete: +- [ ] `python cloudkit_import.py --diff` reports accurate counts for all record types +- [ ] `python cloudkit_import.py --smart-sync --dry-run` shows correct preview +- [ ] `python cloudkit_import.py --smart-sync` uploads only changed records +- [ ] Update with recordChangeTag handles conflicts gracefully +- [ ] Interactive menu has new options 12 and 13 + + + + +- Change detection accurately identifies new/updated/unchanged/deleted records +- Smart sync reduces CloudKit API calls by skipping unchanged records +- Conflict handling prevents data loss on concurrent updates +- No regressions to existing import functionality + + + +After completion, create `.planning/phases/05-cloudkit-crud/05-01-SUMMARY.md` + diff --git a/.planning/phases/05-cloudkit-crud/05-02-PLAN.md b/.planning/phases/05-cloudkit-crud/05-02-PLAN.md new file mode 100644 index 0000000..a6df6ed --- /dev/null +++ b/.planning/phases/05-cloudkit-crud/05-02-PLAN.md @@ -0,0 +1,165 @@ +--- +phase: 05-cloudkit-crud +plan: 02 +type: execute +domain: data-pipeline +--- + + +Add sync verification and individual record management to cloudkit_import.py. + +Purpose: Enable verification that CloudKit matches local data, plus ability to manage individual records. +Output: Enhanced cloudkit_import.py with --verify, --get, --update-record, and --delete-record flags. + + + +~/.claude/get-shit-done/workflows/execute-phase.md +~/.claude/get-shit-done/templates/summary.md + + + +@.planning/PROJECT.md +@.planning/ROADMAP.md +@.planning/STATE.md +@.planning/phases/05-cloudkit-crud/05-01-PLAN.md + +**Relevant source files:** +@Scripts/cloudkit_import.py + +**Tech stack available:** Python 3, requests, cryptography, CloudKit server-to-server API +**Established patterns:** query_all() for full record retrieval, compute_diff() for change detection, sync_diff() for smart sync + +**Prior plan context:** +- Plan 05-01 adds change detection and smart sync +- query_all() returns dict of recordName -> record data +- compute_diff() identifies new/updated/unchanged/deleted + + + + + + Task 1: Add sync verification with --verify flag + Scripts/cloudkit_import.py + +Add sync verification to confirm CloudKit matches local canonical data. + +1. Add `--verify` flag to argparse: + - Query CloudKit for all record types + - Compare counts with local data + - Report discrepancies + +2. Add `verify_sync(ck, data_dir, verbose)` function: + - For each record type (Stadium, Team, Game, LeagueStructure, TeamAlias, StadiumAlias): + - Query CloudKit count + - Get local count from JSON files + - Compare and report + - Output format per type: + ``` + Stadium: CloudKit=32, Local=32 [OK] + Game: CloudKit=5758, Local=5760 [MISMATCH: 2 missing in CloudKit] + ``` + +3. Add spot-check verification: + - Random sample 5 records per type (if count matches) + - Verify field values match between CloudKit and local + - Report any field mismatches found + +4. Add `--verify-deep` flag: + - Full field-by-field comparison of ALL records (not just sample) + - Report each mismatch with recordName and field name + - Warning: "Deep verification may take several minutes for large datasets" + +5. Menu integration: + - Add option 14: "Verify sync (quick)" + - Add option 15: "Verify sync (deep)" + +Avoid: Modifying any data during verification. This is read-only. + + +```bash +cd Scripts && python cloudkit_import.py --verify --verbose +``` +Should output verification report showing CloudKit vs local counts with OK/MISMATCH status. + + --verify flag reports accurate comparison between CloudKit and local data for all record types + + + + Task 2: Add individual record management commands + Scripts/cloudkit_import.py + +Add commands for managing individual records by ID. + +1. Add `--get ` flag: + - Query single record by recordName + - Print all field values in formatted output + - Example: `--get Stadium stadium_nba_td_garden` + +2. Add CloudKit `lookup(record_names)` method: + - Use records/lookup endpoint for efficient single/batch lookup + - Return list of records + +3. Add `--update-record =` flag: + - Lookup record to get current recordChangeTag + - Update specified field(s) + - Handle CONFLICT error with retry + - Example: `--update-record Stadium stadium_nba_td_garden capacity=19156` + +4. Add `--delete-record ` flag: + - Lookup record to get recordChangeTag + - Delete single record + - Confirm before delete (unless --force) + - Example: `--delete-record Game game_mlb_2025_ari_phi_0401` + +5. Add `--list ` flag: + - Query all records of type + - Print recordNames (one per line) + - Support `--list --count` for just the count + - Example: `--list Stadium` prints all stadium IDs + +6. Error handling: + - Record not found: "Error: No Stadium with id 'xyz' found in CloudKit" + - Invalid type: "Error: Unknown record type 'Foo'. Valid types: Stadium, Team, Game, ..." + +Avoid: Deleting records without confirmation. Updating records without checking conflict. + + +```bash +cd Scripts && python cloudkit_import.py --get Stadium stadium_nba_td_garden +``` +Should print all fields for the specified stadium record. + +```bash +cd Scripts && python cloudkit_import.py --list Game --count +``` +Should print the count of Game records in CloudKit. + + Individual record management commands work: --get, --list, --update-record, --delete-record + + + + + +Before declaring plan complete: +- [ ] `--verify` reports accurate count comparison for all record types +- [ ] `--verify-deep` performs full field comparison with mismatch reporting +- [ ] `--get ` retrieves and displays single record +- [ ] `--list ` lists all recordNames for type +- [ ] `--update-record` updates field with conflict handling +- [ ] `--delete-record` deletes with confirmation +- [ ] Interactive menu has options 14-15 +- [ ] No regressions to existing functionality + + + + +- Verification accurately detects sync discrepancies +- Individual record operations work for all record types +- Conflict handling prevents data corruption +- All CRUD operations now fully supported +- Phase 5 complete + + + +After completion, create `.planning/phases/05-cloudkit-crud/05-02-SUMMARY.md` +