docs(05): create CloudKit CRUD phase plans

Phase 5: CloudKit CRUD
- 2 plans created
- 4 total tasks defined
- Ready for execution

Plan 05-01: Smart sync with change detection
- Change detection with diff reporting
- Differential sync (upload only changed records)

Plan 05-02: Verification and record management
- Sync verification (CloudKit vs local comparison)
- Individual record CRUD operations

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Trey t
2026-01-10 10:02:06 -06:00
parent 1675e22b26
commit e5c6d0fec7
3 changed files with 321 additions and 5 deletions

View File

@@ -78,12 +78,12 @@ Plans:
### Phase 5: CloudKit CRUD
**Goal**: Implement full create, read, update, delete operations for CloudKit management
**Depends on**: Phase 4
**Research**: Likely (CloudKit server-to-server API)
**Research topics**: CloudKit server-to-server authentication, record modification operations, batch operations, conflict resolution
**Plans**: TBD
**Research**: No (existing patterns in cloudkit_import.py sufficient)
**Plans**: 2 plans
Plans:
- [ ] 05-01: TBD
- [ ] 05-01: Smart sync with change detection (diff reporting, differential upload)
- [ ] 05-02: Verification and record management (sync verification, individual CRUD)
### Phase 6: Validation Reports
**Goal**: Generate validation reports showing record counts, data gaps, orphan records, and relationship integrity
@@ -106,5 +106,5 @@ Phases execute in numeric order: 1 → 2 → 2.1 → 3 → 4 → 5 → 6
| 2.1. Additional Sports Stadiums | 3/3 | Complete | 2026-01-10 |
| 3. Alias Systems | 2/2 | Complete | 2026-01-10 |
| 4. Canonical Linking | 1/1 | Complete | 2026-01-10 |
| 5. CloudKit CRUD | 0/TBD | Not started | - |
| 5. CloudKit CRUD | 0/2 | In progress | - |
| 6. Validation Reports | 0/TBD | Not started | - |

View File

@@ -0,0 +1,151 @@
---
phase: 05-cloudkit-crud
plan: 01
type: execute
domain: data-pipeline
---
<objective>
Add smart sync with change detection to cloudkit_import.py.
Purpose: Enable differential uploads that only sync new/changed records, reducing CloudKit API calls and sync time.
Output: Enhanced cloudkit_import.py with --diff, --smart-sync, and --changes-only flags.
</objective>
<execution_context>
~/.claude/get-shit-done/workflows/execute-phase.md
~/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/04-canonical-linking/04-01-SUMMARY.md
**Relevant source files:**
@Scripts/cloudkit_import.py
**Tech stack available:** Python 3, requests, cryptography, CloudKit server-to-server API
**Established patterns:** forceReplace for create/update, query() for read, delete_all() for deletion, batch operations with BATCH_SIZE=200
**Constraining decisions:**
- Phase 04-01: 5760 games canonicalized with 100% team/stadium resolution
- Existing CloudKit import uses forceReplace (creates or replaces) for all operations
- recordChangeTag must be used for conflict detection in updates
</context>
<tasks>
<task type="auto">
<name>Task 1: Add change detection with diff reporting</name>
<files>Scripts/cloudkit_import.py</files>
<action>
Add change detection capability to compare local canonical data against CloudKit records.
1. Add `query_all(record_type, verbose)` method to CloudKit class:
- Query with pagination (use continuationMarker for >200 records)
- Return dict mapping recordName to record data (including recordChangeTag)
- Handle query errors gracefully
2. Add `compute_diff(local_records, cloud_records)` function:
- Returns dict with keys: 'new', 'updated', 'unchanged', 'deleted'
- 'new': records in local but not in cloud (by recordName)
- 'updated': records in both where fields differ (compare field values)
- 'unchanged': records in both with same field values
- 'deleted': records in cloud but not in local
- Include count for each category
3. Add `--diff` flag to argparse:
- When set, query CloudKit and show diff report for each record type
- Format: "Stadiums: 32 unchanged, 2 new, 1 updated, 0 deleted"
- Do NOT perform any imports, just report
4. Field comparison for 'updated' detection:
- Compare string/int fields directly
- For location fields, compare lat/lng with tolerance (0.0001)
- For reference fields, compare recordName only
- Ignore recordChangeTag and timestamps in comparison
Avoid: Using forceReplace for everything. The goal is to identify WHAT changed before deciding HOW to sync.
</action>
<verify>
```bash
cd Scripts && python cloudkit_import.py --diff --verbose
```
Should output diff report showing counts for each record type (stadiums, teams, games, etc.)
</verify>
<done>--diff flag works and reports new/updated/unchanged/deleted counts for each record type</done>
</task>
<task type="auto">
<name>Task 2: Add differential sync with smart-sync flag</name>
<files>Scripts/cloudkit_import.py</files>
<action>
Add differential sync capability that only uploads new and changed records.
1. Add `sync_diff(ck, diff, record_type, dry_run, verbose)` function:
- For 'new' records: use forceReplace (creates new)
- For 'updated' records: use 'update' operationType with recordChangeTag
- For 'deleted' records: use 'delete' operationType (optional, controlled by flag)
- Skip 'unchanged' records entirely
- Return counts: created, updated, deleted, skipped
2. Add CloudKit `update` operation handling in modify():
- update operationType requires recordChangeTag field
- Handle CONFLICT error (409) - means record was modified since query
- On conflict: re-query that record, recompute if still needs update
3. Add `--smart-sync` flag:
- Query CloudKit first to get current state
- Compute diff against local data
- Sync only new and updated records
- Print summary: "Created N, Updated M, Skipped K unchanged"
4. Add `--delete-orphans` flag (used with --smart-sync):
- When set, also delete records in CloudKit but not in local
- Default: do NOT delete orphans (safe mode)
- Print warning: "Would delete N orphan records (use --delete-orphans to confirm)"
5. Menu integration:
- Add option 12: "Smart sync (diff-based)"
- Add option 13: "Smart sync + delete orphans"
Avoid: Deleting records without explicit flag. forceReplace on unchanged records.
</action>
<verify>
```bash
cd Scripts && python cloudkit_import.py --smart-sync --dry-run --verbose
```
Should show what would be created/updated/skipped without making changes.
```bash
cd Scripts && python cloudkit_import.py --smart-sync --verbose
```
Should perform differential sync, reporting created/updated/skipped counts.
</verify>
<done>--smart-sync flag performs differential upload, skipping unchanged records. Created/updated counts are accurate.</done>
</task>
</tasks>
<verification>
Before declaring plan complete:
- [ ] `python cloudkit_import.py --diff` reports accurate counts for all record types
- [ ] `python cloudkit_import.py --smart-sync --dry-run` shows correct preview
- [ ] `python cloudkit_import.py --smart-sync` uploads only changed records
- [ ] Update with recordChangeTag handles conflicts gracefully
- [ ] Interactive menu has new options 12 and 13
</verification>
<success_criteria>
- Change detection accurately identifies new/updated/unchanged/deleted records
- Smart sync reduces CloudKit API calls by skipping unchanged records
- Conflict handling prevents data loss on concurrent updates
- No regressions to existing import functionality
</success_criteria>
<output>
After completion, create `.planning/phases/05-cloudkit-crud/05-01-SUMMARY.md`
</output>

View File

@@ -0,0 +1,165 @@
---
phase: 05-cloudkit-crud
plan: 02
type: execute
domain: data-pipeline
---
<objective>
Add sync verification and individual record management to cloudkit_import.py.
Purpose: Enable verification that CloudKit matches local data, plus ability to manage individual records.
Output: Enhanced cloudkit_import.py with --verify, --get, --update-record, and --delete-record flags.
</objective>
<execution_context>
~/.claude/get-shit-done/workflows/execute-phase.md
~/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/05-cloudkit-crud/05-01-PLAN.md
**Relevant source files:**
@Scripts/cloudkit_import.py
**Tech stack available:** Python 3, requests, cryptography, CloudKit server-to-server API
**Established patterns:** query_all() for full record retrieval, compute_diff() for change detection, sync_diff() for smart sync
**Prior plan context:**
- Plan 05-01 adds change detection and smart sync
- query_all() returns dict of recordName -> record data
- compute_diff() identifies new/updated/unchanged/deleted
</context>
<tasks>
<task type="auto">
<name>Task 1: Add sync verification with --verify flag</name>
<files>Scripts/cloudkit_import.py</files>
<action>
Add sync verification to confirm CloudKit matches local canonical data.
1. Add `--verify` flag to argparse:
- Query CloudKit for all record types
- Compare counts with local data
- Report discrepancies
2. Add `verify_sync(ck, data_dir, verbose)` function:
- For each record type (Stadium, Team, Game, LeagueStructure, TeamAlias, StadiumAlias):
- Query CloudKit count
- Get local count from JSON files
- Compare and report
- Output format per type:
```
Stadium: CloudKit=32, Local=32 [OK]
Game: CloudKit=5758, Local=5760 [MISMATCH: 2 missing in CloudKit]
```
3. Add spot-check verification:
- Random sample 5 records per type (if count matches)
- Verify field values match between CloudKit and local
- Report any field mismatches found
4. Add `--verify-deep` flag:
- Full field-by-field comparison of ALL records (not just sample)
- Report each mismatch with recordName and field name
- Warning: "Deep verification may take several minutes for large datasets"
5. Menu integration:
- Add option 14: "Verify sync (quick)"
- Add option 15: "Verify sync (deep)"
Avoid: Modifying any data during verification. This is read-only.
</action>
<verify>
```bash
cd Scripts && python cloudkit_import.py --verify --verbose
```
Should output verification report showing CloudKit vs local counts with OK/MISMATCH status.
</verify>
<done>--verify flag reports accurate comparison between CloudKit and local data for all record types</done>
</task>
<task type="auto">
<name>Task 2: Add individual record management commands</name>
<files>Scripts/cloudkit_import.py</files>
<action>
Add commands for managing individual records by ID.
1. Add `--get <type> <id>` flag:
- Query single record by recordName
- Print all field values in formatted output
- Example: `--get Stadium stadium_nba_td_garden`
2. Add CloudKit `lookup(record_names)` method:
- Use records/lookup endpoint for efficient single/batch lookup
- Return list of records
3. Add `--update-record <type> <id> <field>=<value>` flag:
- Lookup record to get current recordChangeTag
- Update specified field(s)
- Handle CONFLICT error with retry
- Example: `--update-record Stadium stadium_nba_td_garden capacity=19156`
4. Add `--delete-record <type> <id>` flag:
- Lookup record to get recordChangeTag
- Delete single record
- Confirm before delete (unless --force)
- Example: `--delete-record Game game_mlb_2025_ari_phi_0401`
5. Add `--list <type>` flag:
- Query all records of type
- Print recordNames (one per line)
- Support `--list <type> --count` for just the count
- Example: `--list Stadium` prints all stadium IDs
6. Error handling:
- Record not found: "Error: No Stadium with id 'xyz' found in CloudKit"
- Invalid type: "Error: Unknown record type 'Foo'. Valid types: Stadium, Team, Game, ..."
Avoid: Deleting records without confirmation. Updating records without checking conflict.
</action>
<verify>
```bash
cd Scripts && python cloudkit_import.py --get Stadium stadium_nba_td_garden
```
Should print all fields for the specified stadium record.
```bash
cd Scripts && python cloudkit_import.py --list Game --count
```
Should print the count of Game records in CloudKit.
</verify>
<done>Individual record management commands work: --get, --list, --update-record, --delete-record</done>
</task>
</tasks>
<verification>
Before declaring plan complete:
- [ ] `--verify` reports accurate count comparison for all record types
- [ ] `--verify-deep` performs full field comparison with mismatch reporting
- [ ] `--get <type> <id>` retrieves and displays single record
- [ ] `--list <type>` lists all recordNames for type
- [ ] `--update-record` updates field with conflict handling
- [ ] `--delete-record` deletes with confirmation
- [ ] Interactive menu has options 14-15
- [ ] No regressions to existing functionality
</verification>
<success_criteria>
- Verification accurately detects sync discrepancies
- Individual record operations work for all record types
- Conflict handling prevents data corruption
- All CRUD operations now fully supported
- Phase 5 complete
</success_criteria>
<output>
After completion, create `.planning/phases/05-cloudkit-crud/05-02-SUMMARY.md`
</output>