Add PlantGuide iOS app with plant identification and care management

- Implement camera capture and plant identification workflow - Add Core Data persistence for plants, care schedules, and cached API data - Create collection view with grid/list layouts and filtering - Build plant detail views with care information display - Integrate Trefle botanical API for plant care data - Add local image storage for captured plant photos - Implement dependency injection container for testability - Include accessibility support throughout the app Bug fixes in this commit: - Fix Trefle API decoding by removing duplicate CodingKeys - Fix LocalCachedImage to load from correct PlantImages directory - Set dateAdded when saving plants for proper collection sorting Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 12:18:01 -06:00
parent d3ab29eb84
commit 136dfbae33
187 changed files with 69001 additions and 0 deletions
--- a/Docs/phase-1-implementation-plan.md
+++ b/Docs/phase-1-implementation-plan.md
@@ -0,0 +1,482 @@
+# Phase 1: Knowledge Base Creation - Implementation Plan
+
+## Overview
+
+**Goal:** Build structured plant knowledge from `data/houseplants_list.json`, enriching with taxonomy and characteristics.
+
+**Input:** `data/houseplants_list.json` (2,278 plants, 11 categories, 50 families)
+
+**Output:** Enriched plant knowledge base (JSON + SQLite) with ~500-2000 validated entries
+
+---
+
+## Current Data Assessment
+
+| Attribute | Current State | Required Enhancement |
+|-----------|---------------|---------------------|
+| Total Plants | 2,278 | Validate, deduplicate |
+| Scientific Names | Present | Validate binomial nomenclature |
+| Common Names | Array per plant | Normalize, cross-reference |
+| Family | 50 families | Validate against taxonomy |
+| Category | 11 categories | Map to target types |
+| Physical Characteristics | **Missing** | **Must add** |
+| Regional/Seasonal Info | **Missing** | **Must add** |
+
+---
+
+## Task Breakdown
+
+### Task 1.1: Load and Validate Plant List
+
+**Objective:** Parse JSON and validate data integrity
+
+**Actions:**
+- [ ] Create Python script `scripts/validate_plant_list.py`
+- [ ] Load `data/houseplants_list.json`
+- [ ] Validate JSON schema:
+  - Each plant has `scientific_name` (required, string)
+  - Each plant has `common_names` (required, array of strings)
+  - Each plant has `family` (required, string)
+  - Each plant has `category` (required, string)
+- [ ] Identify malformed entries (missing fields, wrong types)
+- [ ] Generate validation report: `output/validation_report.json`
+
+**Validation Criteria:**
+- 0 malformed entries
+- All required fields present
+- No null/empty scientific names
+
+**Output File:** `scripts/validate_plant_list.py`
+
+---
+
+### Task 1.2: Normalize and Standardize Plant Names
+
+**Objective:** Ensure consistent naming conventions
+
+**Actions:**
+- [ ] Create `scripts/normalize_names.py`
+- [ ] Scientific name normalization:
+  - Capitalize genus, lowercase species (e.g., "Philodendron hederaceum")
+  - Handle cultivar notation: 'Cultivar Name' in single quotes
+  - Validate binomial/trinomial format
+- [ ] Common name normalization:
+  - Title case standardization
+  - Remove leading/trailing whitespace
+  - Standardize punctuation
+- [ ] Handle hybrid notation (×) consistently
+- [ ] Flag names that don't match expected patterns
+
+**Validation Criteria:**
+- 100% of scientific names follow binomial nomenclature pattern
+- No leading/trailing whitespace in any names
+- Consistent cultivar notation
+
+**Output File:** `data/normalized_plants.json`
+
+---
+
+### Task 1.3: Create Deduplicated Master List
+
+**Objective:** Remove duplicates while preserving unique cultivars
+
+**Actions:**
+- [ ] Create `scripts/deduplicate_plants.py`
+- [ ] Define deduplication rules:
+  - Exact scientific name match = duplicate
+  - Different cultivars of same species = keep both
+  - Same plant, different common names = merge common names
+- [ ] Identify potential duplicates using fuzzy matching on:
+  - Scientific names (Levenshtein distance < 3)
+  - Common names that are identical
+- [ ] Generate duplicate candidates report for manual review
+- [ ] Merge duplicates: combine common names arrays
+- [ ] Assign unique plant IDs (`plant_001`, `plant_002`, etc.)
+
+**Validation Criteria:**
+- No exact scientific name duplicates
+- All plants have unique IDs
+- Merge log documenting all deduplication decisions
+
+**Output Files:**
+- `data/master_plant_list.json`
+- `output/deduplication_report.json`
+
+---
+
+### Task 1.4: Enrich with Physical Characteristics
+
+**Objective:** Add visual and physical attributes for each plant
+
+**Actions:**
+- [ ] Create `scripts/enrich_characteristics.py`
+- [ ] Define characteristic schema:
+  ```json
+  {
+    "characteristics": {
+      "leaf_shape": ["heart", "oval", "linear", "palmate", "lobed", "needle", "rosette"],
+      "leaf_color": ["green", "variegated", "red", "purple", "silver", "yellow"],
+      "leaf_texture": ["glossy", "matte", "fuzzy", "waxy", "smooth", "rough"],
+      "growth_habit": ["upright", "trailing", "climbing", "rosette", "bushy", "tree-form"],
+      "mature_height_cm": [0-500],
+      "mature_width_cm": [0-300],
+      "flowering": true/false,
+      "flower_colors": ["white", "pink", "red", "yellow", "orange", "purple", "blue"],
+      "bloom_season": ["spring", "summer", "fall", "winter", "year-round"]
+    }
+  }
+  ```
+- [ ] Source characteristics data:
+  - **Primary:** Web scraping from botanical databases (RHS, Missouri Botanical Garden)
+  - **Secondary:** Wikipedia API for plant descriptions
+  - **Fallback:** Family/genus-level defaults
+- [ ] Implement web fetching with rate limiting
+- [ ] Parse and extract characteristics from HTML/JSON responses
+- [ ] Store enrichment sources for traceability
+
+**Validation Criteria:**
+- ≥80% of plants have leaf_shape populated
+- ≥80% of plants have growth_habit populated
+- ≥60% of plants have height/width estimates
+- 100% of plants have flowering boolean
+
+**Output Files:**
+- `data/enriched_plants.json`
+- `output/enrichment_coverage_report.json`
+
+---
+
+### Task 1.5: Categorize Plants by Type
+
+**Objective:** Map existing categories to target classification system
+
+**Actions:**
+- [ ] Create `scripts/categorize_plants.py`
+- [ ] Define target categories (per plan):
+  ```
+  - Flowering Plant
+  - Tree / Palm
+  - Shrub / Bush
+  - Succulent / Cactus
+  - Fern
+  - Vine / Trailing
+  - Herb
+  - Orchid
+  - Bromeliad
+  - Air Plant
+  ```
+- [ ] Create mapping from current 11 categories:
+  ```
+  Current → Target
+  ─────────────────────────────
+  Air Plant → Air Plant
+  Bromeliad → Bromeliad
+  Cactus → Succulent / Cactus
+  Fern → Fern
+  Flowering Houseplant → Flowering Plant
+  Herb → Herb
+  Orchid → Orchid
+  Palm → Tree / Palm
+  Succulent → Succulent / Cactus
+  Trailing/Climbing → Vine / Trailing
+  Tropical Foliage → [Requires secondary classification]
+  ```
+- [ ] Handle "Tropical Foliage" (largest category):
+  - Use growth_habit from Task 1.4 to sub-classify
+  - Cross-reference family for tree-form species (Ficus → Tree)
+- [ ] Add `primary_category` and `secondary_categories` fields
+
+**Validation Criteria:**
+- 100% of plants have primary_category assigned
+- No plants remain as "Tropical Foliage" (all reclassified)
+- Category distribution documented
+
+**Output File:** `data/categorized_plants.json`
+
+---
+
+### Task 1.6: Map Common Names to Scientific Names
+
+**Objective:** Create bidirectional lookup for name resolution
+
+**Actions:**
+- [ ] Create `scripts/build_name_index.py`
+- [ ] Build scientific → common names map (already exists, validate)
+- [ ] Build common → scientific names map (reverse lookup)
+- [ ] Handle ambiguous common names (multiple plants share same common name):
+  - Flag conflicts
+  - Add disambiguation notes
+- [ ] Validate against external taxonomy:
+  - World Flora Online (WFO) API
+  - GBIF (Global Biodiversity Information Facility)
+- [ ] Add `verified` boolean for taxonomically confirmed names
+- [ ] Store alternative/deprecated scientific names as synonyms
+
+**Validation Criteria:**
+- Reverse lookup resolves ≥95% of common names unambiguously
+- ≥70% of scientific names verified against WFO/GBIF
+- Synonym list for deprecated names
+
+**Output Files:**
+- `data/name_index.json`
+- `output/name_ambiguity_report.json`
+
+---
+
+### Task 1.7: Add Regional/Seasonal Information
+
+**Objective:** Add native regions, hardiness zones, and seasonal behaviors
+
+**Actions:**
+- [ ] Create `scripts/add_regional_data.py`
+- [ ] Define regional schema:
+  ```json
+  {
+    "regional_info": {
+      "native_regions": ["South America", "Southeast Asia", "Africa", ...],
+      "native_countries": ["Brazil", "Thailand", ...],
+      "usda_hardiness_zones": ["9a", "9b", "10a", ...],
+      "indoor_outdoor": "indoor_only" | "outdoor_temperate" | "outdoor_tropical",
+      "seasonal_behavior": "evergreen" | "deciduous" | "dormant_winter"
+    }
+  }
+  ```
+- [ ] Source regional data:
+  - USDA Plants Database API
+  - Wikipedia (native range sections)
+  - Existing botanical databases
+- [ ] Map families to typical native regions as fallback
+- [ ] Add care-relevant seasonality (dormancy periods, bloom times)
+
+**Validation Criteria:**
+- ≥70% of plants have native_regions populated
+- ≥60% of plants have hardiness zones
+- 100% of plants have indoor_outdoor classification
+
+**Output File:** `data/final_knowledge_base.json`
+
+---
+
+## Final Knowledge Base Schema
+
+```json
+{
+  "version": "1.0.0",
+  "generated_date": "YYYY-MM-DD",
+  "total_plants": 2000,
+  "plants": [
+    {
+      "id": "plant_001",
+      "scientific_name": "Philodendron hederaceum",
+      "common_names": ["Heartleaf Philodendron", "Sweetheart Plant"],
+      "synonyms": [],
+      "family": "Araceae",
+      "genus": "Philodendron",
+      "species": "hederaceum",
+      "cultivar": null,
+      "primary_category": "Vine / Trailing",
+      "secondary_categories": ["Tropical Foliage"],
+      "characteristics": {
+        "leaf_shape": "heart",
+        "leaf_color": ["green"],
+        "leaf_texture": "glossy",
+        "growth_habit": "trailing",
+        "mature_height_cm": 120,
+        "mature_width_cm": 60,
+        "flowering": true,
+        "flower_colors": ["white", "green"],
+        "bloom_season": "rarely indoors"
+      },
+      "regional_info": {
+        "native_regions": ["Central America", "South America"],
+        "native_countries": ["Mexico", "Brazil"],
+        "usda_hardiness_zones": ["10b", "11", "12"],
+        "indoor_outdoor": "indoor_only",
+        "seasonal_behavior": "evergreen"
+      },
+      "taxonomy_verified": true,
+      "data_sources": ["RHS", "Missouri Botanical Garden"],
+      "last_updated": "YYYY-MM-DD"
+    }
+  ]
+}
+```
+
+---
+
+## Output File Structure
+
+```
+PlantGuide/
+├── data/
+│   ├── houseplants_list.json          # Original input (unchanged)
+│   ├── normalized_plants.json         # Task 1.2 output
+│   ├── master_plant_list.json         # Task 1.3 output
+│   ├── enriched_plants.json           # Task 1.4 output
+│   ├── categorized_plants.json        # Task 1.5 output
+│   ├── name_index.json                # Task 1.6 output
+│   └── final_knowledge_base.json      # Task 1.7 output (FINAL)
+├── scripts/
+│   ├── validate_plant_list.py         # Task 1.1
+│   ├── normalize_names.py             # Task 1.2
+│   ├── deduplicate_plants.py          # Task 1.3
+│   ├── enrich_characteristics.py      # Task 1.4
+│   ├── categorize_plants.py           # Task 1.5
+│   ├── build_name_index.py            # Task 1.6
+│   └── add_regional_data.py           # Task 1.7
+├── output/
+│   ├── validation_report.json
+│   ├── deduplication_report.json
+│   ├── enrichment_coverage_report.json
+│   └── name_ambiguity_report.json
+└── knowledge_base/
+    ├── plants.db                       # SQLite database
+    └── schema.sql                      # Database schema
+```
+
+---
+
+## SQLite Database Schema
+
+```sql
+-- Task: Create SQLite database alongside JSON
+
+CREATE TABLE plants (
+    id TEXT PRIMARY KEY,
+    scientific_name TEXT NOT NULL UNIQUE,
+    family TEXT NOT NULL,
+    genus TEXT,
+    species TEXT,
+    cultivar TEXT,
+    primary_category TEXT NOT NULL,
+    taxonomy_verified BOOLEAN DEFAULT FALSE,
+    last_updated DATE
+);
+
+CREATE TABLE common_names (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    plant_id TEXT REFERENCES plants(id),
+    common_name TEXT NOT NULL,
+    is_primary BOOLEAN DEFAULT FALSE
+);
+
+CREATE TABLE characteristics (
+    plant_id TEXT PRIMARY KEY REFERENCES plants(id),
+    leaf_shape TEXT,
+    leaf_color TEXT,  -- JSON array
+    leaf_texture TEXT,
+    growth_habit TEXT,
+    mature_height_cm INTEGER,
+    mature_width_cm INTEGER,
+    flowering BOOLEAN,
+    flower_colors TEXT,  -- JSON array
+    bloom_season TEXT
+);
+
+CREATE TABLE regional_info (
+    plant_id TEXT PRIMARY KEY REFERENCES plants(id),
+    native_regions TEXT,  -- JSON array
+    native_countries TEXT,  -- JSON array
+    usda_hardiness_zones TEXT,  -- JSON array
+    indoor_outdoor TEXT,
+    seasonal_behavior TEXT
+);
+
+CREATE TABLE synonyms (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    plant_id TEXT REFERENCES plants(id),
+    synonym TEXT NOT NULL
+);
+
+-- Indexes for common queries
+CREATE INDEX idx_plants_family ON plants(family);
+CREATE INDEX idx_plants_category ON plants(primary_category);
+CREATE INDEX idx_common_names_name ON common_names(common_name);
+CREATE INDEX idx_characteristics_habit ON characteristics(growth_habit);
+```
+
+---
+
+## End Phase Validation Checklist
+
+### Data Quality Gates
+
+| Metric | Target | Validation Method |
+|--------|--------|-------------------|
+| Total validated plants | ≥1,500 | Count after deduplication |
+| Schema compliance | 100% | JSON schema validation |
+| Scientific name format | 100% valid | Regex: `^[A-Z][a-z]+ [a-z]+` |
+| Plants with characteristics | ≥80% | Field coverage check |
+| Plants with regional data | ≥70% | Field coverage check |
+| Category coverage | 100% | No "Unknown" categories |
+| Name disambiguation | ≥95% | Ambiguity report review |
+| Taxonomy verification | ≥70% | WFO/GBIF cross-reference |
+
+### Functional Validation
+
+- [ ] **Query Test 1:** Lookup by scientific name returns full plant record
+- [ ] **Query Test 2:** Lookup by common name returns correct plant(s)
+- [ ] **Query Test 3:** Filter by category returns expected results
+- [ ] **Query Test 4:** Filter by characteristics (leaf_shape=heart) works
+- [ ] **Query Test 5:** Regional filter (hardiness_zone=10a) works
+
+### Deliverable Checklist
+
+- [ ] `data/final_knowledge_base.json` exists and passes schema validation
+- [ ] `knowledge_base/plants.db` SQLite database is populated
+- [ ] All scripts in `scripts/` directory are functional
+- [ ] All reports in `output/` directory are generated
+- [ ] Data coverage meets minimum thresholds
+- [ ] No critical validation errors in reports
+
+### Phase Exit Criteria
+
+**Phase 1 is COMPLETE when:**
+
+1. ✅ Final knowledge base contains ≥1,500 validated plant entries
+2. ✅ ≥80% of plants have physical characteristics populated
+3. ✅ ≥70% of plants have regional information
+4. ✅ 100% of plants have valid categories (no "Unknown")
+5. ✅ SQLite database mirrors JSON knowledge base
+6. ✅ All validation tests pass
+7. ✅ Documentation updated with final counts and coverage metrics
+
+---
+
+## Execution Order
+
+```
+Task 1.1 (Validate)
+    ↓
+Task 1.2 (Normalize)
+    ↓
+Task 1.3 (Deduplicate)
+    ↓
+    ├─→ Task 1.4 (Characteristics) ─┐
+    │                               │
+    └─→ Task 1.6 (Name Index) ──────┤
+    │                               │
+    └─→ Task 1.7 (Regional) ────────┤
+                                    ↓
+                            Task 1.5 (Categorize)
+                            [Depends on 1.4 for Tropical Foliage]
+                                    ↓
+                            Final Assembly
+                            (JSON + SQLite)
+                                    ↓
+                            Validation Suite
+```
+
+**Note:** Tasks 1.4, 1.6, and 1.7 can run in parallel after Task 1.3 completes. Task 1.5 depends on Task 1.4 output for sub-categorizing Tropical Foliage plants.
+
+---
+
+## Risk Mitigation
+
+| Risk | Mitigation |
+|------|------------|
+| External API rate limits | Implement caching, request throttling |
+| Incomplete enrichment data | Use family-level defaults, document gaps |
+| Ambiguous common names | Flag for manual review, prioritize top plants |
+| Taxonomy database mismatches | Trust WFO as primary source |
+| Large dataset processing | Process in batches, checkpoint progress |