Add PlantGuide iOS app with plant identification and care management
- Implement camera capture and plant identification workflow - Add Core Data persistence for plants, care schedules, and cached API data - Create collection view with grid/list layouts and filtering - Build plant detail views with care information display - Integrate Trefle botanical API for plant care data - Add local image storage for captured plant photos - Implement dependency injection container for testability - Include accessibility support throughout the app Bug fixes in this commit: - Fix Trefle API decoding by removing duplicate CodingKeys - Fix LocalCachedImage to load from correct PlantImages directory - Set dateAdded when saving plants for proper collection sorting Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
482
Docs/phase-1-implementation-plan.md
Normal file
482
Docs/phase-1-implementation-plan.md
Normal file
@@ -0,0 +1,482 @@
|
||||
# Phase 1: Knowledge Base Creation - Implementation Plan
|
||||
|
||||
## Overview
|
||||
|
||||
**Goal:** Build structured plant knowledge from `data/houseplants_list.json`, enriching with taxonomy and characteristics.
|
||||
|
||||
**Input:** `data/houseplants_list.json` (2,278 plants, 11 categories, 50 families)
|
||||
|
||||
**Output:** Enriched plant knowledge base (JSON + SQLite) with ~500-2000 validated entries
|
||||
|
||||
---
|
||||
|
||||
## Current Data Assessment
|
||||
|
||||
| Attribute | Current State | Required Enhancement |
|
||||
|-----------|---------------|---------------------|
|
||||
| Total Plants | 2,278 | Validate, deduplicate |
|
||||
| Scientific Names | Present | Validate binomial nomenclature |
|
||||
| Common Names | Array per plant | Normalize, cross-reference |
|
||||
| Family | 50 families | Validate against taxonomy |
|
||||
| Category | 11 categories | Map to target types |
|
||||
| Physical Characteristics | **Missing** | **Must add** |
|
||||
| Regional/Seasonal Info | **Missing** | **Must add** |
|
||||
|
||||
---
|
||||
|
||||
## Task Breakdown
|
||||
|
||||
### Task 1.1: Load and Validate Plant List
|
||||
|
||||
**Objective:** Parse JSON and validate data integrity
|
||||
|
||||
**Actions:**
|
||||
- [ ] Create Python script `scripts/validate_plant_list.py`
|
||||
- [ ] Load `data/houseplants_list.json`
|
||||
- [ ] Validate JSON schema:
|
||||
- Each plant has `scientific_name` (required, string)
|
||||
- Each plant has `common_names` (required, array of strings)
|
||||
- Each plant has `family` (required, string)
|
||||
- Each plant has `category` (required, string)
|
||||
- [ ] Identify malformed entries (missing fields, wrong types)
|
||||
- [ ] Generate validation report: `output/validation_report.json`
|
||||
|
||||
**Validation Criteria:**
|
||||
- 0 malformed entries
|
||||
- All required fields present
|
||||
- No null/empty scientific names
|
||||
|
||||
**Output File:** `scripts/validate_plant_list.py`
|
||||
|
||||
---
|
||||
|
||||
### Task 1.2: Normalize and Standardize Plant Names
|
||||
|
||||
**Objective:** Ensure consistent naming conventions
|
||||
|
||||
**Actions:**
|
||||
- [ ] Create `scripts/normalize_names.py`
|
||||
- [ ] Scientific name normalization:
|
||||
- Capitalize genus, lowercase species (e.g., "Philodendron hederaceum")
|
||||
- Handle cultivar notation: 'Cultivar Name' in single quotes
|
||||
- Validate binomial/trinomial format
|
||||
- [ ] Common name normalization:
|
||||
- Title case standardization
|
||||
- Remove leading/trailing whitespace
|
||||
- Standardize punctuation
|
||||
- [ ] Handle hybrid notation (×) consistently
|
||||
- [ ] Flag names that don't match expected patterns
|
||||
|
||||
**Validation Criteria:**
|
||||
- 100% of scientific names follow binomial nomenclature pattern
|
||||
- No leading/trailing whitespace in any names
|
||||
- Consistent cultivar notation
|
||||
|
||||
**Output File:** `data/normalized_plants.json`
|
||||
|
||||
---
|
||||
|
||||
### Task 1.3: Create Deduplicated Master List
|
||||
|
||||
**Objective:** Remove duplicates while preserving unique cultivars
|
||||
|
||||
**Actions:**
|
||||
- [ ] Create `scripts/deduplicate_plants.py`
|
||||
- [ ] Define deduplication rules:
|
||||
- Exact scientific name match = duplicate
|
||||
- Different cultivars of same species = keep both
|
||||
- Same plant, different common names = merge common names
|
||||
- [ ] Identify potential duplicates using fuzzy matching on:
|
||||
- Scientific names (Levenshtein distance < 3)
|
||||
- Common names that are identical
|
||||
- [ ] Generate duplicate candidates report for manual review
|
||||
- [ ] Merge duplicates: combine common names arrays
|
||||
- [ ] Assign unique plant IDs (`plant_001`, `plant_002`, etc.)
|
||||
|
||||
**Validation Criteria:**
|
||||
- No exact scientific name duplicates
|
||||
- All plants have unique IDs
|
||||
- Merge log documenting all deduplication decisions
|
||||
|
||||
**Output Files:**
|
||||
- `data/master_plant_list.json`
|
||||
- `output/deduplication_report.json`
|
||||
|
||||
---
|
||||
|
||||
### Task 1.4: Enrich with Physical Characteristics
|
||||
|
||||
**Objective:** Add visual and physical attributes for each plant
|
||||
|
||||
**Actions:**
|
||||
- [ ] Create `scripts/enrich_characteristics.py`
|
||||
- [ ] Define characteristic schema:
|
||||
```json
|
||||
{
|
||||
"characteristics": {
|
||||
"leaf_shape": ["heart", "oval", "linear", "palmate", "lobed", "needle", "rosette"],
|
||||
"leaf_color": ["green", "variegated", "red", "purple", "silver", "yellow"],
|
||||
"leaf_texture": ["glossy", "matte", "fuzzy", "waxy", "smooth", "rough"],
|
||||
"growth_habit": ["upright", "trailing", "climbing", "rosette", "bushy", "tree-form"],
|
||||
"mature_height_cm": [0-500],
|
||||
"mature_width_cm": [0-300],
|
||||
"flowering": true/false,
|
||||
"flower_colors": ["white", "pink", "red", "yellow", "orange", "purple", "blue"],
|
||||
"bloom_season": ["spring", "summer", "fall", "winter", "year-round"]
|
||||
}
|
||||
}
|
||||
```
|
||||
- [ ] Source characteristics data:
|
||||
- **Primary:** Web scraping from botanical databases (RHS, Missouri Botanical Garden)
|
||||
- **Secondary:** Wikipedia API for plant descriptions
|
||||
- **Fallback:** Family/genus-level defaults
|
||||
- [ ] Implement web fetching with rate limiting
|
||||
- [ ] Parse and extract characteristics from HTML/JSON responses
|
||||
- [ ] Store enrichment sources for traceability
|
||||
|
||||
**Validation Criteria:**
|
||||
- ≥80% of plants have leaf_shape populated
|
||||
- ≥80% of plants have growth_habit populated
|
||||
- ≥60% of plants have height/width estimates
|
||||
- 100% of plants have flowering boolean
|
||||
|
||||
**Output Files:**
|
||||
- `data/enriched_plants.json`
|
||||
- `output/enrichment_coverage_report.json`
|
||||
|
||||
---
|
||||
|
||||
### Task 1.5: Categorize Plants by Type
|
||||
|
||||
**Objective:** Map existing categories to target classification system
|
||||
|
||||
**Actions:**
|
||||
- [ ] Create `scripts/categorize_plants.py`
|
||||
- [ ] Define target categories (per plan):
|
||||
```
|
||||
- Flowering Plant
|
||||
- Tree / Palm
|
||||
- Shrub / Bush
|
||||
- Succulent / Cactus
|
||||
- Fern
|
||||
- Vine / Trailing
|
||||
- Herb
|
||||
- Orchid
|
||||
- Bromeliad
|
||||
- Air Plant
|
||||
```
|
||||
- [ ] Create mapping from current 11 categories:
|
||||
```
|
||||
Current → Target
|
||||
─────────────────────────────
|
||||
Air Plant → Air Plant
|
||||
Bromeliad → Bromeliad
|
||||
Cactus → Succulent / Cactus
|
||||
Fern → Fern
|
||||
Flowering Houseplant → Flowering Plant
|
||||
Herb → Herb
|
||||
Orchid → Orchid
|
||||
Palm → Tree / Palm
|
||||
Succulent → Succulent / Cactus
|
||||
Trailing/Climbing → Vine / Trailing
|
||||
Tropical Foliage → [Requires secondary classification]
|
||||
```
|
||||
- [ ] Handle "Tropical Foliage" (largest category):
|
||||
- Use growth_habit from Task 1.4 to sub-classify
|
||||
- Cross-reference family for tree-form species (Ficus → Tree)
|
||||
- [ ] Add `primary_category` and `secondary_categories` fields
|
||||
|
||||
**Validation Criteria:**
|
||||
- 100% of plants have primary_category assigned
|
||||
- No plants remain as "Tropical Foliage" (all reclassified)
|
||||
- Category distribution documented
|
||||
|
||||
**Output File:** `data/categorized_plants.json`
|
||||
|
||||
---
|
||||
|
||||
### Task 1.6: Map Common Names to Scientific Names
|
||||
|
||||
**Objective:** Create bidirectional lookup for name resolution
|
||||
|
||||
**Actions:**
|
||||
- [ ] Create `scripts/build_name_index.py`
|
||||
- [ ] Build scientific → common names map (already exists, validate)
|
||||
- [ ] Build common → scientific names map (reverse lookup)
|
||||
- [ ] Handle ambiguous common names (multiple plants share same common name):
|
||||
- Flag conflicts
|
||||
- Add disambiguation notes
|
||||
- [ ] Validate against external taxonomy:
|
||||
- World Flora Online (WFO) API
|
||||
- GBIF (Global Biodiversity Information Facility)
|
||||
- [ ] Add `verified` boolean for taxonomically confirmed names
|
||||
- [ ] Store alternative/deprecated scientific names as synonyms
|
||||
|
||||
**Validation Criteria:**
|
||||
- Reverse lookup resolves ≥95% of common names unambiguously
|
||||
- ≥70% of scientific names verified against WFO/GBIF
|
||||
- Synonym list for deprecated names
|
||||
|
||||
**Output Files:**
|
||||
- `data/name_index.json`
|
||||
- `output/name_ambiguity_report.json`
|
||||
|
||||
---
|
||||
|
||||
### Task 1.7: Add Regional/Seasonal Information
|
||||
|
||||
**Objective:** Add native regions, hardiness zones, and seasonal behaviors
|
||||
|
||||
**Actions:**
|
||||
- [ ] Create `scripts/add_regional_data.py`
|
||||
- [ ] Define regional schema:
|
||||
```json
|
||||
{
|
||||
"regional_info": {
|
||||
"native_regions": ["South America", "Southeast Asia", "Africa", ...],
|
||||
"native_countries": ["Brazil", "Thailand", ...],
|
||||
"usda_hardiness_zones": ["9a", "9b", "10a", ...],
|
||||
"indoor_outdoor": "indoor_only" | "outdoor_temperate" | "outdoor_tropical",
|
||||
"seasonal_behavior": "evergreen" | "deciduous" | "dormant_winter"
|
||||
}
|
||||
}
|
||||
```
|
||||
- [ ] Source regional data:
|
||||
- USDA Plants Database API
|
||||
- Wikipedia (native range sections)
|
||||
- Existing botanical databases
|
||||
- [ ] Map families to typical native regions as fallback
|
||||
- [ ] Add care-relevant seasonality (dormancy periods, bloom times)
|
||||
|
||||
**Validation Criteria:**
|
||||
- ≥70% of plants have native_regions populated
|
||||
- ≥60% of plants have hardiness zones
|
||||
- 100% of plants have indoor_outdoor classification
|
||||
|
||||
**Output File:** `data/final_knowledge_base.json`
|
||||
|
||||
---
|
||||
|
||||
## Final Knowledge Base Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"version": "1.0.0",
|
||||
"generated_date": "YYYY-MM-DD",
|
||||
"total_plants": 2000,
|
||||
"plants": [
|
||||
{
|
||||
"id": "plant_001",
|
||||
"scientific_name": "Philodendron hederaceum",
|
||||
"common_names": ["Heartleaf Philodendron", "Sweetheart Plant"],
|
||||
"synonyms": [],
|
||||
"family": "Araceae",
|
||||
"genus": "Philodendron",
|
||||
"species": "hederaceum",
|
||||
"cultivar": null,
|
||||
"primary_category": "Vine / Trailing",
|
||||
"secondary_categories": ["Tropical Foliage"],
|
||||
"characteristics": {
|
||||
"leaf_shape": "heart",
|
||||
"leaf_color": ["green"],
|
||||
"leaf_texture": "glossy",
|
||||
"growth_habit": "trailing",
|
||||
"mature_height_cm": 120,
|
||||
"mature_width_cm": 60,
|
||||
"flowering": true,
|
||||
"flower_colors": ["white", "green"],
|
||||
"bloom_season": "rarely indoors"
|
||||
},
|
||||
"regional_info": {
|
||||
"native_regions": ["Central America", "South America"],
|
||||
"native_countries": ["Mexico", "Brazil"],
|
||||
"usda_hardiness_zones": ["10b", "11", "12"],
|
||||
"indoor_outdoor": "indoor_only",
|
||||
"seasonal_behavior": "evergreen"
|
||||
},
|
||||
"taxonomy_verified": true,
|
||||
"data_sources": ["RHS", "Missouri Botanical Garden"],
|
||||
"last_updated": "YYYY-MM-DD"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Output File Structure
|
||||
|
||||
```
|
||||
PlantGuide/
|
||||
├── data/
|
||||
│ ├── houseplants_list.json # Original input (unchanged)
|
||||
│ ├── normalized_plants.json # Task 1.2 output
|
||||
│ ├── master_plant_list.json # Task 1.3 output
|
||||
│ ├── enriched_plants.json # Task 1.4 output
|
||||
│ ├── categorized_plants.json # Task 1.5 output
|
||||
│ ├── name_index.json # Task 1.6 output
|
||||
│ └── final_knowledge_base.json # Task 1.7 output (FINAL)
|
||||
├── scripts/
|
||||
│ ├── validate_plant_list.py # Task 1.1
|
||||
│ ├── normalize_names.py # Task 1.2
|
||||
│ ├── deduplicate_plants.py # Task 1.3
|
||||
│ ├── enrich_characteristics.py # Task 1.4
|
||||
│ ├── categorize_plants.py # Task 1.5
|
||||
│ ├── build_name_index.py # Task 1.6
|
||||
│ └── add_regional_data.py # Task 1.7
|
||||
├── output/
|
||||
│ ├── validation_report.json
|
||||
│ ├── deduplication_report.json
|
||||
│ ├── enrichment_coverage_report.json
|
||||
│ └── name_ambiguity_report.json
|
||||
└── knowledge_base/
|
||||
├── plants.db # SQLite database
|
||||
└── schema.sql # Database schema
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## SQLite Database Schema
|
||||
|
||||
```sql
|
||||
-- Task: Create SQLite database alongside JSON
|
||||
|
||||
CREATE TABLE plants (
|
||||
id TEXT PRIMARY KEY,
|
||||
scientific_name TEXT NOT NULL UNIQUE,
|
||||
family TEXT NOT NULL,
|
||||
genus TEXT,
|
||||
species TEXT,
|
||||
cultivar TEXT,
|
||||
primary_category TEXT NOT NULL,
|
||||
taxonomy_verified BOOLEAN DEFAULT FALSE,
|
||||
last_updated DATE
|
||||
);
|
||||
|
||||
CREATE TABLE common_names (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
plant_id TEXT REFERENCES plants(id),
|
||||
common_name TEXT NOT NULL,
|
||||
is_primary BOOLEAN DEFAULT FALSE
|
||||
);
|
||||
|
||||
CREATE TABLE characteristics (
|
||||
plant_id TEXT PRIMARY KEY REFERENCES plants(id),
|
||||
leaf_shape TEXT,
|
||||
leaf_color TEXT, -- JSON array
|
||||
leaf_texture TEXT,
|
||||
growth_habit TEXT,
|
||||
mature_height_cm INTEGER,
|
||||
mature_width_cm INTEGER,
|
||||
flowering BOOLEAN,
|
||||
flower_colors TEXT, -- JSON array
|
||||
bloom_season TEXT
|
||||
);
|
||||
|
||||
CREATE TABLE regional_info (
|
||||
plant_id TEXT PRIMARY KEY REFERENCES plants(id),
|
||||
native_regions TEXT, -- JSON array
|
||||
native_countries TEXT, -- JSON array
|
||||
usda_hardiness_zones TEXT, -- JSON array
|
||||
indoor_outdoor TEXT,
|
||||
seasonal_behavior TEXT
|
||||
);
|
||||
|
||||
CREATE TABLE synonyms (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
plant_id TEXT REFERENCES plants(id),
|
||||
synonym TEXT NOT NULL
|
||||
);
|
||||
|
||||
-- Indexes for common queries
|
||||
CREATE INDEX idx_plants_family ON plants(family);
|
||||
CREATE INDEX idx_plants_category ON plants(primary_category);
|
||||
CREATE INDEX idx_common_names_name ON common_names(common_name);
|
||||
CREATE INDEX idx_characteristics_habit ON characteristics(growth_habit);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## End Phase Validation Checklist
|
||||
|
||||
### Data Quality Gates
|
||||
|
||||
| Metric | Target | Validation Method |
|
||||
|--------|--------|-------------------|
|
||||
| Total validated plants | ≥1,500 | Count after deduplication |
|
||||
| Schema compliance | 100% | JSON schema validation |
|
||||
| Scientific name format | 100% valid | Regex: `^[A-Z][a-z]+ [a-z]+` |
|
||||
| Plants with characteristics | ≥80% | Field coverage check |
|
||||
| Plants with regional data | ≥70% | Field coverage check |
|
||||
| Category coverage | 100% | No "Unknown" categories |
|
||||
| Name disambiguation | ≥95% | Ambiguity report review |
|
||||
| Taxonomy verification | ≥70% | WFO/GBIF cross-reference |
|
||||
|
||||
### Functional Validation
|
||||
|
||||
- [ ] **Query Test 1:** Lookup by scientific name returns full plant record
|
||||
- [ ] **Query Test 2:** Lookup by common name returns correct plant(s)
|
||||
- [ ] **Query Test 3:** Filter by category returns expected results
|
||||
- [ ] **Query Test 4:** Filter by characteristics (leaf_shape=heart) works
|
||||
- [ ] **Query Test 5:** Regional filter (hardiness_zone=10a) works
|
||||
|
||||
### Deliverable Checklist
|
||||
|
||||
- [ ] `data/final_knowledge_base.json` exists and passes schema validation
|
||||
- [ ] `knowledge_base/plants.db` SQLite database is populated
|
||||
- [ ] All scripts in `scripts/` directory are functional
|
||||
- [ ] All reports in `output/` directory are generated
|
||||
- [ ] Data coverage meets minimum thresholds
|
||||
- [ ] No critical validation errors in reports
|
||||
|
||||
### Phase Exit Criteria
|
||||
|
||||
**Phase 1 is COMPLETE when:**
|
||||
|
||||
1. ✅ Final knowledge base contains ≥1,500 validated plant entries
|
||||
2. ✅ ≥80% of plants have physical characteristics populated
|
||||
3. ✅ ≥70% of plants have regional information
|
||||
4. ✅ 100% of plants have valid categories (no "Unknown")
|
||||
5. ✅ SQLite database mirrors JSON knowledge base
|
||||
6. ✅ All validation tests pass
|
||||
7. ✅ Documentation updated with final counts and coverage metrics
|
||||
|
||||
---
|
||||
|
||||
## Execution Order
|
||||
|
||||
```
|
||||
Task 1.1 (Validate)
|
||||
↓
|
||||
Task 1.2 (Normalize)
|
||||
↓
|
||||
Task 1.3 (Deduplicate)
|
||||
↓
|
||||
├─→ Task 1.4 (Characteristics) ─┐
|
||||
│ │
|
||||
└─→ Task 1.6 (Name Index) ──────┤
|
||||
│ │
|
||||
└─→ Task 1.7 (Regional) ────────┤
|
||||
↓
|
||||
Task 1.5 (Categorize)
|
||||
[Depends on 1.4 for Tropical Foliage]
|
||||
↓
|
||||
Final Assembly
|
||||
(JSON + SQLite)
|
||||
↓
|
||||
Validation Suite
|
||||
```
|
||||
|
||||
**Note:** Tasks 1.4, 1.6, and 1.7 can run in parallel after Task 1.3 completes. Task 1.5 depends on Task 1.4 output for sub-categorizing Tropical Foliage plants.
|
||||
|
||||
---
|
||||
|
||||
## Risk Mitigation
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|------------|
|
||||
| External API rate limits | Implement caching, request throttling |
|
||||
| Incomplete enrichment data | Use family-level defaults, document gaps |
|
||||
| Ambiguous common names | Flag for manual review, prioritize top plants |
|
||||
| Taxonomy database mismatches | Trust WFO as primary source |
|
||||
| Large dataset processing | Process in batches, checkpoint progress |
|
||||
Reference in New Issue
Block a user