Files

Trey t 136dfbae33 Add PlantGuide iOS app with plant identification and care management

- Implement camera capture and plant identification workflow
- Add Core Data persistence for plants, care schedules, and cached API data
- Create collection view with grid/list layouts and filtering
- Build plant detail views with care information display
- Integrate Trefle botanical API for plant care data
- Add local image storage for captured plant photos
- Implement dependency injection container for testability
- Include accessibility support throughout the app

Bug fixes in this commit:
- Fix Trefle API decoding by removing duplicate CodingKeys
- Fix LocalCachedImage to load from correct PlantImages directory
- Set dateAdded when saving plants for proper collection sorting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-23 12:18:01 -06:00

15 KiB

Raw Permalink Blame History

Phase 1: Knowledge Base Creation - Implementation Plan

Overview

Goal: Build structured plant knowledge from data/houseplants_list.json, enriching with taxonomy and characteristics.

Input: data/houseplants_list.json (2,278 plants, 11 categories, 50 families)

Output: Enriched plant knowledge base (JSON + SQLite) with ~500-2000 validated entries

Current Data Assessment

Attribute	Current State	Required Enhancement
Total Plants	2,278	Validate, deduplicate
Scientific Names	Present	Validate binomial nomenclature
Common Names	Array per plant	Normalize, cross-reference
Family	50 families	Validate against taxonomy
Category	11 categories	Map to target types
Physical Characteristics	Missing	Must add
Regional/Seasonal Info	Missing	Must add

Task Breakdown

Task 1.1: Load and Validate Plant List

Objective: Parse JSON and validate data integrity

Actions:

Create Python script scripts/validate_plant_list.py
Load data/houseplants_list.json
Validate JSON schema:
- Each plant has scientific_name (required, string)
- Each plant has common_names (required, array of strings)
- Each plant has family (required, string)
- Each plant has category (required, string)
Identify malformed entries (missing fields, wrong types)
Generate validation report: output/validation_report.json

Validation Criteria:

0 malformed entries
All required fields present
No null/empty scientific names

Output File: scripts/validate_plant_list.py

Task 1.2: Normalize and Standardize Plant Names

Objective: Ensure consistent naming conventions

Actions:

Create scripts/normalize_names.py
Scientific name normalization:
- Capitalize genus, lowercase species (e.g., "Philodendron hederaceum")
- Handle cultivar notation: 'Cultivar Name' in single quotes
- Validate binomial/trinomial format
Common name normalization:
- Title case standardization
- Remove leading/trailing whitespace
- Standardize punctuation
Handle hybrid notation (×) consistently
Flag names that don't match expected patterns

Validation Criteria:

100% of scientific names follow binomial nomenclature pattern
No leading/trailing whitespace in any names
Consistent cultivar notation

Output File: data/normalized_plants.json

Task 1.3: Create Deduplicated Master List

Objective: Remove duplicates while preserving unique cultivars

Actions:

Create scripts/deduplicate_plants.py
Define deduplication rules:
- Exact scientific name match = duplicate
- Different cultivars of same species = keep both
- Same plant, different common names = merge common names
Identify potential duplicates using fuzzy matching on:
- Scientific names (Levenshtein distance < 3)
- Common names that are identical
Generate duplicate candidates report for manual review
Merge duplicates: combine common names arrays
Assign unique plant IDs (plant_001, plant_002, etc.)

Validation Criteria:

No exact scientific name duplicates
All plants have unique IDs
Merge log documenting all deduplication decisions

Output Files:

data/master_plant_list.json
output/deduplication_report.json

Task 1.4: Enrich with Physical Characteristics

Objective: Add visual and physical attributes for each plant

Actions:

Create scripts/enrich_characteristics.py

Define characteristic schema:

{
  "characteristics": {
    "leaf_shape": ["heart", "oval", "linear", "palmate", "lobed", "needle", "rosette"],
    "leaf_color": ["green", "variegated", "red", "purple", "silver", "yellow"],
    "leaf_texture": ["glossy", "matte", "fuzzy", "waxy", "smooth", "rough"],
    "growth_habit": ["upright", "trailing", "climbing", "rosette", "bushy", "tree-form"],
    "mature_height_cm": [0-500],
    "mature_width_cm": [0-300],
    "flowering": true/false,
    "flower_colors": ["white", "pink", "red", "yellow", "orange", "purple", "blue"],
    "bloom_season": ["spring", "summer", "fall", "winter", "year-round"]
  }
}

Source characteristics data:
- Primary: Web scraping from botanical databases (RHS, Missouri Botanical Garden)
- Secondary: Wikipedia API for plant descriptions
- Fallback: Family/genus-level defaults
Implement web fetching with rate limiting
Parse and extract characteristics from HTML/JSON responses
Store enrichment sources for traceability

Validation Criteria:

≥80% of plants have leaf_shape populated
≥80% of plants have growth_habit populated
≥60% of plants have height/width estimates
100% of plants have flowering boolean

Output Files:

data/enriched_plants.json
output/enrichment_coverage_report.json

Task 1.5: Categorize Plants by Type

Objective: Map existing categories to target classification system

Actions:

Create scripts/categorize_plants.py

Define target categories (per plan):

- Flowering Plant
- Tree / Palm
- Shrub / Bush
- Succulent / Cactus
- Fern
- Vine / Trailing
- Herb
- Orchid
- Bromeliad
- Air Plant

Create mapping from current 11 categories:

Current → Target
─────────────────────────────
Air Plant → Air Plant
Bromeliad → Bromeliad
Cactus → Succulent / Cactus
Fern → Fern
Flowering Houseplant → Flowering Plant
Herb → Herb
Orchid → Orchid
Palm → Tree / Palm
Succulent → Succulent / Cactus
Trailing/Climbing → Vine / Trailing
Tropical Foliage → [Requires secondary classification]

Handle "Tropical Foliage" (largest category):
- Use growth_habit from Task 1.4 to sub-classify
- Cross-reference family for tree-form species (Ficus → Tree)
Add primary_category and secondary_categories fields

Validation Criteria:

100% of plants have primary_category assigned
No plants remain as "Tropical Foliage" (all reclassified)
Category distribution documented

Output File: data/categorized_plants.json

Task 1.6: Map Common Names to Scientific Names

Objective: Create bidirectional lookup for name resolution

Actions:

Create scripts/build_name_index.py
Build scientific → common names map (already exists, validate)
Build common → scientific names map (reverse lookup)
Handle ambiguous common names (multiple plants share same common name):
- Flag conflicts
- Add disambiguation notes
Validate against external taxonomy:
- World Flora Online (WFO) API
- GBIF (Global Biodiversity Information Facility)
Add verified boolean for taxonomically confirmed names
Store alternative/deprecated scientific names as synonyms

Validation Criteria:

Reverse lookup resolves ≥95% of common names unambiguously
≥70% of scientific names verified against WFO/GBIF
Synonym list for deprecated names

Output Files:

data/name_index.json
output/name_ambiguity_report.json

Task 1.7: Add Regional/Seasonal Information

Objective: Add native regions, hardiness zones, and seasonal behaviors

Actions:

Create scripts/add_regional_data.py

Define regional schema:

{
  "regional_info": {
    "native_regions": ["South America", "Southeast Asia", "Africa", ...],
    "native_countries": ["Brazil", "Thailand", ...],
    "usda_hardiness_zones": ["9a", "9b", "10a", ...],
    "indoor_outdoor": "indoor_only" | "outdoor_temperate" | "outdoor_tropical",
    "seasonal_behavior": "evergreen" | "deciduous" | "dormant_winter"
  }
}

Source regional data:
- USDA Plants Database API
- Wikipedia (native range sections)
- Existing botanical databases
Map families to typical native regions as fallback
Add care-relevant seasonality (dormancy periods, bloom times)

Validation Criteria:

≥70% of plants have native_regions populated
≥60% of plants have hardiness zones
100% of plants have indoor_outdoor classification

Output File: data/final_knowledge_base.json

Final Knowledge Base Schema

{
  "version": "1.0.0",
  "generated_date": "YYYY-MM-DD",
  "total_plants": 2000,
  "plants": [
    {
      "id": "plant_001",
      "scientific_name": "Philodendron hederaceum",
      "common_names": ["Heartleaf Philodendron", "Sweetheart Plant"],
      "synonyms": [],
      "family": "Araceae",
      "genus": "Philodendron",
      "species": "hederaceum",
      "cultivar": null,
      "primary_category": "Vine / Trailing",
      "secondary_categories": ["Tropical Foliage"],
      "characteristics": {
        "leaf_shape": "heart",
        "leaf_color": ["green"],
        "leaf_texture": "glossy",
        "growth_habit": "trailing",
        "mature_height_cm": 120,
        "mature_width_cm": 60,
        "flowering": true,
        "flower_colors": ["white", "green"],
        "bloom_season": "rarely indoors"
      },
      "regional_info": {
        "native_regions": ["Central America", "South America"],
        "native_countries": ["Mexico", "Brazil"],
        "usda_hardiness_zones": ["10b", "11", "12"],
        "indoor_outdoor": "indoor_only",
        "seasonal_behavior": "evergreen"
      },
      "taxonomy_verified": true,
      "data_sources": ["RHS", "Missouri Botanical Garden"],
      "last_updated": "YYYY-MM-DD"
    }
  ]
}

Output File Structure

PlantGuide/
├── data/
│   ├── houseplants_list.json          # Original input (unchanged)
│   ├── normalized_plants.json         # Task 1.2 output
│   ├── master_plant_list.json         # Task 1.3 output
│   ├── enriched_plants.json           # Task 1.4 output
│   ├── categorized_plants.json        # Task 1.5 output
│   ├── name_index.json                # Task 1.6 output
│   └── final_knowledge_base.json      # Task 1.7 output (FINAL)
├── scripts/
│   ├── validate_plant_list.py         # Task 1.1
│   ├── normalize_names.py             # Task 1.2
│   ├── deduplicate_plants.py          # Task 1.3
│   ├── enrich_characteristics.py      # Task 1.4
│   ├── categorize_plants.py           # Task 1.5
│   ├── build_name_index.py            # Task 1.6
│   └── add_regional_data.py           # Task 1.7
├── output/
│   ├── validation_report.json
│   ├── deduplication_report.json
│   ├── enrichment_coverage_report.json
│   └── name_ambiguity_report.json
└── knowledge_base/
    ├── plants.db                       # SQLite database
    └── schema.sql                      # Database schema

SQLite Database Schema

-- Task: Create SQLite database alongside JSON

CREATE TABLE plants (
    id TEXT PRIMARY KEY,
    scientific_name TEXT NOT NULL UNIQUE,
    family TEXT NOT NULL,
    genus TEXT,
    species TEXT,
    cultivar TEXT,
    primary_category TEXT NOT NULL,
    taxonomy_verified BOOLEAN DEFAULT FALSE,
    last_updated DATE
);

CREATE TABLE common_names (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    plant_id TEXT REFERENCES plants(id),
    common_name TEXT NOT NULL,
    is_primary BOOLEAN DEFAULT FALSE
);

CREATE TABLE characteristics (
    plant_id TEXT PRIMARY KEY REFERENCES plants(id),
    leaf_shape TEXT,
    leaf_color TEXT,  -- JSON array
    leaf_texture TEXT,
    growth_habit TEXT,
    mature_height_cm INTEGER,
    mature_width_cm INTEGER,
    flowering BOOLEAN,
    flower_colors TEXT,  -- JSON array
    bloom_season TEXT
);

CREATE TABLE regional_info (
    plant_id TEXT PRIMARY KEY REFERENCES plants(id),
    native_regions TEXT,  -- JSON array
    native_countries TEXT,  -- JSON array
    usda_hardiness_zones TEXT,  -- JSON array
    indoor_outdoor TEXT,
    seasonal_behavior TEXT
);

CREATE TABLE synonyms (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    plant_id TEXT REFERENCES plants(id),
    synonym TEXT NOT NULL
);

-- Indexes for common queries
CREATE INDEX idx_plants_family ON plants(family);
CREATE INDEX idx_plants_category ON plants(primary_category);
CREATE INDEX idx_common_names_name ON common_names(common_name);
CREATE INDEX idx_characteristics_habit ON characteristics(growth_habit);

End Phase Validation Checklist

Data Quality Gates

Metric	Target	Validation Method
Total validated plants	≥1,500	Count after deduplication
Schema compliance	100%	JSON schema validation
Scientific name format	100% valid	Regex: `^[A-Z][a-z]+ [a-z]+`
Plants with characteristics	≥80%	Field coverage check
Plants with regional data	≥70%	Field coverage check
Category coverage	100%	No "Unknown" categories
Name disambiguation	≥95%	Ambiguity report review
Taxonomy verification	≥70%	WFO/GBIF cross-reference

Functional Validation

Query Test 1: Lookup by scientific name returns full plant record
Query Test 2: Lookup by common name returns correct plant(s)
Query Test 3: Filter by category returns expected results
Query Test 4: Filter by characteristics (leaf_shape=heart) works
Query Test 5: Regional filter (hardiness_zone=10a) works

Deliverable Checklist

data/final_knowledge_base.json exists and passes schema validation
knowledge_base/plants.db SQLite database is populated
All scripts in scripts/ directory are functional
All reports in output/ directory are generated
Data coverage meets minimum thresholds
No critical validation errors in reports

Phase Exit Criteria

Phase 1 is COMPLETE when:

✅ Final knowledge base contains ≥1,500 validated plant entries
✅ ≥80% of plants have physical characteristics populated
✅ ≥70% of plants have regional information
✅ 100% of plants have valid categories (no "Unknown")
✅ SQLite database mirrors JSON knowledge base
✅ All validation tests pass
✅ Documentation updated with final counts and coverage metrics

Execution Order

Task 1.1 (Validate)
    ↓
Task 1.2 (Normalize)
    ↓
Task 1.3 (Deduplicate)
    ↓
    ├─→ Task 1.4 (Characteristics) ─┐
    │                               │
    └─→ Task 1.6 (Name Index) ──────┤
    │                               │
    └─→ Task 1.7 (Regional) ────────┤
                                    ↓
                            Task 1.5 (Categorize)
                            [Depends on 1.4 for Tropical Foliage]
                                    ↓
                            Final Assembly
                            (JSON + SQLite)
                                    ↓
                            Validation Suite

Note: Tasks 1.4, 1.6, and 1.7 can run in parallel after Task 1.3 completes. Task 1.5 depends on Task 1.4 output for sub-categorizing Tropical Foliage plants.

Risk Mitigation

Risk	Mitigation
External API rate limits	Implement caching, request throttling
Incomplete enrichment data	Use family-level defaults, document gaps
Ambiguous common names	Flag for manual review, prioritize top plants
Taxonomy database mismatches	Trust WFO as primary source
Large dataset processing	Process in batches, checkpoint progress

15 KiB Raw Permalink Blame History Unescape Escape

Phase 1: Knowledge Base Creation - Implementation Plan

Overview

Current Data Assessment

Task Breakdown

Task 1.1: Load and Validate Plant List

Task 1.2: Normalize and Standardize Plant Names

Task 1.3: Create Deduplicated Master List

Task 1.4: Enrich with Physical Characteristics

Task 1.5: Categorize Plants by Type

Task 1.6: Map Common Names to Scientific Names

Task 1.7: Add Regional/Seasonal Information

Final Knowledge Base Schema

Output File Structure

SQLite Database Schema

End Phase Validation Checklist

Data Quality Gates

Functional Validation

Deliverable Checklist

Phase Exit Criteria

Execution Order

Risk Mitigation

15 KiB

Raw Permalink Blame History