Files
PlantGuide/Docs/phase-1-implementation-plan.md
Trey t 136dfbae33 Add PlantGuide iOS app with plant identification and care management
- Implement camera capture and plant identification workflow
- Add Core Data persistence for plants, care schedules, and cached API data
- Create collection view with grid/list layouts and filtering
- Build plant detail views with care information display
- Integrate Trefle botanical API for plant care data
- Add local image storage for captured plant photos
- Implement dependency injection container for testability
- Include accessibility support throughout the app

Bug fixes in this commit:
- Fix Trefle API decoding by removing duplicate CodingKeys
- Fix LocalCachedImage to load from correct PlantImages directory
- Set dateAdded when saving plants for proper collection sorting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 12:18:01 -06:00

15 KiB
Raw Permalink Blame History

Phase 1: Knowledge Base Creation - Implementation Plan

Overview

Goal: Build structured plant knowledge from data/houseplants_list.json, enriching with taxonomy and characteristics.

Input: data/houseplants_list.json (2,278 plants, 11 categories, 50 families)

Output: Enriched plant knowledge base (JSON + SQLite) with ~500-2000 validated entries


Current Data Assessment

Attribute Current State Required Enhancement
Total Plants 2,278 Validate, deduplicate
Scientific Names Present Validate binomial nomenclature
Common Names Array per plant Normalize, cross-reference
Family 50 families Validate against taxonomy
Category 11 categories Map to target types
Physical Characteristics Missing Must add
Regional/Seasonal Info Missing Must add

Task Breakdown

Task 1.1: Load and Validate Plant List

Objective: Parse JSON and validate data integrity

Actions:

  • Create Python script scripts/validate_plant_list.py
  • Load data/houseplants_list.json
  • Validate JSON schema:
    • Each plant has scientific_name (required, string)
    • Each plant has common_names (required, array of strings)
    • Each plant has family (required, string)
    • Each plant has category (required, string)
  • Identify malformed entries (missing fields, wrong types)
  • Generate validation report: output/validation_report.json

Validation Criteria:

  • 0 malformed entries
  • All required fields present
  • No null/empty scientific names

Output File: scripts/validate_plant_list.py


Task 1.2: Normalize and Standardize Plant Names

Objective: Ensure consistent naming conventions

Actions:

  • Create scripts/normalize_names.py
  • Scientific name normalization:
    • Capitalize genus, lowercase species (e.g., "Philodendron hederaceum")
    • Handle cultivar notation: 'Cultivar Name' in single quotes
    • Validate binomial/trinomial format
  • Common name normalization:
    • Title case standardization
    • Remove leading/trailing whitespace
    • Standardize punctuation
  • Handle hybrid notation (×) consistently
  • Flag names that don't match expected patterns

Validation Criteria:

  • 100% of scientific names follow binomial nomenclature pattern
  • No leading/trailing whitespace in any names
  • Consistent cultivar notation

Output File: data/normalized_plants.json


Task 1.3: Create Deduplicated Master List

Objective: Remove duplicates while preserving unique cultivars

Actions:

  • Create scripts/deduplicate_plants.py
  • Define deduplication rules:
    • Exact scientific name match = duplicate
    • Different cultivars of same species = keep both
    • Same plant, different common names = merge common names
  • Identify potential duplicates using fuzzy matching on:
    • Scientific names (Levenshtein distance < 3)
    • Common names that are identical
  • Generate duplicate candidates report for manual review
  • Merge duplicates: combine common names arrays
  • Assign unique plant IDs (plant_001, plant_002, etc.)

Validation Criteria:

  • No exact scientific name duplicates
  • All plants have unique IDs
  • Merge log documenting all deduplication decisions

Output Files:

  • data/master_plant_list.json
  • output/deduplication_report.json

Task 1.4: Enrich with Physical Characteristics

Objective: Add visual and physical attributes for each plant

Actions:

  • Create scripts/enrich_characteristics.py
  • Define characteristic schema:
    {
      "characteristics": {
        "leaf_shape": ["heart", "oval", "linear", "palmate", "lobed", "needle", "rosette"],
        "leaf_color": ["green", "variegated", "red", "purple", "silver", "yellow"],
        "leaf_texture": ["glossy", "matte", "fuzzy", "waxy", "smooth", "rough"],
        "growth_habit": ["upright", "trailing", "climbing", "rosette", "bushy", "tree-form"],
        "mature_height_cm": [0-500],
        "mature_width_cm": [0-300],
        "flowering": true/false,
        "flower_colors": ["white", "pink", "red", "yellow", "orange", "purple", "blue"],
        "bloom_season": ["spring", "summer", "fall", "winter", "year-round"]
      }
    }
    
  • Source characteristics data:
    • Primary: Web scraping from botanical databases (RHS, Missouri Botanical Garden)
    • Secondary: Wikipedia API for plant descriptions
    • Fallback: Family/genus-level defaults
  • Implement web fetching with rate limiting
  • Parse and extract characteristics from HTML/JSON responses
  • Store enrichment sources for traceability

Validation Criteria:

  • ≥80% of plants have leaf_shape populated
  • ≥80% of plants have growth_habit populated
  • ≥60% of plants have height/width estimates
  • 100% of plants have flowering boolean

Output Files:

  • data/enriched_plants.json
  • output/enrichment_coverage_report.json

Task 1.5: Categorize Plants by Type

Objective: Map existing categories to target classification system

Actions:

  • Create scripts/categorize_plants.py
  • Define target categories (per plan):
    - Flowering Plant
    - Tree / Palm
    - Shrub / Bush
    - Succulent / Cactus
    - Fern
    - Vine / Trailing
    - Herb
    - Orchid
    - Bromeliad
    - Air Plant
    
  • Create mapping from current 11 categories:
    Current → Target
    ─────────────────────────────
    Air Plant → Air Plant
    Bromeliad → Bromeliad
    Cactus → Succulent / Cactus
    Fern → Fern
    Flowering Houseplant → Flowering Plant
    Herb → Herb
    Orchid → Orchid
    Palm → Tree / Palm
    Succulent → Succulent / Cactus
    Trailing/Climbing → Vine / Trailing
    Tropical Foliage → [Requires secondary classification]
    
  • Handle "Tropical Foliage" (largest category):
    • Use growth_habit from Task 1.4 to sub-classify
    • Cross-reference family for tree-form species (Ficus → Tree)
  • Add primary_category and secondary_categories fields

Validation Criteria:

  • 100% of plants have primary_category assigned
  • No plants remain as "Tropical Foliage" (all reclassified)
  • Category distribution documented

Output File: data/categorized_plants.json


Task 1.6: Map Common Names to Scientific Names

Objective: Create bidirectional lookup for name resolution

Actions:

  • Create scripts/build_name_index.py
  • Build scientific → common names map (already exists, validate)
  • Build common → scientific names map (reverse lookup)
  • Handle ambiguous common names (multiple plants share same common name):
    • Flag conflicts
    • Add disambiguation notes
  • Validate against external taxonomy:
    • World Flora Online (WFO) API
    • GBIF (Global Biodiversity Information Facility)
  • Add verified boolean for taxonomically confirmed names
  • Store alternative/deprecated scientific names as synonyms

Validation Criteria:

  • Reverse lookup resolves ≥95% of common names unambiguously
  • ≥70% of scientific names verified against WFO/GBIF
  • Synonym list for deprecated names

Output Files:

  • data/name_index.json
  • output/name_ambiguity_report.json

Task 1.7: Add Regional/Seasonal Information

Objective: Add native regions, hardiness zones, and seasonal behaviors

Actions:

  • Create scripts/add_regional_data.py
  • Define regional schema:
    {
      "regional_info": {
        "native_regions": ["South America", "Southeast Asia", "Africa", ...],
        "native_countries": ["Brazil", "Thailand", ...],
        "usda_hardiness_zones": ["9a", "9b", "10a", ...],
        "indoor_outdoor": "indoor_only" | "outdoor_temperate" | "outdoor_tropical",
        "seasonal_behavior": "evergreen" | "deciduous" | "dormant_winter"
      }
    }
    
  • Source regional data:
    • USDA Plants Database API
    • Wikipedia (native range sections)
    • Existing botanical databases
  • Map families to typical native regions as fallback
  • Add care-relevant seasonality (dormancy periods, bloom times)

Validation Criteria:

  • ≥70% of plants have native_regions populated
  • ≥60% of plants have hardiness zones
  • 100% of plants have indoor_outdoor classification

Output File: data/final_knowledge_base.json


Final Knowledge Base Schema

{
  "version": "1.0.0",
  "generated_date": "YYYY-MM-DD",
  "total_plants": 2000,
  "plants": [
    {
      "id": "plant_001",
      "scientific_name": "Philodendron hederaceum",
      "common_names": ["Heartleaf Philodendron", "Sweetheart Plant"],
      "synonyms": [],
      "family": "Araceae",
      "genus": "Philodendron",
      "species": "hederaceum",
      "cultivar": null,
      "primary_category": "Vine / Trailing",
      "secondary_categories": ["Tropical Foliage"],
      "characteristics": {
        "leaf_shape": "heart",
        "leaf_color": ["green"],
        "leaf_texture": "glossy",
        "growth_habit": "trailing",
        "mature_height_cm": 120,
        "mature_width_cm": 60,
        "flowering": true,
        "flower_colors": ["white", "green"],
        "bloom_season": "rarely indoors"
      },
      "regional_info": {
        "native_regions": ["Central America", "South America"],
        "native_countries": ["Mexico", "Brazil"],
        "usda_hardiness_zones": ["10b", "11", "12"],
        "indoor_outdoor": "indoor_only",
        "seasonal_behavior": "evergreen"
      },
      "taxonomy_verified": true,
      "data_sources": ["RHS", "Missouri Botanical Garden"],
      "last_updated": "YYYY-MM-DD"
    }
  ]
}

Output File Structure

PlantGuide/
├── data/
│   ├── houseplants_list.json          # Original input (unchanged)
│   ├── normalized_plants.json         # Task 1.2 output
│   ├── master_plant_list.json         # Task 1.3 output
│   ├── enriched_plants.json           # Task 1.4 output
│   ├── categorized_plants.json        # Task 1.5 output
│   ├── name_index.json                # Task 1.6 output
│   └── final_knowledge_base.json      # Task 1.7 output (FINAL)
├── scripts/
│   ├── validate_plant_list.py         # Task 1.1
│   ├── normalize_names.py             # Task 1.2
│   ├── deduplicate_plants.py          # Task 1.3
│   ├── enrich_characteristics.py      # Task 1.4
│   ├── categorize_plants.py           # Task 1.5
│   ├── build_name_index.py            # Task 1.6
│   └── add_regional_data.py           # Task 1.7
├── output/
│   ├── validation_report.json
│   ├── deduplication_report.json
│   ├── enrichment_coverage_report.json
│   └── name_ambiguity_report.json
└── knowledge_base/
    ├── plants.db                       # SQLite database
    └── schema.sql                      # Database schema

SQLite Database Schema

-- Task: Create SQLite database alongside JSON

CREATE TABLE plants (
    id TEXT PRIMARY KEY,
    scientific_name TEXT NOT NULL UNIQUE,
    family TEXT NOT NULL,
    genus TEXT,
    species TEXT,
    cultivar TEXT,
    primary_category TEXT NOT NULL,
    taxonomy_verified BOOLEAN DEFAULT FALSE,
    last_updated DATE
);

CREATE TABLE common_names (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    plant_id TEXT REFERENCES plants(id),
    common_name TEXT NOT NULL,
    is_primary BOOLEAN DEFAULT FALSE
);

CREATE TABLE characteristics (
    plant_id TEXT PRIMARY KEY REFERENCES plants(id),
    leaf_shape TEXT,
    leaf_color TEXT,  -- JSON array
    leaf_texture TEXT,
    growth_habit TEXT,
    mature_height_cm INTEGER,
    mature_width_cm INTEGER,
    flowering BOOLEAN,
    flower_colors TEXT,  -- JSON array
    bloom_season TEXT
);

CREATE TABLE regional_info (
    plant_id TEXT PRIMARY KEY REFERENCES plants(id),
    native_regions TEXT,  -- JSON array
    native_countries TEXT,  -- JSON array
    usda_hardiness_zones TEXT,  -- JSON array
    indoor_outdoor TEXT,
    seasonal_behavior TEXT
);

CREATE TABLE synonyms (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    plant_id TEXT REFERENCES plants(id),
    synonym TEXT NOT NULL
);

-- Indexes for common queries
CREATE INDEX idx_plants_family ON plants(family);
CREATE INDEX idx_plants_category ON plants(primary_category);
CREATE INDEX idx_common_names_name ON common_names(common_name);
CREATE INDEX idx_characteristics_habit ON characteristics(growth_habit);

End Phase Validation Checklist

Data Quality Gates

Metric Target Validation Method
Total validated plants ≥1,500 Count after deduplication
Schema compliance 100% JSON schema validation
Scientific name format 100% valid Regex: ^[A-Z][a-z]+ [a-z]+
Plants with characteristics ≥80% Field coverage check
Plants with regional data ≥70% Field coverage check
Category coverage 100% No "Unknown" categories
Name disambiguation ≥95% Ambiguity report review
Taxonomy verification ≥70% WFO/GBIF cross-reference

Functional Validation

  • Query Test 1: Lookup by scientific name returns full plant record
  • Query Test 2: Lookup by common name returns correct plant(s)
  • Query Test 3: Filter by category returns expected results
  • Query Test 4: Filter by characteristics (leaf_shape=heart) works
  • Query Test 5: Regional filter (hardiness_zone=10a) works

Deliverable Checklist

  • data/final_knowledge_base.json exists and passes schema validation
  • knowledge_base/plants.db SQLite database is populated
  • All scripts in scripts/ directory are functional
  • All reports in output/ directory are generated
  • Data coverage meets minimum thresholds
  • No critical validation errors in reports

Phase Exit Criteria

Phase 1 is COMPLETE when:

  1. Final knowledge base contains ≥1,500 validated plant entries
  2. ≥80% of plants have physical characteristics populated
  3. ≥70% of plants have regional information
  4. 100% of plants have valid categories (no "Unknown")
  5. SQLite database mirrors JSON knowledge base
  6. All validation tests pass
  7. Documentation updated with final counts and coverage metrics

Execution Order

Task 1.1 (Validate)
    ↓
Task 1.2 (Normalize)
    ↓
Task 1.3 (Deduplicate)
    ↓
    ├─→ Task 1.4 (Characteristics) ─┐
    │                               │
    └─→ Task 1.6 (Name Index) ──────┤
    │                               │
    └─→ Task 1.7 (Regional) ────────┤
                                    ↓
                            Task 1.5 (Categorize)
                            [Depends on 1.4 for Tropical Foliage]
                                    ↓
                            Final Assembly
                            (JSON + SQLite)
                                    ↓
                            Validation Suite

Note: Tasks 1.4, 1.6, and 1.7 can run in parallel after Task 1.3 completes. Task 1.5 depends on Task 1.4 output for sub-categorizing Tropical Foliage plants.


Risk Mitigation

Risk Mitigation
External API rate limits Implement caching, request throttling
Incomplete enrichment data Use family-level defaults, document gaps
Ambiguous common names Flag for manual review, prioritize top plants
Taxonomy database mismatches Trust WFO as primary source
Large dataset processing Process in batches, checkpoint progress