Add PlantGuide iOS app with plant identification and care management
- Implement camera capture and plant identification workflow - Add Core Data persistence for plants, care schedules, and cached API data - Create collection view with grid/list layouts and filtering - Build plant detail views with care information display - Integrate Trefle botanical API for plant care data - Add local image storage for captured plant photos - Implement dependency injection container for testability - Include accessibility support throughout the app Bug fixes in this commit: - Fix Trefle API decoding by removing duplicate CodingKeys - Fix LocalCachedImage to load from correct PlantImages directory - Set dateAdded when saving plants for proper collection sorting Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
547
Docs/phase3-implementation-plan.md
Normal file
547
Docs/phase3-implementation-plan.md
Normal file
@@ -0,0 +1,547 @@
|
||||
# Phase 3: Dataset Preprocessing & Augmentation - Implementation Plan
|
||||
|
||||
## Overview
|
||||
|
||||
**Goal:** Prepare images for training with consistent formatting and augmentation pipeline.
|
||||
|
||||
**Prerequisites:** Phase 2 complete - `datasets/train/`, `datasets/val/`, `datasets/test/` directories with manifests
|
||||
|
||||
**Target Deliverable:** Training-ready dataset with standardized dimensions, normalized values, and augmentation pipeline
|
||||
|
||||
---
|
||||
|
||||
## Task Breakdown
|
||||
|
||||
### Task 3.1: Standardize Image Dimensions
|
||||
|
||||
**Objective:** Resize all images to consistent dimensions for model input.
|
||||
|
||||
**Actions:**
|
||||
1. Create `scripts/phase3/standardize_dimensions.py` to:
|
||||
- Load images from train/val/test directories
|
||||
- Resize to target dimension (224x224 for MobileNetV3, 299x299 for EfficientNet)
|
||||
- Preserve aspect ratio with center crop or letterboxing
|
||||
- Save resized images to new directory structure
|
||||
|
||||
2. Support multiple output sizes:
|
||||
```python
|
||||
TARGET_SIZES = {
|
||||
"mobilenet": (224, 224),
|
||||
"efficientnet": (299, 299),
|
||||
"vit": (384, 384)
|
||||
}
|
||||
```
|
||||
|
||||
3. Implement resize strategies:
|
||||
- **center_crop:** Crop to square, then resize (preserves detail)
|
||||
- **letterbox:** Pad to square, then resize (preserves full image)
|
||||
- **stretch:** Direct resize (fastest, may distort)
|
||||
|
||||
4. Output directory structure:
|
||||
```
|
||||
datasets/
|
||||
├── processed/
|
||||
│ └── 224x224/
|
||||
│ ├── train/
|
||||
│ ├── val/
|
||||
│ └── test/
|
||||
```
|
||||
|
||||
**Output:**
|
||||
- `datasets/processed/{size}/` directories
|
||||
- `output/phase3/dimension_report.json` - Processing statistics
|
||||
|
||||
**Validation:**
|
||||
- [ ] All images in processed directory are exactly target dimensions
|
||||
- [ ] No corrupt images (all readable by PIL)
|
||||
- [ ] Image count matches source (no images lost)
|
||||
- [ ] Processing time logged for performance baseline
|
||||
|
||||
---
|
||||
|
||||
### Task 3.2: Normalize Color Channels
|
||||
|
||||
**Objective:** Standardize pixel values and handle format variations.
|
||||
|
||||
**Actions:**
|
||||
1. Create `scripts/phase3/normalize_images.py` to:
|
||||
- Convert all images to RGB (handle RGBA, grayscale, CMYK)
|
||||
- Apply ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
|
||||
- Handle various input formats (JPEG, PNG, WebP, HEIC)
|
||||
- Save as consistent format (JPEG with quality 95, or PNG for lossless)
|
||||
|
||||
2. Implement color normalization:
|
||||
```python
|
||||
def normalize_image(image: np.ndarray) -> np.ndarray:
|
||||
"""Normalize image for model input."""
|
||||
image = image.astype(np.float32) / 255.0
|
||||
mean = np.array([0.485, 0.456, 0.406])
|
||||
std = np.array([0.229, 0.224, 0.225])
|
||||
return (image - mean) / std
|
||||
```
|
||||
|
||||
3. Create preprocessing pipeline class:
|
||||
```python
|
||||
class ImagePreprocessor:
|
||||
def __init__(self, target_size, normalize=True):
|
||||
self.target_size = target_size
|
||||
self.normalize = normalize
|
||||
|
||||
def __call__(self, image_path: str) -> np.ndarray:
|
||||
# Load, resize, convert, normalize
|
||||
pass
|
||||
```
|
||||
|
||||
4. Handle edge cases:
|
||||
- Grayscale → convert to RGB by duplicating channels
|
||||
- RGBA → remove alpha channel, composite on white
|
||||
- CMYK → convert to RGB color space
|
||||
- 16-bit images → convert to 8-bit
|
||||
|
||||
**Output:**
|
||||
- Updated processed images with consistent color handling
|
||||
- `output/phase3/color_conversion_log.json` - Format conversion statistics
|
||||
|
||||
**Validation:**
|
||||
- [ ] All images have exactly 3 color channels (RGB)
|
||||
- [ ] Pixel values in expected range after normalization
|
||||
- [ ] No format conversion errors
|
||||
- [ ] Color fidelity maintained (visual spot check on 50 random images)
|
||||
|
||||
---
|
||||
|
||||
### Task 3.3: Implement Data Augmentation Pipeline
|
||||
|
||||
**Objective:** Create augmentation transforms to increase training data variety.
|
||||
|
||||
**Actions:**
|
||||
1. Create `scripts/phase3/augmentation_pipeline.py` with transforms:
|
||||
|
||||
**Geometric Transforms:**
|
||||
- Random rotation: -30° to +30°
|
||||
- Random horizontal flip: 50% probability
|
||||
- Random vertical flip: 10% probability (some plants are naturally upside-down)
|
||||
- Random crop: 80-100% of image, then resize back
|
||||
- Random perspective: slight perspective distortion
|
||||
|
||||
**Color Transforms:**
|
||||
- Random brightness: ±20%
|
||||
- Random contrast: ±20%
|
||||
- Random saturation: ±30%
|
||||
- Random hue shift: ±10%
|
||||
- Color jitter (combined)
|
||||
|
||||
**Blur/Noise Transforms:**
|
||||
- Gaussian blur: kernel 3-7, 30% probability
|
||||
- Motion blur: 10% probability
|
||||
- Gaussian noise: σ=0.01-0.05, 20% probability
|
||||
|
||||
**Occlusion Transforms:**
|
||||
- Random erasing (cutout): 10-30% area, 20% probability
|
||||
- Grid dropout: 10% probability
|
||||
|
||||
2. Implement using PyTorch or Albumentations:
|
||||
```python
|
||||
import albumentations as A
|
||||
|
||||
train_transform = A.Compose([
|
||||
A.RandomResizedCrop(224, 224, scale=(0.8, 1.0)),
|
||||
A.HorizontalFlip(p=0.5),
|
||||
A.Rotate(limit=30, p=0.5),
|
||||
A.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.3, hue=0.1),
|
||||
A.GaussianBlur(blur_limit=(3, 7), p=0.3),
|
||||
A.CoarseDropout(max_holes=8, max_height=16, max_width=16, p=0.2),
|
||||
A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
|
||||
ToTensorV2(),
|
||||
])
|
||||
|
||||
val_transform = A.Compose([
|
||||
A.Resize(256, 256),
|
||||
A.CenterCrop(224, 224),
|
||||
A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
|
||||
ToTensorV2(),
|
||||
])
|
||||
```
|
||||
|
||||
3. Create visualization tool for augmentation preview:
|
||||
```python
|
||||
def visualize_augmentations(image_path, transform, n_samples=9):
|
||||
"""Show grid of augmented versions of same image."""
|
||||
pass
|
||||
```
|
||||
|
||||
4. Save augmentation configuration to JSON for reproducibility
|
||||
|
||||
**Output:**
|
||||
- `scripts/phase3/augmentation_pipeline.py` - Reusable transform classes
|
||||
- `output/phase3/augmentation_config.json` - Transform parameters
|
||||
- `output/phase3/augmentation_samples/` - Visual examples
|
||||
|
||||
**Validation:**
|
||||
- [ ] All augmentations produce valid images (no NaN, no corruption)
|
||||
- [ ] Augmented images visually reasonable (not over-augmented)
|
||||
- [ ] Transforms are deterministic when seeded
|
||||
- [ ] Pipeline runs at >100 images/second on CPU
|
||||
|
||||
---
|
||||
|
||||
### Task 3.4: Balance Underrepresented Classes
|
||||
|
||||
**Objective:** Create augmented variants to address class imbalance.
|
||||
|
||||
**Actions:**
|
||||
1. Create `scripts/phase3/analyze_class_balance.py` to:
|
||||
- Count images per class in training set
|
||||
- Calculate imbalance ratio (max_class / min_class)
|
||||
- Identify underrepresented classes (below median - 1 std)
|
||||
- Visualize class distribution
|
||||
|
||||
2. Create `scripts/phase3/oversample_minority.py` to:
|
||||
- Define target samples per class (e.g., median count)
|
||||
- Generate augmented copies for minority classes
|
||||
- Apply stronger augmentation for synthetic samples
|
||||
- Track original vs augmented counts
|
||||
|
||||
3. Implement oversampling strategies:
|
||||
```python
|
||||
class BalancingStrategy:
|
||||
"""Strategies for handling class imbalance."""
|
||||
|
||||
@staticmethod
|
||||
def oversample_to_median(class_counts: dict) -> dict:
|
||||
"""Oversample minority classes to median count."""
|
||||
median = np.median(list(class_counts.values()))
|
||||
targets = {}
|
||||
for cls, count in class_counts.items():
|
||||
targets[cls] = max(int(median), count)
|
||||
return targets
|
||||
|
||||
@staticmethod
|
||||
def oversample_to_max(class_counts: dict, cap_ratio=5) -> dict:
|
||||
"""Oversample to max, capped at ratio times original."""
|
||||
max_count = max(class_counts.values())
|
||||
targets = {}
|
||||
for cls, count in class_counts.items():
|
||||
targets[cls] = min(max_count, count * cap_ratio)
|
||||
return targets
|
||||
```
|
||||
|
||||
4. Generate balanced training manifest:
|
||||
- Include original images
|
||||
- Add paths to augmented copies
|
||||
- Mark augmented images in manifest (for analysis)
|
||||
|
||||
**Output:**
|
||||
- `datasets/processed/balanced/train/` - Balanced training set
|
||||
- `output/phase3/class_balance_before.json` - Original distribution
|
||||
- `output/phase3/class_balance_after.json` - Balanced distribution
|
||||
- `output/phase3/balance_histogram.png` - Visual comparison
|
||||
|
||||
**Validation:**
|
||||
- [ ] Imbalance ratio reduced to < 10:1 (max:min)
|
||||
- [ ] No class has fewer than 50 training samples
|
||||
- [ ] Augmented images are visually distinct from originals
|
||||
- [ ] Total training set size documented
|
||||
|
||||
---
|
||||
|
||||
### Task 3.5: Generate Image Manifest Files
|
||||
|
||||
**Objective:** Create mapping files for training pipeline.
|
||||
|
||||
**Actions:**
|
||||
1. Create `scripts/phase3/generate_manifests.py` to produce:
|
||||
|
||||
**CSV Format (PyTorch ImageFolder compatible):**
|
||||
```csv
|
||||
path,label,scientific_name,plant_id,source,is_augmented
|
||||
train/images/quercus_robur_001.jpg,42,Quercus robur,QR001,inaturalist,false
|
||||
train/images/quercus_robur_002_aug.jpg,42,Quercus robur,QR001,augmented,true
|
||||
```
|
||||
|
||||
**JSON Format (detailed metadata):**
|
||||
```json
|
||||
{
|
||||
"train": [
|
||||
{
|
||||
"path": "train/images/quercus_robur_001.jpg",
|
||||
"label": 42,
|
||||
"scientific_name": "Quercus robur",
|
||||
"common_name": "English Oak",
|
||||
"plant_id": "QR001",
|
||||
"source": "inaturalist",
|
||||
"is_augmented": false,
|
||||
"original_path": null
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
2. Generate label mapping file:
|
||||
```json
|
||||
{
|
||||
"label_to_name": {
|
||||
"0": "Acer palmatum",
|
||||
"1": "Acer rubrum",
|
||||
...
|
||||
},
|
||||
"name_to_label": {
|
||||
"Acer palmatum": 0,
|
||||
"Acer rubrum": 1,
|
||||
...
|
||||
},
|
||||
"label_to_common": {
|
||||
"0": "Japanese Maple",
|
||||
...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
3. Create split statistics:
|
||||
- Total images per split
|
||||
- Classes per split
|
||||
- Images per class per split
|
||||
|
||||
**Output:**
|
||||
- `datasets/processed/train_manifest.csv`
|
||||
- `datasets/processed/val_manifest.csv`
|
||||
- `datasets/processed/test_manifest.csv`
|
||||
- `datasets/processed/label_mapping.json`
|
||||
- `output/phase3/manifest_statistics.json`
|
||||
|
||||
**Validation:**
|
||||
- [ ] All image paths in manifests exist on disk
|
||||
- [ ] Labels are consecutive integers starting from 0
|
||||
- [ ] No duplicate entries in manifests
|
||||
- [ ] Split sizes match expected counts
|
||||
- [ ] Label mapping covers all classes
|
||||
|
||||
---
|
||||
|
||||
### Task 3.6: Validate Dataset Integrity
|
||||
|
||||
**Objective:** Final verification of processed dataset.
|
||||
|
||||
**Actions:**
|
||||
1. Create `scripts/phase3/validate_dataset.py` to run comprehensive checks:
|
||||
|
||||
**File Integrity:**
|
||||
- All manifest paths exist
|
||||
- All images load without error
|
||||
- All images have correct dimensions
|
||||
- File permissions allow read access
|
||||
|
||||
**Label Consistency:**
|
||||
- Labels match between manifest and directory structure
|
||||
- All labels have corresponding class names
|
||||
- No orphaned images (in directory but not manifest)
|
||||
- No missing images (in manifest but not directory)
|
||||
|
||||
**Dataset Statistics:**
|
||||
- Per-class image counts
|
||||
- Train/val/test split ratios
|
||||
- Augmented vs original ratio
|
||||
- File size distribution
|
||||
|
||||
**Sample Verification:**
|
||||
- Random sample of 100 images per split
|
||||
- Verify image content matches label (using pretrained model)
|
||||
- Flag potential mislabels for review
|
||||
|
||||
2. Create `scripts/phase3/repair_dataset.py` for common fixes:
|
||||
- Remove entries with missing files
|
||||
- Fix incorrect labels (with confirmation)
|
||||
- Regenerate corrupted augmentations
|
||||
|
||||
**Output:**
|
||||
- `output/phase3/validation_report.json` - Full validation results
|
||||
- `output/phase3/validation_summary.md` - Human-readable summary
|
||||
- `output/phase3/flagged_for_review.json` - Potential issues
|
||||
|
||||
**Validation:**
|
||||
- [ ] 0 missing files
|
||||
- [ ] 0 corrupted images
|
||||
- [ ] 0 dimension mismatches
|
||||
- [ ] <1% potential mislabels flagged
|
||||
- [ ] All metadata fields populated
|
||||
|
||||
---
|
||||
|
||||
## End-of-Phase Validation Checklist
|
||||
|
||||
Run `scripts/phase3/validate_phase3.py` to verify all criteria:
|
||||
|
||||
### Image Processing Validation
|
||||
|
||||
| # | Criterion | Target | Status |
|
||||
|---|-----------|--------|--------|
|
||||
| 1 | All images standardized to target size | 100% at 224x224 (or configured size) | [ ] |
|
||||
| 2 | All images in RGB format | 100% RGB, 3 channels | [ ] |
|
||||
| 3 | No corrupted images | 0 unreadable files | [ ] |
|
||||
| 4 | Normalization applied correctly | Values in expected range | [ ] |
|
||||
|
||||
### Augmentation Validation
|
||||
|
||||
| # | Criterion | Target | Status |
|
||||
|---|-----------|--------|--------|
|
||||
| 5 | Augmentation pipeline functional | All transforms produce valid output | [ ] |
|
||||
| 6 | Augmentation reproducible | Same seed = same output | [ ] |
|
||||
| 7 | Augmentation performance | >100 images/sec on CPU | [ ] |
|
||||
| 8 | Visual quality | Spot check passes (50 random samples) | [ ] |
|
||||
|
||||
### Class Balance Validation
|
||||
|
||||
| # | Criterion | Target | Status |
|
||||
|---|-----------|--------|--------|
|
||||
| 9 | Class imbalance ratio | < 10:1 (max:min) | [ ] |
|
||||
| 10 | Minimum class size | ≥50 images per class in train | [ ] |
|
||||
| 11 | Augmentation ratio | Augmented ≤ 4x original per class | [ ] |
|
||||
|
||||
### Manifest Validation
|
||||
|
||||
| # | Criterion | Target | Status |
|
||||
|---|-----------|--------|--------|
|
||||
| 12 | Manifest completeness | 100% images have manifest entries | [ ] |
|
||||
| 13 | Path validity | 100% manifest paths exist | [ ] |
|
||||
| 14 | Label consistency | Labels match directory structure | [ ] |
|
||||
| 15 | No duplicates | 0 duplicate entries | [ ] |
|
||||
| 16 | Label mapping complete | All labels have names | [ ] |
|
||||
|
||||
### Dataset Statistics
|
||||
|
||||
| Metric | Expected | Actual | Status |
|
||||
|--------|----------|--------|--------|
|
||||
| Total processed images | 50,000 - 200,000 | | [ ] |
|
||||
| Training set size | ~70% of total | | [ ] |
|
||||
| Validation set size | ~15% of total | | [ ] |
|
||||
| Test set size | ~15% of total | | [ ] |
|
||||
| Number of classes | 200 - 500 | | [ ] |
|
||||
| Avg images per class (train) | 100 - 400 | | [ ] |
|
||||
| Image file size (avg) | 30-100 KB | | [ ] |
|
||||
| Total dataset size | 10-50 GB | | [ ] |
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 Completion Checklist
|
||||
|
||||
- [ ] Task 3.1: Images standardized to target dimensions
|
||||
- [ ] Task 3.2: Color channels normalized and formats unified
|
||||
- [ ] Task 3.3: Augmentation pipeline implemented and tested
|
||||
- [ ] Task 3.4: Class imbalance addressed through oversampling
|
||||
- [ ] Task 3.5: Manifest files generated for all splits
|
||||
- [ ] Task 3.6: Dataset integrity validated
|
||||
- [ ] All 16 validation criteria pass
|
||||
- [ ] Dataset statistics documented
|
||||
- [ ] Augmentation config saved for reproducibility
|
||||
- [ ] Ready for Phase 4 (Model Architecture Selection)
|
||||
|
||||
---
|
||||
|
||||
## Scripts Summary
|
||||
|
||||
| Script | Task | Input | Output |
|
||||
|--------|------|-------|--------|
|
||||
| `standardize_dimensions.py` | 3.1 | Raw images | Resized images |
|
||||
| `normalize_images.py` | 3.2 | Resized images | Normalized images |
|
||||
| `augmentation_pipeline.py` | 3.3 | Images | Transform classes |
|
||||
| `analyze_class_balance.py` | 3.4 | Train manifest | Balance report |
|
||||
| `oversample_minority.py` | 3.4 | Imbalanced set | Balanced set |
|
||||
| `generate_manifests.py` | 3.5 | Processed images | CSV/JSON manifests |
|
||||
| `validate_dataset.py` | 3.6 | Full dataset | Validation report |
|
||||
| `validate_phase3.py` | Final | All outputs | Pass/Fail report |
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
```
|
||||
# requirements-phase3.txt
|
||||
Pillow>=9.0.0
|
||||
numpy>=1.24.0
|
||||
albumentations>=1.3.0
|
||||
torch>=2.0.0
|
||||
torchvision>=0.15.0
|
||||
opencv-python>=4.7.0
|
||||
pandas>=2.0.0
|
||||
tqdm>=4.65.0
|
||||
matplotlib>=3.7.0
|
||||
scikit-learn>=1.2.0
|
||||
imagehash>=4.3.0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Directory Structure After Phase 3
|
||||
|
||||
```
|
||||
datasets/
|
||||
├── raw/ # Original downloaded images (Phase 2)
|
||||
├── organized/ # Organized by species (Phase 2)
|
||||
├── verified/ # Quality-checked (Phase 2)
|
||||
├── train/ # Train split (Phase 2)
|
||||
├── val/ # Validation split (Phase 2)
|
||||
├── test/ # Test split (Phase 2)
|
||||
└── processed/ # Phase 3 output
|
||||
├── 224x224/ # Standardized size
|
||||
│ ├── train/
|
||||
│ │ └── images/
|
||||
│ ├── val/
|
||||
│ │ └── images/
|
||||
│ └── test/
|
||||
│ └── images/
|
||||
├── balanced/ # Class-balanced training
|
||||
│ └── train/
|
||||
│ └── images/
|
||||
├── train_manifest.csv
|
||||
├── val_manifest.csv
|
||||
├── test_manifest.csv
|
||||
├── label_mapping.json
|
||||
└── augmentation_config.json
|
||||
|
||||
output/phase3/
|
||||
├── dimension_report.json
|
||||
├── color_conversion_log.json
|
||||
├── augmentation_config.json
|
||||
├── augmentation_samples/
|
||||
├── class_balance_before.json
|
||||
├── class_balance_after.json
|
||||
├── balance_histogram.png
|
||||
├── manifest_statistics.json
|
||||
├── validation_report.json
|
||||
├── validation_summary.md
|
||||
└── flagged_for_review.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Risk Mitigation
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|------------|
|
||||
| Disk space exhaustion | Monitor disk usage, compress images, delete raw after processing |
|
||||
| Memory errors with large batches | Process in batches of 1000, use memory-mapped files |
|
||||
| Augmentation too aggressive | Visual review, conservative defaults, configurable parameters |
|
||||
| Class imbalance persists | Multiple oversampling strategies, weighted loss in training |
|
||||
| Slow processing | Multiprocessing, GPU acceleration for transforms |
|
||||
| Reproducibility issues | Save all configs, use fixed random seeds, version control |
|
||||
|
||||
---
|
||||
|
||||
## Performance Optimization Tips
|
||||
|
||||
1. **Batch Processing:** Process images in parallel using multiprocessing
|
||||
2. **Memory Efficiency:** Use generators, don't load all images at once
|
||||
3. **Disk I/O:** Use SSD, batch writes, memory-mapped files
|
||||
4. **Image Loading:** Use PIL with SIMD, or opencv for speed
|
||||
5. **Augmentation:** Apply on-the-fly during training (save disk space)
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
- Consider saving augmentation config separately from applying augmentations
|
||||
- On-the-fly augmentation during training is often preferred over pre-generating
|
||||
- Keep original unaugmented test set for fair evaluation
|
||||
- Document any images excluded and reasons
|
||||
- Save random seeds for all operations
|
||||
- Phase 4 will select model architecture based on processed dataset size
|
||||
Reference in New Issue
Block a user