Spanish/Conjuga/Scripts/guide-enrichment/PLAN.md

# Guide enrichment plan

**Trigger**: WEIRDO was missing from the present-subjunctive guide. That's a perfect example of a deeper problem — most tense guides are surface-level reference cards (2-3 usages + examples), missing the mnemonics, contrast tables, and exception lists a real Spanish teacher would hand out.

**Goal**: bring every tense guide and grammar note up to "teacher-handout" depth — enough that a learner could study from it alone and pass a quiz.

## Current state (audit, 2026-05-11)

| Surface | Items | Source of truth | Typical body length | Verdict |
|---|---|---|---|---|
| Tense guides | 20 | `Conjuga/Conjuga/conjuga_data.json` → `tenseGuides[]` | 500–1500 chars | **Shallow** — bare *Usages* + examples |
| Grammar notes | ~36 | `Conjuga/Conjuga/Models/GrammarNote.swift` (`GrammarNote.allNotes`, `generatedNotes`) | 1500–3000 chars | **Decent** — most have mnemonics and contrast examples |
| Reference store | — | `Conjuga/Conjuga/Services/ReferenceStore.swift` | varies | Not in scope for this pass |

Tense guides are the bulk of the work. Grammar notes need a smaller audit-and-fill pass.

## What "thorough" looks like

Every tense guide should include, at minimum:

1. **Quick TL;DR** — one sentence: what is this tense for?
2. **When to use it** — numbered usages, each with 2 contrast examples (a clear case and a borderline / common-mistake case).
3. **How to form it** — conjugation pattern for regular verbs (one table per AR/ER/IR if it differs), plus the irregular pattern callout if applicable. Cross-reference the conjugator screens if relevant.
4. **Common irregulars** — top 5–10 irregular verbs that learners will hit immediately in this tense (ser, estar, ir, tener, haber, dar, ver, decir, hacer, querer, poder, poner, saber, salir, traer, venir).
5. **Triggers / mnemonics** — words and structures that signal this tense. WEIRDO and ESCAPA for subjunctive; "yesterday / last X / specific time" for preterite; "used to / when I was a kid" for imperfect; etc.
6. **Pitfalls** — the top 3–5 mistakes English speakers make. e.g. preterite vs imperfect mixups, ir vs venir, ser vs estar overlap.
7. **Tense-vs-tense contrast** — pair with the closest neighbour and show 2 minimal pairs (preterite ↔ imperfect, present ↔ present-progressive, future ↔ ir-a + infinitive, subjunctive-presente ↔ subjunctive-imperfecto).
8. **Real-world feel** — 2–3 dialogue-style examples showing the tense in natural use, not just isolated sentences.

Every grammar note should include, at minimum:
1. The core distinction in one line.
2. Each side of the distinction with 4–6 clear examples covering different positions in a sentence.
3. A mnemonic if one is standard in the language (DOCTOR/PLACE, WEIRDO, ESCAPA, etc.).
4. Edge cases / verbs that change meaning (e.g. ser/estar adjectives, conocer/saber overlap).
5. A practice prompt: "Try translating these 3 sentences, then check below."

## Priority order

Triaged by learner impact (frequency of use × typical confusion):

**Tier 1 — most-used, most-confused** (do first):
1. `ind_presente` (Present indicative) — already 1324 chars, the longest tense guide. Audit for gaps; probably needs irregular tables.
2. `ind_preterito` (Preterite) — currently 492 chars, the shortest. **Highest priority** — every learner hits this and gets it wrong.
3. `ind_imperfecto` (Imperfect) — 774 chars. Always taught alongside preterite; the contrast is the entire game.
4. `subj_presente` (Present subjunctive) — ✅ done in this pass.
5. `imp_afirmativo` + `imp_negativo` (Imperative pair) — combined 2037 chars. Needs the tú/usted/nosotros/vosotros table and the negative-flips-to-subjunctive rule highlighted.

**Tier 2 — common but often skimped**:
6. `ind_futuro` (Simple future) — needs contrast with ir-a + infinitive (already covered in grammar notes; cross-link).
7. `cond_presente` (Conditional) — needs the "if-clause" patterns and the "softening request" usage ("¿Podrías…?").
8. `ind_perfecto` (Present perfect) — needs the haber + past participle conjugation table and the "ya / todavía / alguna vez" trigger words.
9. `subj_imperfecto_1` + `subj_imperfecto_2` (Past subjunctive -ra / -se) — needs the if-clause + condicional pairing.

**Tier 3 — compound and less-frequent** (still must be thorough):
10. `ind_pluscuamperfecto`, `ind_futuro_perfecto`, `ind_preterito_anterior` (literary)
11. `cond_perfecto`, `subj_perfecto`, `subj_pluscuamperfecto_1`, `subj_pluscuamperfecto_2`
12. `subj_futuro`, `subj_futuro_perfecto` (largely archaic — note they're rare but explain why they exist)

**Grammar notes audit**:
- Pass through all 36, score each on the "thorough" criteria above.
- Fill the gaps. Most already have mnemonics; some don't.

## Research sources

Cite explicitly in each draft so reviewers can verify. Order of trust:

1. **Real Academia Española (RAE) — Nueva gramática de la lengua española** — authoritative reference. Free online: `rae.es`.
2. **Studyspanish.com** and **SpanishDict.com** grammar references — best free per-topic explanations, well-curated example sentences.
3. **Practice Makes Perfect: Complete Spanish Grammar** (Dorothy Richmond, McGraw-Hill) — standard teaching reference. The PDF is already at the repo root for cross-reference.
4. **Lawless Spanish** (Laura Lawless) — accurate, concise, good on subjunctive nuances.
5. **The user's existing textbook** — *Complete Spanish Step-by-Step* (Bregstein) is already bundled. Cross-reference its chapter on each tense to keep voice consistent.
6. **YouTube — Butterfly Spanish (Ana), Spring Spanish, Dreaming Spanish (Pablo)** — for natural-use examples and the "feel" of when a native reaches for the tense. The repo already has a curated YouTube list at `Conjuga/Conjuga/youtube_videos.json` — pull from there when a topic has a matching video.

For mnemonics specifically: WEIRDO, ESCAPA, DOCTOR, PLACE are standard. Don't invent new ones unless we can't find a known one.

## Workflow per topic

This is what an enrichment "unit of work" looks like:

1. **Draft** — A research agent (Claude Code subagent, no API key, same pattern as the book translation pipeline) reads the current guide body, consults the sources listed above, drafts a new body following the "thorough" structure. Writes to `Conjuga/Scripts/guide-enrichment/drafts/<topicId>.md`.
2. **Self-review** — same agent re-reads its own draft against the checklist (TL;DR present? mnemonic present? contrast pair? top 3 pitfalls?). Notes anything it couldn't find a source for.
3. **Integrate** — a script reads the draft, swaps it into `conjuga_data.json` (for tense guides) or `GrammarNote.swift` (for grammar notes), bumps `courseDataVersion`, runs build to verify.
4. **Spot-check** — user opens the topic in the app on device, reads it, flags anything that feels wrong or missing.
5. **Commit** — one commit per topic, message: "Guide enrichment — <topic> (tier N)".

Batching: do tier-1 topics one at a time so the user can review and shape what "thorough enough" looks like. Tiers 2 and 3 can batch 3–5 topics per session once the format is dialed in.

## Tooling

Two small scripts will speed this up:

- **`enrich_topic.py <topicId>`** — opens the current body, writes a Markdown template at `drafts/<topicId>.md` with the section headers pre-filled, and prints a research prompt the user can hand to a subagent.
- **`apply_draft.py <topicId>`** — reads `drafts/<topicId>.md`, validates the section structure, swaps it into `conjuga_data.json` (or `GrammarNote.swift` for grammar notes), bumps `courseDataVersion`.

Build both when starting tier 1. Don't build them speculatively now.

## Effort estimate

- Tier 1 (5 topics): ~30 min research + 30 min draft + 15 min integrate = **~75 min per topic, ~6 hours total**.
- Tier 2 (4 topics): faster once the format is dialed in. ~45 min each, ~3 hours.
- Tier 3 (11 topics): ~30 min each (most are compound tenses with similar structure), ~5 hours.
- Grammar notes audit + fill: ~10 min audit each × 36 = 6 hours; ~30 min fill on the ~10 that need it = 5 hours. Total ~11 hours.

**Total scoped at ~25 hours.** Spread across sessions: maybe one tier-1 topic per session, two tier-2 or three tier-3 per session once the format's locked in.

## Ship plan

- Each commit is one topic enriched. Small, reviewable diffs.
- `courseDataVersion` bumps per commit so the change propagates on next launch.
- The user can preview new bodies via the in-app Guide tab without needing a redeploy after the commit hits gitea — they just need to rebuild + reinstall.
- The plan doc itself lives here so future sessions can pick up where this one left off without needing to re-derive the structure.

## Out of scope (intentional)

- Audio recordings of example sentences (could be a future TTS pre-bake).
- Per-region variants (Latin American vs Castilian usage notes) — flag when they matter (vosotros, leísmo), don't comprehensively document.
- Interactive exercises tied to each guide (separate Tests/Quiz infrastructure exists; cross-link instead of duplicate).
- Translation of the guides into Spanish (current guides are English-explanation, Spanish-examples; keep that asymmetry).
- A complete grammar-textbook rewrite. Stop at "depth a teacher would hand out as supplementary material."