Files
Spanish/Conjuga/Scripts/guide-enrichment/PLAN.md
Trey T de446b2301 Guide — enrich present-subjunctive entry with WEIRDO + ESCAPA + plan
The present-subjunctive guide was surface-level: two numbered usages
and a handful of examples, no mnemonic and no structural trigger cue.
That's the recurring problem with the tense guides — they're reference
cards, not teaching materials.

This commit fixes the immediate gap and lays out a plan to fix the
rest:

  Conjuga/conjuga_data.json — subj_presente body expanded from 794 to
  3670 chars. Adds the WEIRDO mnemonic with per-letter triggers and
  examples (Wishes, Emotions, Impersonal, Recommendations, Doubt,
  Ojalá), the ESCAPA adverbial-conjunction set, the "que + change of
  subject" structural rule, adjectival clauses with unknown
  antecedents, and the future-time-clause rule (cuando / hasta que /
  en cuanto).

  Scripts/guide-enrichment/PLAN.md (new) — audit of all 20 tense
  guides and 36 grammar notes, tier-1/2/3 prioritisation, "thorough"
  checklist (TL;DR, usages, conjugation, irregulars, mnemonic,
  pitfalls, contrast, dialogue example), research sources, per-topic
  workflow, effort estimate.

  DataLoader.swift — courseDataVersion 7 → 8 so existing installs
  re-seed the new body on next launch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 23:33:09 -05:00

120 lines
9.2 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Guide enrichment plan
**Trigger**: WEIRDO was missing from the present-subjunctive guide. That's a perfect example of a deeper problem — most tense guides are surface-level reference cards (2-3 usages + examples), missing the mnemonics, contrast tables, and exception lists a real Spanish teacher would hand out.
**Goal**: bring every tense guide and grammar note up to "teacher-handout" depth — enough that a learner could study from it alone and pass a quiz.
## Current state (audit, 2026-05-11)
| Surface | Items | Source of truth | Typical body length | Verdict |
|---|---|---|---|---|
| Tense guides | 20 | `Conjuga/Conjuga/conjuga_data.json``tenseGuides[]` | 5001500 chars | **Shallow** — bare *Usages* + examples |
| Grammar notes | ~36 | `Conjuga/Conjuga/Models/GrammarNote.swift` (`GrammarNote.allNotes`, `generatedNotes`) | 15003000 chars | **Decent** — most have mnemonics and contrast examples |
| Reference store | — | `Conjuga/Conjuga/Services/ReferenceStore.swift` | varies | Not in scope for this pass |
Tense guides are the bulk of the work. Grammar notes need a smaller audit-and-fill pass.
## What "thorough" looks like
Every tense guide should include, at minimum:
1. **Quick TL;DR** — one sentence: what is this tense for?
2. **When to use it** — numbered usages, each with 2 contrast examples (a clear case and a borderline / common-mistake case).
3. **How to form it** — conjugation pattern for regular verbs (one table per AR/ER/IR if it differs), plus the irregular pattern callout if applicable. Cross-reference the conjugator screens if relevant.
4. **Common irregulars** — top 510 irregular verbs that learners will hit immediately in this tense (ser, estar, ir, tener, haber, dar, ver, decir, hacer, querer, poder, poner, saber, salir, traer, venir).
5. **Triggers / mnemonics** — words and structures that signal this tense. WEIRDO and ESCAPA for subjunctive; "yesterday / last X / specific time" for preterite; "used to / when I was a kid" for imperfect; etc.
6. **Pitfalls** — the top 35 mistakes English speakers make. e.g. preterite vs imperfect mixups, ir vs venir, ser vs estar overlap.
7. **Tense-vs-tense contrast** — pair with the closest neighbour and show 2 minimal pairs (preterite ↔ imperfect, present ↔ present-progressive, future ↔ ir-a + infinitive, subjunctive-presente ↔ subjunctive-imperfecto).
8. **Real-world feel** — 23 dialogue-style examples showing the tense in natural use, not just isolated sentences.
Every grammar note should include, at minimum:
1. The core distinction in one line.
2. Each side of the distinction with 46 clear examples covering different positions in a sentence.
3. A mnemonic if one is standard in the language (DOCTOR/PLACE, WEIRDO, ESCAPA, etc.).
4. Edge cases / verbs that change meaning (e.g. ser/estar adjectives, conocer/saber overlap).
5. A practice prompt: "Try translating these 3 sentences, then check below."
## Priority order
Triaged by learner impact (frequency of use × typical confusion):
**Tier 1 — most-used, most-confused** (do first):
1. `ind_presente` (Present indicative) — already 1324 chars, the longest tense guide. Audit for gaps; probably needs irregular tables.
2. `ind_preterito` (Preterite) — currently 492 chars, the shortest. **Highest priority** — every learner hits this and gets it wrong.
3. `ind_imperfecto` (Imperfect) — 774 chars. Always taught alongside preterite; the contrast is the entire game.
4. `subj_presente` (Present subjunctive) — ✅ done in this pass.
5. `imp_afirmativo` + `imp_negativo` (Imperative pair) — combined 2037 chars. Needs the tú/usted/nosotros/vosotros table and the negative-flips-to-subjunctive rule highlighted.
**Tier 2 — common but often skimped**:
6. `ind_futuro` (Simple future) — needs contrast with ir-a + infinitive (already covered in grammar notes; cross-link).
7. `cond_presente` (Conditional) — needs the "if-clause" patterns and the "softening request" usage ("¿Podrías…?").
8. `ind_perfecto` (Present perfect) — needs the haber + past participle conjugation table and the "ya / todavía / alguna vez" trigger words.
9. `subj_imperfecto_1` + `subj_imperfecto_2` (Past subjunctive -ra / -se) — needs the if-clause + condicional pairing.
**Tier 3 — compound and less-frequent** (still must be thorough):
10. `ind_pluscuamperfecto`, `ind_futuro_perfecto`, `ind_preterito_anterior` (literary)
11. `cond_perfecto`, `subj_perfecto`, `subj_pluscuamperfecto_1`, `subj_pluscuamperfecto_2`
12. `subj_futuro`, `subj_futuro_perfecto` (largely archaic — note they're rare but explain why they exist)
**Grammar notes audit**:
- Pass through all 36, score each on the "thorough" criteria above.
- Fill the gaps. Most already have mnemonics; some don't.
## Research sources
Cite explicitly in each draft so reviewers can verify. Order of trust:
1. **Real Academia Española (RAE) — Nueva gramática de la lengua española** — authoritative reference. Free online: `rae.es`.
2. **Studyspanish.com** and **SpanishDict.com** grammar references — best free per-topic explanations, well-curated example sentences.
3. **Practice Makes Perfect: Complete Spanish Grammar** (Dorothy Richmond, McGraw-Hill) — standard teaching reference. The PDF is already at the repo root for cross-reference.
4. **Lawless Spanish** (Laura Lawless) — accurate, concise, good on subjunctive nuances.
5. **The user's existing textbook***Complete Spanish Step-by-Step* (Bregstein) is already bundled. Cross-reference its chapter on each tense to keep voice consistent.
6. **YouTube — Butterfly Spanish (Ana), Spring Spanish, Dreaming Spanish (Pablo)** — for natural-use examples and the "feel" of when a native reaches for the tense. The repo already has a curated YouTube list at `Conjuga/Conjuga/youtube_videos.json` — pull from there when a topic has a matching video.
For mnemonics specifically: WEIRDO, ESCAPA, DOCTOR, PLACE are standard. Don't invent new ones unless we can't find a known one.
## Workflow per topic
This is what an enrichment "unit of work" looks like:
1. **Draft** — A research agent (Claude Code subagent, no API key, same pattern as the book translation pipeline) reads the current guide body, consults the sources listed above, drafts a new body following the "thorough" structure. Writes to `Conjuga/Scripts/guide-enrichment/drafts/<topicId>.md`.
2. **Self-review** — same agent re-reads its own draft against the checklist (TL;DR present? mnemonic present? contrast pair? top 3 pitfalls?). Notes anything it couldn't find a source for.
3. **Integrate** — a script reads the draft, swaps it into `conjuga_data.json` (for tense guides) or `GrammarNote.swift` (for grammar notes), bumps `courseDataVersion`, runs build to verify.
4. **Spot-check** — user opens the topic in the app on device, reads it, flags anything that feels wrong or missing.
5. **Commit** — one commit per topic, message: "Guide enrichment — <topic> (tier N)".
Batching: do tier-1 topics one at a time so the user can review and shape what "thorough enough" looks like. Tiers 2 and 3 can batch 35 topics per session once the format is dialed in.
## Tooling
Two small scripts will speed this up:
- **`enrich_topic.py <topicId>`** — opens the current body, writes a Markdown template at `drafts/<topicId>.md` with the section headers pre-filled, and prints a research prompt the user can hand to a subagent.
- **`apply_draft.py <topicId>`** — reads `drafts/<topicId>.md`, validates the section structure, swaps it into `conjuga_data.json` (or `GrammarNote.swift` for grammar notes), bumps `courseDataVersion`.
Build both when starting tier 1. Don't build them speculatively now.
## Effort estimate
- Tier 1 (5 topics): ~30 min research + 30 min draft + 15 min integrate = **~75 min per topic, ~6 hours total**.
- Tier 2 (4 topics): faster once the format is dialed in. ~45 min each, ~3 hours.
- Tier 3 (11 topics): ~30 min each (most are compound tenses with similar structure), ~5 hours.
- Grammar notes audit + fill: ~10 min audit each × 36 = 6 hours; ~30 min fill on the ~10 that need it = 5 hours. Total ~11 hours.
**Total scoped at ~25 hours.** Spread across sessions: maybe one tier-1 topic per session, two tier-2 or three tier-3 per session once the format's locked in.
## Ship plan
- Each commit is one topic enriched. Small, reviewable diffs.
- `courseDataVersion` bumps per commit so the change propagates on next launch.
- The user can preview new bodies via the in-app Guide tab without needing a redeploy after the commit hits gitea — they just need to rebuild + reinstall.
- The plan doc itself lives here so future sessions can pick up where this one left off without needing to re-derive the structure.
## Out of scope (intentional)
- Audio recordings of example sentences (could be a future TTS pre-bake).
- Per-region variants (Latin American vs Castilian usage notes) — flag when they matter (vosotros, leísmo), don't comprehensively document.
- Interactive exercises tied to each guide (separate Tests/Quiz infrastructure exists; cross-link instead of duplicate).
- Translation of the guides into Spanish (current guides are English-explanation, Spanish-examples; keep that asymmetry).
- A complete grammar-textbook rewrite. Stop at "depth a teacher would hand out as supplementary material."