Spanish

Author	SHA1	Message	Date
Trey T	05a367fdbe	Books — capture <li> vocab bullets the extractor was silently dropping extract_epub.py was walking <p> only, but every "Vocabulario" section in the Olly Richards EPUB lives inside <ul><li>...</li></ul>. That meant the heading made it through but the entries didn't — 680 vocab lines across 24 sections in this book were missing from the bundled JSON. Audit (text-node owner by closest block ancestor) confirmed <li> is the only silent drop: 5,260 nodes in <p>, 1,960 in <li>, 0 anywhere else. No <h1>-<h6>, tables, or blockquotes in this EPUB at all. Fix: walk find_all(["p", "li"]) in document order so bullet entries slot in right after their "Vocabulario" / list heading. Re-extracted (2,646 → 3,326 paragraphs), re-translated all 118 jobs in parallel Claude Code subagents. translate_chapters.py prompt template now tells subagents to keep bilingual `palabra = meaning` lines verbatim — both sides already coexist on the line. Bumped bookDataVersion to 2 so refreshBooksDataIfNeeded re-seeds. Verified in simulator: all 13 chapter row sizes grew (e.g. ch6 18,295→20,951 chars). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 10:10:34 -05:00
Trey T	09e49bda2c	Add Books — read EPUB-imported books in Practice with tap-to-define New "Books" row in the Practice tab opens a library of bundled bilingual books. Each chapter renders Spanish paragraph-by-paragraph; tap any word for a definition sheet (DictionaryService with on-device AI fallback), or toggle the toolbar button to swap to the pre-computed English translation inline. Local-only Book + BookChapter SwiftData models added to the local container schema (reset version bumped to 5). DataLoader.seedBooks walks the bundle for `book_*.json` resources, so future books drop in without touching app code — just bundle a new JSON and bump bookDataVersion. First book: Olly Richards' "Spanish Short Stories For Beginners Vol 2" — 13 chapters, 2,646 paragraphs, bilingual. Scripts/books/ is the repeatable pipeline for future EPUBs: extract_epub.py → translate_chapters.py (per-chapter resumable jobs) → bundle_book.py. Translation is done by parallel Claude Code subagents reading per-job input files and writing output files — no API key required, matching the pattern used for the textbook vocab vision pass. See Scripts/books/README.md for the full how-to. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 09:21:44 -05:00
Trey T	05a0cc0d17	Issue #32 cleanup — drop the last 5 mis-oriented vocab pairs Two small fixes after the LLM-vision pass: 1. merge_pdf_into_book.py — when the LLM classifies an image as 'hybrid' but extracts zero pairs (e.g., a conjugation table whose only English text is on the section header that was excluded by the prompt rules), respect that decision instead of falling through to the bbox/heuristic pipeline. Previously: 1 chapter-2 estar conjugation table generated 4 bad pairs from the heuristic fallback. 2. fix_vocab.py language_score — recognize Spanish present-perfect ('he tenido', 'He andado por este pueblo') as Spanish. The classifier was treating the auxiliary 'he'/'has'/'ha' as English subject pronouns, producing false-positive mis-orientation flags on 4 chapter-15/20/23 present-perfect example tables. Result: mis-oriented vocab pairs across the book go from 5 → 0. textbookDataVersion bumped to 14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 18:52:53 -05:00
Trey T	f368c24ad6	Fixes #32 — LLM vision pass for vocab pairs, fixes scrambled English/Spanish The bbox-OCR pipeline mis-paired ~114 vocab tables across the book — the chapter 7 "Other Idioms" image (issue #32) being the most visible. Three failure modes were collapsing the data: 1) classifier blind to subject pronouns ("yo", "I", etc.) 2) right-then-left OCR reads on 2-col tables 3) Y-cluster drift on multi-line cells in 4-col layouts Replaced the entire vocab-extraction tier with a Claude vision pass over all 931 vocab images. Output is keyed by image with three classifications: - pair_table (extract all Spanish↔English pairs) - reference_only (Spanish-only conjugation tables — no pairs, UI shows the flat OCR lines as a reference list instead) - hybrid (some header pairs + reference content beneath; only the genuine pairs become cards) merge_pdf_into_book.py now picks pair source by priority: llm-vision → bounding-box OCR → block-alternation heuristic. Numbers (across the whole book): - mis-oriented tables: 114 → 5 - quarantined cards: 250 → 2 - extracted pairs: 2832 → 4569 textbookDataVersion bumped to 13. Per-batch agent outputs gitignored under Conjuga/Scripts/textbook/paired_vocab_llm/ — only the merged paired_vocab_llm.json (also gitignored) is needed to rebuild. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 18:48:04 -05:00
Trey t	fcb907718a	Re-curate videos toward preferred channels Swap 24 tense-guide / grammar-note videos to The Language Tutor's numbered lesson series where a matching lesson exists, filling the two remaining gaps (ind_preterito_anterior → Lesson 65, estar-gerund- progressive → Lesson 113). All 32 TLT picks preserved on this pass. For the non-TLT slots, prefer BaseLang's beginner lesson series where a topic-specific video exists: ser-vs-estar, preterite-vs-imperfect, subjunctive-triggers, object-pronouns, conditional-if-clauses, tener-expressions, future-vs-ir-a, possessive-adjectives, irregular-yo-verbs, and stem-changing-verbs. Retire both Tell Me In Spanish videos (personal-a → castellano4U, types-of-irregular-verbs → Master IRREGULAR VERBS Complete Lesson). Generator header note clarifies that "not available on this app" rows are a transient yt-dlp extraction limit — videos still play when tapped in the app via the Stream button, which opens youtube.com externally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 07:24:53 -05:00
Trey t	9c7033d1b4	Add curated-videos markdown report + generator script youtube_videos.md lists every entry in youtube_videos.json with its tense-guide / grammar-note id, title, channel, upload date, duration, views, and likes (where public). Also flags the two topics with no curated video so the gap is auditable in one place. generate_videos_markdown.py queries yt-dlp in parallel for each unique videoId and writes the markdown. Rerun when curation changes. One current entry (saber-vs-conocer → j87i7MVCvIE) is now marked Private Video — needs re-curation as a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 07:07:41 -05:00
Trey T	5f90a01314	Render textbook vocab as paired Spanish→English grid Previously the chapter reader showed vocab tables as a flat list of OCR lines — because Vision reads columns top-to-bottom, the Spanish column appeared as one block followed by the English column, making pairings illegible. Now every vocab table renders as a 2-column grid with Spanish on the left and English on the right. Supporting changes: - New ocr_all_vocab.swift: bounding-box OCR over all 931 vocab images, cluster lines into rows by Y-coordinate, split rows by largest X-gap, detect 2- / 3- / 4-column layouts automatically. ~2800 pairs extracted this pass vs ~1100 from the old block-alternation heuristic. - merge_pdf_into_book.py now prefers bounding-box pairs when present, falls back to the heuristic, embeds the resulting pairs as vocab_table.cards in book.json. - DataLoader passes cards through to TextbookBlock on seed. - TextbookChapterView renders cards via SwiftUI Grid (2 cols). - fix_vocab.py quarantine rule relaxed — only mis-pairs where both sides are clearly the same language are removed. "unknown" sides stay (bbox pipeline already oriented them correctly). Textbook card count jumps from 1044 → 3118 active pairs. textbookDataVersion bumped to 9. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 15:58:41 -05:00
Trey T	63dfc5e41a	Add textbook reader, exercise grading, stem-change toggle, extraction pipeline Major changes: - Textbook UI: chapter list, reader, and interactive exercise view (keyboard + Apple Pencil) surfaced under the Course tab. 30 chapters, 251 exercises. - Stem-change conjugation toggle on Week 4 flashcard decks (E-IE, E-I, O-UE). Uses existing VerbForm + IrregularSpan data to render highlighted present tense conjugations inline. - Deterministic on-device answer grader with partial credit (correct / close for accent-stripped or single-char-typo / wrong). 11 unit tests cover it. - SharedModels: TextbookChapter (local), TextbookExerciseAttempt (cloud- synced), AnswerGrader helpers. Bumped schema. - DataLoader: textbook seeder (version 8) + refresh helpers that preserve LanGo course decks when textbook data is re-seeded. - Local extraction pipeline in Conjuga/Scripts/textbook/ — XHTML chapter parser, answer-key parser, macOS Vision image OCR + PDF page OCR, merger, NSSpellChecker validator, language-aware auto-fixer, and repair pass that re-pairs quarantined vocab rows using bounding-box coordinates. - UI test target (ConjugaUITests) with three tests: end-to-end textbook flow, all-chapters screenshot audit, and stem-change toggle verification. Generated textbook content (textbook_data.json, textbook_vocab.json) and third-party source files are gitignored — re-run Scripts/textbook/run_pipeline.sh locally to regenerate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 15:12:55 -05:00
Trey T	5b69f3b630	Fixes #19 — Add English translations to exceptional yo form flashcards Cards now show "tengo — I have" instead of just "tengo", so learners see the English meaning alongside the Spanish yo form. Bumps course data version to 6 to trigger re-seed on next launch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 08:02:40 -05:00
Trey t	4b467ec136	Initial commit: Conjuga Spanish conjugation app Includes SwiftData dual-store architecture (local reference + CloudKit user data), JSON-based data seeding, 20 tense guides, 20 grammar notes, SRS review system, course vocabulary, and widget support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 20:58:33 -05:00

10 Commits