Commit Graph

6 Commits

Author SHA1 Message Date
Trey T 179400b90d Course — Review Course Material row with bundled weekly PDFs
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 15:40:02 -05:00
Trey T f368c24ad6 Fixes #32 — LLM vision pass for vocab pairs, fixes scrambled English/Spanish
The bbox-OCR pipeline mis-paired ~114 vocab tables across the book — the
chapter 7 "Other Idioms" image (issue #32) being the most visible.
Three failure modes were collapsing the data:
  1) classifier blind to subject pronouns ("yo", "I", etc.)
  2) right-then-left OCR reads on 2-col tables
  3) Y-cluster drift on multi-line cells in 4-col layouts

Replaced the entire vocab-extraction tier with a Claude vision pass over
all 931 vocab images. Output is keyed by image with three classifications:
  - pair_table       (extract all Spanish↔English pairs)
  - reference_only   (Spanish-only conjugation tables — no pairs, UI shows
                      the flat OCR lines as a reference list instead)
  - hybrid           (some header pairs + reference content beneath; only
                      the genuine pairs become cards)

merge_pdf_into_book.py now picks pair source by priority:
  llm-vision → bounding-box OCR → block-alternation heuristic.

Numbers (across the whole book):
  - mis-oriented tables: 114 → 5
  - quarantined cards:   250 → 2
  - extracted pairs:     2832 → 4569

textbookDataVersion bumped to 13. Per-batch agent outputs gitignored
under Conjuga/Scripts/textbook/paired_vocab_llm/ — only the merged
paired_vocab_llm.json (also gitignored) is needed to rebuild.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:48:04 -05:00
Trey T cd491bd695 Bundle textbook JSON so fresh clones build without re-running pipeline
The pbxproj references textbook_data.json and textbook_vocab.json as Copy
Bundle Resources, so xcodebuild fails if they're missing. Committing the
generated output keeps the repo self-sufficient — regenerate via
Conjuga/Scripts/textbook/run_pipeline.sh when content changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:28:45 -05:00
Trey T 63dfc5e41a Add textbook reader, exercise grading, stem-change toggle, extraction pipeline
Major changes:
- Textbook UI: chapter list, reader, and interactive exercise view (keyboard
  + Apple Pencil) surfaced under the Course tab. 30 chapters, 251 exercises.
- Stem-change conjugation toggle on Week 4 flashcard decks (E-IE, E-I, O-UE).
  Uses existing VerbForm + IrregularSpan data to render highlighted present
  tense conjugations inline.
- Deterministic on-device answer grader with partial credit (correct / close
  for accent-stripped or single-char-typo / wrong). 11 unit tests cover it.
- SharedModels: TextbookChapter (local), TextbookExerciseAttempt (cloud-
  synced), AnswerGrader helpers. Bumped schema.
- DataLoader: textbook seeder (version 8) + refresh helpers that preserve
  LanGo course decks when textbook data is re-seeded.
- Local extraction pipeline in Conjuga/Scripts/textbook/ — XHTML chapter
  parser, answer-key parser, macOS Vision image OCR + PDF page OCR, merger,
  NSSpellChecker validator, language-aware auto-fixer, and repair pass that
  re-pairs quarantined vocab rows using bounding-box coordinates.
- UI test target (ConjugaUITests) with three tests: end-to-end textbook
  flow, all-chapters screenshot audit, and stem-change toggle verification.

Generated textbook content (textbook_data.json, textbook_vocab.json) and
third-party source files are gitignored — re-run Scripts/textbook/run_pipeline.sh
locally to regenerate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:12:55 -05:00
Trey t 47a7871c38 Add 13 new grammar notes with 1010 exercises from video extraction
Scraped a 4h Spanish fundamentals YouTube video (transcript + OCR on
14810 frames), extracted structured content across 52 chapters, and
generated fill-in-the-blank quizzes for every grammar topic.

- 13 new GrammarNote entries (articles, possessives, demonstratives,
  greetings, poder, al/del, prepositional pronouns, irregular yo,
  stem-changing, stressed possessives, present/future perfect, present
  indicative conjugation)
- 1010 generated exercises across all 36 grammar notes (new + existing)
- Fix tense guide parser to handle unnumbered *Usages* blocks
- Rewrite 6 broken tense guide bodies (imperative, subj pluperfect,
  subj future) with numbered usage format
- Bump courseDataVersion 5→6 with TenseGuide refresh on upgrade
- Add docs/spanish-fundamentals/ with raw transcripts, polished notes,
  structured JSON, and exercise data

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:40:05 -05:00
Trey t 4b467ec136 Initial commit: Conjuga Spanish conjugation app
Includes SwiftData dual-store architecture (local reference + CloudKit user data),
JSON-based data seeding, 20 tense guides, 20 grammar notes, SRS review system,
course vocabulary, and widget support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 20:58:33 -05:00