Books — pre-computed per-book glossary for context-correct word lookup

The book reader's word lookup used DictionaryService, a verb-conjugation
index plus ~200 hand-typed words: ordinary nouns like "taza" returned
nothing, and homographs always lost (tapping "como" in "como siempre"
gave the verb "comer" because the verb index is checked first).

Add a glossary phase to the books pipeline (build_glossary.py): every
distinct Spanish word is translated once, in its sentence context, by
the same Claude-Code-subagent LLM step the pipeline already uses for
chapter translation. English front matter is excluded by an ES==EN
paragraph-ratio heuristic. The glossary is bundled into book_<slug>.json
and is now part of the pipeline for every book.

In the app, Book carries the decoded glossary and BookReaderView resolves
each tap automatically through cache -> glossary -> DictionaryService ->
on-device LLM, citing which source answered so a curated glossary hit
reads differently from a best-effort AI guess.

book_olly-vol2.json regenerated with a 3,658-word glossary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trey T
2026-05-18 10:44:32 -05:00
parent d0582c4ce7
commit 3ee1563cb0
10 changed files with 18669 additions and 24 deletions
+16 -4
View File
@@ -23,11 +23,13 @@ fi
EPUB="$1"; shift
SLUG=""
BATCH_SIZE="30"
GLOSSARY_BATCH_SIZE="150"
while [[ $# -gt 0 ]]; do
case "$1" in
--slug) SLUG="$2"; shift 2 ;;
--batch-size) BATCH_SIZE="$2"; shift 2 ;;
--glossary-batch-size) GLOSSARY_BATCH_SIZE="$2"; shift 2 ;;
*) echo "unknown option: $1" >&2; exit 2 ;;
esac
done
@@ -53,12 +55,22 @@ python3 translate_chapters.py "$SLUG" --batch-size "$BATCH_SIZE"
PENDING_FILE="build/$SLUG/jobs/_pending.txt"
PENDING_COUNT=$(wc -l < "$PENDING_FILE" | tr -d ' ')
echo
echo "=== Phase 2b: build_glossary.py ==="
python3 build_glossary.py "$SLUG" --batch-size "$GLOSSARY_BATCH_SIZE"
GLOSS_PENDING_FILE="build/$SLUG/glossary/_pending.txt"
GLOSS_PENDING_COUNT=$(wc -l < "$GLOSS_PENDING_FILE" | tr -d ' ')
TOTAL_PENDING=$((PENDING_COUNT + GLOSS_PENDING_COUNT))
echo
echo "=== Phase 3: bundle_book.py ==="
if [[ "$PENDING_COUNT" -gt 0 ]]; then
echo " $PENDING_COUNT translation job(s) still pending."
echo " Run the Claude Code subagent translation step (see README.md), then re-run this script."
echo " Bundling with empty placeholders so you can preview app structure now."
if [[ "$TOTAL_PENDING" -gt 0 ]]; then
echo " $PENDING_COUNT translation job(s) and $GLOSS_PENDING_COUNT glossary job(s) still pending."
echo " Run the Claude Code subagent step (see README.md) for BOTH manifests:"
echo " build/$SLUG/jobs/_pending.txt (translation)"
echo " build/$SLUG/glossary/_pending.txt (glossary)"
echo " then re-run this script. Bundling with placeholders so you can preview now."
python3 bundle_book.py "$SLUG"
else
python3 bundle_book.py "$SLUG" --require-all