Books — pre-computed per-book glossary for context-correct word lookup
The book reader's word lookup used DictionaryService, a verb-conjugation index plus ~200 hand-typed words: ordinary nouns like "taza" returned nothing, and homographs always lost (tapping "como" in "como siempre" gave the verb "comer" because the verb index is checked first). Add a glossary phase to the books pipeline (build_glossary.py): every distinct Spanish word is translated once, in its sentence context, by the same Claude-Code-subagent LLM step the pipeline already uses for chapter translation. English front matter is excluded by an ES==EN paragraph-ratio heuristic. The glossary is bundled into book_<slug>.json and is now part of the pipeline for every book. In the app, Book carries the decoded glossary and BookReaderView resolves each tap automatically through cache -> glossary -> DictionaryService -> on-device LLM, citing which source answered so a curated glossary hit reads differently from a best-effort AI guess. book_olly-vol2.json regenerated with a 3,658-word glossary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -23,11 +23,13 @@ fi
|
||||
EPUB="$1"; shift
|
||||
SLUG=""
|
||||
BATCH_SIZE="30"
|
||||
GLOSSARY_BATCH_SIZE="150"
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--slug) SLUG="$2"; shift 2 ;;
|
||||
--batch-size) BATCH_SIZE="$2"; shift 2 ;;
|
||||
--glossary-batch-size) GLOSSARY_BATCH_SIZE="$2"; shift 2 ;;
|
||||
*) echo "unknown option: $1" >&2; exit 2 ;;
|
||||
esac
|
||||
done
|
||||
@@ -53,12 +55,22 @@ python3 translate_chapters.py "$SLUG" --batch-size "$BATCH_SIZE"
|
||||
PENDING_FILE="build/$SLUG/jobs/_pending.txt"
|
||||
PENDING_COUNT=$(wc -l < "$PENDING_FILE" | tr -d ' ')
|
||||
|
||||
echo
|
||||
echo "=== Phase 2b: build_glossary.py ==="
|
||||
python3 build_glossary.py "$SLUG" --batch-size "$GLOSSARY_BATCH_SIZE"
|
||||
|
||||
GLOSS_PENDING_FILE="build/$SLUG/glossary/_pending.txt"
|
||||
GLOSS_PENDING_COUNT=$(wc -l < "$GLOSS_PENDING_FILE" | tr -d ' ')
|
||||
TOTAL_PENDING=$((PENDING_COUNT + GLOSS_PENDING_COUNT))
|
||||
|
||||
echo
|
||||
echo "=== Phase 3: bundle_book.py ==="
|
||||
if [[ "$PENDING_COUNT" -gt 0 ]]; then
|
||||
echo " $PENDING_COUNT translation job(s) still pending."
|
||||
echo " Run the Claude Code subagent translation step (see README.md), then re-run this script."
|
||||
echo " Bundling with empty placeholders so you can preview app structure now."
|
||||
if [[ "$TOTAL_PENDING" -gt 0 ]]; then
|
||||
echo " $PENDING_COUNT translation job(s) and $GLOSS_PENDING_COUNT glossary job(s) still pending."
|
||||
echo " Run the Claude Code subagent step (see README.md) for BOTH manifests:"
|
||||
echo " build/$SLUG/jobs/_pending.txt (translation)"
|
||||
echo " build/$SLUG/glossary/_pending.txt (glossary)"
|
||||
echo " then re-run this script. Bundling with placeholders so you can preview now."
|
||||
python3 bundle_book.py "$SLUG"
|
||||
else
|
||||
python3 bundle_book.py "$SLUG" --require-all
|
||||
|
||||
Reference in New Issue
Block a user