05a0cc0d17
Two small fixes after the LLM-vision pass:
1. merge_pdf_into_book.py — when the LLM classifies an image as 'hybrid'
but extracts zero pairs (e.g., a conjugation table whose only English
text is on the section header that was excluded by the prompt rules),
respect that decision instead of falling through to the bbox/heuristic
pipeline. Previously: 1 chapter-2 estar conjugation table generated
4 bad pairs from the heuristic fallback.
2. fix_vocab.py language_score — recognize Spanish present-perfect
('he tenido', 'He andado por este pueblo') as Spanish. The classifier
was treating the auxiliary 'he'/'has'/'ha' as English subject pronouns,
producing false-positive mis-orientation flags on 4 chapter-15/20/23
present-perfect example tables.
Result: mis-oriented vocab pairs across the book go from 5 → 0.
textbookDataVersion bumped to 14.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>