Vocab study — noun & adjective flashcards with CEFR level toggles
Add SRS-driven noun and adjective flashcards modeled on the existing verb flashcard flow: - SharedModels/Lexeme — catalog of non-verb vocab, frequency-ranked, with gender for nouns and optional example sentences. Seeded from a bundled vocab_lexemes.json built by Scripts/vocab/build_lexemes.py, which joins frequency.csv + es-en.data from a pinned doozan/spanish_data commit (CC-BY-SA: hermitdave/FrequencyWords + Wiktionary). 1,449 nouns and 600 adjectives, each with Wiktionary-sourced gender and (where available) an example sentence with English translation. - LexemeReviewCard + LexemeReviewStore — cloud-synced SM-2 SRS, keyed by partOfSpeech + lexemeId + drillMode so future drill modes can coexist. - LexemeSessionQueue + LexemePool — parallel to VocabSessionQueue; fresh cards sort by frequency rank. - LexemeStudyGroup — cloud-synced resumable session per (partOfSpeech, drillMode). - NounFlashcardPracticeView + AdjectiveFlashcardPracticeView — same flow as VocabFlashcardPracticeView: English prompt → tap to reveal Spanish → Again/Hard/Good/Easy. Nouns reveal with their article (la taza, el problema) so gender is taught alongside meaning, not as a separate quiz. Example sentence shown when present. CEFR-style level toggles: - LexemeLevel enum (A1/A2/B1/B2/C1+) derived from frequencyRank with standard Spanish-frequency-dictionary cutoffs (250/500/1000/2000). - UserProgress.selectedLexemeLevels — cloud-synced multi-select, defaults to A1+A2 on first launch. - SettingsView gains a "Vocabulary Levels" section with five toggles; the existing "Levels" section is renamed "Verb Levels" for clarity. - Due SRS cards always surface regardless of toggles. Disabling a level only stops new cards from that band entering the pool. PracticeView gets "Nouns" and "Adjectives" rows under "Books". DataLoader: new lexemeDataVersion gate that re-seeds the Lexeme table from vocab_lexemes.json independent of book seeding. project.yml lists the new JSON resource and the existing book_olly-vol2.json (which the previous build was silently excluding because xcodegen rewrote the project from project.yml). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,62 @@
|
||||
# Vocab catalog build
|
||||
|
||||
`build_lexemes.py` produces `Conjuga/vocab_lexemes.json`, the bundled catalog
|
||||
of frequency-ranked Spanish nouns and adjectives that powers the Noun /
|
||||
Adjective flashcard study modes.
|
||||
|
||||
## Run
|
||||
|
||||
```sh
|
||||
python3 build_lexemes.py
|
||||
```
|
||||
|
||||
Downloads `frequency.csv` + `es-en.data` from a pinned commit of
|
||||
[`doozan/spanish_data`](https://github.com/doozan/spanish_data), caches them
|
||||
under `.cache/<commit>/`, joins them, and writes the JSON. Re-running is
|
||||
fast — only the join step happens after the first download.
|
||||
|
||||
Override defaults:
|
||||
|
||||
```sh
|
||||
python3 build_lexemes.py --max-nouns 3000 --max-adjectives 1000
|
||||
python3 build_lexemes.py --output /tmp/vocab.json
|
||||
```
|
||||
|
||||
## Data sources & attribution
|
||||
|
||||
All datasets are CC-licensed; the bundled catalog inherits CC-BY-SA. Credit
|
||||
in the app's About screen must read:
|
||||
|
||||
> Vocabulary data: Wiktionary (CC-BY-SA), OpenSubtitles via FrequencyWords
|
||||
> (CC-BY-SA 3.0).
|
||||
|
||||
- **`frequency.csv`** — derived from
|
||||
[hermitdave/FrequencyWords](https://github.com/hermitdave/FrequencyWords)
|
||||
(OpenSubtitles corpus), packaged by doozan. License: CC-BY-SA 3.0.
|
||||
- **`es-en.data`** — Spanish→English Wiktionary export in the
|
||||
[`enwiktionary_wordlist`](https://github.com/doozan/enwiktionary_wordlist)
|
||||
format. License: CC-BY-SA.
|
||||
|
||||
The pinned doozan commit is at the top of `build_lexemes.py`
|
||||
(`DOOZAN_COMMIT`). Bump it to refresh; the cache key includes the commit so
|
||||
old data is auto-replaced.
|
||||
|
||||
## Output shape
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"baseForm": "casa",
|
||||
"english": "house",
|
||||
"partOfSpeech": "noun",
|
||||
"gender": "f",
|
||||
"frequencyRank": 142,
|
||||
"exampleES": "La casa es grande",
|
||||
"exampleEN": "The house is big"
|
||||
},
|
||||
...
|
||||
]
|
||||
```
|
||||
|
||||
Sorted by `frequencyRank` ascending so the fresh-card path in `LexemePool`
|
||||
surfaces the most useful words first.
|
||||
Reference in New Issue
Block a user