-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange behaviour of LatinBackoffLemmatizer
with plural nouns of the second declension
#1198
Comments
LatinBackoffLemmatizer
with plural nouns of the second declensionLatinBackoffLemmatizer
with plural nouns of the second declension
Different lemmas can have an identical form. For example: jus is the form of a lemma meaning "law", "right" and an other lemma meaning "gravy", "juice". In order to distinguish them, ambiguous lemmas get a trailing number. Here it can be jus1 and jus2. The rule-based lemmatizer is this one (https://github.com/cltk/lat_models_cltk/blob/master/lemmata/latin_lemmata_cltk.py), as far as I know. |
@diyclassics can probably give you more details on how to know which meaning is attached to which lemma. |
Thank you very much, Clément! So, this isn't a bug, but a precise choice: the final number is used to disambiguate. Good to know! |
This is not a bug, but this must be better documented. |
Processing Latin plural nouns from the second declension, sometimes the
LatinBackoffLemmatizer
adds a trailing digit.I observed this strange behaviour with the term "lupus":
On the other hand, the term "amicus" does not present this bug:
I guess the fault lies with the
DictLemmatizer
:Environment: Windows 10 + Python 3.9.15 + cltk 1.1.6
The text was updated successfully, but these errors were encountered: