-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The model trained for a non-english language is converting the single lower case '' i " into upper case " I " #61
Comments
preniqivjosa
changed the title
The model trained for non-english language is converting lower case '' i " into upper case " I "
The model trained for a non-english language is converting lower case '' i " into upper case " I "
Jul 17, 2020
preniqivjosa
changed the title
The model trained for a non-english language is converting lower case '' i " into upper case " I "
The model trained for a non-english language is converting the single lower case '' i " into upper case " I "
Jul 17, 2020
Hi, are you using convert_to_readable.py or demo_play_with_model.py scripts? These two convert the first letter of the first word in each sentence to uppercase ("Title"-case or .title() in python) |
Hi @ottokart, |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
I am using punctuator2 library to train a model for Albanian Language which is part of Indo-European languages with latin-derived alphabet.
I use 206,000 articles from an Albanian magazine. So my corpus is large enough to train the model.
I have successfully trained the model and I am satisfied with the results. However, when I test the model for a random text, it converts all the single lower case " i-s " into upper case " I ". In Albanian language, a single " i " within a sentence represents a conjunction which should be written in lowercase. So this made me think that the model somehow is using something pre-trained or hardcoded from english language (which I am not aware of).
I checked the code (data.py, models.py and main.py) but I could not notice anything hardcoded for that matter, except the "We.pcl" file referenced in the code which does not exist on my path since I do not use it.
Do you have any suggestion or idea why is this happening?
The text was updated successfully, but these errors were encountered: