Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The model trained for a non-english language is converting the single lower case '' i " into upper case " I " #61

Open
preniqivjosa opened this issue Jul 17, 2020 · 2 comments

Comments

@preniqivjosa
Copy link

preniqivjosa commented Jul 17, 2020

Hi,
I am using punctuator2 library to train a model for Albanian Language which is part of Indo-European languages with latin-derived alphabet.

I use 206,000 articles from an Albanian magazine. So my corpus is large enough to train the model.
I have successfully trained the model and I am satisfied with the results. However, when I test the model for a random text, it converts all the single lower case " i-s " into upper case " I ". In Albanian language, a single " i " within a sentence represents a conjunction which should be written in lowercase. So this made me think that the model somehow is using something pre-trained or hardcoded from english language (which I am not aware of).

I checked the code (data.py, models.py and main.py) but I could not notice anything hardcoded for that matter, except the "We.pcl" file referenced in the code which does not exist on my path since I do not use it.
Do you have any suggestion or idea why is this happening?

@preniqivjosa preniqivjosa changed the title The model trained for non-english language is converting lower case '' i " into upper case " I " The model trained for a non-english language is converting lower case '' i " into upper case " I " Jul 17, 2020
@preniqivjosa preniqivjosa changed the title The model trained for a non-english language is converting lower case '' i " into upper case " I " The model trained for a non-english language is converting the single lower case '' i " into upper case " I " Jul 17, 2020
@ottokart
Copy link
Owner

Hi,

are you using convert_to_readable.py or demo_play_with_model.py scripts? These two convert the first letter of the first word in each sentence to uppercase ("Title"-case or .title() in python)

@preniqivjosa
Copy link
Author

Hi @ottokart,
Thank you for the reply!
I was using a different script created for testing the model, but the problem is solved when using demo_play_with_model.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants