-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporate jmnedict database #3
Comments
Any updates on this? I don't know much about databases, but I feel like this wouldn't be too hard to do and it would make the parser much more useful, since it wouldn't just break whenever it came across proper nouns anymore. I'm just curious because the issue is still open, but it's from 5 years ago. If you've just been too busy, that's fine, or maybe it's harder to do than I thought. |
I decided not to do this because it would likely degrade segmenting a lot. Proper nouns can't be consistently romanized anyway. I'll be adding things that can be romanized such as place names separately. For example I already added all municipalities that currently exist in Japan. I'll be looking for other databases that I can incorporate without breaking too much stuff. But regarding jmnedict integration by all means, pull requests are welcome. |
Lately the few placenames etc. that exist in jmdict are being moved to jmnedict. If this continues, ichi.moe won't be able to recognize stuff like Tokyo etc., which is unacceptable. We need to incorporate jmnedict names without messing up the segmenting algorithm. Kanji names should be top priority, katakana names are not important and can be ignored for now. They should score lower than regular words so as not to pollute the results.
The text was updated successfully, but these errors were encountered: