Incorporate jmnedict database #3

tshatrov · 2015-06-22T16:47:04Z

Lately the few placenames etc. that exist in jmdict are being moved to jmnedict. If this continues, ichi.moe won't be able to recognize stuff like Tokyo etc., which is unacceptable. We need to incorporate jmnedict names without messing up the segmenting algorithm. Kanji names should be top priority, katakana names are not important and can be ignored for now. They should score lower than regular words so as not to pollute the results.

buster-blue · 2020-10-20T23:52:16Z

Any updates on this? I don't know much about databases, but I feel like this wouldn't be too hard to do and it would make the parser much more useful, since it wouldn't just break whenever it came across proper nouns anymore. I'm just curious because the issue is still open, but it's from 5 years ago. If you've just been too busy, that's fine, or maybe it's harder to do than I thought.

tshatrov · 2020-10-21T08:38:30Z

I decided not to do this because it would likely degrade segmenting a lot. Proper nouns can't be consistently romanized anyway. I'll be adding things that can be romanized such as place names separately. For example I already added all municipalities that currently exist in Japan. I'll be looking for other databases that I can incorporate without breaking too much stuff. But regarding jmnedict integration by all means, pull requests are welcome.

tshatrov self-assigned this Jun 22, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorporate jmnedict database #3

Incorporate jmnedict database #3

tshatrov commented Jun 22, 2015

buster-blue commented Oct 20, 2020 •

edited

tshatrov commented Oct 21, 2020

Incorporate jmnedict database #3

Incorporate jmnedict database #3

Comments

tshatrov commented Jun 22, 2015

buster-blue commented Oct 20, 2020 • edited

tshatrov commented Oct 21, 2020

buster-blue commented Oct 20, 2020 •

edited