-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Last segment of Thai script is always marked as not word-like #4446
Comments
Hi there, do I need to be assigned the issue or can I start working on this? |
Consider the last part of the Thai script as a separate character. |
The bug is likely in the interface between the (rule-based) break iterator and the LSTM. I think anyone can open a pull request to add a test case and fix the bug. |
Fixes #4446. If EOT after SA, we should mark as Letter (SA).
The last segment of the following strings is always marked as not word-like:
Whereas ICU4C marks the last segment of all four strings as word-like.
CC: @aethanyc and @makotokato
The text was updated successfully, but these errors were encountered: