Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improve spell checker: use dict bundled in LT, drop jmyspell, and add morfologik #1036

Merged
merged 10 commits into from
Jun 2, 2024

Conversation

miurahr
Copy link
Member

@miurahr miurahr commented May 22, 2024

LanguageTool now uses Morfologik as a spell checker engine, and bundles morfologik dictionary.
This improvement support Morfologik spell checker in OmegaT module.

When you run OmegaT on a project that target language is en_AU then you will find you can check spells
with a LT internal dictionary en_AU.dict that is automatically appears in users spell dictionary dir.

Pull request type

  • Feature enhancement -> [enhancement]

Which ticket is resolved?

What does this PR change?

  • Add dependency for morfologik-speller
  • Drop dependency for jmyspell-core
  • Refactoring to produce hunspell-spellchecker and morfologik-spellchecker module

Other information

  • Unit tests

This comment was marked as outdated.

@miurahr miurahr force-pushed the topic/miurahr/spell-checker/morfologik-module branch from 13f031d to e46cc45 Compare May 22, 2024 23:54

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

Copy link

❌ Unit Tests, Quality checks, and Acceptance Tests failed.

Please look a Gradle Scan page for details:
https://gradle.com/s/lsjpenkyg443m

@miurahr miurahr added this to the 6.1.0 (Require Java 11) milestone May 23, 2024
@miurahr miurahr marked this pull request as ready for review May 24, 2024 01:14
@miurahr miurahr changed the title feat: Add morfologik spell checker module feat: Improve spell checker: use dict bundled in LT, drop jmyspell, and add morfologik May 24, 2024
@miurahr
Copy link
Member Author

miurahr commented May 24, 2024

There is unrelated coding style fixes in OConsts and several unwanted copyright header changes. We still have TODO to make Unit test for the features.

- Add morfologik spellcheck module
- Rename hunspell-jmyspell to hunspell module
- Drop jmyspell fallback
- ISpellChecker#initialize return boolean

Signed-off-by: Hiroshi Miura <[email protected]>
- Add SpellCheckDictionaryType
- OConsts: define morfologik dict file extension

Signed-off-by: Hiroshi Miura <[email protected]>
@miurahr miurahr force-pushed the topic/miurahr/spell-checker/morfologik-module branch from 4cf054e to 2295108 Compare May 26, 2024 05:36
@miurahr
Copy link
Member Author

miurahr commented May 26, 2024

Rebased on master and recreate commits for review.

Signed-off-by: Hiroshi Miura <[email protected]>
@miurahr miurahr force-pushed the topic/miurahr/spell-checker/morfologik-module branch from 6b28a64 to 4e1eb8c Compare May 26, 2024 05:42
Signed-off-by: Hiroshi Miura <[email protected]>
LanguageTool bundles Hunspell dictionary in some languages such as de_DE, too. This revert a feature to import LT bundled dictionary when exists.

Signed-off-by: Hiroshi Miura <[email protected]>
Morfologik checker use both *.dict and *.info

Signed-off-by: Hiroshi Miura <[email protected]>
@miurahr
Copy link
Member Author

miurahr commented May 26, 2024

Now unit tests are added, then fix several issues.

@miurahr
Copy link
Member Author

miurahr commented May 26, 2024

image

Configuration screen is as same as OmegaT 5.8/6.0

but It can handle both Hunspell and Morfologik dictionary; de_DE is Hunspell, en-GB is Morfologik

~/.omegat/spelling$ ls -l
-rw-rw-r-- 1 miurahr miurahr   19214  5月 26 16:23 de_DE.aff
-rw-rw-r-- 1 miurahr miurahr 4868438  5月 26 16:23 de_DE.dic
-rw-rw-r-- 1 miurahr miurahr  718340  5月 26 16:27 en_GB.dict
-rw-rw-r-- 1 miurahr miurahr     506  5月 26 16:27 en_GB.info

and en_GB files are automatically copied from LT.

@miurahr
Copy link
Member Author

miurahr commented May 26, 2024

Unit test cases passed

  • Hunspell to check spelling with pre-installd de-DE dictionary.
  • Hunspell to check spelling with Bundled fr-FR dictionary.
  • Hunspell to check spelling with LT bundled dictionary.
  • Morfologik to check spelling with pre-installed de-DE dictionary.
  • Morfologik to check spelling with LT bundled en-AU dictionary

This comment was marked as resolved.

@miurahr miurahr merged commit 20e43d1 into master Jun 2, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants