Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some new rules to en-GB grammar.xml. #10596

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Tang567
Copy link

@Tang567 Tang567 commented May 15, 2024

No description provided.

Copy link
Collaborator

@evan-defran-lt evan-defran-lt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Tang567, thanks for contributing to LT!

I've reviewed your PR, and I've left some comments for you. Some things will need to be amended before this PR could be approved.

Some other things to note:

  • none of these rules are specific to en-GB, so these rules would have more impact in the general EN file: languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml.
  • please make sure all new rules have the attribute default="temp_off". This ensures that rules do not go into production before we've had the chance to test them fully.
  • before committing, it's good practice to run a Maven test in your IDE terminal to ensure that there are no errors that would break the build. Here's the command for English: mvn --projects languagetool-language-modules/en --also-make clean test.
  • I'm not sure if you already know this, but you can also test your rules in LT's "expert" RuleEditor here: https://community.languagetool.org/ruleEditor/expert?devMode=true. It will run your rule over a large corpus of sentences, allowing you to see how your rule performs, which means you can easily identify TPs and FPs. It will also flag some errors that would break the build. It's not perfect – you cannot test rule groups with it, for instance – but it is a helpful resource to start!

Thank you again for contributing to LT. If you have any questions, please let me know.
Evan

</category>
</category>

<rule id="YOUR_YOURE" name="Your/You're Usage">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LT already has rules for {your=>you're}. Additionally, as currently written, this rule will change every instance of your to you're, which will result in a large number of False Positives.
If you want to refine this rule, you should find instances where LT does not successfully correct {your=>you're} (i.e., False Negatives) and refine our already-existing rules for those cases.

<example correction="you're">I think <marker>your</marker> going to love it.</example>
</rule>

<rule id="WORD_COLLOCATION_1" name="Inappropriate Word Collocation">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would probably want a more descriptive id for this rule than just WORD_COLLOCATION_1. Additionally, this wouldn't really be a collocation issue (the classic example of a collocation error would be powerful coffee instead of strong coffee); it's more along the lines of an adverb/adjective morphology error.

<token>believe</token>
</pattern>
<message>Consider using <suggestion>strongly believe</suggestion> instead of 'strong believe'. "Strongly" is the adverb form of "strong" and is used to modify verbs like "believe."</message>
<example correction="strongly believe">I strong believe in your abilities.</example>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to include <marker>...</marker> here.


<rule id="WORD_COLLOCATION_1" name="Inappropriate Word Collocation">
<pattern>
<token>strong</token>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make this rule more robust so that it covers all (or at least more) instances where an adjective is incorrectly being used to modify a verb? Right now, this rule would apply only to the bi-gram strong believe.

<example correction="strongly believe">I strong believe in your abilities.</example>
</rule>

<rule id="WORD_COLLOCATION_3" name="Inappropriate Word Collocation">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above. We'd want a more descriptive id, and this isn't really a collocation error.

<token>fast</token>
<token>meal</token>
</pattern>
<message>Consider using <suggesiton>fast food</suggestion> instead of 'fast meal'. "Fast food" is a specific type of food that is prepared and served quickly, whereas "fast meal" is not commonly used.</message>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo here: should be <suggestion>, not <suggesiton>.

<token postag="VBP">run</token>
</pattern>
<message>The verb "run" does not agree with the singular subject "dog". Consider using <suggestion>runs</suggestion> to match the singular subject.</message>
<example correction="runs">The dog <marker>run</marker> in the park.</example>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The <marker>...</marker> tags in all the examples in this rule are not placed correctly.

<example correction="fast food">Let's grab some fast meal.</example>
</rule>

<rule id="Singular_Subject_Verb_Agreement" name="Singular Subject-Verb Agreement">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LT already has rules for subject verb agreement. Additionally, as currently written, this rule will result in some false positives, like in the sentence, He let his dog run free in the field.
If you want to refine this rule, you should find instances where LT does not successfully correct the issue (i.e., False Negatives) and refine our already-existing rules for those cases.

<rule id="Singular_Subject_Verb_Agreement" name="Singular Subject-Verb Agreement">
<pattern>
<token postag="DT" min="0" max="1"/>
<token postag="NN">dog</token>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule will apply only when the subject is dog and the verb is run. Could you think of ways to make it more robust? (Just flagging this for you. As noted above, LT already has rules for subject-verb agreement issues, so it would be preferable to refine those rules than create new ones from scratch.)

@@ -31,6 +31,7 @@
</developer>
</developers>
<modules>
<module>languagetool-client-example</module>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you let me know why you added this here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants