Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGS#1236] fix: make CalcMatchStatistics ignores threshold #871

Merged
merged 7 commits into from
Dec 28, 2023

Conversation

miurahr
Copy link
Member

@miurahr miurahr commented Dec 22, 2023

Pull request type

  • Bug fix -> [bug]

Which ticket is resolved?

What does this PR change?

  • Add a reproduce case of the bug.
  • Add ignoreThreshold constructor boolean for FindMatches class.

Other information

@miurahr miurahr added the bug label Dec 22, 2023
Copy link

❌ Run Gradle test failed:

1 similar comment
Copy link

❌ Run Gradle test failed:

@miurahr miurahr force-pushed the topic/miurahr/statistics/calc-match-stat-test branch from 29fe93f to afaa222 Compare December 22, 2023 04:23
@miurahr
Copy link
Member Author

miurahr commented Dec 22, 2023

test file test/data/tmx/test-match-stat-en-ca.tmx seems too big.

Copy link

❌ Run Gradle test failed:

@miurahr miurahr force-pushed the topic/miurahr/statistics/calc-match-stat-test branch from afaa222 to d5d0fab Compare December 22, 2023 04:27
@miurahr miurahr marked this pull request as ready for review December 22, 2023 04:27
@miurahr miurahr changed the title feat: add unit test for ClacMatchStatistics class [BUGS#1236] fix: make CalcMatchStatistics ignores threshold Dec 22, 2023
@miurahr miurahr force-pushed the topic/miurahr/statistics/calc-match-stat-test branch from d5d0fab to 31b1caf Compare December 22, 2023 04:29
- When changing threshold, the stats result is changed.

Signed-off-by: Hiroshi Miura <[email protected]>
@miurahr miurahr force-pushed the topic/miurahr/statistics/calc-match-stat-test branch from 31b1caf to 9564e09 Compare December 22, 2023 04:30
Copy link

❌ Run Gradle test failed:

Signed-off-by: Hiroshi Miura <[email protected]>
@omegat-org omegat-org deleted a comment from github-actions bot Dec 22, 2023
@omegat-org omegat-org deleted a comment from github-actions bot Dec 22, 2023
@miurahr
Copy link
Member Author

miurahr commented Dec 22, 2023

Reviewer will see the commit 4bf1233 and may understand that the test case expects results in threshold is 70% as same as one when threashold is 30%. Also you will see the commit 9564e09 that add a field ignoreThreshold and skip a detection of threshold when specified.

@t-cordonnier
Copy link
Contributor

t-cordonnier commented Dec 22, 2023

If the goal is "not change statistics" the implementation is wrong.
If I look at Briac's commit https://sourceforge.net/p/omegat/code/ci/8ccaf8b1c5914fdab48d64b91ed67e020916f279/
I understand that

  • before the change, threshold was 30% everywhere including statistics
  • after the change threshold can be configured, and indeed it also affects statistics

So, if the goal is to be conservative, the correct implementation is not to remove threshold but to apply threshold = 30 in statistics even if changed in the UI, while keeping configurable threshold for matches pane

Personally I would prefer that threshold also applies on statistics: if statistics considers that a segment is 70% match but it does not appear in the matches pane, how could I say to the translator that I will not pay for this segment?
But I know that OmegaT team seems to be conservative. So if the goal is to reproduce old behaviour, then you must implement not "ignore threshold" but "apply threshold = 30%"

@brandelune
Copy link

If the goal is "not change statistics" the implementation is wrong. If I look at Briac's commit https://sourceforge.net/p/omegat/code/ci/8ccaf8b1c5914fdab48d64b91ed67e020916f279/ I understand that

  • before the change, threshold was 30% everywhere including statistics
  • after the change threshold can be configured, and indeed it also affects statistics

So, if the goal is to be conservative, the correct implementation is not to remove threshold but to apply threshold = 30 in statistics even if changed in the UI, while keeping configurable threshold for matches pane

Personally I would prefer that threshold also applies on statistics: if statistics considers that a segment is 70% match but it does not appear in the matches pane, how could I say to the translator that I will not pay for this segment? But I know that OmegaT team seems to be conservative. So if the goal is to reproduce old behaviour, then you must implement not "ignore threshold" but "apply threshold = 30%"

The fuzzy match threshold has nothing to do with the statistics. It here only to set which matches are relevant in the match pane.

@miurahr
Copy link
Member Author

miurahr commented Dec 22, 2023

Personally I would prefer that threshold also applies on statistics: if statistics considers that a segment is 70% match but it does not appear in the matches pane, how could I say to the translator that I will not pay for this segment? But I know that OmegaT team seems to be conservative. So if the goal is to reproduce old behaviour, then you must implement not "ignore threshold" but "apply threshold = 30%"

Thanks for giving me a good point.
I think we can report stats about a range above threshold. It is not "no-match" but "below-threshold" .
When threshold is 70%, a table will be

  • "100%"
  • "> 95%"
  • "> 85%"
  • "> 75%"
  • "> 70%"
  • "<= 70%" = no match

When it is default 30%, a table will be

  • "100%"
  • ">95%"
  • ">85%"
  • ">75%"
  • ">30%"
  • "<=30%" = no match

@brandelune
Copy link

Personally I would prefer that threshold also applies on statistics: if statistics considers that a segment is 70% match but it does not appear in the matches pane, how could I say to the translator that I will not pay for this segment? But I know that OmegaT team seems to be conservative. So if the goal is to reproduce old behaviour, then you must implement not "ignore threshold" but "apply threshold = 30%"

Thanks for giving me a good point. I think we can report stats about a range above threshold. It is not "no-match" but "below-threshold" . When threshold is 70%, a table will be

  • "100%"
  • "> 95%"
  • "> 85%"
  • "> 75%"
  • "> 70%"
  • "<= 70%" = no match

When it is default 30%, a table will be

  • "100%"
  • ">95%"
  • ">85%"
  • ">75%"
  • ">30%"
  • "<=30%" = no match

Do not make any change to the categories.

The threshold is only a convenience setting for the user to display only some matches above a given setting. It has NOTHING to do withe the statistics.

@miurahr
Copy link
Member Author

miurahr commented Dec 22, 2023

Current logic to give category is to compare with fixed values.

No match is not means "<=30%" but "<50%", see MatchStatCounts.java

    private int getRowByPercent(int percent) {
        if (percent == Statistics.PERCENT_EXACT_MATCH) {
            // exact match
            return BASE_FOR_PERCENTS;
        } else if (percent >= 95) {
            return BASE_FOR_PERCENTS + 1;
        } else if (percent >= 85) {
            return BASE_FOR_PERCENTS + 2;
        } else if (percent >= 75) {
            return BASE_FOR_PERCENTS + 3;
        } else if (percent >= 50) {
            return BASE_FOR_PERCENTS + 4;
        } else {
            return BASE_FOR_PERCENTS + 5;
        }
    }

revert variable meaning from ignoreThreshold to applyThreshold

Signed-off-by: Hiroshi Miura <[email protected]>
@miurahr
Copy link
Member Author

miurahr commented Dec 22, 2023

Change variable name "ignoreThreshold" to "applyThreshold"

Copy link

❌ Run Gradle test failed:

@brandelune
Copy link

Current logic to give category is to compare with fixed values.

No match is not means "<=30%" but "<50%", see MatchStatCounts.java

    private int getRowByPercent(int percent) {
        if (percent == Statistics.PERCENT_EXACT_MATCH) {
            // exact match
            return BASE_FOR_PERCENTS;
        } else if (percent >= 95) {
            return BASE_FOR_PERCENTS + 1;
        } else if (percent >= 85) {
            return BASE_FOR_PERCENTS + 2;
        } else if (percent >= 75) {
            return BASE_FOR_PERCENTS + 3;
        } else if (percent >= 50) {
            return BASE_FOR_PERCENTS + 4;
        } else {
            return BASE_FOR_PERCENTS + 5;
        }
    }

Yes. That's what I wrote. The threshold is a display threshold. It has absolutely nothing to do with the stats.

Copy link

❌ Run Gradle test failed:

@t-cordonnier
Copy link
Contributor

No match is not means "<=30%" but "<50%", see MatchStatCounts.java

Yes but the problem is that before the call to getRowByPercent a match which is < 30 % would already have been eliminated by the method FindMatch.haveChanceToAdd
Reading all your messages I realize that the various lines in matches statistics can be considered as thresholds, and I had forgotten that there was also a "no match" line. Now I understand that the old code (before 5.2) had a bug - matches < 30% would be totally ignored, not even considered as "no match".
After that check I agree with Hiroshi's solution.

@miurahr
Copy link
Member Author

miurahr commented Dec 23, 2023

@brandelune could you confirm the fix behave as same as what you claimed?

@brandelune
Copy link

Ok. I'm not seeing any difference in the stats between a 30% threshold and a 70% threshold on a relatively complex project with 50 000 words in 8000 segments. And the stats display is properly updated once the setting is modified. Thank you Hiroshi.

@miurahr
Copy link
Member Author

miurahr commented Dec 28, 2023

Ok. I'm not seeing any difference in the stats between a 30% threshold and a 70% threshold on a relatively complex project with 50 000 words in 8000 segments.

Good news. It has been already tested in unit test, but thank you for double check.

@miurahr miurahr merged commit 27e7355 into master Dec 28, 2023
9 checks passed
@miurahr miurahr deleted the topic/miurahr/statistics/calc-match-stat-test branch December 28, 2023 05:06
miurahr added a commit that referenced this pull request Jan 10, 2024
* feat: add unit test for ClacMatchStatistics class

Signed-off-by: Hiroshi Miura <[email protected]>

* [BUGS#1236] reproduce bug

- When changing threshold, the stats result is changed.

Signed-off-by: Hiroshi Miura <[email protected]>

* [BUGS#1236] fix: ignore threshold when show statistics

Signed-off-by: Hiroshi Miura <[email protected]>

* style: apply spotless

Signed-off-by: Hiroshi Miura <[email protected]>

* refactor: variable applyThreshold

revert variable meaning from ignoreThreshold to applyThreshold

Signed-off-by: Hiroshi Miura <[email protected]>

* chore: suppress checkstyle error in FindMatches class

Signed-off-by: Hiroshi Miura <[email protected]>

* refactor: revert ignore to apply

Signed-off-by: Hiroshi Miura <[email protected]>

---------

Signed-off-by: Hiroshi Miura <[email protected]>

(cherry picked from commit 27e7355)
Signed-off-by: Hiroshi Miura <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants