-
Notifications
You must be signed in to change notification settings - Fork 960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
org.apache.lucene.analysis.tests.TestRandomChains.testRandomChains test failure #13271
Comments
git bisect says: 6cba773 It isn't immediately obvious why to me. @original-brownbear what do you think? |
@benwtrent how long ago did you see this test failure? I can't reproduce it with the latest So I think ICU fixed the bug... i'll try to dig it up, I'm fairly certain we reported a bug related to this. |
Oh dang, I might not have fetched latest 🤦 Let me try again. |
I just reproduced it on So I think the problem is that it doesn't always reproduce: this likely caused your git-bisect confusion. But at least it isn't NEWLY introduced by the icu upgrade, it is a pre-existing condition... |
OK, I simply commented out @original-brownbear 's change and always returned the meta and the test passes with the seed and settings. When I return the cached instance, it fails. I noticed that the offsets are not |
Looking into this now. Maybe it's the fact that for the all zeros case we now always have a single block meta ... let me check |
So, if I switch to just return |
I don't understand what that change has to do with analysis chain... inconsistent offsets has to do with what TokenStream is doing not the index. Be sure, that you aren't getting confused by the fact this failure does not reproduce 100% of the time. as far as the ICU charfilter, but i added some prints so we can see what's happening:
So ultimately on this string, the ICU charfilter will only change one character (the arabic presentation form FB87 to 068E). it won't change the length of the string in UTF-16 nor impact any offsets:
But that charfilter tries to do this incrementally, so it could have some bugs based on how data is being "spoon-fed" to the charfilter (spoonfeeding is happening: that's the |
++ to @rmuir I can reproduce this after reverting my changes, this doesn't seem to be related. A failure rate of maybe ~20% for me just means it needs a couple iterations to show at times. |
OK, apologies for the noise, this test keeps failing weirdly for me. Git bisect has failed me :) |
i'll try to dig into it to at least find the offending component. If we can narrow it down to the problematic charfilter, tokenizer, or tokenfilter, we can make an easier-to-reproduce case. In the past I've done this by creating a manual test (think, its a custom analyzer of the exact components printed out) that consumes the exact string and added it to "TestBugInSomething", until I can whittle it down. Gonna need to move TestBugInSomething.java to the integration tests ( The fact that it only reproduces some of the time is also annoying and possibly a separate bug in the test of its own... |
This test is setup to reproduce complex failures from TestRandomChains, e.g. it has SopFilter and other tools for debugging. But it still resides in the analysis/common module and currently can't be used to debug any TestRandomChains failures that use other modules (e.g. icu). relates to #13271
This test is setup to reproduce complex failures from TestRandomChains, e.g. it has SopFilter and other tools for debugging. But it still resides in the analysis/common module and currently can't be used to debug any TestRandomChains failures that use other modules (e.g. icu). relates to #13271
Attached is a reproducer in TestBugInSomething that seems to work. It is ugly due to bugs in
|
I narrowed the fail down so far a bit to just this chain:
This is nice as it involves no charfilter at all so we know ICU isn't involved. |
I'll debug it some more... just need a break. Mainly I wanted to make sure I didn't introduce this with the ICU upgrade... the two shinglefilters are suspect. |
Description
Haven't finished bisection to figure out the origin of the bug.
Gradle command to reproduce
The text was updated successfully, but these errors were encountered: