Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for parsing brackets in range queries #13323

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

benchaplin
Copy link

Description

I've added some unit tests to protect/display the QueryParser's ability to parse brackets in range query terms when in quotes.

This issue raised a question about the QueryParser's ability to handle brackets in a range query's terms, even when escaped, for example:

queryParser.parse( "[ a\\[i\\] TO b\\[i\\] ]" );

/* 
org.apache.lucene.queryparser.classic.ParseException: Cannot parse '[a\[i\] TO b\[i\]]': Encountered " "]" "] "" at line 1, column 6.
Was expecting:
    "TO" ...
 */

You can get around this exception by wrapping each range query term in quotes:

queryParser.parse( "[\"a[i]\" TO \"b[i]\"]" );

This is because of this JavaCC grammar rule. <RANGE_GOOP> cannot handle a closing bracket, even an escaped one, while <RANGE_QUOTED> can.

I've added some unit tests to protect this functionality.


Any thoughts on whether queryParser.parse( "[ a\\[i\\] TO b\\[i\\] ]" ); should work? I did play around with updating the grammar, it should be possible but I need to think more about how it might break backwards compatibility.

@dweiss
Copy link
Contributor

dweiss commented Apr 25, 2024

Any thoughts on whether queryParser.parse( "[ a\[i\] TO b\[i\] ]" ); should work? I did play around with updating the grammar, it should be possible but I need to think more about how it might break backwards compatibility.

Looking at the javacc file it seems range terms are parsed differently compared to normal tokens. This is strange but I bet there was a reason for it - now long forgotten. I think, logically, it should accept the same term notation that is used elsewhere. It will break backward compatibility but maybe consistency is worth this cost (for the major release)?

@benchaplin
Copy link
Author

I was thinking a fix like this could work:

| <RANGE_GOOP:   (~[ " ", "]", "}" ] | "\\ " | "\\]" | "\\}" )+ >

simply allowing the range term to continue parsing through an escaped space or closing bracket.

"Breaking" changes:

  • This would permit some queries that previously threw exceptions (like the one described in the original issue)
  • I'm trying to think of a query that was permitted and now would break (all existing unit tests pass). My first thought was a range query ending with an escaped bracket, however those are disallowed by the Term can not end with escape character rule. Any thoughts?

@dweiss
Copy link
Contributor

dweiss commented Apr 30, 2024

The joys of escaping... Never easy (hello, Windows command-line users...). You may want to add a randomized test that constructs those terms using a mix of characters and "allowed" escape sequences - a corner case will pop up quickly, I think.

Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label May 15, 2024
@benchaplin
Copy link
Author

Thanks for the advice @dweiss, I will play around with some test cases for this fix. In the meantime, I think we can merge this and I will open another PR with the grammar changes if I can get something working.

@github-actions github-actions bot removed the Stale label May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants