Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnifiedHighlighter: DefaultPassageFormatter causes IndexOutOfBoundsException w/ setStoreTermVectorOffsets unless setStoreTermVectorPositions #12431

Open
hossman opened this issue Jul 10, 2023 · 0 comments · May be fixed by #13315
Labels

Comments

@hossman
Copy link
Member

hossman commented Jul 10, 2023

Description

Summary of mailing list thread...

https://lists.apache.org/[email protected]

  • Using UnifiedHighlighter w/ DefaultPassageFormatter
  • Highlighting fields that use setStoreTermVectors(true) + setStoreTermVectorOffsets(true)
    • but do NOT use setStoreTermVectorPositions(true)
  • IndexOutOfBoundsException can occur in DefaultPassageFormatter -> StringBuilder when query and field includes multiple terms
  • Problem caused by TermVectorOffsetStrategy producing Passage instances where matches are not in order (by start offset)
    • Not clear from Passage API if this is allowed
    • DefaultPassageFormatter does not expect this (only the possibility that end/start will overlap)
  • Problem started happening "by default" in 9.0 due to LUCENE-9431
  • Known workarounds:
    • Index Time Option: Add setStoreTermVectorPositions(true) to fields you wish to highlight that already use setStoreTermVectors(true)
    • Alternative Query Time Option: Subclass UnifiedHighlighter to override getFlags(String) and remove HighlightFlag.WEIGHT_MATCHES from the set returned by super.getFlags(field)

Test patch demonstrating problem in above linked mailing list thread

Version and environment details

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant