Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout support to AbstractKnnVectorQuery #13202

Merged
merged 6 commits into from
Apr 3, 2024

Conversation

kaivalnp
Copy link
Contributor

Description

#927 added timeout support to IndexSearcher#search where we wrap a scorer in a TimeLimitingBulkScorer which periodically checks whether a query timed out while collecting docs in fixed chunks

However as #11677 points out, this does not apply to query rewrites -- an important use case being vector searches (AbstractKnnVectorQuery), where the bulk of computations (HNSW graph searches) happen in #rewrite

If we had set a timeout, and HNSW searches turned out too expensive -- we would not return results even after the search completes (because the timeout is checked before attempting to score docs -- at which point we have all possible results, but they aren't returned)

In this PR, we're wrapping the KnnCollector in another one which additionally checks for QueryTimeout#shouldExit in the KnnCollector#earlyTerminated function. If this timeout is hit, we immediately return empty results

Also extended this to exact searches, where we check for a timeout before scoring each document

@kaivalnp
Copy link
Contributor Author

kaivalnp commented Mar 22, 2024

Benchmarks

Used KnnGraphTester, applied a timeout by calling searcher.setTimeout before this line. Rows (in order) are: recall, latency per query in ms, number of docs, fanout, maxConn, beamWidth, number of nodes visited, reindex time in ms, selectivity, pre/post-filter (topK = 100, numQueries = 10K)

baseline with null timeout

0.965	 1.10	1000000	50	16	100	1675	0	1.00	post-filter
0.992	 2.82	1000000	400	16	100	4086	0	1.00	post-filter
0.974	 1.21	1000000	50	16	200	1853	0	1.00	post-filter
0.995	 3.10	1000000	400	16	200	4504	0	1.00	post-filter

baseline with Long.MAX_VALUE timeout

0.965	 1.10	1000000	50	16	100	1675	0	1.00	post-filter
0.992	 2.82	1000000	400	16	100	4086	0	1.00	post-filter
0.974	 1.21	1000000	50	16	200	1853	0	1.00	post-filter
0.995	 3.09	1000000	400	16	200	4504	0	1.00	post-filter

baseline with 1 ms timeout

0.483	 1.08	1000000	50	16	100	1675	0	1.00	post-filter
0.005	 2.78	1000000	400	16	100	4086	0	1.00	post-filter
0.418	 1.18	1000000	50	16	200	1853	0	1.00	post-filter
0.000	 3.05	1000000	400	16	200	4504	0	1.00	post-filter

candidate with null timeout

0.965	 1.10	1000000	50	16	100	1675	0	1.00	post-filter
0.992	 2.83	1000000	400	16	100	4086	0	1.00	post-filter
0.974	 1.21	1000000	50	16	200	1853	0	1.00	post-filter
0.995	 3.10	1000000	400	16	200	4504	0	1.00	post-filter

candidate with Long.MAX_VALUE timeout

0.965	 1.17	1000000	50	16	100	1675	0	1.00	post-filter
0.992	 2.99	1000000	400	16	100	4086	0	1.00	post-filter
0.974	 1.28	1000000	50	16	200	1853	0	1.00	post-filter
0.995	 3.28	1000000	400	16	200	4504	0	1.00	post-filter

candidate with 1 ms timeout

0.445	 0.89	1000000	50	16	100	476	0	1.00	post-filter
0.001	 1.01	1000000	400	16	100	12	0	1.00	post-filter
0.374	 0.92	1000000	50	16	200	425	0	1.00	post-filter
0.000	 1.01	1000000	400	16	200	3	0	1.00	post-filter

Some observations:

  • There is almost no regression when a timeout is not set (= null value)
  • There is some regression in queries where a timeout is set but not met (~6%) because of additional time checks on every #earlyTerminated call
  • There are large gains in queries where a timeout is set and met

@vigyasharma
Copy link
Contributor

If we had set a timeout, and HNSW searches turned out too expensive -- we would not return results even after the search completes (because the timeout is checked before attempting to score docs -- at which point we have all possible results, but they aren't returned)

Is this because we have already exhausted our query timeout during rewrite, and so we throw a TimeExceededException and bail before scoring docs for even the first chunk?

@kaivalnp
Copy link
Contributor Author

Exactly! This issue is also visible from the above benchmarks, where the baseline (having no early termination in #rewrite) performs the entire graph search (same number of nodes visited, similar latency) even with a 1 ms timeout, but the results are eventually discarded -- demonstrated by the low recall compared to high or no timeout

Copy link
Contributor

@vigyasharma vigyasharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting find on KNN search behavior, @kaivalnp! I added some thoughts after a first pass at the PR.

Apart from those, one divergent behavior is that we won't be raising some form of TimeExceededException when we timeout and terminate the search. Wonder if we should have a way to surface that this was a timeout, v/s all the results we could collect within visitLimit() node traversal.

Comment on lines 83 to 86
if (queryTimeout.shouldExit()) {
// Quickly exit
return TopDocsCollector.EMPTY_TOPDOCS;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most KnnCollector implementations (like this one) tend to return whatever results we've collected so far, and set the relation as TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO if we early terminated. Any reason why we don't want to keep the same behavior here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should return whatever docs we were able to collect within the time budget. We have already done the heavy work of approximate search for those hits, what's left is only to return those cached results from the DocAndScoreQuery.

It's not the behavior today, as TimeLimitingBulkScorer sees the query as timed out, and doesn't even run the first interval on DocAndScoreQuery. I need to think more of a good way around it. I'm thinking that if scorer is doing the heavy work of iterating through ImpactsEnum or PostingsEnum, then it makes sense to early terminate during scoring. But for queries with cached results like DocAndScoreQuery, maybe we shouldn't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning partial results makes sense, but I passed the empty results here purely because they would be discarded later (so we could save on computations used for creating the TopDocs)

TimeLimitingBulkScorer sees the query as timed out, and doesn't even run the first interval
But for queries with cached results like DocAndScoreQuery, maybe we shouldn't?

I don't think returning partial results would add value today, can we add a TODO to return them once the above behavior is fixed?

Most KnnCollector implementations (like this one) tend to return whatever results we've collected so far, and set the relation as TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO if we early terminated

On a separate note, this seems to be a no-op as well -- since we return fresh results from exact search if approximate results are incomplete. I wonder if we can return empty results here as well to save on some computations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think returning partial results would add value today, can we add a TODO to return them once the above behavior is fixed?

Yes, we can handle it in a different change. I'll start a discussion on a separate issue for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a separate note, this seems to be a no-op as well -- since we return fresh results from exact search if approximate results are incomplete.

I think we only fall back to exact search if there is a filter provided, i.e. filterWeight != null (code). Otherwise we return the approx search result and callers can use TotalHits.Relation to figure out if search terminated early.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think topDocs() from the collector should return whatever it collected, and set the relation to GTE. With EMPTY_TOPDOCS, the query gets rewritten as MatchNoDocsQuery which is misleads debugging and further masks the timeout occurrence. We can let the results get discarded by TimeLimitingBulkScorer for now, while we handle that in a separate change (since it is at par with existing behavior).

I'm open to more opinions on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

callers can use TotalHits.Relation to figure out if search terminated early.

This information is internal to AbstractKnnVectorQuery -- even if we pass some TotalHits.Relation, the rewritten DocAndScoreQuery treats all results as "complete" (no distinction for the relation here, and does not pass it on to IndexSearcher#search)

Though I agree, we could pass those results right now and handle them properly in a follow up issue. I've added a commit to do this

return new KnnCollector() {
@Override
public boolean earlyTerminated() {
return queryTimeout.shouldExit() || collector.earlyTerminated();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so both searchLevel() and findBestEntryPoint() in HnswGraphSearcher check earlyTerminated() on their collector to abort the search, so configuring the timeout check allows us to preempt the approximate search.

@vigyasharma
Copy link
Contributor

baseline with null timeout

0.965	 1.10	1000000	50	16	100	1675	0	1.00	post-filter
0.992	 2.82	1000000	400	16	100	4086	0	1.00	post-filter
0.974	 1.21	1000000	50	16	200	1853	0	1.00	post-filter
0.995	 3.10	1000000	400	16	200	4504	0	1.00	post-filter

Could we add headers for each column with these benchmark results please? Thanks!

@vigyasharma
Copy link
Contributor

Separately, should we deprecate TimeLimitingCollector ? It doesn't use QueryTimeout and I don't think we're using it anywhere.

@kaivalnp
Copy link
Contributor Author

Thanks for the review @vigyasharma!

Apart from those, one divergent behavior is that we won't be raising some form of TimeExceededException when we timeout and terminate the search. Wonder if we should have a way to surface that this was a timeout, v/s all the results we could collect within visitLimit() node traversal.

I think this exception will be raised here automatically, because the timeout has already occurred? Though we won't know / raise this explicitly because graph search timed out, but the end goal of marking results as "incomplete" will still be satisfied?

I'm also not sure how we'd do it explicitly, perhaps we require lower-level API changes to allow rewrites to time out more gracefully?

Could we add headers for each column with these benchmark results please?

Oops, I missed adding the benchmark description and row headers earlier, updated now..

Separately, should we deprecate TimeLimitingCollector ? It doesn't use QueryTimeout and I don't think we're using it anywhere.

+1, I only see occurrences of it in comments. I'll try to create a PR for this soon

@kaivalnp
Copy link
Contributor Author

Separately, should we deprecate TimeLimitingCollector ? It doesn't use QueryTimeout and I don't think we're using it anywhere.

Created #13220 to discuss this

Comment on lines +85 to +91
// Mark results as partial if timeout is met
TotalHits.Relation relation =
queryTimeout.shouldExit()
? TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO
: docs.totalHits.relation;

return new TopDocs(new TotalHits(docs.totalHits.value, relation), docs.scoreDocs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simply return collector.topDocs(); and let collectors decide how to handle this? All implementations of AbstractKnnCollector already set relation to GTE based on earlyTerminated() check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the collector.topDocs() will set the relation to GTE in case of timeouts (it will only set this when the visitLimit() is crossed, because it will use its own internal earlyTerminated() function that does not have information about the QueryTimeout)

@@ -102,14 +103,34 @@ public void testVectorEncodingMismatch() throws IOException {
}
}

public void testTimeout() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also add a test for the partial result case. You could create a mock query timeout that returns false only for the first time it's called, and then flips to returning true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

There was a consideration here that the number of levels in the HNSW graph should be == 1, because if the timeout is hit while finding the best entry point for the last level, we haven't collected any results yet. I think this should be fine as we're only indexing 3 vectors, and running these tests for a few thousand times did not give an error. Added a note about this as well

Also fixed another place where the timeout needs to be checked

- Refactor tests to base classes
- Also timeout exact searches in Lucene99HnswVectorsReader
Comment on lines 107 to +112
while (vectorScorer.nextParent() != DocIdSetIterator.NO_MORE_DOCS) {
// Mark results as partial if timeout is met
if (queryTimeout != null && queryTimeout.shouldExit()) {
relation = TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO;
break;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wanted some opinions here: we're checking the timeout in exact search of DiversifyingChildren[Byte|Float]KnnVectorQuery once per-parent (as opposed to once per document elsewhere)

Should we update this to once per-child as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeouts are inevitably approximate, so I guess it should be fine. Also seems like something that we can easily change in a follow up PR is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thanks!

@benwtrent
Copy link
Member

Looking at the benchmarking, we are adding a 5% overhead to all vector operations when using float32. As vector operations get faster (consider hamming distance with exploring more vectors), I expect this timeout mechanism to have an even bigger impact on search.

What was the dimension count & dataset for your performance testing @kaivalnp ?

Do we really think that timing out a couple of seconds faster is worth the overhead?

@kaivalnp
Copy link
Contributor Author

What was the dimension count & dataset for your performance testing

It was the enwiki dataset (https://home.apache.org/~sokolov/enwiki-20120502-lines-1k-100d.vec), unit vectors, 100 dimensions, indexed first 1M vectors, searched next 10K vectors, with DOT_PRODUCT similarity function

Looking at the benchmarking, we are adding a 5% overhead to all vector operations when using float32

This overhead is only when a timeout is set but not met, there is no regression when it is not set at all (in the benchmarked case without any filter). IMO this additional check can be useful to time out unexpectedly expensive queries in a production system to keep overall CPU within limits, but looking forward to more opinions here..

@benwtrent
Copy link
Member

unit vectors, 100 dimensions, indexed first 1M vectors,

@kaivalnp AH, ok, only 100 dimensions, this explains the overly large impact that checking for timeout had, as there is very little work being done on distance calculations. I would expect the percentage of compute to drop as the amount of work done by vector comparisons to increase.

I have a dumb question about timeouts. I am guessing the QueryTimeout object has to do some global syncing to determine the current runtime to check. Is that done on every single vector operation? If so, could we make sure its only done every N operations?

Timeouts are already approximate, I understand the need for having a more strict timeout, but I am not sure we should check for timeout on every single vector.

@kaivalnp
Copy link
Contributor Author

I am guessing the QueryTimeout object has to do some global syncing to determine the current runtime to check

I have used the default QueryTimeoutImpl class, which makes System.nanoTime() calls to check for a timeout at every shouldExit() call

but I am not sure we should check for timeout on every single vector.

Perhaps this can be configured by the end-user themselves, by making actual timeout checks after every TK number of calls, according to acceptable latency / accuracy tradeoffs?

For example, I modified the shouldExit() function here from:

return timeoutAt != null && nanoTime() - timeoutAt > 0;

to

// count is an integer initialized to 0
return timeoutAt != null && ++count % 100 == 0 && nanoTime() - timeoutAt > 0;

to check the timeout every 100 vectors, and the latency overhead with a Long.MAX_VALUE timeout got cut in half. Correspondingly, the latency with a 1 ms timeout increased because we check the timeout less frequently, but all of this is expected

@vigyasharma
Copy link
Contributor

Perhaps this can be configured by the end-user themselves, by making actual timeout checks after every TK number of calls, according to acceptable latency / accuracy tradeoffs?

TimeLimitingBulkScorer already optimizes for timeout check frequency outside of QueryTimeout impl, by adjusting the interval passed to bulk scorer. I would rather we keep a compatible/similar vector timeout behavior by implementing this in the collector manager, so that users can use the same timeout impl in both places.

That being said, I'm okay if we do it in a separate change.

@vigyasharma
Copy link
Contributor

I think this PR's changes are okay to be merged. @kaivalnp: can you please resolve the version conflicts and add a changes entry, and I can merge it in.

@kaivalnp
Copy link
Contributor Author

kaivalnp commented Apr 2, 2024

TimeLimitingBulkScorer already optimizes for timeout check frequency outside of QueryTimeout impl

Ahh nice catch! You mean something like:

// counter is an integer initialized to 0
return (counter++ == 100 && queryTimeout.shouldExit()) || collector.earlyTerminated();

in the TimeLimitingKnnCollectorManager itself?

This makes sense to me, we can add this in a follow-up and even make the interval growing like this commit?

@vigyasharma
Copy link
Contributor

TimeLimitingBulkScorer already optimizes for timeout check frequency outside of QueryTimeout impl

Ahh nice catch! You mean something like:

// counter is an integer initialized to 0
return (counter++ == 100 && queryTimeout.shouldExit()) || collector.earlyTerminated();

in the TimeLimitingKnnCollectorManager itself?

This makes sense to me, we can add this in a follow-up and even make the interval growing like this commit?

Yes. Let's do it in a follow-up change.

@vigyasharma vigyasharma merged commit df154cd into apache:main Apr 3, 2024
3 checks passed
@kaivalnp kaivalnp deleted the timeout branch April 3, 2024 14:36
@kaivalnp
Copy link
Contributor Author

kaivalnp commented Apr 4, 2024

I see a couple of recent failures related to the test added here:

  1. https://github.com/apache/lucene/actions/runs/8557766498/job/23450804532
  2. https://github.com/apache/lucene/actions/runs/8558104895/job/23451920688
  3. https://github.com/apache/lucene/actions/runs/8556647561/job/23446977354
org.apache.lucene.search.TestKnnByteVectorQuery > test suite's output saved to /home/runner/work/lucene/lucene/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.search.TestKnnByteVectorQuery.txt, copied below:
   >     java.lang.AssertionError: expected:<1> but was:<0>
   >         at __randomizedtesting.SeedInfo.seed([7F16007D6F6A386B:2F2E8BC60A821A77]:0)
   >         at org.junit.Assert.fail(Assert.java:89)
   >         at org.junit.Assert.failNotEquals(Assert.java:835)
   >         at org.junit.Assert.assertEquals(Assert.java:647)
   >         at org.junit.Assert.assertEquals(Assert.java:633)
   >         at org.apache.lucene.search.BaseKnnVectorQueryTestCase.testTimeout(BaseKnnVectorQueryTestCase.java:790)

Sample command to reproduce as mentioned in (1):

./gradlew :lucene:core:test --tests "org.apache.lucene.search.TestKnnByteVectorQuery.testTimeout" -Ptests.jvms=2 -Ptests.jvmargs= -Ptests.seed=7F16007D6F6A386B -Ptests.gui=true -Ptests.file.encoding=ISO-8859-1 -Ptests.vectorsize=256

However when I run it on my machine, I'm not getting the same error. Could someone double-check if this is deterministically reproducible?

Additionally, should we mark the test as @Ignored while we debug / fix it?

@benwtrent
Copy link
Member

@kaivalnp I too have had this fail, but it is not deterministic. I created #13272 to track it.

I can mute the test while we wait for the fix.

vigyasharma pushed a commit that referenced this pull request Apr 11, 2024
* Add timeout support to graph searches in AbstractKnnVectorQuery

* Also timeout exact searches

* Return partial KNN results

* Add tests for partial KNN results

- Refactor tests to base classes
- Also timeout exact searches in Lucene99HnswVectorsReader

* Add CHANGES.txt entry and fix some comments

---------

Co-authored-by: Kaival Parikh <[email protected]>
@kaivalnp
Copy link
Contributor Author

FYI I opened #13285 to add the same timeout support to AbstractVectorSimilarityQuery

vigyasharma pushed a commit that referenced this pull request Apr 17, 2024
* Add timeout support to graph searches in AbstractKnnVectorQuery

* Also timeout exact searches

* Return partial KNN results

* Add tests for partial KNN results

- Refactor tests to base classes
- Also timeout exact searches in Lucene99HnswVectorsReader

* Add CHANGES.txt entry and fix some comments

---------

Co-authored-by: Kaival Parikh <[email protected]>
@vigyasharma
Copy link
Contributor

Backported to branch_9x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants