Add timeout support to AbstractKnnVectorQuery #13202

kaivalnp · 2024-03-22T23:08:55Z

Description

#927 added timeout support to IndexSearcher#search where we wrap a scorer in a TimeLimitingBulkScorer which periodically checks whether a query timed out while collecting docs in fixed chunks

However as #11677 points out, this does not apply to query rewrites -- an important use case being vector searches (AbstractKnnVectorQuery), where the bulk of computations (HNSW graph searches) happen in #rewrite

If we had set a timeout, and HNSW searches turned out too expensive -- we would not return results even after the search completes (because the timeout is checked before attempting to score docs -- at which point we have all possible results, but they aren't returned)

In this PR, we're wrapping the KnnCollector in another one which additionally checks for QueryTimeout#shouldExit in the KnnCollector#earlyTerminated function. If this timeout is hit, we immediately return empty results

Also extended this to exact searches, where we check for a timeout before scoring each document

kaivalnp · 2024-03-22T23:14:18Z

Benchmarks

Used KnnGraphTester, applied a timeout by calling searcher.setTimeout before this line. Rows (in order) are: recall, latency per query in ms, number of docs, fanout, maxConn, beamWidth, number of nodes visited, reindex time in ms, selectivity, pre/post-filter (topK = 100, numQueries = 10K)

baseline with `null` timeout

0.965	 1.10	1000000	50	16	100	1675	0	1.00	post-filter
0.992	 2.82	1000000	400	16	100	4086	0	1.00	post-filter
0.974	 1.21	1000000	50	16	200	1853	0	1.00	post-filter
0.995	 3.10	1000000	400	16	200	4504	0	1.00	post-filter

baseline with `Long.MAX_VALUE` timeout

0.965	 1.10	1000000	50	16	100	1675	0	1.00	post-filter
0.992	 2.82	1000000	400	16	100	4086	0	1.00	post-filter
0.974	 1.21	1000000	50	16	200	1853	0	1.00	post-filter
0.995	 3.09	1000000	400	16	200	4504	0	1.00	post-filter

baseline with `1 ms` timeout

0.483	 1.08	1000000	50	16	100	1675	0	1.00	post-filter
0.005	 2.78	1000000	400	16	100	4086	0	1.00	post-filter
0.418	 1.18	1000000	50	16	200	1853	0	1.00	post-filter
0.000	 3.05	1000000	400	16	200	4504	0	1.00	post-filter

candidate with `null` timeout

0.965	 1.10	1000000	50	16	100	1675	0	1.00	post-filter
0.992	 2.83	1000000	400	16	100	4086	0	1.00	post-filter
0.974	 1.21	1000000	50	16	200	1853	0	1.00	post-filter
0.995	 3.10	1000000	400	16	200	4504	0	1.00	post-filter

candidate with `Long.MAX_VALUE` timeout

0.965	 1.17	1000000	50	16	100	1675	0	1.00	post-filter
0.992	 2.99	1000000	400	16	100	4086	0	1.00	post-filter
0.974	 1.28	1000000	50	16	200	1853	0	1.00	post-filter
0.995	 3.28	1000000	400	16	200	4504	0	1.00	post-filter

candidate with `1 ms` timeout

0.445	 0.89	1000000	50	16	100	476	0	1.00	post-filter
0.001	 1.01	1000000	400	16	100	12	0	1.00	post-filter
0.374	 0.92	1000000	50	16	200	425	0	1.00	post-filter
0.000	 1.01	1000000	400	16	200	3	0	1.00	post-filter

Some observations:

There is almost no regression when a timeout is not set (= null value)
There is some regression in queries where a timeout is set but not met (~6%) because of additional time checks on every #earlyTerminated call
There are large gains in queries where a timeout is set and met

vigyasharma · 2024-03-23T18:16:04Z

If we had set a timeout, and HNSW searches turned out too expensive -- we would not return results even after the search completes (because the timeout is checked before attempting to score docs -- at which point we have all possible results, but they aren't returned)

Is this because we have already exhausted our query timeout during rewrite, and so we throw a TimeExceededException and bail before scoring docs for even the first chunk?

kaivalnp · 2024-03-23T20:31:41Z

Exactly! This issue is also visible from the above benchmarks, where the baseline (having no early termination in #rewrite) performs the entire graph search (same number of nodes visited, similar latency) even with a 1 ms timeout, but the results are eventually discarded -- demonstrated by the low recall compared to high or no timeout

vigyasharma

Interesting find on KNN search behavior, @kaivalnp! I added some thoughts after a first pass at the PR.

Apart from those, one divergent behavior is that we won't be raising some form of TimeExceededException when we timeout and terminate the search. Wonder if we should have a way to surface that this was a timeout, v/s all the results we could collect within visitLimit() node traversal.

vigyasharma · 2024-03-24T20:18:43Z

lucene/core/src/java/org/apache/lucene/search/TimeLimitingKnnCollectorManager.java

+ if (queryTimeout.shouldExit()) {
+ // Quickly exit
+ return TopDocsCollector.EMPTY_TOPDOCS;
+ }


Most KnnCollector implementations (like this one) tend to return whatever results we've collected so far, and set the relation as TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO if we early terminated. Any reason why we don't want to keep the same behavior here?

I think we should return whatever docs we were able to collect within the time budget. We have already done the heavy work of approximate search for those hits, what's left is only to return those cached results from the DocAndScoreQuery.

It's not the behavior today, as TimeLimitingBulkScorer sees the query as timed out, and doesn't even run the first interval on DocAndScoreQuery. I need to think more of a good way around it. I'm thinking that if scorer is doing the heavy work of iterating through ImpactsEnum or PostingsEnum, then it makes sense to early terminate during scoring. But for queries with cached results like DocAndScoreQuery, maybe we shouldn't?

Returning partial results makes sense, but I passed the empty results here purely because they would be discarded later (so we could save on computations used for creating the TopDocs)

TimeLimitingBulkScorer sees the query as timed out, and doesn't even run the first interval
But for queries with cached results like DocAndScoreQuery, maybe we shouldn't?

I don't think returning partial results would add value today, can we add a TODO to return them once the above behavior is fixed?

Most KnnCollector implementations (like this one) tend to return whatever results we've collected so far, and set the relation as TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO if we early terminated

On a separate note, this seems to be a no-op as well -- since we return fresh results from exact search if approximate results are incomplete. I wonder if we can return empty results here as well to save on some computations?

I don't think returning partial results would add value today, can we add a TODO to return them once the above behavior is fixed?

Yes, we can handle it in a different change. I'll start a discussion on a separate issue for this.

On a separate note, this seems to be a no-op as well -- since we return fresh results from exact search if approximate results are incomplete.

I think we only fall back to exact search if there is a filter provided, i.e. filterWeight != null (code). Otherwise we return the approx search result and callers can use TotalHits.Relation to figure out if search terminated early.

I still think topDocs() from the collector should return whatever it collected, and set the relation to GTE. With EMPTY_TOPDOCS, the query gets rewritten as MatchNoDocsQuery which is misleads debugging and further masks the timeout occurrence. We can let the results get discarded by TimeLimitingBulkScorer for now, while we handle that in a separate change (since it is at par with existing behavior).

I'm open to more opinions on this.

callers can use TotalHits.Relation to figure out if search terminated early.

This information is internal to AbstractKnnVectorQuery -- even if we pass some TotalHits.Relation, the rewritten DocAndScoreQuery treats all results as "complete" (no distinction for the relation here, and does not pass it on to IndexSearcher#search)

Though I agree, we could pass those results right now and handle them properly in a follow up issue. I've added a commit to do this

vigyasharma · 2024-03-24T20:51:18Z

lucene/core/src/java/org/apache/lucene/search/TimeLimitingKnnCollectorManager.java

+ return new KnnCollector() {
+ @Override
+ public boolean earlyTerminated() {
+ return queryTimeout.shouldExit() || collector.earlyTerminated();


Ah, so both searchLevel() and findBestEntryPoint() in HnswGraphSearcher check earlyTerminated() on their collector to abort the search, so configuring the timeout check allows us to preempt the approximate search.

vigyasharma · 2024-03-24T21:22:10Z

baseline with `null` timeout

0.965	 1.10	1000000	50	16	100	1675	0	1.00	post-filter
0.992	 2.82	1000000	400	16	100	4086	0	1.00	post-filter
0.974	 1.21	1000000	50	16	200	1853	0	1.00	post-filter
0.995	 3.10	1000000	400	16	200	4504	0	1.00	post-filter

Could we add headers for each column with these benchmark results please? Thanks!

vigyasharma · 2024-03-24T21:24:19Z

Separately, should we deprecate TimeLimitingCollector ? It doesn't use QueryTimeout and I don't think we're using it anywhere.

kaivalnp · 2024-03-25T00:19:11Z

Thanks for the review @vigyasharma!

Apart from those, one divergent behavior is that we won't be raising some form of TimeExceededException when we timeout and terminate the search. Wonder if we should have a way to surface that this was a timeout, v/s all the results we could collect within visitLimit() node traversal.

I think this exception will be raised here automatically, because the timeout has already occurred? Though we won't know / raise this explicitly because graph search timed out, but the end goal of marking results as "incomplete" will still be satisfied?

I'm also not sure how we'd do it explicitly, perhaps we require lower-level API changes to allow rewrites to time out more gracefully?

Could we add headers for each column with these benchmark results please?

Oops, I missed adding the benchmark description and row headers earlier, updated now..

Separately, should we deprecate TimeLimitingCollector ? It doesn't use QueryTimeout and I don't think we're using it anywhere.

+1, I only see occurrences of it in comments. I'll try to create a PR for this soon

kaivalnp · 2024-03-26T14:10:03Z

Separately, should we deprecate TimeLimitingCollector ? It doesn't use QueryTimeout and I don't think we're using it anywhere.

Created #13220 to discuss this

vigyasharma · 2024-03-26T19:39:17Z

lucene/core/src/java/org/apache/lucene/search/TimeLimitingKnnCollectorManager.java

+ // Mark results as partial if timeout is met
+ TotalHits.Relation relation =
+ queryTimeout.shouldExit()
+ ? TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO
+ : docs.totalHits.relation;
+
+ return new TopDocs(new TotalHits(docs.totalHits.value, relation), docs.scoreDocs);


Can we simply return collector.topDocs(); and let collectors decide how to handle this? All implementations of AbstractKnnCollector already set relation to GTE based on earlyTerminated() check.

I don't think the collector.topDocs() will set the relation to GTE in case of timeouts (it will only set this when the visitLimit() is crossed, because it will use its own internal earlyTerminated() function that does not have information about the QueryTimeout)

vigyasharma · 2024-03-26T19:42:19Z

lucene/core/src/test/org/apache/lucene/search/TestKnnByteVectorQuery.java

@@ -102,14 +103,34 @@ public void testVectorEncodingMismatch() throws IOException {
 }
 }

+ public void testTimeout() throws IOException {


We could also add a test for the partial result case. You could create a mock query timeout that returns false only for the first time it's called, and then flips to returning true.

Makes sense!

There was a consideration here that the number of levels in the HNSW graph should be == 1, because if the timeout is hit while finding the best entry point for the last level, we haven't collected any results yet. I think this should be fine as we're only indexing 3 vectors, and running these tests for a few thousand times did not give an error. Added a note about this as well

Also fixed another place where the timeout needs to be checked

- Refactor tests to base classes - Also timeout exact searches in Lucene99HnswVectorsReader

kaivalnp · 2024-03-26T22:07:07Z

lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenFloatKnnVectorQuery.java

 while (vectorScorer.nextParent() != DocIdSetIterator.NO_MORE_DOCS) {
+ // Mark results as partial if timeout is met
+ if (queryTimeout != null && queryTimeout.shouldExit()) {
+ relation = TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO;
+ break;
+ }


Also wanted some opinions here: we're checking the timeout in exact search of DiversifyingChildren[Byte|Float]KnnVectorQuery once per-parent (as opposed to once per document elsewhere)

Should we update this to once per-child as well?

Timeouts are inevitably approximate, so I guess it should be fine. Also seems like something that we can easily change in a follow up PR is needed.

Sounds good, thanks!

benwtrent · 2024-03-27T11:51:41Z

Looking at the benchmarking, we are adding a 5% overhead to all vector operations when using float32. As vector operations get faster (consider hamming distance with exploring more vectors), I expect this timeout mechanism to have an even bigger impact on search.

What was the dimension count & dataset for your performance testing @kaivalnp ?

Do we really think that timing out a couple of seconds faster is worth the overhead?

kaivalnp · 2024-03-28T10:50:18Z

What was the dimension count & dataset for your performance testing

It was the enwiki dataset (https://home.apache.org/~sokolov/enwiki-20120502-lines-1k-100d.vec), unit vectors, 100 dimensions, indexed first 1M vectors, searched next 10K vectors, with DOT_PRODUCT similarity function

Looking at the benchmarking, we are adding a 5% overhead to all vector operations when using float32

This overhead is only when a timeout is set but not met, there is no regression when it is not set at all (in the benchmarked case without any filter). IMO this additional check can be useful to time out unexpectedly expensive queries in a production system to keep overall CPU within limits, but looking forward to more opinions here..

benwtrent · 2024-03-28T11:32:53Z

unit vectors, 100 dimensions, indexed first 1M vectors,

@kaivalnp AH, ok, only 100 dimensions, this explains the overly large impact that checking for timeout had, as there is very little work being done on distance calculations. I would expect the percentage of compute to drop as the amount of work done by vector comparisons to increase.

I have a dumb question about timeouts. I am guessing the QueryTimeout object has to do some global syncing to determine the current runtime to check. Is that done on every single vector operation? If so, could we make sure its only done every N operations?

Timeouts are already approximate, I understand the need for having a more strict timeout, but I am not sure we should check for timeout on every single vector.

kaivalnp · 2024-03-28T17:40:59Z

I am guessing the QueryTimeout object has to do some global syncing to determine the current runtime to check

I have used the default QueryTimeoutImpl class, which makes System.nanoTime() calls to check for a timeout at every shouldExit() call

but I am not sure we should check for timeout on every single vector.

Perhaps this can be configured by the end-user themselves, by making actual timeout checks after every TK number of calls, according to acceptable latency / accuracy tradeoffs?

For example, I modified the shouldExit() function here from:

return timeoutAt != null && nanoTime() - timeoutAt > 0;

to

// count is an integer initialized to 0
return timeoutAt != null && ++count % 100 == 0 && nanoTime() - timeoutAt > 0;

to check the timeout every 100 vectors, and the latency overhead with a Long.MAX_VALUE timeout got cut in half. Correspondingly, the latency with a 1 ms timeout increased because we check the timeout less frequently, but all of this is expected

vigyasharma · 2024-04-01T20:56:06Z

Perhaps this can be configured by the end-user themselves, by making actual timeout checks after every TK number of calls, according to acceptable latency / accuracy tradeoffs?

TimeLimitingBulkScorer already optimizes for timeout check frequency outside of QueryTimeout impl, by adjusting the interval passed to bulk scorer. I would rather we keep a compatible/similar vector timeout behavior by implementing this in the collector manager, so that users can use the same timeout impl in both places.

That being said, I'm okay if we do it in a separate change.

vigyasharma · 2024-04-01T20:58:14Z

I think this PR's changes are okay to be merged. @kaivalnp: can you please resolve the version conflicts and add a changes entry, and I can merge it in.

kaivalnp · 2024-04-02T12:03:40Z

TimeLimitingBulkScorer already optimizes for timeout check frequency outside of QueryTimeout impl

Ahh nice catch! You mean something like:

// counter is an integer initialized to 0
return (counter++ == 100 && queryTimeout.shouldExit()) || collector.earlyTerminated();

in the TimeLimitingKnnCollectorManager itself?

This makes sense to me, we can add this in a follow-up and even make the interval growing like this commit?

vigyasharma · 2024-04-02T17:49:32Z

TimeLimitingBulkScorer already optimizes for timeout check frequency outside of QueryTimeout impl

Ahh nice catch! You mean something like:
// counter is an integer initialized to 0
return (counter++ == 100 && queryTimeout.shouldExit()) || collector.earlyTerminated();
in the TimeLimitingKnnCollectorManager itself?

This makes sense to me, we can add this in a follow-up and even make the interval growing like this commit?

Yes. Let's do it in a follow-up change.

kaivalnp · 2024-04-04T18:39:44Z

I see a couple of recent failures related to the test added here:

org.apache.lucene.search.TestKnnByteVectorQuery > test suite's output saved to /home/runner/work/lucene/lucene/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.search.TestKnnByteVectorQuery.txt, copied below:
   >     java.lang.AssertionError: expected:<1> but was:<0>
   >         at __randomizedtesting.SeedInfo.seed([7F16007D6F6A386B:2F2E8BC60A821A77]:0)
   >         at org.junit.Assert.fail(Assert.java:89)
   >         at org.junit.Assert.failNotEquals(Assert.java:835)
   >         at org.junit.Assert.assertEquals(Assert.java:647)
   >         at org.junit.Assert.assertEquals(Assert.java:633)
   >         at org.apache.lucene.search.BaseKnnVectorQueryTestCase.testTimeout(BaseKnnVectorQueryTestCase.java:790)

Sample command to reproduce as mentioned in (1):

./gradlew :lucene:core:test --tests "org.apache.lucene.search.TestKnnByteVectorQuery.testTimeout" -Ptests.jvms=2 -Ptests.jvmargs= -Ptests.seed=7F16007D6F6A386B -Ptests.gui=true -Ptests.file.encoding=ISO-8859-1 -Ptests.vectorsize=256

However when I run it on my machine, I'm not getting the same error. Could someone double-check if this is deterministically reproducible?

Additionally, should we mark the test as @Ignored while we debug / fix it?

benwtrent · 2024-04-04T19:49:28Z

@kaivalnp I too have had this fail, but it is not deterministic. I created #13272 to track it.

I can mute the test while we wait for the fix.

* Add timeout support to graph searches in AbstractKnnVectorQuery * Also timeout exact searches * Return partial KNN results * Add tests for partial KNN results - Refactor tests to base classes - Also timeout exact searches in Lucene99HnswVectorsReader * Add CHANGES.txt entry and fix some comments --------- Co-authored-by: Kaival Parikh <[email protected]>

kaivalnp · 2024-04-15T06:52:33Z

FYI I opened #13285 to add the same timeout support to AbstractVectorSimilarityQuery

* Add timeout support to graph searches in AbstractKnnVectorQuery * Also timeout exact searches * Return partial KNN results * Add tests for partial KNN results - Refactor tests to base classes - Also timeout exact searches in Lucene99HnswVectorsReader * Add CHANGES.txt entry and fix some comments --------- Co-authored-by: Kaival Parikh <[email protected]>

vigyasharma · 2024-04-17T21:57:26Z

Backported to branch_9x

Kaival Parikh added 2 commits March 22, 2024 21:53

Add timeout support to graph searches in AbstractKnnVectorQuery

d700b91

Also timeout exact searches

b7f0ed2

vigyasharma reviewed Mar 24, 2024

View reviewed changes

Return partial KNN results

31a2643

kaivalnp mentioned this pull request Mar 26, 2024

Mark TimeLimitingCollector as deprecated #13220

Merged

vigyasharma reviewed Mar 26, 2024

View reviewed changes

Add tests for partial KNN results

6908af4

- Refactor tests to base classes - Also timeout exact searches in Lucene99HnswVectorsReader

kaivalnp commented Mar 26, 2024

View reviewed changes

Kaival Parikh added 2 commits April 2, 2024 08:14

Merge remote-tracking branch 'origin/main' into timeout

e984c80

Add CHANGES.txt entry and fix some comments

8ad040a

vigyasharma approved these changes Apr 2, 2024

View reviewed changes

vigyasharma merged commit df154cd into apache:main Apr 3, 2024
3 checks passed

kaivalnp deleted the timeout branch April 3, 2024 14:36

This was referenced Apr 8, 2024

Fix failing BaseKnnVectorQueryTestCase#testTimeout #13283

Merged

Add timeout support to AbstractVectorSimilarityQuery #13285

Open

vigyasharma mentioned this pull request Apr 11, 2024

Backport #13202 to branch_9x #13295

Closed

Add timeout support to AbstractKnnVectorQuery #13202

Add timeout support to AbstractKnnVectorQuery #13202

Conversation

kaivalnp commented Mar 22, 2024

Description

kaivalnp commented Mar 22, 2024 • edited

Benchmarks

baseline with null timeout

baseline with Long.MAX_VALUE timeout

baseline with 1 ms timeout

candidate with null timeout

candidate with Long.MAX_VALUE timeout

candidate with 1 ms timeout

vigyasharma commented Mar 23, 2024

kaivalnp commented Mar 23, 2024

vigyasharma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vigyasharma commented Mar 24, 2024

baseline with null timeout

vigyasharma commented Mar 24, 2024

kaivalnp commented Mar 25, 2024

kaivalnp commented Mar 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vigyasharma Apr 1, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent commented Mar 27, 2024

kaivalnp commented Mar 28, 2024

benwtrent commented Mar 28, 2024

kaivalnp commented Mar 28, 2024

vigyasharma commented Apr 1, 2024

vigyasharma commented Apr 1, 2024

kaivalnp commented Apr 2, 2024

vigyasharma commented Apr 2, 2024

kaivalnp commented Apr 4, 2024

benwtrent commented Apr 4, 2024

kaivalnp commented Apr 15, 2024

vigyasharma commented Apr 17, 2024

kaivalnp commented Mar 22, 2024 •

edited

baseline with `null` timeout

baseline with `Long.MAX_VALUE` timeout

baseline with `1 ms` timeout

candidate with `null` timeout

candidate with `Long.MAX_VALUE` timeout

candidate with `1 ms` timeout

baseline with `null` timeout

vigyasharma Apr 1, 2024 •

edited