Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(search): Multishard cutoffs #1924

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dranikpg
Copy link
Contributor

No description provided.

@dranikpg
Copy link
Contributor Author

My condolences to the one reviewing it 🙂

@kostasrim
Copy link
Contributor

kostasrim commented Sep 26, 2023

My condolences to the one reviewing it 🙂

I actually wrote Vlad - Kostas 1-1 only to not submit the comment because it would have been noise. Anyway, when you wrote this I could not resist not to write this msg 😛

I would argue mine was simpler because it was mostly boilerplate (I haven't looked yours TBH so it could be the same case) so it's not really 1-1 and probably you are getting the 👑 👨‍🍳

@kostasrim kostasrim self-requested a review September 26, 2023 07:09
@dranikpg dranikpg changed the title WIP feat(search): Multishard cutoffs feat(search): Multishard cutoffs Oct 29, 2023
@dranikpg
Copy link
Contributor Author

dranikpg commented Oct 29, 2023

@kostasrim @chakaz

Please take a quick look at this. There is still some polishment left but the general "structure" is ready.

I'd wish for this to be merged before we announce search because it's important for fast multishard queries

PS: Yes there is lots of code and it's complicated 🙂 I'd start looking from MultiShardSearch::Run because it's the main entry point w everything else being tailored to fit it's needs

Copy link
Collaborator

@chakaz chakaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed through this PR, it looks good.
What is missing, except for that TODO around how many items to retrieve?
Also see some minor questions/nits

src/server/search/doc_index.h Outdated Show resolved Hide resolved
src/server/search/doc_index.h Show resolved Hide resolved
Comment on lines +64 to +74
size_t avg_shard_min = hits * intlog2(hits) / (12 + shard_set->size() / 10);
avg_shard_min -= min(avg_shard_min, min(hits, size_t(5)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the rationale here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experimentally from #1892 😄 Those formulas might not be final

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd link to that, at least until you change it :)

// If all shards have at least avg min, keep the bare minimum needed to cover the request
size_t limit = requested / shards + 1;

// Aggregations like SORTBY and KNN reorder the result and thus introduce some variance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have to fetch all matching for SORTBY?

Copy link
Contributor Author

@dranikpg dranikpg Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No/Yes 🤓 We have to fetch all, but not serialize all. That is what the doc reference is for. If we find out when building the reply that some entries should have been included but are not serialized, we'll refill (see BuildSortedOrder())

Signed-off-by: Vladislav Oleshko <[email protected]>
Signed-off-by: Vladislav Oleshko <[email protected]>
Signed-off-by: Vladislav Oleshko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants