Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: local reduce can be optimized if only one segment is involved #32728

Open
1 task done
longjiquan opened this issue Apr 30, 2024 · 2 comments
Open
1 task done
Assignees
Labels
kind/enhancement Issues or changes related to enhancement

Comments

@longjiquan
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

What would you like to be added?

int64_t
ReduceHelper::ReduceSearchResultForOneNQ(int64_t qi,
int64_t topk,
int64_t& offset) {
while (!heap_.empty()) {
heap_.pop();
}
pk_set_.clear();
pairs_.clear();
group_by_val_set_.clear();
pairs_.reserve(num_segments_);
for (int i = 0; i < num_segments_; i++) {
auto search_result = search_results_[i];
auto offset_beg = search_result->topk_per_nq_prefix_sum_[qi];
auto offset_end = search_result->topk_per_nq_prefix_sum_[qi + 1];
if (offset_beg == offset_end) {
continue;
}
auto primary_key = search_result->primary_keys_[offset_beg];
auto distance = search_result->distances_[offset_beg];
if (search_result->group_by_values_.has_value()) {
AssertInfo(
search_result->group_by_values_.value().size() > offset_beg,
"Wrong size for group_by_values size to "
"ReduceSearchResultForOneNQ:{}, not enough for"
"required offset_beg:{}",
search_result->group_by_values_.value().size(),
offset_beg);
}
pairs_.emplace_back(
primary_key,
distance,
search_result,
i,
offset_beg,
offset_end,
search_result->group_by_values_.has_value() &&
search_result->group_by_values_.value().size() > offset_beg
? std::make_optional(
search_result->group_by_values_.value().at(offset_beg))
: std::nullopt);
heap_.push(&pairs_.back());
}
// nq has no results for all segments
if (heap_.size() == 0) {
return 0;
}
int64_t dup_cnt = 0;
auto start = offset;
while (offset - start < topk && !heap_.empty()) {
auto pilot = heap_.top();
heap_.pop();
auto index = pilot->segment_index_;
auto pk = pilot->primary_key_;
// no valid search result for this nq, break to next
if (pk == INVALID_PK) {
break;
}
// remove duplicates
if (pk_set_.count(pk) == 0) {
bool skip_for_group_by = false;
if (pilot->group_by_value_.has_value()) {
if (group_by_val_set_.count(pilot->group_by_value_.value()) >
0) {
skip_for_group_by = true;
}
}
if (!skip_for_group_by) {
pilot->search_result_->result_offsets_.push_back(offset++);
final_search_records_[index][qi].push_back(pilot->offset_);
pk_set_.insert(pk);
if (pilot->group_by_value_.has_value())
group_by_val_set_.insert(pilot->group_by_value_.value());
}
} else {
// skip entity with same primary key
dup_cnt++;
}
pilot->advance();
if (pilot->primary_key_ != INVALID_PK) {
heap_.push(pilot);
}
}
return dup_cnt;
}

Why is this needed?

9PfCtXlffP

If only one segment was involved in the reduce phase, in fact we waste the cpu of heap-sort to complete the reduce. We can do it more simply.

Anything else?

No response

@longjiquan longjiquan added the kind/enhancement Issues or changes related to enhancement label Apr 30, 2024
@longjiquan
Copy link
Contributor Author

the tracing can be found in #32734

@xiaofan-luan
Copy link
Contributor

seems to be a promising optimization

sre-ci-robot pushed a commit that referenced this issue May 7, 2024
sre-ci-robot pushed a commit that referenced this issue May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Issues or changes related to enhancement
Projects
None yet
Development

No branches or pull requests

3 participants