Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Deadlock in concurrent update()s #396

Open
2 tasks done
chun0nick opened this issue Apr 12, 2024 · 5 comments
Open
2 tasks done

Bug: Deadlock in concurrent update()s #396

chun0nick opened this issue Apr 12, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@chun0nick
Copy link

chun0nick commented Apr 12, 2024

Describe the bug

Deadlock came up during testing, appears to be an issue in index.hpp. We're using the Rust bindings for v2.8.16 so through index_dense.hpp. This testing involved deleting a bunch of vectors and then, with multiple threads, calling add(), which then called update() on the deleted nodes.

In update we acquire the lock on the deleted node here. We then call connect_node_across_levels_ which invokes search_for_one_ and search_to_insert_, while locking and locking (respectively) every candidate node it looks at.

This update lock on the deleted node seems to be the problem. If another thread performing an update selects that deleted node as a "next" candidate, we have to wait until the other thread is completely done updating, as we don't drop the update lock until the end of the call. What we've seen somewhat consistently (with 16 threads updating) is a deadlock, where each of the threads needs the candidate lock that another thread updating already holds, and that thread in turn needs the candidate lock a different thread holds.

This may be the issue described in #354?

Steps to reproduce

Create an index, add a reasonable number of vectors. Delete them - then concurrently call add (update) with a bunch of threads.

Expected behavior

The concurrent updates should not conflict with each other.

USearch version

v2.8.16

Operating System

Amazon Linux 2

Hardware architecture

x86

Which interface are you using?

Other bindings

Contact Details

[email protected]

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct
@chun0nick chun0nick added the bug Something isn't working label Apr 12, 2024
@ashvardanian
Copy link
Contributor

Good catch! Looking into it! Feel free to submit a test-case or patches via PR while I do 🤗

@ashvardanian
Copy link
Contributor

Replicated it, will make part of the primary testing suite in C++. Thanks for finding and reporting the issue, @chun0nick! Would you be open to be mentioned as a contributor for this patch? 🤗

@chun0nick
Copy link
Author

Awesomely fast response! And sorry - was planning to give you a test case today. Sure, would be happy to be mentioned as a contributor.

@sef43
Copy link

sef43 commented Apr 23, 2024

I had what looked to be a deadlock happen when using the Python api (as part of usearch_molecules indexing step) when using 32 threads. Could this bug be the cause of that?

@ashvardanian
Copy link
Contributor

Yes, @sef43, the issue is coming from the core in update-heavy workloads. Haven't finished patching yet, was busy the last few days with new bindings for UForm and new algos in StringZilla.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants