-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Embeddings Deletion Causes "Delete of nonexisting embedding ID" #989
Comments
I have the problem too |
@tazarov Hi, could you please look at this problem? Thank you for you time! |
@mickey-lyx, thanks for reporting this. I'll take a look at this soon. At a glance, the code looks fine, and the actual result seems to be fine - you have 61 docs once you remove 47 from the starting 107. All in all, this seems like a warning, not an actual bug. The I will have a look and let you know. |
@tazarov Really appreciate it. The result is right. I'm just wondering why there appears to be warnings of deleting nonexisting embeddings. Is it because the embeddings were deleted multiple times? |
I have the same issue, and running queries on the db triggers this warning every time. What I did is selected items based on where statement (no ID was given) and removed them one-by-one:
Since then the warning is shown every time I query it. |
I'm having the same issue. This seems to occur even when an empty list is passed as ids to Collection.delete. |
We'd love to get this fixed - is anyone able to help post a minimal repro? |
import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
def main():
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection(name="test", embedding_function=OpenAIEmbeddingFunction())
num_1 = 47
num_2 = 70
texts_1 = [f"text_1.{i}" for i in range(num_1)]
ids_1 = [f"1.{i}" for i in range(num_1)]
texts_2 = [f"text_2.{i}" for i in range(num_2)]
ids_2 = [f"2.{i}" for i in range(num_2)]
collection.add(ids=ids_1, documents=texts_1)
collection.add(ids=ids_2, documents=texts_2)
print("count before", collection.count())
collection.delete(ids_1)
print("count after", collection.count())
if __name__ == '__main__':
main() |
I'm seeing similar warnings, but I'm unsure if I should be concerned since it's a warning. It would be good to get some insights to why this occurs even after uploading a few PDF files and while the fastapi is idle, keeps logging.
package versions chromadb==0.4.10 Running chroma client server with the latest Docker version
|
I am having this exact issue too |
@jeffchuber, @chrispangg, @timothymugayi, @mickey-lyx, As I mentioned above, the issue is benign. Chroma maintains a temporary index of embeddings before it flushes it to disk after it reaches a certain threshold. In your example, the threshold is reached (100) so the temp index is flushed and cleared, and subsequent entries are appended to it, but when delete comes right after add Chroma attempts to remove any and all embeddings from the temporary index which leads to the warning you see. I have made a fix to properly check if ids to be removed are part of the temp index and if not Chroma will not attempt deletion. PR's on the way. |
- When the BF index overflows (batch_size upon insertion of large batch it is cleared, if a subsequent delete request comes to delete Ids which were in the cleared BF index a warning is raised for non-existent embedding. The issue was resolved by separately checking if BF the record exists in the BF index and conditionally execute the BF removal Refs: chroma-core#989
- Remove ternary expression Refs: chroma-core#989
Refs: #989 ## Description of changes *Summarize the changes made by this PR.* - Improvements & Bug fixes - When the BF index overflows (batch_size upon insertion of large batch it is cleared, if a subsequent delete request comes to delete Ids which were in the cleared BF index a warning is raised for non-existent embedding. The issue was resolved by separately checking if BF the record exists in the BF index and conditionally execute the BF removal ## Test plan *How are these changes tested?* - [x] Tests pass locally with `pytest` for python ## Documentation Changes N/A
…ore#1150) Refs: chroma-core#989 ## Description of changes *Summarize the changes made by this PR.* - Improvements & Bug fixes - When the BF index overflows (batch_size upon insertion of large batch it is cleared, if a subsequent delete request comes to delete Ids which were in the cleared BF index a warning is raised for non-existent embedding. The issue was resolved by separately checking if BF the record exists in the BF index and conditionally execute the BF removal ## Test plan *How are these changes tested?* - [x] Tests pass locally with `pytest` for python ## Documentation Changes N/A
…ore#1150) Refs: chroma-core#989 ## Description of changes *Summarize the changes made by this PR.* - Improvements & Bug fixes - When the BF index overflows (batch_size upon insertion of large batch it is cleared, if a subsequent delete request comes to delete Ids which were in the cleared BF index a warning is raised for non-existent embedding. The issue was resolved by separately checking if BF the record exists in the BF index and conditionally execute the BF removal ## Test plan *How are these changes tested?* - [x] Tests pass locally with `pytest` for python ## Documentation Changes N/A
@HammadB I think we can close this now. |
I think this issue is still present. I've just stumbled upon it in my application. And I'm using latest (0.4.24) version of Chroma, so the fix from #1150 should probably be already merged. |
我更新了chromadb==0.5.0,但还是有这个问题: |
@running-frog, @s-peryt, we have a bug in the HNSW binary index that, under certain conditions, can result in the above errors. There is a PR - #2062 that should resolve this. |
What happened?
Hi there, I tried to upload two PDF files to a persistant collection and delete one of them. But I received Warning Messages: "Delete of nonexisting embedding ID". This Warning only appears when I upload multiple files and delete one of them. Here are my test files and code.
alphabet-2023-q1-10q.pdf
Apple Inc.-10K.pdf
Versions
chromadb==0.4.5
langchain==0.0.264
python==3.10.12
MacOS==13.3.1
Relevant log output
The text was updated successfully, but these errors were encountered: