Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Lack of transactionality mechanics in create_collection #2104

Open
tazarov opened this issue May 1, 2024 · 1 comment
Open

[Bug]: Lack of transactionality mechanics in create_collection #2104

tazarov opened this issue May 1, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@tazarov
Copy link
Contributor

tazarov commented May 1, 2024

What happened?

Creating a collection in Chroma involves several steps:

  1. Create the collection in sysdb
  2. Create segments (metadata + Vector)
  3. Create a Vector segment in sysdb
  4. Create metadata segment in sysdb

If any of steps 2-3 fails, Chroma is left in an inconsistent state, with the collection in sysdb. A subsequent delete_collection or get_or_create_collection may fix the problem. However, a simple create_collection will return a UniqueConstraint error.

This is not a critical issue, as there are ways to work around it. However, it highlights the need for robust error handling, including but not limited to rollback.

Versions

Chroma 0.4.x and 0.5.x (single-node), Any OS or Python version

Relevant log output

Python 3.11.7 (main, Dec 30 2023, 14:03:09) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import chromadb
>>> client = chromadb.Client()
>>> try:
...     client.create_collection("test",metadata={"hnsw:batch_size":100})
... except Exception as e:
...     print(e)
... 
Unknown HNSW parameter: hnsw:batch_size
>>> client.create_collection("test")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tazarov/experiments/chroma/taz-sprint-14/chromadb/api/client.py", line 198, in create_collection
    return self._server.create_collection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tazarov/experiments/chroma/taz-sprint-14/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/tazarov/experiments/chroma/taz-sprint-14/chromadb/api/segment.py", line 173, in create_collection
    coll, created = self._sysdb.create_collection(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tazarov/experiments/chroma/taz-sprint-14/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/tazarov/experiments/chroma/taz-sprint-14/chromadb/db/mixins/sysdb.py", line 220, in create_collection
    raise UniqueConstraintError(f"Collection {name} already exists")
chromadb.db.base.UniqueConstraintError: Collection test already exists

Note: The above issue is reproducible for in-memory chroma single-node local or server (distributed not tested)

@tazarov tazarov added the bug Something isn't working label May 1, 2024
@tazarov
Copy link
Contributor Author

tazarov commented May 1, 2024

delete_collection has a similar sequence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant