New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Langchain plugin for Chroma always tries to create the collection even if the collection already exists. #2163
Comments
@jeffchuber @tazarov need help with this. |
@harshal-cuminai, thanks for the elaborate and deep exploration of the issue. Separating your ingestion and query/get flows makes sense for more than security reasons. Just off the top of my head, I see two options here:
Is auth something you can work with? If yes, then I can give you some configs to try out. It might be worth it until we figure out a more flexible solution. |
hi @tazarov sure we are open to any temporary solution till we can make some variant of proposed solution a first class integration in Langchain. Currently we do have auth setup as a subprocess for the nginx proxy sitting in front of the chromadb service. But our use case requires rejecting collection creation altogether (even for authenticated clients) which is not possible due to current langchain integration, so i am thinking we will probably have to redirect POST call for collection creation (triggered by langchain) post authentication (based on client role) as follows:
as they both have same response schema and output when get_or_create is set to true as it is in current case. What approach are you suggesting ? |
Rewriting sounds like a sensible approach. However, you'll have to read the POST payload to get the |
yes correct. Any cheaper alternative, you can suggest ? On a side note, it would be best to have this as a first class feature in langchain-chroma. wdyt? |
I've already written up the Langchain🦜🔗 PR, just adding tests, and off it goes. However, it might take a few days to merge and release it. Your problem is not uncommon or shouldn't be for some publicly facing products where you'd want a modicum of control over who can write to the DB. |
Adds the ability to either get_or_create or simply get collection. This is useful when dealing wit read-only Chroma instances where users can only get_collection. Targeted at Http/CloudClients mostly. Closes chroma-core/chroma#2163
@harshal-cuminai, PR in Langchain🦜🔗 created. |
…ma constructor (#21420) - **Description:** Adds the ability to either `get_or_create` or simply `get_collection`. This is useful when dealing with read-only Chroma instances where users are constraint to using `get_collection`. Targeted at Http/CloudClients mostly. - **Issue:** chroma-core/chroma#2163 - **Dependencies:** N/A - **Twitter handle:** `@t_azarov` | Collection Exists | create_collection_if_not_exists | Outcome | test | |-------------------|---------------------------------|----------------------------------------------------------------|----------------------------------------------------------| | True | False | No errors, collection state unchanged | `test_create_collection_if_not_exist_false_existing` | | True | True | No errors, collection state unchanged | `test_create_collection_if_not_exist_true_existing` | | False | False | Error, `get_collection()` fails | `test_create_collection_if_not_exist_false_non_existing` | | False | True | No errors, `get_or_create_collection()` creates the collection | `test_create_collection_if_not_exist_true_non_existing` |
@harshal-cuminai The PR should be in the next release. |
…ma constructor (langchain-ai#21420) - **Description:** Adds the ability to either `get_or_create` or simply `get_collection`. This is useful when dealing with read-only Chroma instances where users are constraint to using `get_collection`. Targeted at Http/CloudClients mostly. - **Issue:** chroma-core/chroma#2163 - **Dependencies:** N/A - **Twitter handle:** `@t_azarov` | Collection Exists | create_collection_if_not_exists | Outcome | test | |-------------------|---------------------------------|----------------------------------------------------------------|----------------------------------------------------------| | True | False | No errors, collection state unchanged | `test_create_collection_if_not_exist_false_existing` | | True | True | No errors, collection state unchanged | `test_create_collection_if_not_exist_true_existing` | | False | False | Error, `get_collection()` fails | `test_create_collection_if_not_exist_false_non_existing` | | False | True | No errors, `get_or_create_collection()` creates the collection | `test_create_collection_if_not_exist_true_non_existing` |
…ma constructor (langchain-ai#21420) - **Description:** Adds the ability to either `get_or_create` or simply `get_collection`. This is useful when dealing with read-only Chroma instances where users are constraint to using `get_collection`. Targeted at Http/CloudClients mostly. - **Issue:** chroma-core/chroma#2163 - **Dependencies:** N/A - **Twitter handle:** `@t_azarov` | Collection Exists | create_collection_if_not_exists | Outcome | test | |-------------------|---------------------------------|----------------------------------------------------------------|----------------------------------------------------------| | True | False | No errors, collection state unchanged | `test_create_collection_if_not_exist_false_existing` | | True | True | No errors, collection state unchanged | `test_create_collection_if_not_exist_true_existing` | | False | False | Error, `get_collection()` fails | `test_create_collection_if_not_exist_false_non_existing` | | False | True | No errors, `get_or_create_collection()` creates the collection | `test_create_collection_if_not_exist_true_non_existing` |
@tazarov is the package auto published on release? https://pypi.org/project/langchain-chroma/#history |
@harshal-cuminai, I think they do separate releases for partner libs. But you can always do the following: With pip: pip install git+https://github.com/langchain-ai/langchain.git@master#subdirectory=libs/partners/chroma In requirements.txt: git+https://github.com/langchain-ai/langchain.git@master#subdirectory=libs/partners/chroma In pyproject.toml: [tool.poetry.dependencies]
langchain-chroma = { git = "https://github.com/langchain-ai/langchain.git", branch = "master", subdirectory = "libs/partners/chroma" } |
perfect. this works. Thanks a ton @tazarov . Closing this thread now. |
@tazarov now that we have tested it locally, we are kinda blocked from release of our package till this change gets rolled out in the langchain-chroma package (as we can't rollout packages with direct repo based dependencies). I have dropped in a comment on your langchain PR, but is there a way you folks can expedite the release ? |
…ma constructor (langchain-ai#21420) - **Description:** Adds the ability to either `get_or_create` or simply `get_collection`. This is useful when dealing with read-only Chroma instances where users are constraint to using `get_collection`. Targeted at Http/CloudClients mostly. - **Issue:** chroma-core/chroma#2163 - **Dependencies:** N/A - **Twitter handle:** `@t_azarov` | Collection Exists | create_collection_if_not_exists | Outcome | test | |-------------------|---------------------------------|----------------------------------------------------------------|----------------------------------------------------------| | True | False | No errors, collection state unchanged | `test_create_collection_if_not_exist_false_existing` | | True | True | No errors, collection state unchanged | `test_create_collection_if_not_exist_true_existing` | | False | False | Error, `get_collection()` fails | `test_create_collection_if_not_exist_false_non_existing` | | False | True | No errors, `get_or_create_collection()` creates the collection | `test_create_collection_if_not_exist_true_non_existing` |
closing as 0.1.1 is released. |
Describe the problem
Use Case:
Only allow querying collection hosted in chroma server running remotely for similarity search. The assumption is that the triple (tenant, db, collection) will always exist and the client will always pass the right values that already exists in db. If not, we err out.
Problem:
We are trying to integrate the Chroma db server into an application. We use chroma's langchain plugin for client side testing and wish to support client side integration with Langchain with limited access to chroma server.
The problem is that we don't want to expose all the api endpoints of chroma server and only are exposing the following in our app ingress rules:
(Note: We are not exposing Create Collection endpoint)
This works great when using pure chromadb way as shown below. Assuming that the collection "demo" is already created before. The code only uses the 4 api calls as mentioned above.
However, by default the langchain plugin tries to create a collection by defaulting to get_or_create to true and thus errs out as we are not exposing the Collection create api.
Describe the proposed solution
We should have an option to set get_or_create to false.
Alternatives considered
No response
Importance
i cannot use Chroma without it
Additional Information
No response
The text was updated successfully, but these errors were encountered: