Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community[minor]: Improve Azure Cosmos DB vector store support #5197

Merged
merged 11 commits into from May 13, 2024

Conversation

sinedied
Copy link
Contributor

This PR improves the existing Azure Cosmos DB vector store in the following ways:

  • Add automatic index creation
  • Add automatic embeddings length detection when not provided
  • Allow to delete documents using mongoDB filters

It does not include any breaking changes.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 24, 2024
Copy link

vercel bot commented Apr 24, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchainjs-api-refs ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 13, 2024 7:38pm
langchainjs-docs ✅ Ready (Inspect) Visit Preview May 13, 2024 7:38pm

@dosubot dosubot bot added the auto:improvement Medium size change to existing code to handle new use-cases label Apr 24, 2024
@@ -48,13 +48,18 @@ describe.skip("AzureCosmosDBVectorStore", () => {
process.env.AZURE_COSMOSDB_CONNECTION_STRING!
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there! I've reviewed the code and noticed that the recent changes explicitly access an environment variable using process.env. I've flagged this for your review to ensure it aligns with our environment variable handling practices. Let me know if you have any questions or need further clarification.

similarity: AzureCosmosDBSimilarityType = AzureCosmosDBSimilarityType.COS
): Promise<void> {
await this.initPromise;
await this.connectPromise;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is hypothetically called before connectPromise is initialized, could we throw instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah never mind, I see below. Could we always just rely on this.initPromise instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since init calls createIndex we need 2 different promise there otherwise there's an interlocking:

  • connectPromise => create DB + collections clients
  • initPromise => connectPromise + createIndex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the throw question, the issue is that the promise is created in the constructor and there's no way for the user to wait for it or know when the connect task is done.

@jacoblee93 jacoblee93 added the close PRs that need one or two touch-ups to be ready label Apr 26, 2024
Copy link
Collaborator

@bracesproul bracesproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks solid, couple comments. Please request my review once implemented, these will be great to get released!

@jacoblee93 jacoblee93 added lgtm PRs that are ready to be merged as-is and removed close PRs that need one or two touch-ups to be ready labels May 13, 2024
@jacoblee93 jacoblee93 merged commit 2b3b194 into langchain-ai:main May 13, 2024
17 checks passed
@jacoblee93
Copy link
Collaborator

Thank you!

bracesproul added a commit that referenced this pull request May 13, 2024
* core[minor]: RunnableLambda should consume (async) iterator if the wrapped function returns one (#5342)

* core: RunnableLambda should consume async iterator if the wrapped function returns one

* Consume iterators too

* Add tests

* Dont interpret arrays/sets/etc as iterators

* Implement in invoke too

* Fix async storage propagation

* Handle any async iterable

* Add more tests

* community[minor]: Improve Azure Cosmos DB vector store support (#5197)

* feat: add delete by filter

* feat: return added document ids

* fix: delete by id

* test: update integration tests

* feat: add automatic index creation

* Update azure_cosmosdb.ts

* refactor: separate ids and filter params for delete()

* refactor: use a single param for delete

* test: fix unit tests

* Address feedback

---------

Co-authored-by: Jacob Lee <[email protected]>

* Revert "Merge branch 'v0.1' into main" (#5345)

This reverts commit db5ab3f, reversing
changes made to 2b3b194.

---------

Co-authored-by: Nuno Campos <[email protected]>
Co-authored-by: Yohan Lasorsa <[email protected]>
Co-authored-by: Jacob Lee <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:improvement Medium size change to existing code to handle new use-cases lgtm PRs that are ready to be merged as-is size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants