Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(langchain4j-milvus): MilvusEmbeddingStore supports configure required index parameters #931

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Glarme
Copy link
Contributor

@Glarme Glarme commented Apr 12, 2024

Context

Added support for configure common custom index parameters

Fix: #860

Change

The constructor of MilvusEmbeddingStore now require a parameter of type IndexParam and provides parameter models required for common type indexes.

Checklist

Before submitting this PR, please check the following points:

  • I have added unit and integration tests for my change
  • All unit and integration tests in the module I have added/changed are green
  • All unit and integration tests in the core and main modules are green
  • I have added/updated the documentation
  • I have added an example in the examples repo (only for "big" features)
  • I have added my new module in the BOM (only when a new module is added)

Checklist for adding new embedding store integration

  • I have added a {NameOfIntegration}EmbeddingStoreIT that extends from either EmbeddingStoreIT or EmbeddingStoreWithFilteringIT

…ent builder

This commit addresses an issue where the baseUrl in the builder of ZhipuAiClient was not properly configured in Module langchain4j-zhipu-ai. This misconfiguration could potentially lead to confusion or errors when using the client directly. By correcting the baseUrl configuration, this commit ensures consistent and accurate behavior of the client.
…uired index parameters

Fix: [langchain4j#860](langchain4j#860)

Added support for configure common custom index parameters

BREAKING CHANGE: The constructor of MilvusEmbeddingStore now require a parameter of type IndexParam.
@Glarme
Copy link
Contributor Author

Glarme commented Apr 12, 2024

@langchain4j Is there a configured milvus instance available for CI unit testing? Attempting to connect using MILVUS_URI and MILVUS_API_KEY retrieved from environment variables seems to fail.

@jiangsier-xyz
Copy link
Contributor

@langchain4j Is there a configured milvus instance available for CI unit testing? Attempting to connect using MILVUS_URI and MILVUS_API_KEY retrieved from environment variables seems to fail.

You can use the @EnabledIfEnvironmentVariable annotation to conditionally enable a test class based on an environment variable. For example:

@EnabledIfEnvironmentVariable(named = "MILVUS_API_KEY", matches = ".+")
class MilvusEmbeddingStoreCloudIT extends EmbeddingStoreWithFilteringIT {
    // Test class content
}

Additionally, I suggest that MilvusEmbeddingStore should employ the 'upsert' operation instead of 'insert', similar to the approach taken by PgVector, Pinecone, and AstraDb. This change would help to alleviate issues arising from id conflicts. Furthermore, @langchain4j the EmbeddingStore interface contains a method public void add(String id, Embedding embedding). Would it be possible to include a public void add(String id, Embedding embedding, TextSegment textSegment), that enables the upserting of a record along with a text field?

@Glarme
Copy link
Contributor Author

Glarme commented Apr 13, 2024

@langchain4j Is there a configured milvus instance available for CI unit testing? Attempting to connect using MILVUS_URI and MILVUS_API_KEY retrieved from environment variables seems to fail.

You can use the @EnabledIfEnvironmentVariable annotation to conditionally enable a test class based on an environment variable. For example:

@EnabledIfEnvironmentVariable(named = "MILVUS_API_KEY", matches = ".+")
class MilvusEmbeddingStoreCloudIT extends EmbeddingStoreWithFilteringIT {
    // Test class content
}

Additionally, I suggest that MilvusEmbeddingStore should employ the 'upsert' operation instead of 'insert', similar to the approach taken by PgVector, Pinecone, and AstraDb. This change would help to alleviate issues arising from id conflicts. Furthermore, @langchain4j the EmbeddingStore interface contains a method public void add(String id, Embedding embedding). Would it be possible to include a public void add(String id, Embedding embedding, TextSegment textSegment), that enables the upserting of a record along with a text field?

@langchain4j Is there a configured milvus instance available for CI unit testing? Attempting to connect using MILVUS_URI and MILVUS_API_KEY retrieved from environment variables seems to fail.

You can use the @EnabledIfEnvironmentVariable annotation to conditionally enable a test class based on an environment variable. For example:

@EnabledIfEnvironmentVariable(named = "MILVUS_API_KEY", matches = ".+")
class MilvusEmbeddingStoreCloudIT extends EmbeddingStoreWithFilteringIT {
    // Test class content
}

Additionally, I suggest that MilvusEmbeddingStore should employ the 'upsert' operation instead of 'insert', similar to the approach taken by PgVector, Pinecone, and AstraDb. This change would help to alleviate issues arising from id conflicts. Furthermore, @langchain4j the EmbeddingStore interface contains a method public void add(String id, Embedding embedding). Would it be possible to include a public void add(String id, Embedding embedding, TextSegment textSegment), that enables the upserting of a record along with a text field?

This PR mainly aims to add parameters for Index creation. Collection and index creation are performed within the constructor of MilvusEmbeddingStore, contingent upon the collection not existing. Without a Milvus instance, unit tests cannot reach the location of this modification. If skipping the test based on conditions using the @EnabledIfEnvironmentVariable method without a Milvus instance, then this unit test would serve no purpose.

Additionally, regarding modifications to other methods in MilvusEmbeddingStore, it might be worth opening a separate issue for discussion. In addition to your suggestions, I also propose adding a vector deletion interface based on ID or filters to EmbeddingStore, enabling CRUD operations on vectors to be performed within EmbeddingStore. For updates, this can be achieved through a combination of deletion and creation, eliminating the need to create a separate client for maintaining this data functionality.

@Glarme
Copy link
Contributor Author

Glarme commented Apr 19, 2024

The Milvus instance in unit testing has been adjusted to be provided by @Testcontainers, and all unit tests have passed. It's worth noting that configuring Milvus to use @Testcontainers requires additional settings, as shown below.

MilvusContainer milvus = new MilvusContainer("milvusdb/milvus:v2.3.1")
            .withCreateContainerCmdModifier(cmd -> {
                cmd.withHostConfig(HostConfig.newHostConfig()
                        // This security opts needs to be configured;
                        // otherwise, the container will fail to start.
                        // Refer to the parameters from https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh.
                        .withSecurityOpts(Collections.singletonList("seccomp=unconfined"))
                        // By default, the Milvus Docker image doesn't expose the target port.
                        // Manual binding is required to expose it;
                        // otherwise, it will result in a failed detection during the test container inspection,
                        // leading to an inability to connect to the Milvus instance.
                        .withPortBindings(PortBinding.parse("19530:19530"), PortBinding.parse("9091:9091")
                        )
                );
            });

@langchain4j
Copy link
Owner

@langchain4j Is there a configured milvus instance available for CI unit testing? Attempting to connect using MILVUS_URI and MILVUS_API_KEY retrieved from environment variables seems to fail.

No, not for the main CI yet.
Only for releases: https://github.com/langchain4j/langchain4j/blob/main/.github/workflows/release.yaml#L53

@langchain4j langchain4j added the P3 Medium priority label Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 Medium priority
Projects
None yet
3 participants