Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading collection console log keeps looping with errors in milvus-sdk-java 2.4.0 #880

Open
CSi-CJ opened this issue Apr 26, 2024 · 3 comments

Comments

@CSi-CJ
Copy link

CSi-CJ commented Apr 26, 2024

Problem Description

When calling the loadCollection method after creating a collection with milvus-sdk-java 2.4.0, the MilvusServiceClient keeps executing a loading loop and throwing errors. Weirdly, milvus-attu shows that the collection has been loaded, but the console log keeps looping with errors.
image
image

Error Log

2024-04-26T15:31:33.347+08:00 RID-b04b984f-fb0f-44d6-9ca7-780db24edb53  WARN 34720 --- [nio-8443-exec-1] i.m.client.AbstractMilvusGrpcClient      : Retry(6) with interval 2430ms. Reason: CANCELLED: Failed to read message.
2024-04-26T15:31:35.806+08:00 RID-b04b984f-fb0f-44d6-9ca7-780db24edb53 ERROR 34720 --- [nio-8443-exec-1] i.m.client.AbstractMilvusGrpcClient      : LoadCollectionRequest collectionName:Entity_100000001_Multi_Vector_3cf4e5916b0549b7ab79d6c0b71be4ce RPC failed! Exception:{}

io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:275) ~[grpc-stub-1.57.2.jar:1.57.2]
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:256) ~[grpc-stub-1.57.2.jar:1.57.2]
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:169) ~[grpc-stub-1.57.2.jar:1.57.2]
	at io.milvus.grpc.MilvusServiceGrpc$MilvusServiceBlockingStub.showCollections(MilvusServiceGrpc.java:4073) ~[milvus-sdk-java-2.4.0.jar:na]
	at io.milvus.client.AbstractMilvusGrpcClient.waitForLoadingCollection(AbstractMilvusGrpcClient.java:94) ~[milvus-sdk-java-2.4.0.jar:na]
	at io.milvus.client.AbstractMilvusGrpcClient.loadCollection(AbstractMilvusGrpcClient.java:565) ~[milvus-sdk-java-2.4.0.jar:na]
	at io.milvus.client.MilvusServiceClient.lambda$loadCollection$8(MilvusServiceClient.java:454) ~[milvus-sdk-java-2.4.0.jar:na]
	at io.milvus.client.MilvusServiceClient.retry(MilvusServiceClient.java:290) ~[milvus-sdk-java-2.4.0.jar:na]
	at io.milvus.client.MilvusServiceClient.loadCollection(MilvusServiceClient.java:454) ~[milvus-sdk-java-2.4.0.jar:na]
	at com.ot.ais.service.search.data.impl.MilvusDatabaseServiceImpl.loadCollection(MilvusDatabaseServiceImpl.java:250) ~[classes/:na]
	at com.ot.ais.service.search.data.impl.MilvusDatabaseServiceImpl.createIndexesAndLoadCollection(MilvusDatabaseServiceImpl.java:151) ~[classes/:na]
	at com.ot.ais.service.search.data.impl.MilvusDatabaseServiceImpl.createCollection(MilvusDatabaseServiceImpl.java:131) ~[classes/:na]

Environment

  • milvus-sdk-java version: 2.4.0
  • JDK version: 17
  • Operating System: windows

Steps to Reproduce

  1. define method loadCollection:
    public R<RpcStatus> loadCollection(String collectionName) {
        return milvusServiceClient.loadCollection(
          LoadCollectionParam.newBuilder()
            .withCollectionName(collectionName)
            .build()
        );
    }
  1. invoke loadCollection()

Expected Behavior

The collection should be loaded successfully without looping errors.

Additional Information

Hope someone can help me resolve this issue.

Additionally, I noticed that the MilvusServiceClient has a default retry mechanism for almost every database interaction with private int maxRetryTimes = 75. Why is the retry count set to 75? Is there any specific reason behind this number?

image

@yhmo
Copy link
Contributor

yhmo commented Apr 26, 2024

The retry machinery is consistent with the milvus python sdk which is as-designed: https://github.com/milvus-io/pymilvus/blob/1081c49fcc21039300fec22e7b19805be8f198f0/pymilvus/decorators.py#L42

The loadCollection() calls showCollection() to check loading progress. Seems the showCollection() failed in rpc.

"CANCELLED: Failed to read message" is a GRPC error, it indicates the connection is broken or closed.

@CSi-CJ
Copy link
Author

CSi-CJ commented Apr 28, 2024

Yeah, it seems like grpc connection has crashed. I launched the Milvus standalone cluster in the local Ubuntu environment, the infra is as below:
I almost found the problem where is, cause my Milvus helm chart installs failed, and the query-node pod has not been found. I think maybe reinstalling the Milvus cluster can work normally. please help me confirm whether the cluster status is correct
image

@yhmo
Copy link
Contributor

yhmo commented Apr 29, 2024

The querycoord failed to initialize. Need the full log to know what the error is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants