[Bug]: Milvus Cluster encounters rpc error: code = DeadlineExceeded desc = context deadline exceeded probably when loading partitions #32763

ganderaj · 2024-05-05T20:51:46Z

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version: v2.3.1
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): Pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): v2.3.6 or above
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Our Milvus cluster which is built on AWS EKS did not receive any changes and neither did the client application. However, lately we have noticed an issue which is sporadic in nature where application trying to load a partition hangs up with an error message:

ERROR:pymilvus.decorators:grpc RpcError: [load_partitions], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2024-04-29 14:43:46.221321', 'gRPC error': '2024-04-29 14:44:46.222777'}>
ERROR:app.milvus_helper:Error in get_context Milvus:  <_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "UNKNOWN:Deadline Exceeded {created_time:"2024-04-29T14:44:46.222455187+00:00", grpc_status:4}"

It has been observed on Milvus that it would log a similar error on its QueryCoord:

[ERROR] [querycoordv2/server.go:186] ["QueryCoord init failed"] [error="stack trace: /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:485 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:499 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/rootcoord/client/client.go:120 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.wrapGrpcCall[...]\n/go/src/github.com/milvus-io/milvus/internal/distributed/rootcoord/client/client.go:196 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).DescribeCollection\n/go/src/github.com/milvus-io/milvus/internal/querycoordv2/meta/coordinator_broker.go:76 github.com/milvus-io/milvus/internal/querycoordv2/meta.(*CoordinatorBroker).GetCollectionSchema\n/go/src/github.com/milvus-io/milvus/internal/querycoordv2/meta/collection_manager.go:133 github.com/milvus-io/milvus/internal/querycoordv2/meta.(*CollectionManager).Recover\n/go/src/github.com/milvus-io/milvus/internal/querycoordv2/server.go:311 github.com/milvus-io/milvus/internal/querycoordv2.(*Server).initMeta\n/go/src/github.com/milvus-io/milvus/internal/querycoordv2/server.go:229 github.com/milvus-io/milvus/internal/querycoordv2.(*Server).initQueryCoord\n/go/src/github.com/milvus-io/milvus/internal/querycoordv2/server.go:185 github.com/milvus-io/milvus/internal/querycoordv2.(*Server).Init.func1: attempt #0: rpc error: code = DeadlineExceeded desc = context deadline exceeded: context done during sleep after run#0: context deadline exceeded"] [stack="github.com/milvus-io/milvus/internal/querycoordv2.(*Server).Init.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/server.go:186\ngithub.com/milvus-io/milvus/internal/util/sessionutil.(*Session).ProcessActiveStandBy\n\t/go/src/github.com/milvus-io/milvus/internal/util/sessionutil/session_util.go:1031\ngithub.com/milvus-io/milvus/internal/querycoordv2.(*Server).Register\n\t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/server.go:139\ngithub.com/milvus-io/milvus/internal/distributed/querycoord.(*Server).start\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/service.go:262\ngithub.com/milvus-io/milvus/internal/distributed/querycoord.(*Server).Run\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/service.go:100\ngithub.com/milvus-io/milvus/cmd/components.(*QueryCoord).Run\n\t/go/src/github.com/milvus-io/milvus/cmd/components/query_coord.go:53\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:112"]

We have investigated the resource utilization of Milvus and found it is barely used (~5%) and every Pod / Deployment is in healthy status. Though it is in our roadmap to upgrade our Milvus to one of the latest releases, we would like to understand the root cause and how do we ensure we don't encounter the same post upgrade.

Expected Behavior

Loading a Partition and performing Queries would go through with minimal latency and of course with no issues.

Steps To Reproduce

We are dealing with a Milvus Database which comprise of:
- Collections: 2144
- Partitions: 14643
- Vectors: 43713

Please note that the distribution of vectors and partitions across collections is not uniform and it is our understanding that `partition.load(timeout=60)` leads to this error message.

Milvus Log

Issue appears to be sporadic in nature and at times milvus-logs do not indicate error message but the client SDK receives error messages. Attached are milvus-prod-logs.tar.gz which are taken on May-05 for the past 2days.

Anything else?

No response

The text was updated successfully, but these errors were encountered:

yanliang567 · 2024-05-06T01:04:41Z

/assign @congqixia
/unassign

xiaofan-luan · 2024-05-06T01:37:05Z

We are dealing with a Milvus Database which comprise of:

Collections: 2144
Partitions: 14643
Vectors: 43713

so you are saying the cluster has 2144 collections and 14643 partitions? but only 43000 vectors?
is there a reason we need to create so many collections but not use partition key to split the data

xiaofan-luan · 2024-05-06T01:40:28Z

from the log you offered, I don't see any DEADLINE_EXCEEDED logs. and the only problem is too many collections and partitions

ganderaj · 2024-05-06T03:39:44Z

from the log you offered, I don't see any DEADLINE_EXCEEDED logs. and the only problem is too many collections and partitions

Yes, as mentioned earlier, issue is sporadic in nature and hence it has become difficult to triage. Please find logs from May-02, 0502-logs.zip which includes several DeadlineExceeded errors.

ganderaj · 2024-05-06T03:41:45Z

We are dealing with a Milvus Database which comprise of:

Collections: 2144

Partitions: 14643

Vectors: 43713

so you are saying the cluster has 2144 collections and 14643 partitions? but only 43000 vectors? is there a reason we need to create so many collections but not use partition key to split the data

We understand this might not be a typical use-case of Milvus DB, however, it is per our application design that we have to go with multiple collections and partitions for relatively few vectors. Please be noted that is a PRD application which has been working with no issues since at least 5mo.

xiaofan-luan · 2024-05-06T03:53:53Z

We are dealing with a Milvus Database which comprise of:

Collections: 2144

Partitions: 14643

Vectors: 43713

so you are saying the cluster has 2144 collections and 14643 partitions? but only 43000 vectors? is there a reason we need to create so many collections but not use partition key to split the data

We understand this might not be a typical use-case of Milvus DB, however, it is per our application design that we have to go with multiple collections and partitions for relatively few vectors. Please be noted that is a PRD application which has been working with no issues since at least 5mo.

you can try to put data in single collection and use partition key feature and filtering on the datas.
see https://milvus.io/docs/multi_tenancy.md

Filtering can help to reduce meta information. the collection number and partition number can not be increased unlimitedly.

We recommend to have less than 10000 collections and less than 65536 partition * collections

ganderaj · 2024-05-06T04:54:17Z

Appreciate for sharing your recommendations. We have considered Milvus Limits while designing our application to ensure we do not hit any hard limits.

Can you please comment on what could have possibly led our platform to encounter such DeadlineExceeded errors while our cluster reports indicate every K8s asset is healthy? Can you please educate us on what does this error message actually mean and how do we avoid it in future?

ganderaj · 2024-05-06T05:07:00Z

We recommend to have less than 10000 collections and less than 65536 partition * collections

By this statement, you are recommending that we keep the product of total partitions and collections under 65536 while collections alone under 10K. Can you please confirm ?

xiaofan-luan · 2024-05-06T05:34:49Z

yes. and the best practice is keep even less than that

xiaofan-luan · 2024-05-06T05:38:38Z

we are working on this, we are working on 10000 collections campaign to improve the performance

ganderaj · 2024-05-06T13:04:12Z

Any insights on the error message and what's causing it?

It appears Milvus is indeed not working as expected and we would like to seek your support in learning of root cause. We have received bunch of these errors while attempting to partition.load(timeout=60) while a few of them actually succeeded. Included below is a snippet from Grafana Dashboard indicating major proportion of load requests ended up failing.

xiaofan-luan · 2024-05-06T13:13:34Z

didn't load timeout?

ganderaj · 2024-05-06T13:21:23Z

Yes, our clients using PyMilvus SDK's received below error message:

grpc RpcError: [load_partitions], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2024-05-06 11:38:10.918914', 'gRPC error': '2024-05-06 11:39:10.932805'}>

However, unlike in the past, Milvus didn't log any errors today which you may find at 0506-milvus-prod-logs.tar.gz taken for the past 12h.

ganderaj · 2024-05-06T19:41:52Z

We have been receiving similar error messages while attempting to load partitions. However, the occurrence of this error message is sporadic in nature.

{
    "log": "2024-05-06T19:39:47.138200174Z stdout F [2024/05/06 19:39:47.137 +00:00] [ERROR] [retry/retry.go:42] [\"retry func failed\"] [traceID=8fa85e754e7374538c2675ddd045bf6d] [\"retry time\"=0] [error=\"rpc error: code = Canceled desc = context canceled\"] [stack=\"github.com/milvus-io/milvus/pkg/util/retry.Do\\n\\t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:42\\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).call\\n\\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:405\\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call\\n\\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:483\\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\\n\\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:499\\ngithub.com/milvus-io/milvus/internal/distributed/querycoord/client.wrapGrpcCall[...]\\n\\t/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:109\\ngithub.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).LoadPartitions\\n\\t/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:197\\ngithub.com/milvus-io/milvus/internal/proxy.(*loadPartitionsTask).Execute\\n\\t/go/src/github.com/milvus-io/milvus/internal/proxy/task.go:1694\\ngithub.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\\n\\t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:457\\ngithub.com/milvus-io/milvus/internal/proxy.(*taskScheduler).definitionLoop\\n\\t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:484\"]",
    "kubernetes": {
        "pod_name": "milvus-proxy-85bd7dbd76-79pmd",
        "namespace_name": "milvus",
        "pod_id": "433a2fb8-f5e0-45ff-824b-f1f93815c005",
        "host": "ip-10-246-13-233.xxx.com",
        "container_name": "proxy",
        "docker_id": "458179f521efb951943b782facd495d1ccb45148bdf57d714db8659d6aec6e37",
        "container_hash": "docker.io/milvusdb/milvus@sha256:9f37c7100c44be8e7419c55f285c2963ee4d25c71951a2325614a4565a03d7e6",
        "container_image": "docker.io/milvusdb/milvus:v2.3.1"
    }
}```

ganderaj · 2024-05-07T19:56:34Z

FYI @xiaofan-luan

xiaofan-luan · 2024-05-08T01:55:33Z

we need full log of querynode and querycoord to understand what happens.

this is how the log how be collected
https://github.com/milvus-io/milvus/blob/5038036ece044fe605de9323b0c9967cb4f20c57/deployments/export-log/README.md?plain=1

and I would say it's very dangerous to maintain too much collection and partition in one cluster and this is not recommended

xiaofan-luan · 2024-05-08T01:56:16Z

if you can check querynode warn log you may get some ideas about why the load fails.

ganderaj · 2024-05-08T20:25:06Z

Hi @xiaofan-luan

You may find milvus-logs 0508-milvus-logs.tar.gz taken today with since argument set to all. Please refer to the QueryCoord from pod_log and pod_log_previous to find the deadline exceeded errors which occur when we try to load a partition.

FYI, all the logs that I have shared earlier in the thread were captured using export-milvus-log.sh

ganderaj · 2024-05-09T14:33:24Z

@xiaofan-luan

We are seeing a lot of error messages on milvus-pulsar-recovery. Just wanted to check on your thoughts based on the error:

14:30:00.040 [BookKeeperClientWorker-OrderedExecutor-1-0] ERROR org.apache.bookkeeper.proto.BookkeeperInternalCallbacks - Error in multi callback : -1
14:30:00.041 [BookKeeperClientWorker-OrderedExecutor-1-0] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Read for failed on bookie milvus-pulsar-bookie-1.milvus-pulsar-bookie.milvus.svc.cluster.local:3181 code EIO
14:30:00.041 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO  org.apache.bookkeeper.client.PendingReadOp - Error: Error while reading ledger while reading L838481 E119021 from bookie: milvus-pulsar-bookie-1.milvus-pulsar-bookie.milvus.svc.cluster.local:3181
14:30:00.041 [BookKeeperClientWorker-OrderedExecutor-1-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L838481 E119021-E119021, Sent to [milvus-pulsar-bookie-1.milvus-pulsar-bookie.milvus.svc.cluster.local:3181, milvus-pulsar-bookie-3.milvus-pulsar-bookie.milvus.svc.cluster.local:3181], Heard from [] : bitset = {}, Error = 'Error while reading ledger'. First unread entry is (-1, rc = null)
14:30:00.041 [BookKeeperClientWorker-OrderedExecutor-1-0] ERROR org.apache.bookkeeper.client.LedgerFragmentReplicator - BK error reading ledger entry: 119021
org.apache.bookkeeper.client.BKException$BKReadException: Error while reading ledger
	at org.apache.bookkeeper.client.BKException.create(BKException.java:62) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3.jar:4.14.3]
	at org.apache.bookkeeper.client.LedgerFragmentReplicator$3.readComplete(LedgerFragmentReplicator.java:364) [org.apache.bookkeeper-bookkeeper-server-4.14.3.jar:4.14.3]
	at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:820) [org.apache.bookkeeper-bookkeeper-server-4.14.3.jar:4.14.3]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38) [org.apache.bookkeeper-bookkeeper-common-4.14.3.jar:4.14.3]
	at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26) [org.apache.bookkeeper-bookkeeper-common-4.14.3.jar:4.14.3]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) [?:?]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) [?:?]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.72.Final.jar:4.1.72.Final]
	at java.lang.Thread.run(Thread.java:829) [?:?]
14:30:00.041 [BookKeeperClientWorker-OrderedExecutor-1-0] ERROR

Considering Error Code: EIO an indicator for Input/Output errors, I have investigated into the disk / file system failures, but everything seems to be alright. I have used bookkeeper admin took from the pulsar-bookie pods to validate whether the Legders indeed exist. They show on the listledgers but throw the same error message when attempted to read using readledger.

xiaofan-luan · 2024-05-09T14:50:12Z

maybe restart bookie and check if the service recovered?
did you got a chance to check the io usage for pulsar

xiaofan-luan · 2024-05-09T14:50:21Z

ioutil and disktuil

ganderaj · 2024-05-09T14:55:37Z

I do not see any errors logged on pulsar-bookie pods but only on pulsar-recovery. I have scaled up and up-sized pulsar-recovery and restarted both bookies and recovery. No help.

We are on AWS; EBS Average Queue Length is barely 0.2-0.5 in the past 2 weeks. Can you inspect Error code and confirm that we are dealing with a disk IO issue or anything else ?

ganderaj · 2024-05-09T14:57:00Z

I believe these errors are linked to our degraded performance on our application. It takes around 30s-40s and sometime over 2 minutes to flush a partition.

ganderaj · 2024-05-09T15:05:27Z

@xiaofan-luan

Please refer to the ERROR messages in STDOUT for listledger and readledger outputs from one of the bookie pod attached.

bookkeeper_shell.txt

xiaofan-luan · 2024-05-09T15:05:56Z

from the log you offered, everything on milvus seems to be working as expected.
the reason why flush is slow might be related to too many partitions and segments.

you can verify this by pprof the coordinator node and see how much cpu usage does it cost.

xiaofan-luan · 2024-05-09T15:10:26Z

we fixed many of the issues on 2.4.1， like #32831

ganderaj · 2024-05-09T15:14:54Z

I'm guessing you are referring to DataCoord, are you? I haven't used pprof but I gathered metrics from CloudWatch and Prometheus which indicate we are using good amount of CPU and Mem across most of our xxCoord and Nodes, but they are not even close to what's available for them. We have a beefy environment with EKS Node on r6i.4xlarge.

ganderaj · 2024-05-09T15:19:59Z

@xiaofan-luan

Initially when I opened this discussion, our application was on Milvus v2.3.1. Considering the performance issues we have moved to our application to our passive Milvus deployment on v2.3.10. Doesn't 2.3.10 have this bugfix / new design?

Fundamentally, we are trying to understand what's wrong. That'll help us ensure we don't hit similar issue on v2.4.1.

xiaofan-luan · 2024-05-09T15:30:02Z

query, datacoord, could spent tons of cpus.
2.3.14 already made some progress on reducing the cpu, 2.4.1 has more optimization.

what is the current performance problem you are facing?

ganderaj · 2024-05-09T15:40:34Z

@xiaofan-luan

Flushing a partition with less than 15 vectors takes around 40s and sometime 270s. CPU and Memory utilization for DataCoord / Nodes and bookies indicate they are not maximally used than provisioned.

Perhaps you may review our configuration on the bookies' Pods and JVM and share optimal config. I am concerned with bookies since they are the only components which current throw error messages and flushing operation is related to bookies. Let me know if you think otherwise.

Your help is greatly appreciated.

ganderaj · 2024-05-10T07:21:14Z

@xiaofan-luan

I have deployed Milvus v2.4.1 and currently in the process of migration our PRD workloads to this new environment. I'll keep you posted on how it goes.

Thanks for your support.

moinsaj · 2024-05-10T12:03:49Z

We are dealing with a Milvus Database which comprise of:

Collections: 2144

Partitions: 14643

Vectors: 43713

so you are saying the cluster has 2144 collections and 14643 partitions? but only 43000 vectors? is there a reason we need to create so many collections but not use partition key to split the data

We understand this might not be a typical use-case of Milvus DB, however, it is per our application design that we have to go with multiple collections and partitions for relatively few vectors. Please be noted that is a PRD application which has been working with no issues since at least 5mo.

you can try to put data in single collection and use partition key feature and filtering on the datas. see https://milvus.io/docs/multi_tenancy.md

Filtering can help to reduce meta information. the collection number and partition number can not be increased unlimitedly.

We recommend to have less than 10000 collections and less than 65536 partition * collections

@xiaofan-luan When you say less than 65536 partition * collections, do you mean physical partitions or does it apply to partition-key based storage too.

ganderaj · 2024-05-15T16:11:18Z

https://milvus.io/docs/multi_tenancy.md

Once you set the partition key for a collection, you cannot bulk-insert entities into the collection.

@xiaofan-luan / @moinsaj

Considering our requirements for our CVP Stack, we are left with choosing "Partition-key-based" approach which supports our requirements for the maximum number of tenants.

It has been learned that bulk-insert entities are not supported for collections with Partition Key enabled. We rely on Milvus-backup to back up our databases for disaster recovery purposes. I believe the backup tool makes use of bulk-insert for the restore operation.

Could you please comment on whether the backup tool would work should we attempt to back up/restore a collection with a Partition-Key enabled? Please advise on any alternatives if it doesn't.

yanliang567 · 2024-05-16T01:40:52Z

@ganderaj in latest Milvus release v2.3.15 or v2.4.x, you can bulk insert into collevtion with parition key eabled. We will update the doc https://milvus.io/docs/multi_tenancy.md immediately. BTW, you can also do backup and restore with backup tool with partition key enabled in latest release.

ganderaj added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 5, 2024

ganderaj assigned yanliang567 May 5, 2024

sre-ci-robot assigned congqixia and unassigned yanliang567 May 6, 2024

yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 6, 2024

yanliang567 added this to the 2.3.15 milestone May 6, 2024

yanliang567 modified the milestones: 2.3.15, 2.3.16 May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Milvus Cluster encounters rpc error: code = DeadlineExceeded desc = context deadline exceeded probably when loading partitions #32763

[Bug]: Milvus Cluster encounters rpc error: code = DeadlineExceeded desc = context deadline exceeded probably when loading partitions #32763

ganderaj commented May 5, 2024

yanliang567 commented May 6, 2024

xiaofan-luan commented May 6, 2024

xiaofan-luan commented May 6, 2024

ganderaj commented May 6, 2024

ganderaj commented May 6, 2024

xiaofan-luan commented May 6, 2024

ganderaj commented May 6, 2024

ganderaj commented May 6, 2024

xiaofan-luan commented May 6, 2024

xiaofan-luan commented May 6, 2024

ganderaj commented May 6, 2024 •

edited

xiaofan-luan commented May 6, 2024

ganderaj commented May 6, 2024

ganderaj commented May 6, 2024

ganderaj commented May 7, 2024

xiaofan-luan commented May 8, 2024

xiaofan-luan commented May 8, 2024

ganderaj commented May 8, 2024

ganderaj commented May 9, 2024

xiaofan-luan commented May 9, 2024

xiaofan-luan commented May 9, 2024

ganderaj commented May 9, 2024

ganderaj commented May 9, 2024

ganderaj commented May 9, 2024 •

edited

xiaofan-luan commented May 9, 2024

xiaofan-luan commented May 9, 2024

ganderaj commented May 9, 2024

ganderaj commented May 9, 2024

xiaofan-luan commented May 9, 2024

ganderaj commented May 9, 2024

ganderaj commented May 10, 2024

moinsaj commented May 10, 2024

ganderaj commented May 15, 2024

yanliang567 commented May 16, 2024

[Bug]: Milvus Cluster encounters rpc error: code = DeadlineExceeded desc = context deadline exceeded probably when loading partitions #32763

[Bug]: Milvus Cluster encounters rpc error: code = DeadlineExceeded desc = context deadline exceeded probably when loading partitions #32763

Comments

ganderaj commented May 5, 2024

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

yanliang567 commented May 6, 2024

xiaofan-luan commented May 6, 2024

xiaofan-luan commented May 6, 2024

ganderaj commented May 6, 2024

ganderaj commented May 6, 2024

xiaofan-luan commented May 6, 2024

ganderaj commented May 6, 2024

ganderaj commented May 6, 2024

xiaofan-luan commented May 6, 2024

xiaofan-luan commented May 6, 2024

ganderaj commented May 6, 2024 • edited

xiaofan-luan commented May 6, 2024

ganderaj commented May 6, 2024

ganderaj commented May 6, 2024

ganderaj commented May 7, 2024

xiaofan-luan commented May 8, 2024

xiaofan-luan commented May 8, 2024

ganderaj commented May 8, 2024

ganderaj commented May 9, 2024

xiaofan-luan commented May 9, 2024

xiaofan-luan commented May 9, 2024

ganderaj commented May 9, 2024

ganderaj commented May 9, 2024

ganderaj commented May 9, 2024 • edited

xiaofan-luan commented May 9, 2024

xiaofan-luan commented May 9, 2024

ganderaj commented May 9, 2024

ganderaj commented May 9, 2024

xiaofan-luan commented May 9, 2024

ganderaj commented May 9, 2024

ganderaj commented May 10, 2024

moinsaj commented May 10, 2024

ganderaj commented May 15, 2024

yanliang567 commented May 16, 2024

ganderaj commented May 6, 2024 •

edited

ganderaj commented May 9, 2024 •

edited