[Bug]: [benchmark] No `quotaAndLimits` configured, some insert interfaces report error: `quota exceeded[reason=rate type: DMLInsert]` #32719

elstic · 2024-04-30T03:10:23Z

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version: master-20240429-ac82cef0 
- Deployment mode(standalone or cluster):both
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

case: test_concurrent_locust_diskann_compaction_standalone, test_concurrent_locust_diskann_compaction_cluster

server:

fouram-disk-sta13600-3-29-1428-etcd-0                             1/1     Running                       0                  8m35s   10.104.24.171   4am-node29   <none>           <none>
fouram-disk-sta13600-3-29-1428-etcd-1                             1/1     Running                       0                  8m35s   10.104.25.2     4am-node30   <none>           <none>
fouram-disk-sta13600-3-29-1428-etcd-2                             1/1     Running                       0                  8m35s   10.104.34.68    4am-node37   <none>           <none>
fouram-disk-sta13600-3-29-1428-milvus-datacoord-5df45846d7xqj29   1/1     Running                       4 (7m9s ago)       8m35s   10.104.20.188   4am-node22   <none>           <none>
fouram-disk-sta13600-3-29-1428-milvus-datanode-d7dd7799b-8w6dn    1/1     Running                       5 (2m5s ago)       8m35s   10.104.16.38    4am-node21   <none>           <none>
fouram-disk-sta13600-3-29-1428-milvus-indexcoord-7df6fdd6bsr4vn   1/1     Running                       0                  8m35s   10.104.20.189   4am-node22   <none>           <none>
fouram-disk-sta13600-3-29-1428-milvus-indexnode-bb44d659f-wvhh2   1/1     Running                       4 (7m2s ago)       8m35s   10.104.16.39    4am-node21   <none>           <none>
fouram-disk-sta13600-3-29-1428-milvus-proxy-59dfdbd678-ks72f      1/1     Running                       5 (2m5s ago)       8m35s   10.104.30.208   4am-node38   <none>           <none>
fouram-disk-sta13600-3-29-1428-milvus-querycoord-685ff7d6djs9n8   1/1     Running                       5 (2m5s ago)       8m35s   10.104.20.190   4am-node22   <none>           <none>
fouram-disk-sta13600-3-29-1428-milvus-querynode-5c87f496f79pzbk   1/1     Running                       4 (7m11s ago)      8m35s   10.104.20.191   4am-node22   <none>           <none>
fouram-disk-sta13600-3-29-1428-milvus-rootcoord-5889c654d7xhgnf   1/1     Running                       5 (2m21s ago)      8m35s   10.104.16.37    4am-node21   <none>           <none>
fouram-disk-sta13600-3-29-1428-minio-0                            1/1     Running                       0                  8m35s   10.104.24.166   4am-node29   <none>           <none>
fouram-disk-sta13600-3-29-1428-minio-1                            1/1     Running                       0                  8m35s   10.104.25.253   4am-node30   <none>           <none>
fouram-disk-sta13600-3-29-1428-minio-2                            1/1     Running                       0                  8m35s   10.104.23.28    4am-node27   <none>           <none>
fouram-disk-sta13600-3-29-1428-minio-3                            1/1     Running                       0                  8m35s   10.104.32.222   4am-node39   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-bookie-0                    1/1     Running                       0                  8m35s   10.104.32.219   4am-node39   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-bookie-1                    1/1     Running                       0                  8m35s   10.104.18.179   4am-node25   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-bookie-2                    1/1     Running                       0                  8m34s   10.104.24.174   4am-node29   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-bookie-init-mwztn           0/1     Completed                     0                  8m35s   10.104.4.119    4am-node11   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-broker-0                    1/1     Running                       0                  8m35s   10.104.6.192    4am-node13   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-proxy-0                     1/1     Running                       0                  8m35s   10.104.9.48     4am-node14   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-pulsar-init-8jhbf           0/1     Completed                     0                  8m35s   10.104.4.120    4am-node11   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-recovery-0                  1/1     Running                       0                  8m35s   10.104.14.18    4am-node18   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-zookeeper-0                 1/1     Running                       0                  8m35s   10.104.25.251   4am-node30   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-zookeeper-1                 1/1     Running                       0                  6m48s   10.104.24.187   4am-node29   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-zookeeper-2                 1/1     Running                       0                  5m27s   10.104.15.69    4am-node20   <none>           <none> (base.py:257)
[2024-04-29 23:14:22,676 -  INFO - fouram]: [Cmd Exe]  kubectl get pods  -n qa-milvus  -o wide | grep -E 'NAME|fouram-disk-sta13600-3-29-1428-milvus|fouram-disk-sta13600-3-29-1428-minio|fouram-disk-sta13600-3-29-1428-etcd|fouram-disk-sta13600-3-29-1428-pulsar|fouram-disk-sta13600-3-29-1428-zookeeper|fouram-disk-sta13600-3-29-1428-kafka|fouram-disk-sta13600-3-29-1428-log|fouram-disk-sta13600-3-29-1428-tikv'  (util_cmd.py:14)
[2024-04-29 23:14:33,187 -  INFO - fouram]: [CliClient] pod details of release(fouram-disk-sta13600-3-29-1428): 
 I0429 23:14:24.319075    4058 request.go:665] Waited for 1.198730413s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/chaos-mesh.org/v1alpha1?timeout=32s
NAME                                                              READY   STATUS                        RESTARTS          AGE     IP              NODE         NOMINATED NODE   READINESS GATES
fouram-disk-sta13600-3-29-1428-etcd-0                             1/1     Running                       0                 5h10m   10.104.24.171   4am-node29   <none>           <none>
fouram-disk-sta13600-3-29-1428-etcd-1                             1/1     Running                       0                 5h10m   10.104.25.2     4am-node30   <none>           <none>
fouram-disk-sta13600-3-29-1428-etcd-2                             1/1     Running                       0                 5h10m   10.104.34.68    4am-node37   <none>           <none>
fouram-disk-sta13600-3-29-1428-milvus-datacoord-5df45846d7xqj29   1/1     Running                       4 (5h8m ago)      5h10m   10.104.20.188   4am-node22   <none>           <none>
fouram-disk-sta13600-3-29-1428-milvus-datanode-d7dd7799b-8w6dn    1/1     Running                       5 (5h3m ago)      5h10m   10.104.16.38    4am-node21   <none>           <none>
fouram-disk-sta13600-3-29-1428-milvus-indexcoord-7df6fdd6bsr4vn   1/1     Running                       0                 5h10m   10.104.20.189   4am-node22   <none>           <none>
fouram-disk-sta13600-3-29-1428-milvus-indexnode-bb44d659f-wvhh2   1/1     Running                       4 (5h8m ago)      5h10m   10.104.16.39    4am-node21   <none>           <none>
fouram-disk-sta13600-3-29-1428-milvus-proxy-59dfdbd678-ks72f      1/1     Running                       5 (5h3m ago)      5h10m   10.104.30.208   4am-node38   <none>           <none>
fouram-disk-sta13600-3-29-1428-milvus-querycoord-685ff7d6djs9n8   1/1     Running                       5 (5h3m ago)      5h10m   10.104.20.190   4am-node22   <none>           <none>
fouram-disk-sta13600-3-29-1428-milvus-querynode-5c87f496f79pzbk   1/1     Running                       4 (5h8m ago)      5h10m   10.104.20.191   4am-node22   <none>           <none>
fouram-disk-sta13600-3-29-1428-milvus-rootcoord-5889c654d7xhgnf   1/1     Running                       5 (5h4m ago)      5h10m   10.104.16.37    4am-node21   <none>           <none>
fouram-disk-sta13600-3-29-1428-minio-0                            1/1     Running                       0                 5h10m   10.104.24.166   4am-node29   <none>           <none>
fouram-disk-sta13600-3-29-1428-minio-1                            1/1     Running                       0                 5h10m   10.104.25.253   4am-node30   <none>           <none>
fouram-disk-sta13600-3-29-1428-minio-2                            1/1     Running                       0                 5h10m   10.104.23.28    4am-node27   <none>           <none>
fouram-disk-sta13600-3-29-1428-minio-3                            1/1     Running                       0                 5h10m   10.104.32.222   4am-node39   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-bookie-0                    1/1     Running                       0                 5h10m   10.104.32.219   4am-node39   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-bookie-1                    1/1     Running                       0                 5h10m   10.104.18.179   4am-node25   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-bookie-2                    1/1     Running                       0                 5h10m   10.104.24.174   4am-node29   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-bookie-init-mwztn           0/1     Completed                     0                 5h10m   10.104.4.119    4am-node11   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-broker-0                    1/1     Running                       0                 5h10m   10.104.6.192    4am-node13   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-proxy-0                     1/1     Running                       0                 5h10m   10.104.9.48     4am-node14   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-pulsar-init-8jhbf           0/1     Completed                     0                 5h10m   10.104.4.120    4am-node11   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-recovery-0                  1/1     Running                       0                 5h10m   10.104.14.18    4am-node18   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-zookeeper-0                 1/1     Running                       0                 5h10m   10.104.25.251   4am-node30   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-zookeeper-1                 1/1     Running                       0                 5h8m    10.104.24.187   4am-node29   <none>           <none>
fouram-disk-sta13600-3-29-1428-pulsar-zookeeper-2                 1/1     Running                       0                 5h7m    10.104.15.69    4am-node20   <none>           <none

client pod :fouram-disk-stab-1714413600-861861569
client error log:

client result:
About 30% of insert interfaces fail

 'scene_insert_delete_flush': {'Requests': 9785,
           'Fails': 2977,
           'RPS': 0.54,
           'fail_s': 0.3,
           'RT_max': 8708.94,
           'RT_avg': 2614.99,
           'TP50': 3100.0,
           'TP99': 5500.0},

Expected Behavior

Milvus can insert normally and will not be limited.

Steps To Reproduce

1. create a collection 
  2. build an DiskANN index on the vector column
  3. insert 100k vectors
  4. flush collection
  5. build index on vector column with the same parameters  
  6. count the total number of rows
  7. load collection
  8. execute concurrent search, query,load,insert,delete,flush  ==》 fail
  9. step 8 lasts 5h

Milvus Log

No response

Anything else?

No response

The text was updated successfully, but these errors were encountered:

SimFG · 2024-04-30T03:51:09Z

This is mainly due to excessive memory usage.

yanliang567 · 2024-04-30T09:21:36Z

@SimFG but milvus did not enable dml quota and limit, why it reports that error?

/assign @SimFG
/unassign

elstic · 2024-04-30T10:08:59Z

The concurrency test afterward did not trigger the compaction, resulting in too many segments and memory usage growing to nearly 100%.

yanliang567 · 2024-04-30T10:11:34Z

could be related to pr #32326

elstic · 2024-05-06T02:54:10Z

this issue is not fixed.
verify image: master-20240430-5bb672d7
server:

fouram-disk-sta72800-3-40-5704-etcd-0                             1/1     Running       0                5h8m    10.104.23.27    4am-node27   <none>           <none>
fouram-disk-sta72800-3-40-5704-etcd-1                             1/1     Running       0                5h8m    10.104.15.121   4am-node20   <none>           <none>
fouram-disk-sta72800-3-40-5704-etcd-2                             1/1     Running       0                5h8m    10.104.34.187   4am-node37   <none>           <none>
fouram-disk-sta72800-3-40-5704-milvus-datacoord-746568b9c88z2df   1/1     Running       4 (5h6m ago)     5h8m    10.104.25.156   4am-node30   <none>           <none>
fouram-disk-sta72800-3-40-5704-milvus-datanode-6dc8b86fbc-sxg6s   1/1     Running       4 (5h6m ago)     5h8m    10.104.16.153   4am-node21   <none>           <none>
fouram-disk-sta72800-3-40-5704-milvus-indexcoord-56c6d79b4xr885   1/1     Running       0                5h8m    10.104.25.154   4am-node30   <none>           <none>
fouram-disk-sta72800-3-40-5704-milvus-indexnode-5954d9b694h4vpp   1/1     Running       4 (5h6m ago)     5h8m    10.104.21.37    4am-node24   <none>           <none>
fouram-disk-sta72800-3-40-5704-milvus-proxy-64768cd77-fckmh       1/1     Running       4 (5h6m ago)     5h8m    10.104.25.155   4am-node30   <none>           <none>
fouram-disk-sta72800-3-40-5704-milvus-querycoord-5699467b9tcxqf   1/1     Running       4 (5h6m ago)     5h8m    10.104.25.153   4am-node30   <none>           <none>
fouram-disk-sta72800-3-40-5704-milvus-querynode-865c5b87c66vjjn   1/1     Running       4 (5h6m ago)     5h8m    10.104.18.8     4am-node25   <none>           <none>
fouram-disk-sta72800-3-40-5704-milvus-rootcoord-594c9cc978l78t6   1/1     Running       4 (5h6m ago)     5h8m    10.104.25.152   4am-node30   <none>           <none>
fouram-disk-sta72800-3-40-5704-minio-0                            1/1     Running       0                5h8m    10.104.33.109   4am-node36   <none>           <none>
fouram-disk-sta72800-3-40-5704-minio-1                            1/1     Running       0                5h8m    10.104.24.247   4am-node29   <none>           <none>
fouram-disk-sta72800-3-40-5704-minio-2                            1/1     Running       0                5h8m    10.104.34.184   4am-node37   <none>           <none>
fouram-disk-sta72800-3-40-5704-minio-3                            1/1     Running       0                5h8m    10.104.23.30    4am-node27   <none>           <none>
fouram-disk-sta72800-3-40-5704-pulsar-bookie-0                    1/1     Running       0                5h8m    10.104.25.170   4am-node30   <none>           <none>
fouram-disk-sta72800-3-40-5704-pulsar-bookie-1                    1/1     Running       0                5h8m    10.104.23.28    4am-node27   <none>           <none>
fouram-disk-sta72800-3-40-5704-pulsar-bookie-2                    1/1     Running       0                5h8m    10.104.34.185   4am-node37   <none>           <none>
fouram-disk-sta72800-3-40-5704-pulsar-bookie-init-rhlxf           0/1     Completed     0                5h8m    10.104.4.17     4am-node11   <none>           <none>
fouram-disk-sta72800-3-40-5704-pulsar-broker-0                    1/1     Running       0                5h8m    10.104.13.203   4am-node16   <none>           <none>
fouram-disk-sta72800-3-40-5704-pulsar-proxy-0                     1/1     Running       0                5h8m    10.104.4.18     4am-node11   <none>           <none>
fouram-disk-sta72800-3-40-5704-pulsar-pulsar-init-z9nx2           0/1     Completed     0                5h8m    10.104.13.204   4am-node16   <none>           <none>
fouram-disk-sta72800-3-40-5704-pulsar-recovery-0                  1/1     Running       0                5h8m    10.104.14.192   4am-node18   <none>           <none>
fouram-disk-sta72800-3-40-5704-pulsar-zookeeper-0                 1/1     Running       0                5h8m    10.104.25.169   4am-node30   <none>           <none>
fouram-disk-sta72800-3-40-5704-pulsar-zookeeper-1                 1/1     Running       0                5h6m    10.104.23.32    4am-node27   <none>           <none>
fouram-disk-sta72800-3-40-5704-pulsar-zookeeper-2                 1/1     Running       0                5h5m    10.104.32.67    4am-node39   <none>           <none>

elstic · 2024-05-13T03:47:34Z

issue fixed.
verify image: master-20240511-8a9a4219

tianshihan818 · 2024-05-13T07:28:53Z

Hello @elstic! I met the same issue in milvus v2.4.1 (deployed by helm using the chart version milvus-4.1.30). I am wondering the reason of this error. And for now to fix this need re-deploy the milvus with pulling the image master-20240511-8a9a4219, right?
Thank you if you could offer any help!

elstic · 2024-05-13T08:13:55Z

Hello @elstic! I met the same issue in milvus v2.4.1 (deployed by helm using the chart version milvus-4.1.30). I am wondering the reason of this error. And for now to fix this need re-deploy the milvus with pulling the image master-20240511-8a9a4219, right? Thank you if you could offer any help!

Hello, the essence of the problem I'm documenting is that there is no compaction, resulting in the segment not being merged. For example, there is no data for the compaction latency in the graph.
But as far as I know, v2.4.1 should not have this problem. Or can you describe your problem in detail? Is it either of the following?
1 ) Error: “quota exceeded[reason=rate type: DMLInsert]” Please check if your memory usage is high or if you have configured flow limiting: quotaAndLimits.dml.enabled
2) Check if your instance has been compacted , if not then your problem is the same as mine.

tianshihan818 · 2024-05-13T09:03:14Z

Hello @elstic! I met the same issue in milvus v2.4.1 (deployed by helm using the chart version milvus-4.1.30). I am wondering the reason of this error. And for now to fix this need re-deploy the milvus with pulling the image master-20240511-8a9a4219, right? Thank you if you could offer any help!

Hello, the essence of the problem I'm documenting is that there is no compaction, resulting in the segment not being merged. For example, there is no data for the compaction latency in the graph. But as far as I know, v2.4.1 should not have this problem. Or can you describe your problem in detail? Is it either of the following? 1 ) Error: “quota exceeded[reason=rate type: DMLInsert]” Please check if your memory usage is high or if you have configured flow limiting: quotaAndLimits.dml.enabled 2) Check if your instance has been compacted , if not then your problem is the same as mine.

Thanks for your reply first!

I am testing the bottleneck of the milvus inserting performance. My scene is just doing batch insert (10000 of 768dim random data at a time) continually, and the insert API reported the error RPC error: [batch_insert], <MilvusException: (code=9, message=quota exceeded[reason=rate type: DMLInsert])> when the entities num reached 200 million level. I didn't set the quotaAndLimits.dml at deployment, I guess the default value is False.

I checked the log of rootcoord at that time, reported QueryNode memory to low water level, limit writing rate. And the log of querynode also reported no sufficient resource to load segments. So I tried to allocate more memory quota for querynodes and restart the pods, but it still raises the same error when inserting.

I think I should upgrade the whole milvus cluster then.

xiaofan-luan · 2024-05-13T09:36:31Z

Hello @elstic! I met the same issue in milvus v2.4.1 (deployed by helm using the chart version milvus-4.1.30). I am wondering the reason of this error. And for now to fix this need re-deploy the milvus with pulling the image master-20240511-8a9a4219, right? Thank you if you could offer any help!

Hello, the essence of the problem I'm documenting is that there is no compaction, resulting in the segment not being merged. For example, there is no data for the compaction latency in the graph. But as far as I know, v2.4.1 should not have this problem. Or can you describe your problem in detail? Is it either of the following? 1 ) Error: “quota exceeded[reason=rate type: DMLInsert]” Please check if your memory usage is high or if you have configured flow limiting: quotaAndLimits.dml.enabled 2) Check if your instance has been compacted , if not then your problem is the same as mine.

Thanks for your reply first!

I am testing the bottleneck of the milvus inserting performance. My scene is just doing batch insert (10000 of 768dim random data at a time) continually, and the insert API reported the error RPC error: [batch_insert], <MilvusException: (code=9, message=quota exceeded[reason=rate type: DMLInsert])> when the entities num reached 200 million level. I didn't set the quotaAndLimits.dml at deployment, I guess the default value is False.

I checked the log of rootcoord at that time, reported QueryNode memory to low water level, limit writing rate. And the log of querynode also reported no sufficient resource to load segments. So I tried to allocate more memory quota for querynodes and restart the pods, but it still raises the same error when inserting.

I think I should upgrade the whole milvus cluster then.

you need to caculate how much memory you need for load 200 million vectors . This could cost a couple of hundred giga bytes

tianshihan818 · 2024-05-14T03:44:33Z

Hello @elstic! I met the same issue in milvus v2.4.1 (deployed by helm using the chart version milvus-4.1.30). I am wondering the reason of this error. And for now to fix this need re-deploy the milvus with pulling the image master-20240511-8a9a4219, right? Thank you if you could offer any help!

Hello, the essence of the problem I'm documenting is that there is no compaction, resulting in the segment not being merged. For example, there is no data for the compaction latency in the graph. But as far as I know, v2.4.1 should not have this problem. Or can you describe your problem in detail? Is it either of the following? 1 ) Error: “quota exceeded[reason=rate type: DMLInsert]” Please check if your memory usage is high or if you have configured flow limiting: quotaAndLimits.dml.enabled 2) Check if your instance has been compacted , if not then your problem is the same as mine.

Thanks for your reply first!
I am testing the bottleneck of the milvus inserting performance. My scene is just doing batch insert (10000 of 768dim random data at a time) continually, and the insert API reported the error RPC error: [batch_insert], <MilvusException: (code=9, message=quota exceeded[reason=rate type: DMLInsert])> when the entities num reached 200 million level. I didn't set the quotaAndLimits.dml at deployment, I guess the default value is False.
I checked the log of rootcoord at that time, reported QueryNode memory to low water level, limit writing rate. And the log of querynode also reported no sufficient resource to load segments. So I tried to allocate more memory quota for querynodes and restart the pods, but it still raises the same error when inserting.
I think I should upgrade the whole milvus cluster then.

you need to caculate how much memory you need for load 200 million vectors . This could cost a couple of hundred giga bytes

Thanks for your advice!

I estimate the request resources quota before deployment (I tried 16 * 16cpu64Gi querynodes first). I guess the cause is the high writing rate leads to the high memory usage.

Actually I am wondering whether Milvus supports resizing resources quota of workernodes dynamically? In this scene, I tried to use helm like helm upgrade -f values_custom.yaml my_milvus zilliztech/milvus --reuse-values --set queryNode.resources.requests.memory=128Gi or helm upgrade -f values_custom.yaml my_milvus zilliztech/milvus --reuse-values --set queryNode.replicas=32 to resize and restart the querynodes, it did get more resource but still raises the above error. So I uninstalled then installed the whole Milvus cluster again, it works finally.

I have two questions, hoping you would give some advice :)

How does the milvus resize its scale correctly as the data increasing towards its limit?
When components crash and restart (due to disconnecting from etcd for example) during the insertion, how does the milvus recover and sync the data? Could you explain or offer some reference documents to show me the pipeline? I've run into this scene a few times before, but sometimes it restarts and recovers automatically, sometimes it crashes completely to make insertion API unavailable.

xiaofan-luan · 2024-05-14T04:26:05Z

Hello @elstic! I met the same issue in milvus v2.4.1 (deployed by helm using the chart version milvus-4.1.30). I am wondering the reason of this error. And for now to fix this need re-deploy the milvus with pulling the image master-20240511-8a9a4219, right? Thank you if you could offer any help!

Hello, the essence of the problem I'm documenting is that there is no compaction, resulting in the segment not being merged. For example, there is no data for the compaction latency in the graph. But as far as I know, v2.4.1 should not have this problem. Or can you describe your problem in detail? Is it either of the following? 1 ) Error: “quota exceeded[reason=rate type: DMLInsert]” Please check if your memory usage is high or if you have configured flow limiting: quotaAndLimits.dml.enabled 2) Check if your instance has been compacted , if not then your problem is the same as mine.

Thanks for your reply first!
I am testing the bottleneck of the milvus inserting performance. My scene is just doing batch insert (10000 of 768dim random data at a time) continually, and the insert API reported the error RPC error: [batch_insert], <MilvusException: (code=9, message=quota exceeded[reason=rate type: DMLInsert])> when the entities num reached 200 million level. I didn't set the quotaAndLimits.dml at deployment, I guess the default value is False.
I checked the log of rootcoord at that time, reported QueryNode memory to low water level, limit writing rate. And the log of querynode also reported no sufficient resource to load segments. So I tried to allocate more memory quota for querynodes and restart the pods, but it still raises the same error when inserting.
I think I should upgrade the whole milvus cluster then.

you need to caculate how much memory you need for load 200 million vectors . This could cost a couple of hundred giga bytes

Thanks for your advice!

I estimate the request resources quota before deployment (I tried 16 * 16cpu64Gi querynodes first). I guess the cause is the high writing rate leads to the high memory usage.

Actually I am wondering whether Milvus supports resizing resources quota of workernodes dynamically? In this scene, I tried to use helm like helm upgrade -f values_custom.yaml my_milvus zilliztech/milvus --reuse-values --set queryNode.resources.requests.memory=128Gi or helm upgrade -f values_custom.yaml my_milvus zilliztech/milvus --reuse-values --set queryNode.replicas=32 to resize and restart the querynodes, it did get more resource but still raises the above error. So I uninstalled then installed the whole Milvus cluster again, it works finally.

I have two questions, hoping you would give some advice :)

How does the milvus resize its scale correctly as the data increasing towards its limit?

When components crash and restart (due to disconnecting from etcd for example) during the insertion, how does the milvus recover and sync the data? Could you explain or offer some reference documents to show me the pipeline? I've run into this scene a few times before, but sometimes it restarts and recovers automatically, sometimes it crashes completely to make insertion API unavailable.

can you explain your use case a little bit. it seems to be a large developement.
I'm glad to setup offline meeting and offer some help.
Please connect me at [email protected]

elstic added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. test/benchmark benchmark test labels Apr 30, 2024

elstic added this to the 2.4.1 milestone Apr 30, 2024

elstic assigned yanliang567 Apr 30, 2024

sre-ci-robot assigned SimFG and unassigned yanliang567 Apr 30, 2024

yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 30, 2024

elstic assigned tedxu and unassigned SimFG Apr 30, 2024

yanliang567 modified the milestones: 2.4.1, 2.4.2 May 7, 2024

elstic closed this as completed May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: [benchmark] No `quotaAndLimits` configured, some insert interfaces report error: `quota exceeded[reason=rate type: DMLInsert]` #32719

[Bug]: [benchmark] No `quotaAndLimits` configured, some insert interfaces report error: `quota exceeded[reason=rate type: DMLInsert]` #32719

elstic commented Apr 30, 2024 •

edited

SimFG commented Apr 30, 2024

yanliang567 commented Apr 30, 2024

elstic commented Apr 30, 2024

yanliang567 commented Apr 30, 2024

elstic commented May 6, 2024

elstic commented May 13, 2024

tianshihan818 commented May 13, 2024

elstic commented May 13, 2024

tianshihan818 commented May 13, 2024

xiaofan-luan commented May 13, 2024

tianshihan818 commented May 14, 2024

xiaofan-luan commented May 14, 2024

[Bug]: [benchmark] No quotaAndLimits configured, some insert interfaces report error: quota exceeded[reason=rate type: DMLInsert] #32719

[Bug]: [benchmark] No quotaAndLimits configured, some insert interfaces report error: quota exceeded[reason=rate type: DMLInsert] #32719

Comments

elstic commented Apr 30, 2024 • edited

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

SimFG commented Apr 30, 2024

yanliang567 commented Apr 30, 2024

elstic commented Apr 30, 2024

yanliang567 commented Apr 30, 2024

elstic commented May 6, 2024

elstic commented May 13, 2024

tianshihan818 commented May 13, 2024

elstic commented May 13, 2024

tianshihan818 commented May 13, 2024

xiaofan-luan commented May 13, 2024

tianshihan818 commented May 14, 2024

xiaofan-luan commented May 14, 2024

[Bug]: [benchmark] No `quotaAndLimits` configured, some insert interfaces report error: `quota exceeded[reason=rate type: DMLInsert]` #32719

[Bug]: [benchmark] No `quotaAndLimits` configured, some insert interfaces report error: `quota exceeded[reason=rate type: DMLInsert]` #32719

elstic commented Apr 30, 2024 •

edited