Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [DiskANN] Querynode oom occurs in concurrent search, query, load, flush, insert, delete tests #32674

Closed
1 task done
elstic opened this issue Apr 28, 2024 · 2 comments
Closed
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@elstic
Copy link
Contributor

elstic commented Apr 28, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: master-20240426-c080dc16-amd64
- Deployment mode(standalone or cluster):both 
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

case: test_concurrent_locust_diskann_compaction_standalone, test_concurrent_locust_diskann_compaction_cluster

argo task : fouramf-concurrent-2c6ft

querynode oom:
image

image

client pod: fouramf-concurrent-2c6ft-426191921
server:

fouram-71-6993-etcd-0                                             1/1     Running                       0                  6m42s   10.104.34.254   4am-node37   <none>           <none>
fouram-71-6993-etcd-1                                             1/1     Running                       0                  6m42s   10.104.15.57    4am-node20   <none>           <none>
fouram-71-6993-etcd-2                                             1/1     Running                       0                  6m42s   10.104.32.99    4am-node39   <none>           <none>
fouram-71-6993-milvus-datacoord-7b999cd6f5-4nvwb                  1/1     Running                       0                  6m43s   10.104.30.71    4am-node38   <none>           <none>
fouram-71-6993-milvus-datanode-856f597978-gr4qx                   1/1     Running                       1 (2m11s ago)      6m43s   10.104.33.161   4am-node36   <none>           <none>
fouram-71-6993-milvus-indexcoord-56dd6f7958-zlsd7                 1/1     Running                       0                  6m42s   10.104.27.79    4am-node31   <none>           <none>
fouram-71-6993-milvus-indexnode-79c57cbff5-sn2rd                  1/1     Running                       0                  6m42s   10.104.30.72    4am-node38   <none>           <none>
fouram-71-6993-milvus-proxy-7bc465fb86-z758q                      1/1     Running                       1 (2m11s ago)      6m43s   10.104.33.160   4am-node36   <none>           <none>
fouram-71-6993-milvus-querycoord-65b7f57bf9-xj8bc                 1/1     Running                       1 (2m12s ago)      6m42s   10.104.27.78    4am-node31   <none>           <none>
fouram-71-6993-milvus-querynode-564db7b7c7-m65rx                  1/1     Running                       0                  6m43s   10.104.20.197   4am-node22   <none>           <none>
fouram-71-6993-milvus-rootcoord-84d5cf487-t7tth                   1/1     Running                       1 (2m12s ago)      6m43s   10.104.30.70    4am-node38   <none>           <none>
fouram-71-6993-minio-0                                            1/1     Running                       0                  6m42s   10.104.34.253   4am-node37   <none>           <none>
fouram-71-6993-minio-1                                            1/1     Running                       0                  6m42s   10.104.32.98    4am-node39   <none>           <none>
fouram-71-6993-minio-2                                            1/1     Running                       0                  6m42s   10.104.15.60    4am-node20   <none>           <none>
fouram-71-6993-minio-3                                            1/1     Running                       0                  6m42s   10.104.25.164   4am-node30   <none>           <none>
fouram-71-6993-pulsar-bookie-0                                    1/1     Running                       0                  6m42s   10.104.34.3     4am-node37   <none>           <none>
fouram-71-6993-pulsar-bookie-1                                    1/1     Running                       0                  6m42s   10.104.15.61    4am-node20   <none>           <none>
fouram-71-6993-pulsar-bookie-2                                    1/1     Running                       0                  6m41s   10.104.32.102   4am-node39   <none>           <none>
fouram-71-6993-pulsar-bookie-init-7dlwk                           0/1     Completed                     0                  6m42s   10.104.13.35    4am-node16   <none>           <none>
fouram-71-6993-pulsar-broker-0                                    1/1     Running                       0                  6m42s   10.104.13.38    4am-node16   <none>           <none>
fouram-71-6993-pulsar-proxy-0                                     1/1     Running                       0                  6m42s   10.104.13.37    4am-node16   <none>           <none>
fouram-71-6993-pulsar-pulsar-init-b44hc                           0/1     Completed                     0                  6m42s   10.104.13.36    4am-node16   <none>           <none>
fouram-71-6993-pulsar-recovery-0                                  1/1     Running                       0                  6m42s   10.104.9.195    4am-node14   <none>           <none>
fouram-71-6993-pulsar-zookeeper-0                                 1/1     Running                       0                  6m42s   10.104.15.56    4am-node20   <none>           <none>
fouram-71-6993-pulsar-zookeeper-1                                 1/1     Running                       0                  6m      10.104.32.104   4am-node39   <none>           <none>
fouram-71-6993-pulsar-zookeeper-2                                 1/1     Running                       0                  5m25s   10.104.25.168   4am-node30   <none>           <none> (cli_client.py:138)

After the case is executed:
image

Expected Behavior

No response

Steps To Reproduce

1. create a collection 
  2. build an DiskANN index on the vector column
  3. insert 100k vectors
  4. flush collection
  5. build index on vector column with the same parameters  
  6. count the total number of rows
  7. load collection
  8. execute concurrent search, query,load,insert,delete,flush 
  9. step 8 lasts 5h

Milvus Log

No response

Anything else?

test env: 4am cluster

@elstic elstic added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. test/benchmark benchmark test labels Apr 28, 2024
@elstic elstic added this to the 2.4.1 milestone Apr 28, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 28, 2024
@yanliang567
Copy link
Contributor

/assign @weiliu1031
/unassign

@yanliang567 yanliang567 added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Apr 28, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.1, 2.4.2 May 7, 2024
@elstic
Copy link
Contributor Author

elstic commented May 17, 2024

This issue hasn't come up recently

@elstic elstic closed this as completed May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants