Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: v2.4.0 datanode 内存使用过高 #32695

Open
1 task done
yesyue opened this issue Apr 29, 2024 · 11 comments
Open
1 task done

[Bug]: v2.4.0 datanode 内存使用过高 #32695

yesyue opened this issue Apr 29, 2024 · 11 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@yesyue
Copy link

yesyue commented Apr 29, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.4.0
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    kafka 
- SDK version(e.g. pymilvus v2.0.0rc2): 2.7
- OS(Ubuntu or CentOS):  CentOS
- CPU/Memory: 544c /4291.6 G at least
- GPU:  0 
- Others: datanode

Current Behavior

参考sizing tools 分配Data Node , 2 core 8 GB x 2pods , 实际运行出现OOM , 扩容后内存占用达40G

Expected Behavior

参考sizing tools 分配Data Node , 2 core 8 GB x 2pods , 实际运行出现OOM , 扩容后内存占用达40G

Steps To Reproduce

参考sizing tools 分配Data Node , 2 core 8 GB x 2pods , 实际运行出现OOM , 扩容后内存占用达40G

Milvus Log

No response

Anything else?

No response

@yesyue yesyue added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 29, 2024
Copy link
Contributor

The title and description of this issue contains Chinese. Please use English to describe your issue.

@yesyue
Copy link
Author

yesyue commented Apr 29, 2024

Referring to the Sizing Tools, allocate Data Nodes with 2 cores of 8 GB x 2 pods.
However, during actual operation, the Data Nodes was an OOM, and after expansion, the memory usage reached 40G.

@yesyue
Copy link
Author

yesyue commented Apr 29, 2024

datanode log:

datanode.log

@yanliang567
Copy link
Contributor

@yesyue please share more info about how you using milvus, e.g. what kinds of requests did you call to milvus, how many, and how frequency of them? also please help all the milvus pods logs for invesgitaion.

/assign @yesyue
/unassign

@sre-ci-robot sre-ci-robot assigned yesyue and unassigned yanliang567 Apr 29, 2024
@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 29, 2024
@yesyue
Copy link
Author

yesyue commented Apr 29, 2024

100 Million/day entites write to milvus

@tadinhkien99
Copy link

100 Million/day entites write to milvus

after I inserted 10M entites total, then milvus docker stop and crash. I use IVF_SQ8 index, installed milvus with gpu.
I use batch insert 10000 (only insert if enough 10000 entities.

after crash I can't connect to connection again and can't use anything. Any solution?

@xiaofan-luan
Copy link
Contributor

  1. seems that flush can not catch up the read.
  2. how many partitions do you have? if you have many partitions or collections, the flush and memory consumption will be larger than estimation.
  3. there is bunch of configs to tune, like concurrent flush number -> dataNode.dataSync.maxParallelSyncMgrTasks (for 2.4)
    memory used for growing segment

@xiaofan-luan
Copy link
Contributor

100 Million/day entites write to milvus

after I inserted 10M entites total, then milvus docker stop and crash. I use IVF_SQ8 index, installed milvus with gpu. I use batch insert 10000 (only insert if enough 10000 entities.

after crash I can't connect to connection again and can't use anything. Any solution?

how much gpu memory do you have?
please open another issue with detailed logs so we can help

@yesyue
Copy link
Author

yesyue commented May 4, 2024

querynode (3).log

@xiaofan-luan
Copy link
Contributor

querynode (3).log

1.could you offer log for datanode?
2. it would be great if you have a datanode pprof, so you know which part takes of your memory. Most likely it's insert buffer takes the memory and you can tune the flush parameter

@xiaofan-luan
Copy link
Contributor

I saw you in many issues and we'd like to offer help.
feel free to contact me at [email protected] if necessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

4 participants