Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用milvus数据库,初始化数据库的时候,显存爆炸。 #3971

Closed
zmwstu opened this issue May 8, 2024 · 2 comments
Closed
Labels
bug Something isn't working

Comments

@zmwstu
Copy link

zmwstu commented May 8, 2024

This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
Batches: 0%| | 0/396 [00:00<?, ?it/s]
2024-05-08 13:41:01,091 - embeddings_api.py[line:39] - ERROR: CUDA out of memory. Tried to allocate 17.93 GiB. GPU 0 has a total capacty of 23.65 GiB of which 1.62 GiB is free. Including non-PyTorch memory, this process has 22.02 GiB memory in use. Of the allocated memory 21.55 GiB is allocated by PyTorch, and 15.99 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
AttributeError: 'NoneType' object has no attribute 'conjugate'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/zwm/Code_Program/Chatchat/milvus-Langchain-Chatchat/init_database.py", line 107, in
folder2db(kb_names=args.kb_name, mode="recreate_vs", embed_model=args.embed_model)
File "/home/zwm/Code_Program/Chatchat/milvus-Langchain-Chatchat/server/knowledge_base/migrate.py", line 128, in folder2db
files2vs(kb_name, kb_files)
File "/home/zwm/Code_Program/Chatchat/milvus-Langchain-Chatchat/server/knowledge_base/migrate.py", line 113, in files2vs
kb.add_doc(kb_file=kb_file, not_refresh_vs_cache=True)
File "/home/zwm/Code_Program/Chatchat/milvus-Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 131, in add_doc
doc_infos = self.do_add_doc(docs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zwm/Code_Program/Chatchat/milvus-Langchain-Chatchat/server/knowledge_base/kb_service/milvus_kb_service.py", line 83, in do_add_doc
ids = self.milvus.add_documents(docs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/langchain_core/vectorstores.py", line 119, in add_documents
return self.add_texts(texts, metadatas, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/langchain_community/vectorstores/milvus.py", line 531, in add_texts
embeddings = self.embedding_func.embed_documents(texts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zwm/Code_Program/Chatchat/milvus-Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 439, in embed_documents
return normalize(embeddings).tolist()
^^^^^^^^^^^^^^^^^^^^^
File "/home/zwm/Code_Program/Chatchat/milvus-Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 37, in normalize
norm = np.linalg.norm(embeddings, axis=1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/numpy/linalg/linalg.py", line 2582, in norm
s = (x.conj() * x).real
^^^^^^^^
TypeError: loop of ufunc does not support argument 0 of type NoneType which has no callable conjugate method

有多张显卡,不知道可以一起共用不,初始化数据库的时候,为什么需要那么大显存,是程序的问题吗?

@zmwstu zmwstu added the bug Something isn't working label May 8, 2024
Copy link

dosubot bot commented May 8, 2024

The CUDA out of memory error you're experiencing is likely due to the high computational demands of indexing and searching with Milvus, especially with large datasets or complex vector embeddings. To mitigate this, consider processing data in smaller batches, adjusting Milvus's index parameters for efficiency, and configuring the index_file_size to manage segment sizes. Additionally, leveraging Milvus's support for distributed deployment can help distribute the workload across multiple GPUs.

Regarding the TypeError related to a NoneType object, this error suggests that an operation is being applied to a None object, possibly due to the embedding function returning None or an unexpected None value in your data processing pipeline. Ensure your embedding model is correctly initialized, validate input data to the embedding function, and check for None values in your data processing logic.

For utilizing multiple GPUs, ensure your setup supports distributed processing, which can help manage memory usage more effectively across GPUs.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@ilovesouthpark
Copy link

model config里embedding模型用的cuda吗?或者是auto?
试试cpu看,然后看看初始化是否出错。只是项目里那些sample的初始化我这里无论cpu还是cuda都能正确运行。显卡1张24g,1张10g。多张显卡的embedding的问题,前面有人讨论过你看看行不行,多显卡推理是没问题的。

@zmwstu zmwstu closed this as completed May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants