Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 配置为milvus向量库时报错,faiss时正常 #3905

Closed
Sgzmust opened this issue Apr 26, 2024 · 5 comments
Closed

[BUG] 配置为milvus向量库时报错,faiss时正常 #3905

Sgzmust opened this issue Apr 26, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@Sgzmust
Copy link

Sgzmust commented Apr 26, 2024

初始化向量库时,用下面语句
python init_database.py --recreate-vs
报错:
2024-04-26 10:25:07,084 - lang.py[line:346] - WARNING: Need to load profiles.
2024-04-26 10:25:07,727 - common.py[line:591] - INFO: HTML element instance has no attribute type
cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py)
文档切分示例:page_content='大模型技术栈-算法与原理\n\ntokenizer方法\nword-level\nchar-level\nsubword-level\nBPE\nWordPiece\nUniLM\nSentencePiece\nByteBPE\n\nposition encoding\n绝对位置编码\nROPE\nAliBi\n\n\n相对位置编码\nTransformer-XL\nT5/TUPE\nDeBERTa\n\n\n其他位置编码\n\n注意力机制\n稀疏注意力\nflash-attention' metadata={'source': '/home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/大模型技术栈-算法与原理.md'}
2024-04-26 10:25:37,804 - utils.py[line:295] - INFO: RapidOCRLoader used for /home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/img/大模型技术栈-算法与原理-幕布图片-19929-302935.jpg
正在将 samples/llm/大模型技术栈-算法与原理.md 添加到向量库,共包含56条文档
cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py)
文档切分示例:page_content='Multi-head\n\nGrouped-query\n\nMulti-query\n\nValues\n\nKeys\n\n00000000\n\nQueries' metadata={'source': '/home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/img/大模型推理优化策略-幕布图片-699343-219844.jpg'}
2024-04-26 10:25:37,837 - utils.py[line:295] - INFO: RapidOCRLoader used for /home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/img/分布式训练技术原理-幕布图片-906937-836104.jpg
cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py)
cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py)
cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py)
cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py)
cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py)
文档切分示例:page_content='NVIDIA Megatron Trains LLM\n\nPipelineParallelism\n\nDevice 1\n\n101112\n\nLayer 1-4\n\nDevice 2\n\n9101112\n\n10\n\nLayer5-8\n\nDevice 3\n\n9101112\n\n13\n\n10\n\n11\n\nLayer 9-12\n\nDevice 4\n\n10\n\n10\n\n11\n\n11\n\n12\n\nLayer 13-16\n\nTime\n\nAssignmultiple stages\n\ntoeachdevice\n\nDevice 1' metadata={'source': '/home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/img/分布式训练技术原理-幕布图片-618350-869132.jpg'}

@Sgzmust Sgzmust added the bug Something isn't working label Apr 26, 2024
Copy link

dosubot bot commented Apr 26, 2024

The error you're encountering with importing AutoTokenizer from the transformers package in your Python 3.11 environment is likely due to a compatibility issue. To resolve this, you should upgrade the transformers package to a version that supports Python 3.11. This can be done by running pip install transformers --upgrade. Ensure the upgraded version is compatible with your project's dependencies.

Upgrading transformers may affect other dependencies such as sentence_transformers, accelerate, and transformers_stream_generator, so it's important to check their compatibility with the new version of transformers and Python 3.11.

The AutoTokenizer is crucial for handling different models' tokenization in the Langchain-Chatchat project, which is why this issue directly impacts your ability to add documents to the vector database.

For reproducing and further investigating the error, refer to the test_milvus_db.py file in the tests directory of the Langchain-Chatchat repository. This file outlines steps for initializing a vector database with Milvus, adding documents, performing searches, and deleting documents, which could be helpful for troubleshooting.

If the problem persists after upgrading, please provide more detailed error messages and the context in which they occur to better diagnose and address the issue.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@jiusi9
Copy link

jiusi9 commented Apr 29, 2024

我也遇到了同样的问题,请问解决了吗

@hanwsf
Copy link

hanwsf commented Apr 29, 2024

cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py)
建议uninstall,然后重新install transformers==?.

@hanwsf
Copy link

hanwsf commented Apr 29, 2024

milvus用2.4版本

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR self-assigned this May 5, 2024
@zRzRzRzRzRzRzR
Copy link
Collaborator

装4.37的transformer没问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants