[BUG] 配置为milvus向量库时报错，faiss时正常 #3905

Sgzmust · 2024-04-26T10:31:52Z

初始化向量库时，用下面语句
python init_database.py --recreate-vs
报错：
2024-04-26 10:25:07,084 - lang.py[line:346] - WARNING: Need to load profiles.
2024-04-26 10:25:07,727 - common.py[line:591] - INFO: HTML element instance has no attribute type
cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py)
文档切分示例：page_content='大模型技术栈-算法与原理\n\ntokenizer方法\nword-level\nchar-level\nsubword-level\nBPE\nWordPiece\nUniLM\nSentencePiece\nByteBPE\n\nposition encoding\n绝对位置编码\nROPE\nAliBi\n\n\n相对位置编码\nTransformer-XL\nT5/TUPE\nDeBERTa\n\n\n其他位置编码\n\n注意力机制\n稀疏注意力\nflash-attention' metadata={'source': '/home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/大模型技术栈-算法与原理.md'}
2024-04-26 10:25:37,804 - utils.py[line:295] - INFO: RapidOCRLoader used for /home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/img/大模型技术栈-算法与原理-幕布图片-19929-302935.jpg
正在将 samples/llm/大模型技术栈-算法与原理.md 添加到向量库，共包含56条文档
cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py)
文档切分示例：page_content='Multi-head\n\nGrouped-query\n\nMulti-query\n\nValues\n\nKeys\n\n00000000\n\nQueries' metadata={'source': '/home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/img/大模型推理优化策略-幕布图片-699343-219844.jpg'}
2024-04-26 10:25:37,837 - utils.py[line:295] - INFO: RapidOCRLoader used for /home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/img/分布式训练技术原理-幕布图片-906937-836104.jpg
cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py)
cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py)
cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py)
cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py)
cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py)
文档切分示例：page_content='NVIDIA Megatron Trains LLM\n\nPipelineParallelism\n\nDevice 1\n\n101112\n\nLayer 1-4\n\nDevice 2\n\n9101112\n\n10\n\nLayer5-8\n\nDevice 3\n\n9101112\n\n13\n\n10\n\n11\n\nLayer 9-12\n\nDevice 4\n\n10\n\n10\n\n11\n\n11\n\n12\n\nLayer 13-16\n\nTime\n\nAssignmultiple stages\n\ntoeachdevice\n\nDevice 1' metadata={'source': '/home/Python/Langchain-Chatchat/knowledge_base/samples/content/llm/img/分布式训练技术原理-幕布图片-618350-869132.jpg'}

dosubot · 2024-04-26T10:32:29Z

The error you're encountering with importing AutoTokenizer from the transformers package in your Python 3.11 environment is likely due to a compatibility issue. To resolve this, you should upgrade the transformers package to a version that supports Python 3.11. This can be done by running pip install transformers --upgrade. Ensure the upgraded version is compatible with your project's dependencies.

Upgrading transformers may affect other dependencies such as sentence_transformers, accelerate, and transformers_stream_generator, so it's important to check their compatibility with the new version of transformers and Python 3.11.

The AutoTokenizer is crucial for handling different models' tokenization in the Langchain-Chatchat project, which is why this issue directly impacts your ability to add documents to the vector database.

For reproducing and further investigating the error, refer to the test_milvus_db.py file in the tests directory of the Langchain-Chatchat repository. This file outlines steps for initializing a vector database with Milvus, adding documents, performing searches, and deleting documents, which could be helpful for troubleshooting.

If the problem persists after upgrading, please provide more detailed error messages and the context in which they occur to better diagnose and address the issue.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

jiusi9 · 2024-04-29T04:03:02Z

我也遇到了同样的问题，请问解决了吗

hanwsf · 2024-04-29T19:35:06Z

cannot import name 'AutoTokenizer' from 'transformers' (/root/anaconda3/envs/Langchain/lib/python3.11/site-packages/transformers/init.py)
建议uninstall，然后重新install transformers==？.

hanwsf · 2024-04-29T19:36:03Z

milvus用2.4版本

zRzRzRzRzRzRzR · 2024-05-05T04:21:45Z

装4.37的transformer没问题

Sgzmust added the bug Something isn't working label Apr 26, 2024

zRzRzRzRzRzRzR self-assigned this May 5, 2024

zRzRzRzRzRzRzR closed this as completed May 5, 2024

This was referenced May 8, 2024

切换milvus向量库，初始化数据报错 #3970

Open

使用milvus数据库，初始化数据库的时候，显存爆炸。 #3971

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] 配置为milvus向量库时报错，faiss时正常 #3905

[BUG] 配置为milvus向量库时报错，faiss时正常 #3905

Sgzmust commented Apr 26, 2024

dosubot bot commented Apr 26, 2024 •

edited

Details

jiusi9 commented Apr 29, 2024

hanwsf commented Apr 29, 2024

hanwsf commented Apr 29, 2024

zRzRzRzRzRzRzR commented May 5, 2024

[BUG] 配置为milvus向量库时报错，faiss时正常 #3905

[BUG] 配置为milvus向量库时报错，faiss时正常 #3905

Comments

Sgzmust commented Apr 26, 2024

dosubot bot commented Apr 26, 2024 • edited

Details

jiusi9 commented Apr 29, 2024

hanwsf commented Apr 29, 2024

hanwsf commented Apr 29, 2024

zRzRzRzRzRzRzR commented May 5, 2024

dosubot bot commented Apr 26, 2024 •

edited