New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data did not match any variant of untagged enum PyPreTokenizerTypeWrapper #3910
Comments
The error "data did not match any variant of untagged enum PyPreTokenizerTypeWrapper" you're encountering is likely due to a mismatch or misconfiguration in the tokenizer setup within your Docker-deployed project. This issue can arise from several factors, including Python version discrepancies, compatibility issues between Langchain, Fastchat, and the tokenizer, or incorrect tokenizer configuration.
For the Rebuild your Docker image after making any changes to ensure they are applied. If you need further assistance or have more questions, feel free to ask.
|
可能是 CodeQwen 不兼容新版 See here Qwen/CodeQwen1.5-7B-Chat · Having trouble loading this with transformers |
和这个issus中一样,我试下其他版本 你觉得这个可能是llm model和 |
This issue was resolved. Now, the version is: |
问题描述 / Problem Description
用简洁明了的语言描述这个问题 / Describe the problem in a clear and concise manner.
Image build 成功,container启动失败。
能帮我看下这是哪里出现问题了吗?我一直没有定位到是哪里出现了问题,或者能帮我看下这是哪个模块提示的错误
复现问题的步骤 / Steps to Reproduce
==============================Langchain-Chatchat Configuration==============================
操作系统:Linux-5.15.0-76-generic-x86_64-with-glibc2.29.
python版本:3.8.10 (default, Nov 22 2023, 10:22:35)
[GCC 9.4.0]
项目版本:v0.2.10
langchain版本:0.0.344. fastchat版本:0.2.36
当前使用的分词器:ChineseRecursiveTextSplitter
当前启动的LLM模型:['CodeQwen1.5-7B-Chat', 'openai-api'] @ cuda
{'device': 'cuda',
'host': '0.0.0.0',
'infer_turbo': False,
'model_path': '/opt/models/CodeQwen1.5-7B-Chat',
'model_path_exists': True,
'port': 20002}
{'api_base_url': 'https://api.openai.com/v1',
'api_key': '',
'device': 'auto',
'host': '0.0.0.0',
'infer_turbo': False,
'model_name': 'gpt-3.5-turbo',
'online_api': True,
'openai_proxy': '',
'port': 20002}
当前Embbedings模型: bge-large-en-v1.5 @ cuda
==============================Langchain-Chatchat Configuration==============================
2024-04-28 06:29:43,432 - startup.py[line:650] - INFO: 正在启动服务:
2024-04-28 06:29:43,433 - startup.py[line:651] - INFO: 如需查看 llm_api 日志,请前往 /opt/Langchain-ChatChat/logs
2024-04-28 06:29:48 | ERROR | stderr | INFO: Started server process [475]
2024-04-28 06:29:48 | ERROR | stderr | INFO: Waiting for application startup.
2024-04-28 06:29:48 | ERROR | stderr | INFO: Application startup complete.
2024-04-28 06:29:48 | ERROR | stderr | INFO: Uvicorn running on http://0.0.0.0:20000 (Press CTRL+C to quit)
2024-04-28 06:29:48 | INFO | model_worker | Loading the model ['CodeQwen1.5-7B-Chat'] on worker 131939df ...
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|███████████████████████████████████████████████████████████████████████████▎ | 1/4 [00:00<00:02, 1.13it/s]
Loading checkpoint shards: 50%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 2/4 [00:01<00:01, 1.10it/s]
Loading checkpoint shards: 75%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 3/4 [00:02<00:00, 1.07it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.21it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.16it/s]
2024-04-28 06:29:54 | ERROR | stderr |
2024-04-28 06:29:54 | ERROR | stderr | Process model_worker - CodeQwen1.5-7B-Chat:
2024-04-28 06:29:54 | ERROR | stderr | Traceback (most recent call last):
2024-04-28 06:29:54 | ERROR | stderr | File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
2024-04-28 06:29:54 | ERROR | stderr | self.run()
2024-04-28 06:29:54 | ERROR | stderr | File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
2024-04-28 06:29:54 | ERROR | stderr | self._target(*self._args, **self._kwargs)
2024-04-28 06:29:54 | ERROR | stderr | File "/opt/Langchain-ChatChat/startup.py", line 386, in run_model_worker
2024-04-28 06:29:54 | ERROR | stderr | app = create_model_worker_app(log_level=log_level, **kwargs)
2024-04-28 06:29:54 | ERROR | stderr | File "/opt/Langchain-ChatChat/startup.py", line 214, in create_model_worker_app
2024-04-28 06:29:54 | ERROR | stderr | worker = ModelWorker(
2024-04-28 06:29:54 | ERROR | stderr | File "/usr/local/lib/python3.8/dist-packages/fastchat/serve/model_worker.py", line 77, in init
2024-04-28 06:29:54 | ERROR | stderr | self.model, self.tokenizer = load_model(
2024-04-28 06:29:54 | ERROR | stderr | File "/usr/local/lib/python3.8/dist-packages/fastchat/model/model_adapter.py", line 353, in load_model
2024-04-28 06:29:54 | ERROR | stderr | model, tokenizer = adapter.load_model(model_path, kwargs)
2024-04-28 06:29:54 | ERROR | stderr | File "/usr/local/lib/python3.8/dist-packages/fastchat/model/model_adapter.py", line 1706, in load_model
2024-04-28 06:29:54 | ERROR | stderr | tokenizer = AutoTokenizer.from_pretrained(
2024-04-28 06:29:54 | ERROR | stderr | File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/tokenization_auto.py", line 862, in from_pretrained
2024-04-28 06:29:54 | ERROR | stderr | return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
2024-04-28 06:29:54 | ERROR | stderr | File "/usr/local/lib/python3.8/dist-packages/transformers/tokenization_utils_base.py", line 2089, in from_pretrained
2024-04-28 06:29:54 | ERROR | stderr | return cls._from_pretrained(
2024-04-28 06:29:54 | ERROR | stderr | File "/usr/local/lib/python3.8/dist-packages/transformers/tokenization_utils_base.py", line 2311, in _from_pretrained
2024-04-28 06:29:54 | ERROR | stderr | tokenizer = cls(*init_inputs, **init_kwargs)
2024-04-28 06:29:54 | ERROR | stderr | File "/usr/local/lib/python3.8/dist-packages/transformers/tokenization_utils_fast.py", line 111, in init
2024-04-28 06:29:54 | ERROR | stderr | fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
2024-04-28 06:29:54 | ERROR | stderr | Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 12564 column 3
环境信息 / Environment Information
==============================Langchain-Chatchat Configuration==============================
操作系统:Linux-5.15.0-76-generic-x86_64-with-glibc2.29.
python版本:3.8.10 (default, Nov 22 2023, 10:22:35)
[GCC 9.4.0]
项目版本:v0.2.10
langchain版本:0.0.344. fastchat版本:0.2.36
附加信息 / Additional Information
添加与问题相关的任何其他信息 / Add any other information related to the issue.
The text was updated successfully, but these errors were encountered: