[BUG/Help] <title>windows系统用cpu跑cli_demo，已经安装gcc，第二行tokenizer报错 #1449

lixianqi · 2024-01-20T08:39:06Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

如题，第一次配置大模型，一下是报错内容
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Traceback (most recent call last):
File "D:\Anaconda\envs\LLM\lib\site-packages\transformers\tokenization_utils_base.py", line 1958, in from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "C:\Users\username/.cache\huggingface\modules\transformers_modules\THUDM\chatglm-6b\8b7d33596d18c5e83e2da052d05ca4db02e60620\tokenization_chatglm.py", line 221, in init
self.sp_tokenizer = SPTokenizer(vocab_file, num_image_tokens=num_image_tokens)
File "C:\Users\username/.cache\huggingface\modules\transformers_modules\THUDM\chatglm-6b\8b7d33596d18c5e83e2da052d05ca4db02e60620\tokenization_chatglm.py", line 64, in init
self.text_tokenizer = TextTokenizer(vocab_file)
File "C:\Users\username/.cache\huggingface\modules\transformers_modules\THUDM\chatglm-6b\8b7d33596d18c5e83e2da052d05ca4db02e60620\tokenization_chatglm.py", line 22, in init
self.sp.Load(model_path)
File "D:\Anaconda\envs\LLM\lib\site-packages\sentencepiece_init.py", line 905, in Load
return self.LoadFromFile(model_file)
File "D:\Anaconda\envs\LLM\lib\site-packages\sentencepiece_init_.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
OSError: Not found: "C:\Users\username/.cache\huggingface\hub\models--THUDM--chatglm-6b\snapshots\8b7d33596d18c5e83e2da052d05ca4db02e60620\ice_text.model": Illegal byte sequence Error #42

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\username\ChatGLM-6B\cli_demo.py", line 7, in
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)#C:/Users/李贤琦/Desktop/LLM THUDM/chatglm-6b
File "D:\Anaconda\envs\LLM\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 679, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "D:\Anaconda\envs\LLM\lib\site-packages\transformers\tokenization_utils_base.py", line 1804, in from_pretrained
return cls._from_pretrained(
File "D:\Anaconda\envs\LLM\lib\site-packages\transformers\tokenization_utils_base.py", line 1960, in _from_pretrained
raise OSError(
OSError: Unable to load vocabulary from file. Please check that the provided vocabulary is accessible and not corrupted.
求大佬们解答

Expected Behavior

No response

Steps To Reproduce

就是按照官方文档一步步安装的

Environment

- OS:Windows 10
- Python:3.9
- Transformers:4.27.1
- PyTorch:23.3.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :False

Anything else?

No response

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG/Help] <title>windows系统用cpu跑cli_demo，已经安装gcc，第二行tokenizer报错 #1449

[BUG/Help] <title>windows系统用cpu跑cli_demo，已经安装gcc，第二行tokenizer报错 #1449

lixianqi commented Jan 20, 2024

[BUG/Help] <title>windows系统用cpu跑cli_demo，已经安装gcc，第二行tokenizer报错 #1449

[BUG/Help] <title>windows系统用cpu跑cli_demo，已经安装gcc，第二行tokenizer报错 #1449

Comments

lixianqi commented Jan 20, 2024

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?