Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG/Help] <title>windows系统用cpu跑cli_demo,已经安装gcc,第二行tokenizer报错 #1449

Open
1 task done
lixianqi opened this issue Jan 20, 2024 · 0 comments
Open
1 task done

Comments

@lixianqi
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

如题,第一次配置大模型,一下是报错内容
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Traceback (most recent call last):
File "D:\Anaconda\envs\LLM\lib\site-packages\transformers\tokenization_utils_base.py", line 1958, in from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "C:\Users\username/.cache\huggingface\modules\transformers_modules\THUDM\chatglm-6b\8b7d33596d18c5e83e2da052d05ca4db02e60620\tokenization_chatglm.py", line 221, in init
self.sp_tokenizer = SPTokenizer(vocab_file, num_image_tokens=num_image_tokens)
File "C:\Users\username/.cache\huggingface\modules\transformers_modules\THUDM\chatglm-6b\8b7d33596d18c5e83e2da052d05ca4db02e60620\tokenization_chatglm.py", line 64, in init
self.text_tokenizer = TextTokenizer(vocab_file)
File "C:\Users\username/.cache\huggingface\modules\transformers_modules\THUDM\chatglm-6b\8b7d33596d18c5e83e2da052d05ca4db02e60620\tokenization_chatglm.py", line 22, in init
self.sp.Load(model_path)
File "D:\Anaconda\envs\LLM\lib\site-packages\sentencepiece_init
.py", line 905, in Load
return self.LoadFromFile(model_file)
File "D:\Anaconda\envs\LLM\lib\site-packages\sentencepiece_init_.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
OSError: Not found: "C:\Users\username/.cache\huggingface\hub\models--THUDM--chatglm-6b\snapshots\8b7d33596d18c5e83e2da052d05ca4db02e60620\ice_text.model": Illegal byte sequence Error #42

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\username\ChatGLM-6B\cli_demo.py", line 7, in
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)#C:/Users/李贤琦/Desktop/LLM THUDM/chatglm-6b
File "D:\Anaconda\envs\LLM\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 679, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "D:\Anaconda\envs\LLM\lib\site-packages\transformers\tokenization_utils_base.py", line 1804, in from_pretrained
return cls._from_pretrained(
File "D:\Anaconda\envs\LLM\lib\site-packages\transformers\tokenization_utils_base.py", line 1960, in _from_pretrained
raise OSError(
OSError: Unable to load vocabulary from file. Please check that the provided vocabulary is accessible and not corrupted.
求大佬们解答

Expected Behavior

No response

Steps To Reproduce

就是按照官方文档一步步安装的

Environment

- OS:Windows 10
- Python:3.9
- Transformers:4.27.1
- PyTorch:23.3.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :False

Anything else?

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant