Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

当我的transformers是4.36.2时,chatglm2不能正常加载 #651

Open
1 task done
Congcong-Song opened this issue Jan 5, 2024 · 2 comments
Open
1 task done

Comments

@Congcong-Song
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

报错如下:
Traceback (most recent call last):
File "/home/inspur/scc/gpt/LLaMA-Factory/src/train_bash.py", line 14, in
main()
File "/home/inspur/scc/gpt/LLaMA-Factory/src/train_bash.py", line 5, in main
run_exp()
File "/home/inspur/scc/gpt/LLaMA-Factory/src/llmtuner/train/tuner.py", line 26, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/home/inspur/scc/gpt/LLaMA-Factory/src/llmtuner/train/sft/workflow.py", line 29, in run_sft
model, tokenizer = load_model_and_tokenizer(model_args, finetuning_args, training_args.do_train)
File "/home/inspur/scc/gpt/LLaMA-Factory/src/llmtuner/model/loader.py", line 49, in load_model_and_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 774, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained
return cls._from_pretrained(
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/inspur/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 69, in init
super().init(padding_side=padding_side, **kwargs)
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 367, in init
self._add_tokens(
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens
current_vocab = self.get_vocab().copy()
File "/home/inspur/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 108, in get_vocab
vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)}
File "/home/inspur/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 104, in vocab_size
return self.tokenizer.n_words
AttributeError: 'ChatGLMTokenizer' object has no attribute 'tokenizer'. Did you mean: 'tokenize'?

这个要降transformers的版本才能加载。但是降版本后又会导致其他问题。

Expected Behavior

No response

Steps To Reproduce

我的命令:
CUDA_VISIBLE_DEVICES=5 python src/train_bash.py
--stage sft
--do_train
--model_name_or_path /path/THUDM/chatglm2-6b
--dataset alpaca_gpt4_zh
--template chatglm2
--finetuning_type lora
--lora_target query_key_value
--output_dir /path/chatglm2
--overwrite_cache
--per_device_train_batch_size 4
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 3.0
--plot_loss
--fp16

Environment

环境:按照requirements,硬件:A100。python:3.10 Transformers: 4.36.2 pytorch:2.1.2

Anything else?

No response

@Gaojun123123
Copy link

去hugingface 下载最新的模型

@mawenju203
Copy link

更新下tokenization_chatglm.py,就可以了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants