Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: piece id is out of range. #4

Open
lv199882 opened this issue Nov 25, 2023 · 3 comments
Open

IndexError: piece id is out of range. #4

lv199882 opened this issue Nov 25, 2023 · 3 comments

Comments

@lv199882
Copy link

你好,我进行p_tuning时用yago数据可以跑,用自己的数据,目前就做了61条就报错,用的是chatglm-6b模型。
Running tokenizer on train dataset: 100%|██████████████████████████████████████| 61/61 [00:00<00:00, 2185.47 examples/s]
input_ids [5, 107883, 102011, 64744, 73948, 63826, 102011, 65407, 65267, 64379, 31, 71492, 63859, 65845, 63984, 64121, 66740, 12, 91831, 85, 65853, 85, 64174, 7, 150001, 150004, 5, 91831, 150005, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
Traceback (most recent call last):
File "/home/yuan/kg-llm-main-KGC/ptuning_main.py", line 399, in
main()
File "/home/yuan/kg-llm-main-KGC/ptuning_main.py", line 226, in main
print_dataset_example(train_dataset[0])
File "/home/yuan/kg-llm-main-KGC/ptuning_main.py", line 206, in print_dataset_example
print("inputs", tokenizer.decode(example["input_ids"]))
File "/home/yuan/.conda/envs/kgc1/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3476, in decode
return self._decode(
File "/home/yuan/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 285, in _decode
return super()._decode(token_ids, **kwargs)
File "/home/yuan/.conda/envs/kgc1/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 931, in _decode
filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
File "/home/yuan/.conda/envs/kgc1/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 912, in convert_ids_to_tokens
tokens.append(self._convert_id_to_token(index))
File "/home/yuan/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 293, in _convert_id_to_token
return self.sp_tokenizer[index]
File "/home/yuan/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 157, in getitem
return self.text_tokenizer.convert_id_to_token(x - self.num_image_tokens)
File "/home/yuan/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 44, in convert_id_to_token
return self.sp.IdToPiece(idx)
File "/home/yuan/.conda/envs/kgc1/lib/python3.10/site-packages/sentencepiece/init.py", line 1045, in _batched_func
return _func(self, arg)
File "/home/yuan/.conda/envs/kgc1/lib/python3.10/site-packages/sentencepiece/init.py", line 1038, in _func
raise IndexError('piece id is out of range.')
IndexError: piece id is out of range.
请问是什么原因吗?感谢

@yao8839836
Copy link
Owner

@lv199882

你好,可参考:
THUDM/ChatGLM-6B#438

@lv199882
Copy link
Author

非常感谢

@lv199882

你好,可参考: THUDM/ChatGLM-6B#438 砰砰/ChatGLM-6B#438

非常感谢您的解答,我想请教您另一个问题,因为网上没找到llama的模型,我用的llama2-7b,在运行lora_finetune_wn11.py后,我调用lora_infer_wn11.py文件时发生以下错误:
Traceback (most recent call last):
File "/home/yuan/kg-llm-main-KGC/lora_infer_wn11.py", line 38, in
model = PeftModel.from_pretrained(
File "/home/yuan/.conda/envs/kgc/lib/python3.10/site-packages/peft/peft_model.py", line 332, in from_pretrained
model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
File "/home/yuan/.conda/envs/kgc/lib/python3.10/site-packages/peft/peft_model.py", line 629, in load_adapter
adapters_weights = load_peft_weights(model_id, device=torch_device, **hf_hub_download_kwargs)
File "/home/yuan/.conda/envs/kgc/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 222, in load_peft_weights
adapters_weights = safe_load_file(filename, device=device)
File "/home/yuan/.conda/envs/kgc/lib/python3.10/site-packages/safetensors/torch.py", line 308, in load_file
with safe_open(filename, framework="pt", device=device) as f:
safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization
这个是因为llama2不能使用lora微调吗?

@JTWang722
Copy link

非常感谢

@lv199882
你好,可参考: THUDM/ChatGLM-6B#438 砰砰/ChatGLM-6B#438

非常感谢您的解答,我想请教您另一个问题,因为网上没找到llama的模型,我用的llama2-7b,在运行lora_finetune_wn11.py后,我调用lora_infer_wn11.py文件时发生以下错误: Traceback (most recent call last): File "/home/yuan/kg-llm-main-KGC/lora_infer_wn11.py", line 38, in model = PeftModel.from_pretrained( File "/home/yuan/.conda/envs/kgc/lib/python3.10/site-packages/peft/peft_model.py", line 332, in from_pretrained model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs) File "/home/yuan/.conda/envs/kgc/lib/python3.10/site-packages/peft/peft_model.py", line 629, in load_adapter adapters_weights = load_peft_weights(model_id, device=torch_device, **hf_hub_download_kwargs) File "/home/yuan/.conda/envs/kgc/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 222, in load_peft_weights adapters_weights = safe_load_file(filename, device=device) File "/home/yuan/.conda/envs/kgc/lib/python3.10/site-packages/safetensors/torch.py", line 308, in load_file with safe_open(filename, framework="pt", device=device) as f: safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization 这个是因为llama2不能使用lora微调吗?

我也遇到了同样的问题,使用lora在wn11数据集上微调llama2-7b,错误如下:
Traceback (most recent call last):
File "lora_finetune.py", line 179, in
trainer.train()
File "/home/220/.conda/envs/kgellm/lib/python3.8/site-packages/transformers/trainer.py", line 1537, in train
return inner_training_loop(
File "/home/220/.conda/envs/kgellm/lib/python3.8/site-packages/transformers/trainer.py", line 1957, in _inner_training_loop
self._load_best_model()
File "/home/220/.conda/envs/kgellm/lib/python3.8/site-packages/transformers/trainer.py", line 2181, in _load_best_model
model.load_adapter(self.state.best_model_checkpoint, model.active_adapter)
File "/home/220/.conda/envs/kgellm/lib/python3.8/site-packages/peft/peft_model.py", line 689, in load_adapter
adapters_weights = load_peft_weights(model_id, device=torch_device, **hf_hub_download_kwargs)
File "/home/220/.conda/envs/kgellm/lib/python3.8/site-packages/peft/utils/save_and_load.py", line 270, in load_peft_weights
adapters_weights = safe_load_file(filename, device=device)
File "/home/220/.conda/envs/kgellm/lib/python3.8/site-packages/safetensors/torch.py", line 308, in load_file
with safe_open(filename, framework="pt", device=device) as f:
safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization

请问您解决了吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants