Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

烦请帮忙看看,微调后运行cli_demo.py出现维度不一致问题; RuntimeError: The size of tensor a (12288) must match the size of tensor b (25165824) at non-singleton dimension 0 #339

Open
New-start-man opened this issue Jan 25, 2024 · 2 comments

Comments

@New-start-man
Copy link

微调模型训练已完成,为何调用时依然出现问题呢?
python cli_demo.py --from_pretrained "checkpoints/finetune-visualglm-6b-01-24-20-02/" --quant 4
[2024-01-25 10:40:45,884] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /home/anaconda3/envs/pytorch/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/anaconda3/envs/pytorch did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/anaconda3/envs/pytorch/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
[2024-01-25 10:40:53,953] [INFO] building FineTuneVisualGLMModel model ...
[2024-01-25 10:40:53,955] [INFO] [RANK 0] > initializing model parallel with size 1
[2024-01-25 10:40:53,955] [INFO] [RANK 0] You didn't pass in LOCAL_WORLD_SIZE environment variable. We use the guessed LOCAL_WORLD_SIZE=1. If this is wrong, please pass the LOCAL_WORLD_SIZE manually.
[2024-01-25 10:40:53,956] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
[2024-01-25 10:41:01,778] [INFO] [RANK 0] replacing layer 0 attention with lora
[2024-01-25 10:41:02,129] [INFO] [RANK 0] replacing layer 14 attention with lora
[2024-01-25 10:41:02,481] [INFO] [RANK 0] replacing chatglm linear layer with 4bit
[2024-01-25 10:41:30,816] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 7802848768
[2024-01-25 10:41:33,995] [INFO] [RANK 0] global rank 0 is loading checkpoint checkpoints/finetune-visualglm-6b-01-24-20-02/300/mp_rank_00_model_states.pt
Traceback (most recent call last):
File "/home/PycharmProjects/VisualGLM-6B/cli_demo.py", line 103, in
main()
File "/home/PycharmProjects/VisualGLM-6B/cli_demo.py", line 30, in main
model, model_args = AutoModel.from_pretrained(
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/sat/model/base_model.py", line 338, in from_pretrained
return cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=build_only, overwrite_args=overwrite_args, **kwargs)
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/sat/model/base_model.py", line 332, in from_pretrained_base
load_checkpoint(model, args, load_path=model_path, prefix=prefix)
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/sat/training/model_io.py", line 273, in load_checkpoint
missing_keys, unexpected_keys = module.load_state_dict(sd['module'], strict=False)
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2027, in load_state_dict
load(self, state_dict)
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2015, in load
load(child, child_state_dict, child_prefix)
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2015, in load
load(child, child_state_dict, child_prefix)
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2015, in load
load(child, child_state_dict, child_prefix)
[Previous line repeated 3 more times]
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2009, in load
module._load_from_state_dict(
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/sat/model/finetune/lora2.py", line 49, in load_from_state_dict
self.weight.data.copy
(state_dict[prefix+'weight'])
RuntimeError: The size of tensor a (12288) must match the size of tensor b (25165824) at non-singleton dimension 0

@xiongxiaochu
Copy link

请问能发一下用来训练的模型文件吗?我这边mac无法安装triton不能用sat代码来下,然后清华开源的模型又有问题。。。

@New-start-man
Copy link
Author

请问能发一下用来训练的模型文件吗?我这边mac无法安装triton不能用sat代码来下,然后清华开源的模型又有问题。。。

您好,训练后的模型如何发给您呢?似乎没有太好的路径可以把微调后的模型文件共享

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants