You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
烦请帮忙看看,微调后运行cli_demo.py出现维度不一致问题; RuntimeError: The size of tensor a (12288) must match the size of tensor b (25165824) at non-singleton dimension 0
#339
Open
New-start-man opened this issue
Jan 25, 2024
· 2 comments
bin /home/anaconda3/envs/pytorch/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/anaconda3/envs/pytorch did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/anaconda3/envs/pytorch/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
[2024-01-25 10:40:53,953] [INFO] building FineTuneVisualGLMModel model ...
[2024-01-25 10:40:53,955] [INFO] [RANK 0] > initializing model parallel with size 1
[2024-01-25 10:40:53,955] [INFO] [RANK 0] You didn't pass in LOCAL_WORLD_SIZE environment variable. We use the guessed LOCAL_WORLD_SIZE=1. If this is wrong, please pass the LOCAL_WORLD_SIZE manually.
[2024-01-25 10:40:53,956] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
[2024-01-25 10:41:01,778] [INFO] [RANK 0] replacing layer 0 attention with lora
[2024-01-25 10:41:02,129] [INFO] [RANK 0] replacing layer 14 attention with lora
[2024-01-25 10:41:02,481] [INFO] [RANK 0] replacing chatglm linear layer with 4bit
[2024-01-25 10:41:30,816] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 7802848768
[2024-01-25 10:41:33,995] [INFO] [RANK 0] global rank 0 is loading checkpoint checkpoints/finetune-visualglm-6b-01-24-20-02/300/mp_rank_00_model_states.pt
Traceback (most recent call last):
File "/home/PycharmProjects/VisualGLM-6B/cli_demo.py", line 103, in
main()
File "/home/PycharmProjects/VisualGLM-6B/cli_demo.py", line 30, in main
model, model_args = AutoModel.from_pretrained(
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/sat/model/base_model.py", line 338, in from_pretrained
return cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=build_only, overwrite_args=overwrite_args, **kwargs)
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/sat/model/base_model.py", line 332, in from_pretrained_base
load_checkpoint(model, args, load_path=model_path, prefix=prefix)
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/sat/training/model_io.py", line 273, in load_checkpoint
missing_keys, unexpected_keys = module.load_state_dict(sd['module'], strict=False)
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2027, in load_state_dict
load(self, state_dict)
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2015, in load
load(child, child_state_dict, child_prefix)
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2015, in load
load(child, child_state_dict, child_prefix)
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2015, in load
load(child, child_state_dict, child_prefix)
[Previous line repeated 3 more times]
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2009, in load
module._load_from_state_dict(
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/sat/model/finetune/lora2.py", line 49, in load_from_state_dict
self.weight.data.copy(state_dict[prefix+'weight'])
RuntimeError: The size of tensor a (12288) must match the size of tensor b (25165824) at non-singleton dimension 0
The text was updated successfully, but these errors were encountered:
微调模型训练已完成,为何调用时依然出现问题呢?
python cli_demo.py --from_pretrained "checkpoints/finetune-visualglm-6b-01-24-20-02/" --quant 4
[2024-01-25 10:40:45,884] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /home/anaconda3/envs/pytorch/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/anaconda3/envs/pytorch did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/anaconda3/envs/pytorch/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
[2024-01-25 10:40:53,953] [INFO] building FineTuneVisualGLMModel model ...
[2024-01-25 10:40:53,955] [INFO] [RANK 0] > initializing model parallel with size 1
[2024-01-25 10:40:53,955] [INFO] [RANK 0] You didn't pass in LOCAL_WORLD_SIZE environment variable. We use the guessed LOCAL_WORLD_SIZE=1. If this is wrong, please pass the LOCAL_WORLD_SIZE manually.
[2024-01-25 10:40:53,956] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
[2024-01-25 10:41:01,778] [INFO] [RANK 0] replacing layer 0 attention with lora
[2024-01-25 10:41:02,129] [INFO] [RANK 0] replacing layer 14 attention with lora
[2024-01-25 10:41:02,481] [INFO] [RANK 0] replacing chatglm linear layer with 4bit
[2024-01-25 10:41:30,816] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 7802848768
[2024-01-25 10:41:33,995] [INFO] [RANK 0] global rank 0 is loading checkpoint checkpoints/finetune-visualglm-6b-01-24-20-02/300/mp_rank_00_model_states.pt
Traceback (most recent call last):
File "/home/PycharmProjects/VisualGLM-6B/cli_demo.py", line 103, in
main()
File "/home/PycharmProjects/VisualGLM-6B/cli_demo.py", line 30, in main
model, model_args = AutoModel.from_pretrained(
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/sat/model/base_model.py", line 338, in from_pretrained
return cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=build_only, overwrite_args=overwrite_args, **kwargs)
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/sat/model/base_model.py", line 332, in from_pretrained_base
load_checkpoint(model, args, load_path=model_path, prefix=prefix)
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/sat/training/model_io.py", line 273, in load_checkpoint
missing_keys, unexpected_keys = module.load_state_dict(sd['module'], strict=False)
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2027, in load_state_dict
load(self, state_dict)
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2015, in load
load(child, child_state_dict, child_prefix)
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2015, in load
load(child, child_state_dict, child_prefix)
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2015, in load
load(child, child_state_dict, child_prefix)
[Previous line repeated 3 more times]
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2009, in load
module._load_from_state_dict(
File "/home/anaconda3/envs/pytorch/lib/python3.10/site-packages/sat/model/finetune/lora2.py", line 49, in load_from_state_dict
self.weight.data.copy(state_dict[prefix+'weight'])
RuntimeError: The size of tensor a (12288) must match the size of tensor b (25165824) at non-singleton dimension 0
The text was updated successfully, but these errors were encountered: