量化后的模型推理报错怎么解决 #940

greatheart1000 · 2024-05-15T10:17:24Z

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)
量化后的模型推理报错
CUDA_VISIBLE_DEVICES=0 swift infer --model_type baichuan2-7b --model_id_or_path baichuan2-7b-gptq-int4

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/quantization/gptq.py", line 208, in apply_weights
output = ops.gptq_gemm(reshaped_x, weights["qweight"],
RuntimeError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacty of 22.19 GiB of which 14.50 MiB is free. Process 832 has 1.31 GiB memory in use. Process 3711 has 1.31 GiB memory in use. Including non-PyTorch memory, this process has 19.55 GiB memory in use. Of the allocated memory 17.90 GiB is allocated by PyTorch, and 155.89 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Exception raised from malloc at ../c10/cuda/CUDACachingAllocator.cpp:1438 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f1f7070f617 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: + 0x30f6c (0x7f1f707adf6c in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #2: + 0x3139e (0x7f1f707ae39e in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x3175e (0x7f1f707ae75e in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #4: + 0x16c1461 (0x7f1f2eaf7461 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
frame #5: at::detail::empty_generic(c10::ArrayRef, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optionalc10::MemoryFormat) + 0x14 (0x7f1f2eaef674 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
frame #6: at::detail::empty_cuda(c10::ArrayRef, c10::ScalarType, c10::optionalc10::Device, c10::optionalc10::MemoryFormat) + 0x111 (0x7f1f029a4061 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: at::detail::empty_cuda(c10::ArrayRef, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0x31 (0x7f1f029a4331 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #8: at::native::empty_cuda(c10::ArrayRef, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0x20 (0x7f1f02ad13c0 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #9: + 0x2d403a9 (0x7f1f048bc3a9 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #10: + 0x2d4048b (0x7f1f048bc48b in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #11: at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRefc10::SymInt, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0xe7 (0x7f1f2fa1d277 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
frame #12: + 0x295eaef (0x7f1f2fd94aef in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
frame #13: at::_ops::empty_memory_format::call(c10::ArrayRefc10::SymInt, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0x1a3 (0x7f1f2fa613e3 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
frame #14: torch::empty(c10::ArrayRef, c10::TensorOptions, c10::optionalc10::MemoryFormat) + 0x23d (0x7f1e2dc1ce0d in /usr/local/lib/python3.10/dist-packages/vllm/_C.cpython-310-x86_64-linux-gnu.so)
frame #15: gptq_gemm(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, bool, int) + 0x2dd (0x7f1e2dc18ffd in /usr/local/lib/python3.10/dist-packages/vllm/_C.cpython-310-x86_64-linux-gnu.so)
frame #16: + 0x94f62 (0x7f1e2dc31f62 in /usr/local/lib/python3.10/dist-packages/vllm/_C.cpython-310-x86_64-linux-gnu.so)
frame #17: + 0x90dac (0x7f1e2dc2ddac in /usr/local/lib/python3.10/dist-packages/vllm/_C.cpython-310-x86_64-linux-gnu.so)

Additional context
Add any other context about the problem here(在这里补充其他信息)

greatheart1000 · 2024-05-15T10:39:14Z

部署量化后模型服务端也报错
CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir 'output/baichuan2-7b/v11-20240511-210615/checkpoint-1200-merged'

CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir 'output/baichuan2-7b/v11-20240511-210615/checkpoint-1200-merged'
run sh: python /root/swift/swift/cli/deploy.py --ckpt_dir output/baichuan2-7b/v11-20240511-210615/checkpoint-1200-merged
2024-05-15 18:36:47,690 - modelscope - INFO - PyTorch version 2.1.2 Found.
2024-05-15 18:36:47,691 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-05-15 18:36:47,715 - modelscope - INFO - Loading done! Current index file version is 1.14.0, with md5 20ca4569cd83063597978af789773db5 and a total number of 976 components indexed
[INFO:swift] Successfully registered /root/swift/swift/llm/data/dataset_info.json
[INFO:swift] Start time of running main: 2024-05-15 18:36:48.482545
[INFO:swift] ckpt_dir: /root/swift/examples/pytorch/llm/output/baichuan2-7b/v11-20240511-210615/checkpoint-1200-merged
[INFO:swift] Setting args.model_type: baichuan2-7b
[INFO:swift] Setting model_info['revision']: master
[INFO:swift] Setting self.eval_human: True
[INFO:swift] Setting overwrite_generation_config: True
[INFO:swift] args: DeployArguments(model_type='baichuan2-7b', model_id_or_path='baichuan-inc/Baichuan2-7B-Base', model_revision='master', sft_type='full', template_type='default-generation', infer_backend='pt', ckpt_dir='/root/swift/examples/pytorch/llm/output/baichuan2-7b/v11-20240511-210615/checkpoint-1200-merged', load_args_from_ckpt_dir=True, load_dataset_config=False, eval_human=True, device_map_config_path=None, seed=42, dtype='bf16', dataset=[], dataset_seed=42, dataset_test_ratio=1, show_dataset_sample=10, save_result=True, system=None, max_length=None, truncation_strategy='delete', check_dataset_strategy='none', model_name=[None, None], model_author=[None, None], quantization_bit=4, bnb_4bit_comp_dtype='bf16', bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_quant_storage=None, max_new_tokens=2048, do_sample=True, temperature=0.3, top_k=20, top_p=0.7, repetition_penalty=1.0, num_beams=1, stop_words=None, use_flash_attn=None, ignore_args_error=False, stream=True, merge_lora=False, merge_device_map='cpu', save_safetensors=True, overwrite_generation_config=True, verbose=None, custom_register_path=None, custom_dataset_info=None, gpu_memory_utilization=0.9, tensor_parallel_size=1, max_model_len=None, vllm_enable_lora=False, vllm_max_lora_rank=16, lora_modules=[], self_cognition_sample=0, train_dataset_sample=-1, val_dataset_sample=None, safe_serialization=None, model_cache_dir=None, merge_lora_and_save=None, custom_train_dataset_path=[], custom_val_dataset_path=[], vllm_lora_modules=None, host='127.0.0.1', port=8000, ssl_keyfile=None, ssl_certfile=None)
[INFO:swift] Global seed set to 42
INFO: 2024-05-15 18:36:48,526 infer.py:131] device_count: 1
INFO: 2024-05-15 18:36:48,526 infer.py:148] quantization_config: {'quant_method': <QuantizationMethod.BITS_AND_BYTES: 'bitsandbytes'>, '_load_in_8bit': False, '_load_in_4bit': True, 'llm_int8_threshold': 6.0, 'llm_int8_skip_modules': None, 'llm_int8_enable_fp32_cpu_offload': False, 'llm_int8_has_fp16_weight': False, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': True, 'bnb_4bit_compute_dtype': torch.bfloat16, 'bnb_4bit_quant_storage': torch.uint8}
INFO: 2024-05-15 18:36:48,526 model.py:3941] Loading the model using model_dir: /root/swift/examples/pytorch/llm/output/baichuan2-7b/v11-20240511-210615/checkpoint-1200-merged
Traceback (most recent call last):
File "/root/swift/swift/cli/deploy.py", line 5, in
deploy_main()
File "/root/swift/swift/utils/run_utils.py", line 27, in x_main
result = llm_x(args, **kwargs)
File "/root/swift/swift/llm/deploy.py", line 442, in llm_deploy
model, template = prepare_model_template(args)
File "/root/swift/swift/llm/infer.py", line 161, in prepare_model_template
model, tokenizer = get_model_tokenizer(
File "/root/swift/swift/llm/utils/model.py", line 4004, in get_model_tokenizer
model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs, load_model, **kwargs)
File "/root/swift/swift/llm/utils/model.py", line 1106, in get_model_tokenizer_baichuan2
model, tokenizer = get_model_tokenizer_from_repo(
File "/root/swift/swift/llm/utils/model.py", line 815, in get_model_tokenizer_from_repo
model = automodel_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/modelscope/utils/hf_util.py", line 113, in from_pretrained
module_obj = module_class.from_pretrained(model_dir, *model_args,
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
return model_class.from_pretrained(
File "/root/.cache/huggingface/modules/transformers_modules/checkpoint-1200-merged/modeling_baichuan.py", line 609, in from_pretrained
state_dict = torch.load(model_file, map_location="cpu")
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 986, in load
with _open_file_like(f, 'rb') as opened_file:
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 435, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 416, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/root/swift/examples/pytorch/llm/output/baichuan2-7b/v11-20240511-210615/checkpoint-1200-merged/pytorch_model.bin' 怎么解决呢参数是 safetensors格式

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

量化后的模型推理报错怎么解决 #940

量化后的模型推理报错怎么解决 #940

greatheart1000 commented May 15, 2024

greatheart1000 commented May 15, 2024

量化后的模型推理报错怎么解决 #940

量化后的模型推理报错怎么解决 #940

Comments

greatheart1000 commented May 15, 2024

greatheart1000 commented May 15, 2024