-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
量化后的模型推理报错怎么解决 #940
Comments
部署量化后模型服务端 也报错 CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir 'output/baichuan2-7b/v11-20240511-210615/checkpoint-1200-merged' |
Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
量化后的模型推理报错
CUDA_VISIBLE_DEVICES=0 swift infer --model_type baichuan2-7b --model_id_or_path baichuan2-7b-gptq-int4
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/quantization/gptq.py", line 208, in apply_weights
output = ops.gptq_gemm(reshaped_x, weights["qweight"],
RuntimeError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacty of 22.19 GiB of which 14.50 MiB is free. Process 832 has 1.31 GiB memory in use. Process 3711 has 1.31 GiB memory in use. Including non-PyTorch memory, this process has 19.55 GiB memory in use. Of the allocated memory 17.90 GiB is allocated by PyTorch, and 155.89 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Exception raised from malloc at ../c10/cuda/CUDACachingAllocator.cpp:1438 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f1f7070f617 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: + 0x30f6c (0x7f1f707adf6c in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #2: + 0x3139e (0x7f1f707ae39e in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x3175e (0x7f1f707ae75e in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #4: + 0x16c1461 (0x7f1f2eaf7461 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
frame #5: at::detail::empty_generic(c10::ArrayRef, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optionalc10::MemoryFormat) + 0x14 (0x7f1f2eaef674 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
frame #6: at::detail::empty_cuda(c10::ArrayRef, c10::ScalarType, c10::optionalc10::Device, c10::optionalc10::MemoryFormat) + 0x111 (0x7f1f029a4061 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: at::detail::empty_cuda(c10::ArrayRef, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0x31 (0x7f1f029a4331 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #8: at::native::empty_cuda(c10::ArrayRef, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0x20 (0x7f1f02ad13c0 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #9: + 0x2d403a9 (0x7f1f048bc3a9 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #10: + 0x2d4048b (0x7f1f048bc48b in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #11: at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRefc10::SymInt, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0xe7 (0x7f1f2fa1d277 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
frame #12: + 0x295eaef (0x7f1f2fd94aef in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
frame #13: at::_ops::empty_memory_format::call(c10::ArrayRefc10::SymInt, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0x1a3 (0x7f1f2fa613e3 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so)
frame #14: torch::empty(c10::ArrayRef, c10::TensorOptions, c10::optionalc10::MemoryFormat) + 0x23d (0x7f1e2dc1ce0d in /usr/local/lib/python3.10/dist-packages/vllm/_C.cpython-310-x86_64-linux-gnu.so)
frame #15: gptq_gemm(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, bool, int) + 0x2dd (0x7f1e2dc18ffd in /usr/local/lib/python3.10/dist-packages/vllm/_C.cpython-310-x86_64-linux-gnu.so)
frame #16: + 0x94f62 (0x7f1e2dc31f62 in /usr/local/lib/python3.10/dist-packages/vllm/_C.cpython-310-x86_64-linux-gnu.so)
frame #17: + 0x90dac (0x7f1e2dc2ddac in /usr/local/lib/python3.10/dist-packages/vllm/_C.cpython-310-x86_64-linux-gnu.so)
Additional context
Add any other context about the problem here(在这里补充其他信息)
The text was updated successfully, but these errors were encountered: