Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] InternalError: Check failed: (res == VK_SUCCESS) is false: Vulkan Error, code=-4: VK_ERROR_DEVICE_LOST #2328

Open
aaaaaad333 opened this issue May 11, 2024 · 4 comments
Labels
bug Confirmed bugs

Comments

@aaaaaad333
Copy link

馃悰 Bug

MLC LLM briefly causes the computer to freeze when the prompt is too long, followed by MLC LLM crashing.

To Reproduce

Steps to reproduce the behavior:

  1. Input a 1000 characters long string of the letter 'a'.

Traceback (most recent call last):
File "/home/username/miniconda3/envs/envformlc/bin/mlc_llm", line 8, in
sys.exit(main())
^^^^^^
File "/home/username/miniconda3/envs/envformlc/lib/python3.12/site-packages/mlc_llm/main.py", line 37, in main
cli.main(sys.argv[2:])
File "/home/username/miniconda3/envs/envformlc/lib/python3.12/site-packages/mlc_llm/cli/chat.py", line 42, in main
chat(
File "/home/username/miniconda3/envs/envformlc/lib/python3.12/site-packages/mlc_llm/interface/chat.py", line 160, in chat
cm.generate(
File "/home/username/miniconda3/envs/envformlc/lib/python3.12/site-packages/mlc_llm/chat_module.py", line 863, in generate
self._prefill(prompt, generation_config=generation_config)
File "/home/username/miniconda3/envs/envformlc/lib/python3.12/site-packages/mlc_llm/chat_module.py", line 1086, in _prefill
self._prefill_func(
File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.call
File "tvm/_ffi/_cython/./packed_func.pxi", line 277, in tvm._ffi._cy3.core.FuncCall
File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
File "/home/username/miniconda3/envs/envformlc/lib/python3.12/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
raise py_err
File "/workspace/mlc-llm/cpp/llm_chat.cc", line 1697, in mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtrtvm::runtime::Object const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#5}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
File "/workspace/mlc-llm/cpp/llm_chat.cc", line 1010, in mlc::llm::LLMChat::PrefillStep(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, bool, bool, mlc::llm::PlaceInPrompt, tvm::runtime::String)
File "/workspace/mlc-llm/cpp/llm_chat.cc", line 1241, in mlc::llm::LLMChat::SampleTokenFromLogits(tvm::runtime::NDArray, picojson::object_with_ordered_keys)
File "/workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/ndarray.h", line 405, in mlc::llm::LLMChat::UpdateLogitsOrProbOnCPUSync(tvm::runtime::NDArray)
tvm.error.InternalError: Traceback (most recent call last):
8: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtrtvm::runtime::Object const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#5}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
at /workspace/mlc-llm/cpp/llm_chat.cc:1697
7: mlc::llm::LLMChat::PrefillStep(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, bool, bool, mlc::llm::PlaceInPrompt, tvm::runtime::String)
at /workspace/mlc-llm/cpp/llm_chat.cc:1010
6: mlc::llm::LLMChat::SampleTokenFromLogits(tvm::runtime::NDArray, picojson::object_with_ordered_keys)
at /workspace/mlc-llm/cpp/llm_chat.cc:1241
5: mlc::llm::LLMChat::UpdateLogitsOrProbOnCPUSync(tvm::runtime::NDArray)
at /workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/ndarray.h:405
4: tvm::runtime::NDArray::CopyFromTo(DLTensor const*, DLTensor*, void*)
3: tvm::runtime::DeviceAPI::CopyDataFromTo(DLTensor*, DLTensor*, void*)
2: tvm::runtime::vulkan::VulkanDeviceAPI::CopyDataFromTo(void const*, unsigned long, void*, unsigned long, unsigned long, DLDevice, DLDevice, DLDataType, void*)
1: tvm::runtime::vulkan::VulkanStream::Synchronize()
0: _ZN3tvm7runtime6deta
File "/workspace/tvm/src/runtime/vulkan/vulkan_stream.cc", line 155
InternalError: Check failed: (res == VK_SUCCESS) is false: Vulkan Error, code=-4: VK_ERROR_DEVICE_LOST
terminate called after throwing an instance of 'tvm::runtime::InternalError'
what(): [22:27:47] /workspace/tvm/src/runtime/vulkan/vulkan_device.cc:402: InternalError: Check failed: (__e == VK_SUCCESS) is false: Vulkan Error, code=-4: VK_ERROR_DEVICE_LOST
Stack trace:
0: _ZN3tvm7runtime6deta
1: tvm::runtime::vulkan::VulkanDevice::QueueSubmit(VkSubmitInfo, VkFence_T*) const
2: tvm::runtime::vulkan::VulkanStream::Synchronize()
3: tvm::runtime::vulkan::VulkanDeviceAPI::StreamSync(DLDevice, void*)
4: ZN3tvm7runtime6vulkan15VulkanDeviceAPI13FreeDataS
5: tvm::runtime::NDArray::Internal::DefaultDeleter(tvm::runtime::Object*)
6: tvm::runtime::relax_vm::PagedAttentionKVCacheObj::~PagedAttentionKVCacheObj()
7: ZN3tvm7runtime18SimpleObjAllocator7H
8: tvm::runtime::ObjectPtrtvm::runtime::Object::reset()
at /workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/object.h:455
9: tvm::runtime::ObjectPtrtvm::runtime::Object::~ObjectPtr()
at /workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/object.h:404
10: tvm::runtime::ObjectRef::~ObjectRef()
at /workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/object.h:519
11: mlc::llm::LLMChat::~LLMChat()
at /workspace/mlc-llm/cpp/llm_chat.cc:371
12: mlc::llm::LLMChatModule::~LLMChatModule()
at /workspace/mlc-llm/cpp/llm_chat.cc:1638
13: tvm::runtime::SimpleObjAllocator::Handlermlc::llm::LLMChatModule::Deleter
(tvm::runtime::Object*)
at /workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/memory.h:138
14: tvm::runtime::Object::DecRef()
at /workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/object.h:846
15: tvm::runtime::Object::DecRef()
at /workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/object.h:842
16: tvm::runtime::ObjectPtrtvm::runtime::Object::reset()
at /workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/object.h:455
17: tvm::runtime::ObjectPtrtvm::runtime::Object::~ObjectPtr()
at /workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/object.h:404
18: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtrtvm::runtime::Object const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#12}::~TVMRetValue()
at /workspace/mlc-llm/cpp/llm_chat.cc:1757
19: tvm::runtime::SimpleObjAllocator::Handler<tvm::runtime::PackedFuncSubObj<mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtrtvm::runtime::Object const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#12}> >::Deleter
(tvm::runtime::Object*)
at /workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/memory.h:138
20: TVMObjectFree
21: __pyx_tp_dealloc_3tvm_4_ffi_4_cy3_4core_PackedFuncBase(_object*)
22: 0x00000000005cab98
23: 0xffffffffffffffff

Aborted (core dumped)

Expected behavior

I expect the software to run normally and the model to answer.

Environment

  • Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): Vulkan
  • Operating system (e.g. Ubuntu/Windows/MacOS/...): Linux Mint
  • Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): PC, Intel漏 Celeron漏 N5105 @ 2.00GHz 脳 4. Integrated GPU.
  • How you installed MLC-LLM (conda, source): conda
  • How you installed TVM-Unity (pip, source): I don't remember installing this. Maybe I already had it.
  • Python version (e.g. 3.10): 3.12.3
  • GPU driver version (if applicable):
  • CUDA/cuDNN version (if applicable):
  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
  • Any other relevant information:

Additional context

The 'aaaaaaa' prompt consistently causes the application to crash, but this behavior is also observed with shorter, more complex prompts. I love MLC-LLM, it's so much faster than koboldcpp on my mini PC. Please let me know if there is anything I can do.

@aaaaaad333 aaaaaad333 added the bug Confirmed bugs label May 11, 2024
@tqchen
Copy link
Contributor

tqchen commented May 11, 2024

do you mind try out the python api https://llm.mlc.ai/docs/deploy/python_engine.html and provide a reproducible script that can bring ths error?

@aaaaaad333
Copy link
Author

Thank you, this script causes the error every time it's run:

from mlc_llm import MLCEngine

# Create engine
model = "HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC"
engine = MLCEngine(model)

# Run chat completion in OpenAI API.
for response in engine.chat.completions.create(
    messages=[{"role": "user", "content": """What a profound and timeless question!

The meaning of life is a topic that has puzzled philosophers, theologians, and scientists for centuries. While there may not be a definitive answer, I can offer some perspectives and insights that might be helpful.

One approach is to consider the concept of purpose. What gives your life significance? What are your values, passions, and goals? For many people, finding meaning and purpose in life involves pursuing their values and interests, building meaningful relationships, and making a positive impact on the world.

Another perspective is to look at the human experience as a whole. We are social creatures, and our lives are intertwined with those of others. We have a natural desire for connection, community, and belonging. We also have a need for self-expression, creativity, and personal growth. These aspects of human nature can be seen as fundamental to our existence and provide a sense of meaning.

Some people find meaning in their lives through spirituality or religion. They may believe that their existence has a higher purpose, and that their experiences and challenges are part of a larger plan.

Others may find meaning through their work, hobbies, or activities that bring them joy and fulfillment. They may believe that their existence has a purpose because they are contributing to the greater good, making a positive impact, or leaving a lasting legacy.

Ultimately, the meaning of life is a highly personal and subjective concept. It can be influenced by our experiences, values, and perspectives. While there may not be a single, definitive answer, exploring these questions and reflecting on our own experiences can help us discover our own sense of purpose and meaning.

What are your thoughts on the meaning of life? What gives your life significance?
"""}],
    model=model,
    stream=True,
):
    for choice in response.choices:
        print(choice.delta.content, end="", flush=True)
print("\n")

engine.terminate()

This is the full log:

(envformlc) 11:04:58 username@hostname:~/mlc$ python ./mlc.py 
[2024-05-12 11:09:23] INFO auto_device.py:88: Not found device: cuda:0
[2024-05-12 11:09:25] INFO auto_device.py:88: Not found device: rocm:0
[2024-05-12 11:09:26] INFO auto_device.py:88: Not found device: metal:0
[2024-05-12 11:09:28] INFO auto_device.py:79: Found device: vulkan:0
[2024-05-12 11:09:28] INFO auto_device.py:79: Found device: vulkan:1
[2024-05-12 11:09:30] INFO auto_device.py:88: Not found device: opencl:0
[2024-05-12 11:09:30] INFO auto_device.py:35: Using device: vulkan:0
[2024-05-12 11:09:30] INFO chat_module.py:362: Downloading model from HuggingFace: HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
[2024-05-12 11:09:30] INFO download.py:133: Weights already downloaded: /home/username/.cache/mlc_llm/model_weights/mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
[2024-05-12 11:09:30] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-05-12 11:09:30] INFO jit.py:160: Using cached model lib: /home/username/.cache/mlc_llm/model_lib/dc91913de42964b1f58e63f0d45a691e.so
[2024-05-12 11:09:30] INFO engine_base.py:124: The selected engine mode is local. We choose small max batch size and KV cache capacity to use less GPU memory.
[2024-05-12 11:09:30] INFO engine_base.py:149: If you don't have concurrent requests and only use the engine interactively, please select mode "interactive".
[2024-05-12 11:09:30] INFO engine_base.py:154: If you have high concurrent requests and want to maximize the GPU memory utilization, please select mode "server".
[11:09:30] /workspace/mlc-llm/cpp/serve/config.cc:601: Under mode "local", max batch size will be set to 4, max KV cache token capacity will be set to 8192, prefill chunk size will be set to 1024. 
[11:09:30] /workspace/mlc-llm/cpp/serve/config.cc:601: Under mode "interactive", max batch size will be set to 1, max KV cache token capacity will be set to 8192, prefill chunk size will be set to 1024. 
[11:09:30] /workspace/mlc-llm/cpp/serve/config.cc:601: Under mode "server", max batch size will be set to 80, max KV cache token capacity will be set to 41512, prefill chunk size will be set to 1024. 
[11:09:30] /workspace/mlc-llm/cpp/serve/config.cc:678: The actual engine mode is "local". So max batch size is 4, max KV cache token capacity is 8192, prefill chunk size is 1024.
[11:09:30] /workspace/mlc-llm/cpp/serve/config.cc:683: Estimated total single GPU memory usage: 5736.325 MB (Parameters: 4308.133 MB. KVCache: 1092.268 MB. Temporary buffer: 335.925 MB). The actual usage might be slightly larger than the estimated number.
Exception in thread Thread-1 (_background_loop):
Traceback (most recent call last):
  File "/home/username/miniconda3/envs/envformlc/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/home/username/miniconda3/envs/envformlc/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/home/username/miniconda3/envs/envformlc/lib/python3.12/site-packages/mlc_llm/serve/engine_base.py", line 482, in _background_loop
    self._ffi["run_background_loop"]()
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "/home/username/miniconda3/envs/envformlc/lib/python3.12/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
  File "/workspace/mlc-llm/cpp/serve/threaded_engine.cc", line 168, in mlc::llm::serve::ThreadedEngineImpl::RunBackgroundLoop()
  File "/workspace/mlc-llm/cpp/serve/engine.cc", line 328, in mlc::llm::serve::EngineImpl::Step()
  File "/workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc", line 233, in mlc::llm::serve::NewRequestPrefillActionObj::Step(mlc::llm::serve::EngineState)
  File "/workspace/mlc-llm/cpp/serve/sampler/cpu_sampler.cc", line 344, in mlc::llm::serve::CPUSampler::BatchRenormalizeProbsByTopP(tvm::runtime::NDArray, std::vector<int, std::allocator<int> > const&, tvm::runtime::Array<tvm::runtime::String, void> const&, tvm::runtime::Array<mlc::llm::serve::GenerationConfig, void> const&)
  File "/workspace/mlc-llm/cpp/serve/sampler/cpu_sampler.cc", line 560, in mlc::llm::serve::CPUSampler::CopyProbsToCPU(tvm::runtime::NDArray)
  File "/workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/ndarray.h", line 405, in tvm::runtime::NDArray::CopyFrom(tvm::runtime::NDArray const&)
tvm.error.InternalError: Traceback (most recent call last):
  10: mlc::llm::serve::ThreadedEngineImpl::RunBackgroundLoop()
        at /workspace/mlc-llm/cpp/serve/threaded_engine.cc:168
  9: mlc::llm::serve::EngineImpl::Step()
        at /workspace/mlc-llm/cpp/serve/engine.cc:328
  8: mlc::llm::serve::NewRequestPrefillActionObj::Step(mlc::llm::serve::EngineState)
        at /workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc:233
  7: mlc::llm::serve::CPUSampler::BatchRenormalizeProbsByTopP(tvm::runtime::NDArray, std::vector<int, std::allocator<int> > const&, tvm::runtime::Array<tvm::runtime::String, void> const&, tvm::runtime::Array<mlc::llm::serve::GenerationConfig, void> const&)
        at /workspace/mlc-llm/cpp/serve/sampler/cpu_sampler.cc:344
  6: mlc::llm::serve::CPUSampler::CopyProbsToCPU(tvm::runtime::NDArray)
        at /workspace/mlc-llm/cpp/serve/sampler/cpu_sampler.cc:560
  5: tvm::runtime::NDArray::CopyFrom(tvm::runtime::NDArray const&)
        at /workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/ndarray.h:405
  4: tvm::runtime::NDArray::CopyFromTo(DLTensor const*, DLTensor*, void*)
  3: tvm::runtime::DeviceAPI::CopyDataFromTo(DLTensor*, DLTensor*, void*)
  2: tvm::runtime::vulkan::VulkanDeviceAPI::CopyDataFromTo(void const*, unsigned long, void*, unsigned long, unsigned long, DLDevice, DLDevice, DLDataType, void*)
  1: tvm::runtime::vulkan::VulkanStream::Synchronize()
  0: _ZN3tvm7runtime6deta
  File "/workspace/tvm/src/runtime/vulkan/vulkan_stream.cc", line 155
InternalError: Check failed: (res == VK_SUCCESS) is false: Vulkan Error, code=-4: VK_ERROR_DEVICE_LOST
^CTraceback (most recent call last):
  File "/home/username/mlc/./mlc.py", line 8, in <module>
    for response in engine.chat.completions.create(
  File "/home/username/miniconda3/envs/envformlc/lib/python3.12/site-packages/mlc_llm/serve/engine.py", line 1735, in _handle_chat_completion
    for delta_outputs in self._generate(prompts, generation_cfg, request_id):  # type: ignore
  File "/home/username/miniconda3/envs/envformlc/lib/python3.12/site-packages/mlc_llm/serve/engine.py", line 1858, in _generate
    delta_outputs = self.state.sync_output_queue.get()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/miniconda3/envs/envformlc/lib/python3.12/queue.py", line 171, in get
    self.not_empty.wait()
  File "/home/username/miniconda3/envs/envformlc/lib/python3.12/threading.py", line 355, in wait
    waiter.acquire()
KeyboardInterrupt
^CException ignored in: <module 'threading' from '/home/username/miniconda3/envs/envformlc/lib/python3.12/threading.py'>
Traceback (most recent call last):
  File "/home/username/miniconda3/envs/envformlc/lib/python3.12/threading.py", line 1622, in _shutdown
    lock.acquire()
KeyboardInterrupt: 

@tqchen
Copy link
Contributor

tqchen commented May 12, 2024

Thank you, do you also mind comment about the GPu you have and the vram size ?

@aaaaaad333
Copy link
Author

I don't have a separate GPU, I'm using a Celeron 5105 with an Intel UHD Graphics 24EU Mobile, with no VRAM of its own.

lspci -v | grep VGA -A 12
00:02.0 VGA compatible controller: Intel Corporation JasperLake [UHD Graphics] (rev 01) (prog-if 00 [VGA controller])
	DeviceName: Onboard - Video
	Subsystem: Intel Corporation JasperLake [UHD Graphics]
	Flags: bus master, fast devsel, latency 0, IRQ 129
	Memory at 6000000000 (64-bit, non-prefetchable) [size=16M]
	Memory at 4000000000 (64-bit, prefetchable) [size=256M]
	I/O ports at 5000 [size=64]
	Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: i915
	Kernel modules: i915

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bugs
Projects
None yet
Development

No branches or pull requests

2 participants