Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Not able to do lora inference with phi-3 #4715

Closed
WeiXiaoSummer opened this issue May 9, 2024 · 4 comments
Closed

[Bug]: Not able to do lora inference with phi-3 #4715

WeiXiaoSummer opened this issue May 9, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@WeiXiaoSummer
Copy link

Your current environment

The output of `python collect_env.py`

馃悰 Describe the bug

The following error appeared when trying to do lora inference with phi-3 using the newest vllm version:

Exception while reading stream response: Loading lora data/loras/jt_snc_dpo failed

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 150, in _load_lora
    lora = self._lora_model_cls.from_local_checkpoint(
  File "/usr/local/lib/python3.10/dist-packages/vllm/lora/models.py", line 225, in from_local_checkpoint
    raise ValueError(
ValueError: While loading data/loras/jt_snc_dpo, expected target modules in ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj', 'embed_tokens', 'lm_head'] but received ['gate_up_proj', 'qkv_proj']. Please verify that the loaded LoRA module is correct

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/model_wrapper.py", line 269, in write_response_to_queue
    async for chunk in generator:
  File "/app/model/model.py", line 50, in generator
    async for output in vllm_generator:
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 666, in generate
    raise e
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 660, in generate
    async for request_output in stream:
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 77, in __anext__
    raise result
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    task.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 501, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(
  File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 475, in engine_step
    request_outputs = await self.engine.step_async()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 221, in step_async
    output = await self.model_executor.execute_model_async(
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 148, in execute_model_async
    output = await make_async(self.driver_worker.execute_model
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 249, in execute_model
    output = self.model_runner.execute_model(seq_group_metadata_list,
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 790, in execute_model
    self.set_active_loras(lora_requests, lora_mapping)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 901, in set_active_loras
    self.lora_manager.set_active_loras(lora_requests, lora_mapping)
  File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 113, in set_active_loras
    self._apply_loras(lora_requests)
  File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 235, in _apply_loras
    self.add_lora(lora)
  File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 243, in add_lora
    lora = self._load_lora(lora_request)
  File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 162, in _load_lora
    raise RuntimeError(
RuntimeError: Loading lora data/loras/jt_snc_dpo failed

Below is the config file of the adapter:

{
  "alpha_pattern": {},
  "auto_mapping": null,
  "base_model_name_or_path": "microsoft/Phi-3-mini-128k-instruct",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "layer_replication": null,
  "layers_pattern": null,
  "layers_to_transform": null,
  "loftq_config": {},
  "lora_alpha": 64,
  "lora_dropout": 0.1,
  "megatron_config": null,
  "megatron_core": "megatron.core",
  "modules_to_save": null,
  "peft_type": "LORA",
  "r": 32,
  "rank_pattern": {},
  "revision": null,
  "target_modules": [
    "o_proj",
    "gate_up_proj",
    "down_proj",
    "qkv_proj"
  ],
  "task_type": "CAUSAL_LM",
  "use_dora": false,
  "use_rslora": false
}
@WeiXiaoSummer WeiXiaoSummer added the bug Something isn't working label May 9, 2024
@Raibows
Copy link

Raibows commented May 13, 2024

The reason is that vllm project treats the phi3 as llama architecture, i.e., splitting the merged qkv_proj into q, k and v projs. A simple workaround is to convert the tensor weight of your adapter/lora checkpoint to match it.

Here is a tested script in the gist. Feel free to use.

@SHIMURA0
Copy link

@Raibows thanks for your helpful python script! May I ask another question? I want to use Ollama with a fine tuned Phi3 model (using QLoRA), and now I have succeed transformed the LoRA weights into GGMl file (using llama.cpp), but I think I should merge back the qkv_proj layer weights so that I can use it on Ollama (because now I just got an error that "Error: llama runner process has terminated: signal: abort trap error:failed to apply lora adapter"). I will be grateful if you can give me some suggestions!

@WeiXiaoSummer
Copy link
Author

@Raibows thanks for the script! It worked like a charm!!!

@arunpatala
Copy link

ERROR 05-20 08:02:25 async_llm_engine.py:43] ValueError: While loading /data/llm_resume_profiles_phi3_v1_split, expected target modules in ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj', 'embed_tokens', 'lm_head'] but received ['gate_up_proj']. Please verify that the loaded LoRA module is correct^M

can we also fix gate_up_proj in a similar way? i am using phi3-128k version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants