[Bug]: Not able to do lora inference with phi-3 #4715

WeiXiaoSummer · 2024-05-09T14:03:29Z

Your current environment

The output of `python collect_env.py`

🐛 Describe the bug

The following error appeared when trying to do lora inference with phi-3 using the newest vllm version:

Exception while reading stream response: Loading lora data/loras/jt_snc_dpo failed

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 150, in _load_lora
    lora = self._lora_model_cls.from_local_checkpoint(
  File "/usr/local/lib/python3.10/dist-packages/vllm/lora/models.py", line 225, in from_local_checkpoint
    raise ValueError(
ValueError: While loading data/loras/jt_snc_dpo, expected target modules in ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj', 'embed_tokens', 'lm_head'] but received ['gate_up_proj', 'qkv_proj']. Please verify that the loaded LoRA module is correct

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/model_wrapper.py", line 269, in write_response_to_queue
    async for chunk in generator:
  File "/app/model/model.py", line 50, in generator
    async for output in vllm_generator:
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 666, in generate
    raise e
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 660, in generate
    async for request_output in stream:
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 77, in __anext__
    raise result
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    task.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 501, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(
  File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 475, in engine_step
    request_outputs = await self.engine.step_async()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 221, in step_async
    output = await self.model_executor.execute_model_async(
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 148, in execute_model_async
    output = await make_async(self.driver_worker.execute_model
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 249, in execute_model
    output = self.model_runner.execute_model(seq_group_metadata_list,
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 790, in execute_model
    self.set_active_loras(lora_requests, lora_mapping)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 901, in set_active_loras
    self.lora_manager.set_active_loras(lora_requests, lora_mapping)
  File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 113, in set_active_loras
    self._apply_loras(lora_requests)
  File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 235, in _apply_loras
    self.add_lora(lora)
  File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 243, in add_lora
    lora = self._load_lora(lora_request)
  File "/usr/local/lib/python3.10/dist-packages/vllm/lora/worker_manager.py", line 162, in _load_lora
    raise RuntimeError(
RuntimeError: Loading lora data/loras/jt_snc_dpo failed

Below is the config file of the adapter:

{
  "alpha_pattern": {},
  "auto_mapping": null,
  "base_model_name_or_path": "microsoft/Phi-3-mini-128k-instruct",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "layer_replication": null,
  "layers_pattern": null,
  "layers_to_transform": null,
  "loftq_config": {},
  "lora_alpha": 64,
  "lora_dropout": 0.1,
  "megatron_config": null,
  "megatron_core": "megatron.core",
  "modules_to_save": null,
  "peft_type": "LORA",
  "r": 32,
  "rank_pattern": {},
  "revision": null,
  "target_modules": [
    "o_proj",
    "gate_up_proj",
    "down_proj",
    "qkv_proj"
  ],
  "task_type": "CAUSAL_LM",
  "use_dora": false,
  "use_rslora": false
}

The text was updated successfully, but these errors were encountered:

Raibows · 2024-05-13T02:08:01Z

The reason is that vllm project treats the phi3 as llama architecture, i.e., splitting the merged qkv_proj into q, k and v projs. A simple workaround is to convert the tensor weight of your adapter/lora checkpoint to match it.

Here is a tested script in the gist. Feel free to use.

SHIMURA0 · 2024-05-14T03:36:33Z

@Raibows thanks for your helpful python script! May I ask another question? I want to use Ollama with a fine tuned Phi3 model (using QLoRA), and now I have succeed transformed the LoRA weights into GGMl file (using llama.cpp), but I think I should merge back the qkv_proj layer weights so that I can use it on Ollama (because now I just got an error that "Error: llama runner process has terminated: signal: abort trap error:failed to apply lora adapter"). I will be grateful if you can give me some suggestions!

WeiXiaoSummer · 2024-05-17T20:20:59Z

@Raibows thanks for the script! It worked like a charm!!!

arunpatala · 2024-05-20T08:38:40Z

ERROR 05-20 08:02:25 async_llm_engine.py:43] ValueError: While loading /data/llm_resume_profiles_phi3_v1_split, expected target modules in ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj', 'embed_tokens', 'lm_head'] but received ['gate_up_proj']. Please verify that the loaded LoRA module is correct^M

can we also fix gate_up_proj in a similar way? i am using phi3-128k version.

WeiXiaoSummer added the bug Something isn't working label May 9, 2024

SHIMURA0 mentioned this issue May 14, 2024

Looking for help for using llama.cpp with Phi3 model and LoRA ggerganov/llama.cpp#7164

Open

WeiXiaoSummer closed this as completed May 17, 2024

WeiXiaoSummer mentioned this issue May 20, 2024

[Bug]: Phi3 lora module not loading #4915

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Not able to do lora inference with phi-3 #4715

[Bug]: Not able to do lora inference with phi-3 #4715

WeiXiaoSummer commented May 9, 2024

Raibows commented May 13, 2024

SHIMURA0 commented May 14, 2024

WeiXiaoSummer commented May 17, 2024

arunpatala commented May 20, 2024

[Bug]: Not able to do lora inference with phi-3 #4715

[Bug]: Not able to do lora inference with phi-3 #4715

Comments

WeiXiaoSummer commented May 9, 2024

Your current environment

🐛 Describe the bug

Raibows commented May 13, 2024

SHIMURA0 commented May 14, 2024

WeiXiaoSummer commented May 17, 2024

arunpatala commented May 20, 2024