New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Not able to do lora inference with phi-3 #4715
Comments
The reason is that vllm project treats the phi3 as llama architecture, i.e., splitting the merged Here is a tested script in the gist. Feel free to use. |
@Raibows thanks for your helpful python script! May I ask another question? I want to use Ollama with a fine tuned Phi3 model (using QLoRA), and now I have succeed transformed the LoRA weights into GGMl file (using llama.cpp), but I think I should merge back the qkv_proj layer weights so that I can use it on Ollama (because now I just got an error that "Error: llama runner process has terminated: signal: abort trap error:failed to apply lora adapter"). I will be grateful if you can give me some suggestions! |
@Raibows thanks for the script! It worked like a charm!!! |
ERROR 05-20 08:02:25 async_llm_engine.py:43] ValueError: While loading /data/llm_resume_profiles_phi3_v1_split, expected target modules in ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj', 'embed_tokens', 'lm_head'] but received ['gate_up_proj']. Please verify that the loaded LoRA module is correct^M can we also fix gate_up_proj in a similar way? i am using phi3-128k version. |
Your current environment
馃悰 Describe the bug
The following error appeared when trying to do lora inference with phi-3 using the newest vllm version:
Below is the config file of the adapter:
The text was updated successfully, but these errors were encountered: