Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can peft support ColumnParallelLinear? #1711

Open
2 of 4 tasks
wjn1996 opened this issue May 5, 2024 · 2 comments
Open
2 of 4 tasks

Can peft support ColumnParallelLinear? #1711

wjn1996 opened this issue May 5, 2024 · 2 comments

Comments

@wjn1996
Copy link

wjn1996 commented May 5, 2024

System Info

I have a model, and the architecture has xxxParallel attributes, which are used for parallel inference:

BaichuanForCausalLM(
  (model): BaiChuanModel(
    (embed_tokens): VocabParallelEmbedding()
    (layers): ModuleList(
      (0-31): 32 x BaiChuanDecoderLayer(
        (self_attn): BaiChuanAttention(
          (W_pack): ColumnParallelLinear()
          (o_proj): RowParallelLinear()
          (attn): PagedAttentionWithALiBi()
        )
        (mlp): BaiChuanMLP(
          (gate_up_proj): ColumnParallelLinear()
          (down_proj): RowParallelLinear()
          (act_fn): SiluAndMul()
        )
        (input_layernorm): RMSNorm()
        (post_attention_layernorm): RMSNorm()
      )
    )
    (norm): RMSNorm()
  )
  (lm_head): ColumnParallelLinear()
  (sampler): Sampler()
)

I want to directly load this model with peft (lora), but it throws an error:

ValueError: Target module ColumnParallelLinear() is not supported. Currently, only the following modules are supported: `torch.nn.Linear`, `torch.nn.Embedding`, `torch.nn.Conv2d`, `transformers.pytorch_utils.Conv1D`.

So, how can I implement this process without any model architecture update?

Who can help?

@pacman100 @younesbelkada @BenjaminBossan @sayakpaul

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

# LLM 和 SamplingParams
# pip install vllm==0.2.1 (cuda=11.8)
from vllm import LLM, SamplingParams
from peft import PeftModel
# Function to load the PeftModel for performance optimization
def load_peft_model(model, peft_model):
    peft_model = PeftModel.from_pretrained(model, peft_model)
    return peft_model

prompts = [
    "xxx",
]

sampling_params = SamplingParams(temperature=1.0, top_p=0.9)

model_name = "baichuan2-7b-base"
origin_model_path = "xxx/pre-trained-lm/{}".format(model_name)
saved_model_path = "xxx/v2/{}/checkpoint-8000".format(model_name) # lora path
save_answer_path = "xxx/{}".format(model_name)

llm = LLM(model=origin_model_path, trust_remote_code=True)

model = llm.llm_engine.workers[0].model
model = load_peft_model(model, saved_model_path)
llm.llm_engine.workers[0].model = model


outputs = llm.generate(
    prompts, 
    sampling_params,
    # lora_request=LoRARequest("headline-lora", 1, saved_model_path)
    )


for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Expected behavior

solve this issue.

@BenjaminBossan
Copy link
Member

Copy link

github-actions bot commented Jun 4, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants