[BUG/Help] <title>chatglm2首token时延增长随输入长度成倍快速增长 #664

woaipichuli · 2024-02-08T02:17:38Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

验证chatglm2-6b和chatglm2-6b-int4都出现首token时延随输入长度成倍快速增长，从输入长度512到2048，首token时延从500ms增长至1.8s

Expected Behavior

输入部分应该是并行的，为什么增长会这么明显

Steps To Reproduce

tokenizer = AutoTokenizer.from_pretrained(base_model_name_or_path, trust_remote_code=True)
base_model = AutoModel.from_pretrained(base_model_name_or_path, trust_remote_code=True, revision=True)
model = PeftModel.from_pretrained(base_model, peft_model_id,torch_dtype=torch.float16)

str="测试文本"
pt_data = tokenizer(str, return_tensors="pt", padding=True).to('cuda')
gen_kwargs = {"max_length": pt_data["input_ids"].shape[-1] + 1, "num_beams": 1, "do_sample": False,
"top_p": 0.8,
"temperature": 0, "logits_processor": logits_processor}

outputs = model.generate(**pt_data, **gen_kwargs)

Environment

V100和T4两个GPU上都进行了验证

Anything else?

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG/Help] <title>chatglm2首token时延增长随输入长度成倍快速增长 #664

[BUG/Help] <title>chatglm2首token时延增长随输入长度成倍快速增长 #664

woaipichuli commented Feb 8, 2024

[BUG/Help] <title>chatglm2首token时延增长随输入长度成倍快速增长 #664

[BUG/Help] <title>chatglm2首token时延增长随输入长度成倍快速增长 #664

Comments

woaipichuli commented Feb 8, 2024

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?