We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
首token在GPU上验证时延随输入长度增长明显,基本是线性倍增,从512到2048,首token的时延基本上涨接近4倍,从500ms上涨到1.8s 输入部分应该是并行计算的,为什么时延会增长这么大呢?
No response
tokenizer = AutoTokenizer.from_pretrained(base_model_name_or_path, trust_remote_code=True) base_model = AutoModel.from_pretrained(base_model_name_or_path, trust_remote_code=True, revision=True) model = PeftModel.from_pretrained(base_model, peft_model_id,torch_dtype=torch.float16) model.cuda()
str = "测试文本" pt_data = tokenizer(str, return_tensors="pt", padding=True).to('cuda') gen_kwargs = {"max_length": pt_data["input_ids"].shape[-1] + 1, "num_beams": 1, "do_sample": False, "top_p": 0.8, "temperature": 0, "logits_processor": logits_processor} outputs = model.generate(**pt_data, **gen_kwargs)
V100 T4 两个GPU上验证了该问题
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Is there an existing issue for this?
Current Behavior
首token在GPU上验证时延随输入长度增长明显,基本是线性倍增,从512到2048,首token的时延基本上涨接近4倍,从500ms上涨到1.8s
输入部分应该是并行计算的,为什么时延会增长这么大呢?
Expected Behavior
No response
Steps To Reproduce
tokenizer = AutoTokenizer.from_pretrained(base_model_name_or_path, trust_remote_code=True)
base_model = AutoModel.from_pretrained(base_model_name_or_path, trust_remote_code=True, revision=True)
model = PeftModel.from_pretrained(base_model, peft_model_id,torch_dtype=torch.float16)
model.cuda()
str = "测试文本"
pt_data = tokenizer(str, return_tensors="pt", padding=True).to('cuda')
gen_kwargs = {"max_length": pt_data["input_ids"].shape[-1] + 1, "num_beams": 1, "do_sample": False,
"top_p": 0.8,
"temperature": 0, "logits_processor": logits_processor}
outputs = model.generate(**pt_data, **gen_kwargs)
Environment
Anything else?
No response
The text was updated successfully, but these errors were encountered: