You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from vllm import LLM, SamplingParams
sampling_params = SamplingParams(temperature=0, max_tokens=2048)
llm = LLM(model="Llama-3-8B",tensor_parallel_size=4, trust_remote_code=True)
outputs = llm.generate(prompts=prompts, sampling_params=sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(prompt + generated_text)
I use vllm 0.4.1 and 0.4.2 run this code, and I print logits in sampler.py, but I found the logits is different between 0.4.1 and 0.4.2 when the tensor_parallel_size is 4, but is the same when tensor_parallel_size is 2 or 1. TP4 first token logits:
0.4.2
0.4.1 TP2 first token logits:
0.4.2
0.4.1
The text was updated successfully, but these errors were encountered:
Your current environment
馃悰 Describe the bug
I use vllm 0.4.1 and 0.4.2 run this code, and I print logits in sampler.py, but I found the logits is different between 0.4.1 and 0.4.2 when the tensor_parallel_size is 4, but is the same when tensor_parallel_size is 2 or 1.
TP4 first token logits:
0.4.2
0.4.1
TP2 first token logits:
0.4.2
0.4.1
The text was updated successfully, but these errors were encountered: