[Bug]: why the logits is different between 0.4.1 and 0.4.2 #4740

sitabulaixizawaluduo · 2024-05-10T13:18:20Z

Your current environment

The output of `python collect_env.py`

🐛 Describe the bug

from vllm import LLM, SamplingParams
sampling_params = SamplingParams(temperature=0, max_tokens=2048)
llm = LLM(model="Llama-3-8B",tensor_parallel_size=4, trust_remote_code=True)
outputs = llm.generate(prompts=prompts, sampling_params=sampling_params)
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(prompt + generated_text)

I use vllm 0.4.1 and 0.4.2 run this code, and I print logits in sampler.py, but I found the logits is different between 0.4.1 and 0.4.2 when the tensor_parallel_size is 4, but is the same when tensor_parallel_size is 2 or 1.
TP4 first token logits:
0.4.2

0.4.1

TP2 first token logits:
0.4.2

0.4.1

The text was updated successfully, but these errors were encountered:

sitabulaixizawaluduo · 2024-05-10T13:19:24Z

I useFlashAttention backend with flash-attn==2.5.2

sitabulaixizawaluduo added the bug Something isn't working label May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: why the logits is different between 0.4.1 and 0.4.2 #4740

[Bug]: why the logits is different between 0.4.1 and 0.4.2 #4740

sitabulaixizawaluduo commented May 10, 2024

sitabulaixizawaluduo commented May 10, 2024

[Bug]: why the logits is different between 0.4.1 and 0.4.2 #4740

[Bug]: why the logits is different between 0.4.1 and 0.4.2 #4740

Comments

sitabulaixizawaluduo commented May 10, 2024

Your current environment

🐛 Describe the bug

sitabulaixizawaluduo commented May 10, 2024