Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Feature]: CI: Test on NVLink-enabled machine
feature request
#4770
opened May 12, 2024 by
youkaichao
[Feature]: could paged_attention_v1 support parameter 'attn_bias'
feature request
#4766
opened May 11, 2024 by
cillinzhang
[Feature]: Support W4A8KV4 Quantization(QServe/QoQ)
feature request
#4763
opened May 11, 2024 by
bratao
[Performance]: Why the avg. througput generation is low?
performance
Performance-related issues
#4760
opened May 11, 2024 by
rvsh2
[Bug]: CUDA error when running mistral-7b + lora with tensor_para=8
bug
Something isn't working
#4756
opened May 11, 2024 by
sfc-gh-zhwang
Regression in support of customized "role" in OpenAI compatible API (v.0.4.2)
good first issue
Good for newcomers
#4755
opened May 10, 2024 by
simon-mo
[Usage]: Vllm AutoAWQ with 4-GPU doesnt utilize GPU
usage
How to use vllm
#4744
opened May 10, 2024 by
danielstankw
[RFC]: Support specifying quant_config details in the LLM or Server entrypoints
feature request
RFC
#4743
opened May 10, 2024 by
mgoin
[Bug]: ValueError when using LoRA with CohereForCausalLM model
bug
Something isn't working
#4742
opened May 10, 2024 by
onlyfish79
[Bug]: squeezeLLM with sparse could not work.
bug
Something isn't working
#4741
opened May 10, 2024 by
RyanWMHI
[Bug]: why the logits is different between 0.4.1 and 0.4.2
bug
Something isn't working
#4740
opened May 10, 2024 by
sitabulaixizawaluduo
[New Model]: Blip2 Support required
new model
Requests to new models
#4739
opened May 10, 2024 by
anisingh1
[New Model]: fastspeech2_conformer (just need a new attention mechanism: RelPositionMultiHeadedAttention)
new model
Requests to new models
#4736
opened May 10, 2024 by
cillinzhang
[Bug]: When enforce_eager is True or False, the paged_attention version used is inconsistent
bug
Something isn't working
#4731
opened May 10, 2024 by
liangxuegang
[Feature]: Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
feature request
#4728
opened May 10, 2024 by
tchaton
[Bug]: Unable to serve Llama3 using vLLM Docker container
bug
Something isn't working
#4725
opened May 10, 2024 by
vecorro
[Performance]: Why does vllm spend so much memory even using OPT model?
performance
Performance-related issues
#4723
opened May 9, 2024 by
MitchellX
[Feature]: Enforce formatting standards for C++ and CUDA code
feature request
#4721
opened May 9, 2024 by
mgoin
[Bug]: Not able to do lora inference with phi-3
bug
Something isn't working
#4715
opened May 9, 2024 by
WeiXiaoSummer
[Bug]: export failed when kv cache fp8 quantizing Qwen1.5-72B-Chat-GPTQ-Int4
bug
Something isn't working
#4714
opened May 9, 2024 by
frankxyy
[Bug]: Running the punica lora on Qwen1.5 32B model encountered RuntimeError: No suitable kernel. h_in=64 h_out=3424 dtype=Float out_dtype=BFloat16
bug
Something isn't working
#4708
opened May 9, 2024 by
victorzhz111
[Bug]: use thread after call multiple times. KeyError: request_id
bug
Something isn't working
#4706
opened May 9, 2024 by
xubzhlin
[Feature]: Is it possible to dynamically adjust lora tp policy to different situations ?
feature request
#4704
opened May 9, 2024 by
yyccli
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.