vllm-project / vllm Public

Notifications
Fork 2.6k
Star 19.3k

Code
Issues 804
Pull requests 223
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q2 2024

#3861 opened Apr 4, 2024 by simon-mo

Open 23

v0.4.2 Release Tracker

#4505 by simon-mo was closed May 5, 2024

Closed 12

Virtual Office Hours: May 15 2pm ET

#4538 opened May 1, 2024 by robertgshaw2-neuralmagic

Open 1

Labels 41 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

804 Open 1,895 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Feature]: CI: Test on NVLink-enabled machine feature request

#4770 opened May 12, 2024 by youkaichao

[Feature]: could paged_attention_v1 support parameter 'attn_bias' feature request

#4766 opened May 11, 2024 by cillinzhang

[Feature]: Support W4A8KV4 Quantization(QServe/QoQ) feature request

#4763 opened May 11, 2024 by bratao

[Performance]: Why the avg. througput generation is low? performance

Performance-related issues

#4760 opened May 11, 2024 by rvsh2

[Bug]: CUDA error when running mistral-7b + lora with tensor_para=8 bug

Something isn't working

#4756 opened May 11, 2024 by sfc-gh-zhwang

Regression in support of customized "role" in OpenAI compatible API (v.0.4.2) good first issue

Good for newcomers

#4755 opened May 10, 2024 by simon-mo

[Usage]: prompt_logprompt from endpoint usage

How to use vllm

#4747 opened May 10, 2024 by basma-b

[Usage]: Vllm AutoAWQ with 4-GPU doesnt utilize GPU usage

How to use vllm

#4744 opened May 10, 2024 by danielstankw

[RFC]: Support specifying quant_config details in the LLM or Server entrypoints feature request RFC

#4743 opened May 10, 2024 by mgoin

[Bug]: ValueError when using LoRA with CohereForCausalLM model bug

Something isn't working

#4742 opened May 10, 2024 by onlyfish79

[Bug]: squeezeLLM with sparse could not work. bug

Something isn't working

#4741 opened May 10, 2024 by RyanWMHI

[Bug]: why the logits is different between 0.4.1 and 0.4.2 bug

Something isn't working

#4740 opened May 10, 2024 by sitabulaixizawaluduo

[New Model]: Blip2 Support required new model

Requests to new models

#4739 opened May 10, 2024 by anisingh1

[New Model]: fastspeech2_conformer (just need a new attention mechanism: RelPositionMultiHeadedAttention) new model

Requests to new models

#4736 opened May 10, 2024 by cillinzhang

VRAM USAGE WHEN LOADING THE MODEL misc

#4733 opened May 10, 2024 by Fahmie23

[Bug]: When enforce_eager is True or False, the paged_attention version used is inconsistent bug

Something isn't working

#4731 opened May 10, 2024 by liangxuegang

[Feature]: Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference feature request

#4728 opened May 10, 2024 by tchaton

[Bug]: Unable to serve Llama3 using vLLM Docker container bug

Something isn't working

#4725 opened May 10, 2024 by vecorro

[Performance]: Why does vllm spend so much memory even using OPT model? performance

Performance-related issues

#4723 opened May 9, 2024 by MitchellX

[Feature]: Enforce formatting standards for C++ and CUDA code feature request

#4721 opened May 9, 2024 by mgoin

[Bug]: Not able to do lora inference with phi-3 bug

Something isn't working

#4715 opened May 9, 2024 by WeiXiaoSummer

[Bug]: export failed when kv cache fp8 quantizing Qwen1.5-72B-Chat-GPTQ-Int4 bug

Something isn't working

#4714 opened May 9, 2024 by frankxyy

[Bug]: Running the punica lora on Qwen1.5 32B model encountered RuntimeError: No suitable kernel. h_in=64 h_out=3424 dtype=Float out_dtype=BFloat16 bug

Something isn't working

#4708 opened May 9, 2024 by victorzhz111

[Bug]: use thread after call multiple times. KeyError: request_id bug

Something isn't working

#4706 opened May 9, 2024 by xubzhlin

[Feature]: Is it possible to dynamically adjust lora tp policy to different situations ? feature request

#4704 opened May 9, 2024 by yyccli

Previous 1 2 3 4 5 … 32 33 Next

Previous Next

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly