-
Notifications
You must be signed in to change notification settings - Fork 776
Issues: NVIDIA/TensorRT-LLM
[Issue Template]Short one-line summary of the issue #270
#783
opened Jan 1, 2024 by
juney-nvidia
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Is it "INT8 or FP8" with "--use_weight_only --weight_only_precision int8 --qformat fp8"
bug
Something isn't working
quantization
Issue about lower bit quantization, including int8, int4, fp8
question
Further information is requested
#1810
opened Jun 19, 2024 by
aiiAtelier
2 of 4 tasks
prompt_vocab_size is ignored in executor API
bug
Something isn't working
#1809
opened Jun 19, 2024 by
thefacetakt
2 of 4 tasks
cluster key option not working?
question
Further information is requested
triaged
Issue has been triaged by maintainers
#1807
opened Jun 19, 2024 by
tonylek
How to test the time to new token of a model in Tensorrt-llm
#1805
opened Jun 19, 2024 by
Ourspolaire1
Model 'tensorrt_llm' loading failed with error: key 'use_context_fmha_for_generation' not found
question
Further information is requested
#1803
opened Jun 19, 2024 by
jasonngap1
2 of 4 tasks
Medusa with Mixtral 8x7B
question
Further information is requested
#1798
opened Jun 18, 2024 by
v-dicicco
How can i customize position_ids for my own model?
question
Further information is requested
waiting for feedback
#1797
opened Jun 18, 2024 by
littletomatodonkey
There is a difference between the decoding result of Medusa and the source model
bug
Something isn't working
Investigating
#1795
opened Jun 18, 2024 by
skyCreateXian
4 tasks done
CogVLM just supports one image as input in the fixed place
feature request
New feature or request
Investigating
question
Further information is requested
#1790
opened Jun 17, 2024 by
littletomatodonkey
Repeated outputs for long input tasks on Llama 3 70B compared to vLLM and HF's transformers
bug
Something isn't working
waiting for feedback
#1788
opened Jun 16, 2024 by
DreamGenX
2 of 4 tasks
Problems in running convert_checkpoint.py file given in the repository
bug
Something isn't working
waiting for feedback
#1786
opened Jun 14, 2024 by
JungleMist
2 of 4 tasks
Qwen2 1.5B checkpoint conversion broken
bug
Something isn't working
triaged
Issue has been triaged by maintainers
waiting for feedback
#1785
opened Jun 14, 2024 by
yaysummeriscoming
2 of 4 tasks
[shapeMachine.cpp::executeContinuation::905] Error Code 7: Internal Error (Dimensions with name batch_size_beam_width must be equal. Condition '==' violated: 1 != 3. Instruction: CHECK_EQUAL 1 3.)
bug
Something isn't working
Investigating
#1784
opened Jun 14, 2024 by
Naphat-Khoprasertthaworn
2 of 4 tasks
single A100 runs llama-7B with batch_size>=64 input_len(max)=1024 output_len(max)=512
triaged
Issue has been triaged by maintainers
waiting for feedback
#1783
opened Jun 14, 2024 by
rhmaaa
Model goes into unusable state when passing specific LoRA as input to trtllm model with LoRA support
bug
Something isn't working
Investigating
waiting for feedback
#1781
opened Jun 13, 2024 by
pankajroark
2 of 4 tasks
Unable to convert LLaVa model to tensorrt
triaged
Issue has been triaged by maintainers
waiting for feedback
wontfix
This will not be worked on
#1776
opened Jun 13, 2024 by
tanveer-sayyed
2 of 4 tasks
ChatGLM3 6B Multi-batch Failed with Error
bug
Something isn't working
Investigating
#1775
opened Jun 13, 2024 by
RobinJYM
2 of 4 tasks
InferenceRequest::serialize does not handle logits post processor, log an error
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#1771
opened Jun 12, 2024 by
DreamGenX
4 tasks
Fail to build w4a8_awq on Llama 13b
bug
Something isn't working
triaged
Issue has been triaged by maintainers
waiting for feedback
#1770
opened Jun 12, 2024 by
Hongbosherlock
2 of 4 tasks
Using TensorRT-LLM/examples/apps/fastapi_server.py as server inside TensorRT-LLM docker
bug
Something isn't working
Investigating
#1768
opened Jun 11, 2024 by
snassimr
3 of 4 tasks
Internlm2 only runs normally on adjacent GPUs.
bug
Something isn't working
triaged
Issue has been triaged by maintainers
waiting for feedback
#1759
opened Jun 10, 2024 by
yuanphoenix
1 of 4 tasks
AWQ performance issue for higher batches
bug
Something isn't working
quantization
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#1757
opened Jun 8, 2024 by
canamika27
2 of 4 tasks
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.