Search code, repositories, users, issues, pull requests...

CUDA runtime error in cudaGetDevice(&device): forward compatibility was attempted on non supported HW waiting for feedback

Issue has been triaged by maintainers

#1807 opened Jun 19, 2024 by tonylek

#1806 opened Jun 19, 2024 by najsword

How to test the time to new token of a model in Tensorrt-llm

#1805 opened Jun 19, 2024 by Ourspolaire1

Model 'tensorrt_llm' loading failed with error: key 'use_context_fmha_for_generation' not found question

Further information is requested

#1803 opened Jun 19, 2024 by jasonngap1

2 of 4 tasks

Medusa with Mixtral 8x7B question

Further information is requested

#1798 opened Jun 18, 2024 by v-dicicco

How can i customize position_ids for my own model? question

Further information is requested

There is a difference between the decoding result of Medusa and the source model bug

#1797 opened Jun 18, 2024 by littletomatodonkey

Something isn't working

CogVLM just supports one image as input in the fixed place feature request

#1795 opened Jun 18, 2024 by skyCreateXian

4 tasks done

New feature or request

Investigating question

Further information is requested

#1790 opened Jun 17, 2024 by littletomatodonkey

Repeated outputs for long input tasks on Llama 3 70B compared to vLLM and HF's transformers bug

Something isn't working

Problems in running convert_checkpoint.py file given in the repository bug

#1788 opened Jun 16, 2024 by DreamGenX

2 of 4 tasks

Something isn't working

Qwen2 1.5B checkpoint conversion broken bug

#1786 opened Jun 14, 2024 by JungleMist

2 of 4 tasks

Something isn't working

Issue has been triaged by maintainers

[shapeMachine.cpp::executeContinuation::905] Error Code 7: Internal Error (Dimensions with name batch_size_beam_width must be equal. Condition '==' violated: 1 != 3. Instruction: CHECK_EQUAL 1 3.) bug

#1785 opened Jun 14, 2024 by yaysummeriscoming

2 of 4 tasks

Something isn't working

single A100 runs llama-7B with batch_size>=64 input_len(max)=1024 output_len(max)=512 triaged

#1784 opened Jun 14, 2024 by Naphat-Khoprasertthaworn

2 of 4 tasks

Issue has been triaged by maintainers

Model goes into unusable state when passing specific LoRA as input to trtllm model with LoRA support bug

#1783 opened Jun 14, 2024 by rhmaaa

Something isn't working

Investigating waiting for feedback

#1781 opened Jun 13, 2024 by pankajroark

2 of 4 tasks

: (C7510) Potential Performance Loss: wgmma.mma_async instructions are serialized due to wgmma pipeline crossing function boundary at a function call in the function waiting for feedback

#1779 opened Jun 13, 2024 by avianion

Unable to convert LLaVa model to tensorrt triaged

Issue has been triaged by maintainers

waiting for feedback wontfix

This will not be worked on

#1776 opened Jun 13, 2024 by tanveer-sayyed

2 of 4 tasks

ChatGLM3 6B Multi-batch Failed with Error bug

Something isn't working

InferenceRequest::serialize does not handle logits post processor, log an error bug

#1775 opened Jun 13, 2024 by RobinJYM

2 of 4 tasks

Something isn't working

Fail to build w4a8_awq on Llama 13b bug

Issue has been triaged by maintainers

#1771 opened Jun 12, 2024 by DreamGenX

4 tasks

Something isn't working

Issue has been triaged by maintainers

Using TensorRT-LLM/examples/apps/fastapi_server.py as server inside TensorRT-LLM docker bug

#1770 opened Jun 12, 2024 by Hongbosherlock

2 of 4 tasks

Something isn't working

How to identify the rest toke latency? benchmark question

#1768 opened Jun 11, 2024 by snassimr

3 of 4 tasks

Further information is requested

Internlm2 only runs normally on adjacent GPUs. bug

Issue has been triaged by maintainers

#1761 opened Jun 11, 2024 by RobinJYM

2 of 4 tasks

Something isn't working

Issue has been triaged by maintainers

AWQ performance issue for higher batches bug

#1759 opened Jun 10, 2024 by yuanphoenix

1 of 4 tasks

Something isn't working

quantization

Issue about lower bit quantization, including int8, int4, fp8