Skip to content

Commit

Permalink
fix: Triton usage for GPT-Q (#140)
Browse files Browse the repository at this point in the history
  • Loading branch information
tgaddair committed Dec 18, 2023
1 parent 5080877 commit 9ae65b3
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions server/lorax_server/utils/gptq/custom_autotune.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,9 +88,9 @@ def kernel_call():
# In testings using only 40 reps seems to be close enough and it appears to be what PyTorch uses
# PyTorch also sets fast_flush to True, but I didn't see any speedup so I'll leave the default
return triton.testing.do_bench(
kernel_call, percentiles=(0.5, 0.2, 0.8), rep=40
kernel_call, quantiles=(0.5, 0.2, 0.8), rep=40
)
except triton.compiler.OutOfResources:
except triton.OutOfResources:
return (float("inf"), float("inf"), float("inf"))

def run(self, *args, **kwargs):
Expand Down

0 comments on commit 9ae65b3

Please sign in to comment.