Skip to content

"Efficient JSON generation following a JSON Schema" example running continuosly, desn't stop #826

Closed Answered by mpetruc
mpetruc asked this question in Q&A
Discussion options

You must be logged in to vote

I did one more experiment, adding 'low_cpu_mem_usage':True to model_kwargs and that seemed to do the trick. If finished generation in 4secs. Here's the final command, in case someone runs into the same issue:

model = outlines.models.transformers(model_name, device="cuda",model_kwargs={'torch_dtype':torch.float16, 'low_cpu_mem_usage':True, 'attn_implementation':"flash_attention_2"})

Thank you!

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by mpetruc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant