You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i see different modes here:
Can be either “default”, “reduce-overhead”, “max-autotune” or “max-autotune-no-cudagraphs” ... so far reduce-overhead gives best results....
cpu:
what are the options to optimize cpu inference?
hi,
im currently investigating what the options we have to optimize setfit inference and have a few questions about it:
is the following the only way to use setfit with torch.compile?
info above was provided by Tom Aarsen.
does torch.compile also work for cpu? edit: looks like it should work for cpu too...
https://pytorch.org/docs/stable/generated/torch.compile.html
does torch compile change anything about the accuracy of the model inference?
i see different modes here:
Can be either “default”, “reduce-overhead”, “max-autotune” or “max-autotune-no-cudagraphs” ... so far reduce-overhead gives best results....
what are the options to optimize cpu inference?
is BetterTransformer really not available for setFit? i dont see setFit in this list: https://huggingface.co/docs/optimum/bettertransformer/overview#supported-models
are there any other resources to speedup setfit model inference? where can you run a setFit model except torchServe?
Thanks,
Gerald
The text was updated successfully, but these errors were encountered: