how to optimize setfit inference #519

geraldstanje · 2024-05-03T19:19:21Z

hi,

im currently investigating what the options we have to optimize setfit inference and have a few questions about it:

gpu:
- torch compile: https://huggingface.co/docs/transformers/en/perf_torch_compile
  is the following the only way to use setfit with torch.compile?

model.model_body[0].auto_model = torch.compile(model.model_body[0].auto_model)

info above was provided by Tom Aarsen.

does torch.compile also work for cpu? edit: looks like it should work for cpu too...

https://pytorch.org/docs/stable/generated/torch.compile.html
does torch compile change anything about the accuracy of the model inference?

i see different modes here:
Can be either “default”, “reduce-overhead”, “max-autotune” or “max-autotune-no-cudagraphs” ... so far reduce-overhead gives best results....

cpu:
what are the options to optimize cpu inference?
- BetterTransformer: https://huggingface.co/docs/transformers/en/perf_infer_cpu
  is BetterTransformer really not available for setFit? i dont see setFit in this list: https://huggingface.co/docs/optimum/bettertransformer/overview#supported-models

are there any other resources to speedup setfit model inference? where can you run a setFit model except torchServe?

Thanks,
Gerald

The text was updated successfully, but these errors were encountered:

geraldstanje closed this as completed Jun 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to optimize setfit inference #519

how to optimize setfit inference #519

geraldstanje commented May 3, 2024 •

edited

how to optimize setfit inference #519

how to optimize setfit inference #519

Comments

geraldstanje commented May 3, 2024 • edited

geraldstanje commented May 3, 2024 •

edited