-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: torch compile tests #1725
TST: torch compile tests #1725
Conversation
Right now, we don't have specific tests for torch.compile. Instead, we have a "hack" that allows to run _all_ tests with torch.compile if we set the environment variable PEFT_DEBUG_WITH_TORCH_COMPILE=1. This is not very practical because it takes a lot of time to run all these tests with compilation enabled. Also, currently hundreds of tests are failing, which makes it impossible to understand more closely what does or does not work. This PR removes the aforementioned "hack" and instead replaces it with a list of explicit torch.compile tests. Currently, these tests cover training/inference with a bunch of different tuner types, as well as more advanced features with LoRA (e.g. quantization, multiple adapters, etc.). Some of these tests pass and some of them fail. This is documented now, so that users can quickly look up if their use case would be compatbile with torch.compile. This is very useful to have, because sometimes torch.compile may appear to work but actually returns the wrong result. For users, it's not immediately obvious when this happens. The test suite is not exhaustive, there are many combinations of features that could be added. However, it should be a good starting point and can be extended later. The test suite does _not_ cover whether torch.compile actually accelerates the code. This may not be the case even if it works correctly (e.g. because of graph breaks). Testing this would require bigger models and more data, which is prohibitively slow to test.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for adding! I think it'd be more compact and easier to compare if we presented what works and doesn't side by side in a table
- Training with `Trainer` from 🤗 transformers. | ||
- Training with a custom PyTorch loop. | ||
- Inference | ||
- Generation | ||
|
||
The following adapters were tested: | ||
|
||
- AdaLoRA | ||
- BOFT | ||
- IA³ | ||
- Layer Norm Tuning | ||
- LoHa | ||
- LoRA | ||
- LoRA + DoRA | ||
- OFT | ||
- VeRA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Training with `Trainer` from 🤗 transformers. | |
- Training with a custom PyTorch loop. | |
- Inference | |
- Generation | |
The following adapters were tested: | |
- AdaLoRA | |
- BOFT | |
- IA³ | |
- Layer Norm Tuning | |
- LoHa | |
- LoRA | |
- LoRA + DoRA | |
- OFT | |
- VeRA |
## Features that don't work with `torch.compile` | ||
|
||
### Training and inference | ||
|
||
The following adapters don't work correctly for training or inference when using `torch.compile`: | ||
|
||
- LoKr | ||
- LoRA targeting embedding layers | ||
|
||
### Advanced PEFT features | ||
|
||
These more advanced PEFT features don't work in conjunction with `torch.compile`. Tests were run with LoRA: | ||
|
||
- Using PEFT adapters with quantization (bitsandbytes) | ||
- Inference with multiple adapters | ||
- Unloading (i.e. calling `model.merge_and_unload()`) | ||
- Disabling adapters (i.e. using `with model.disable_adapter()`) | ||
- Mixed adapter batches (i.e. calling `model(batch, adapter_names=["__base__", "default", "other", ...])`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Features that don't work with `torch.compile` | |
### Training and inference | |
The following adapters don't work correctly for training or inference when using `torch.compile`: | |
- LoKr | |
- LoRA targeting embedding layers | |
### Advanced PEFT features | |
These more advanced PEFT features don't work in conjunction with `torch.compile`. Tests were run with LoRA: | |
- Using PEFT adapters with quantization (bitsandbytes) | |
- Inference with multiple adapters | |
- Unloading (i.e. calling `model.merge_and_unload()`) | |
- Disabling adapters (i.e. using `with model.disable_adapter()`) | |
- Mixed adapter batches (i.e. calling `model(batch, adapter_names=["__base__", "default", "other", ...])`) |
### Advanced PEFT features | ||
|
||
Below are some of the more advanced PEFT features that work. They were all tested with LoRA. | ||
|
||
- `modules_to_save` (i.e. `config = LoraConfig(..., modules_to_save=...)`) | ||
- Merging adapters (one or multiple) | ||
- Merging multiple adapters into one adapter (i.e. calling `model.add_weighted_adapter(...)`) | ||
|
||
Generally, we can expect that if a feature works correctly with LoRA and is also supported by other adapter types, it should also work for that adapter type. | ||
|
||
## Features that don't work with `torch.compile` | ||
|
||
### Training and inference | ||
|
||
The following adapters don't work correctly for training or inference when using `torch.compile`: | ||
|
||
- LoKr | ||
- LoRA targeting embedding layers | ||
|
||
### Advanced PEFT features | ||
|
||
These more advanced PEFT features don't work in conjunction with `torch.compile`. Tests were run with LoRA: | ||
|
||
- Using PEFT adapters with quantization (bitsandbytes) | ||
- Inference with multiple adapters | ||
- Unloading (i.e. calling `model.merge_and_unload()`) | ||
- Disabling adapters (i.e. using `with model.disable_adapter()`) | ||
- Mixed adapter batches (i.e. calling `model(batch, adapter_names=["__base__", "default", "other", ...])`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Advanced PEFT features | |
Below are some of the more advanced PEFT features that work. They were all tested with LoRA. | |
- `modules_to_save` (i.e. `config = LoraConfig(..., modules_to_save=...)`) | |
- Merging adapters (one or multiple) | |
- Merging multiple adapters into one adapter (i.e. calling `model.add_weighted_adapter(...)`) | |
Generally, we can expect that if a feature works correctly with LoRA and is also supported by other adapter types, it should also work for that adapter type. | |
## Features that don't work with `torch.compile` | |
### Training and inference | |
The following adapters don't work correctly for training or inference when using `torch.compile`: | |
- LoKr | |
- LoRA targeting embedding layers | |
### Advanced PEFT features | |
These more advanced PEFT features don't work in conjunction with `torch.compile`. Tests were run with LoRA: | |
- Using PEFT adapters with quantization (bitsandbytes) | |
- Inference with multiple adapters | |
- Unloading (i.e. calling `model.merge_and_unload()`) | |
- Disabling adapters (i.e. using `with model.disable_adapter()`) | |
- Mixed adapter batches (i.e. calling `model(batch, adapter_names=["__base__", "default", "other", ...])`) | |
## Advanced PEFT features | |
PEFT supports a number of features for working with adapters, such as merging and saving, but not all of them work with torch.compile. Check the table below to find out which features are compatible with torch.compile and which ones aren't. | |
| torch.compile compatible | torch.compile incompatible | | |
|------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------| | |
| [modules_to_save](https://huggingface.co/docs/peft/package_reference/lora#peft.LoraConfig.modules_to_save) | usage with quantization (bitsandbytes) | | |
| merging one or more adapters | inference with multiple adapters | | |
| merging multiple adapters into one adapter | unloading weights with [merge_and_unload](https://huggingface.co/docs/peft/v0.10.0/en/package_reference/lora#peft.LoraModel.merge_and_unload) | | |
| | disabling adapters with [disable_adapter](https://huggingface.co/docs/peft/v0.10.0/en/package_reference/peft_model#peft.PeftModel.disable_adapter) | | |
| | mixing adapter batches (`model(batch, adapter_names=["__base__", "default", "other"])`) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this and for explaining ! LGTM with few nits
Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Steven Liu <[email protected]>
- remove commented code - link test file
@stevhliu Thanks for the feedback, I added most of your suggestions. I'm not sure about the tables though. Using tables was also my first instinct, but what I dislike about them is 1) there are many empty entries and 2) there is no correspondence between items in the same row, which is typically expected in a table. WDYT? |
Argunment name was changed in some pytest version.
The empty entries don't really bother me, but 2 is a good point. How about just reorganizing the sections but keeping the lists? ## Training and inference
- list adapters that work and don't work for training and inference
## Advanced PEFT features
- list features that work and don't work |
@stevhliu I re-arranged the sections, is this what you had in mind? There are now subsections with identical names, not sure if this could be problematic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more minor suggestions but LGTM!
If you do keep the smaller subsection titles, make sure you format them like ## Features that work with torch.compile[[training]]
to differentiate it from the identical title in the Advanced PEFT features section. Otherwise, clicking on the ## Features that work with torch.compile
in the Advanced PEFT features section will link to the first one.
|
||
## Training and inference | ||
|
||
### Features that work with `torch.compile` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can even remove these smaller subsections (Features that work with torch.compile and Features that don't work with torch.compile) to keep things cleaner and avoid duplicate subsection titles
### Features that work with `torch.compile` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, could you please do a (hopefully) last check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for iterating! ❤️
Co-authored-by: Steven Liu <[email protected]>
Right now, we don't have specific tests for torch.compile. Instead, we have a "hack" that allows to run all tests with
torch.compile
if we set the environment variablePEFT_DEBUG_WITH_TORCH_COMPILE=1
.This is not very practical because it takes a lot of time to run all these tests with compilation enabled. Also, currently hundreds of tests are failing, which makes it impossible to understand more closely what does or does not work.
This PR removes the aforementioned "hack" and instead replaces it with a list of explicit torch.compile tests. Currently, these tests cover training/inference with a bunch of different tuner types, as well as more advanced features with LoRA (e.g. quantization, multiple adapters, etc.).
Some of these tests pass and some of them fail. This is documented now, so that users can quickly look up if their use case would be compatible with
torch.compile
. This is very useful to have, because sometimestorch.compile
may appear to work but actually returns the wrong result. For users, it's not immediately obvious when this happens.The test suite is not exhaustive, there are many combinations of features that could be added. However, it should be a good starting point and can be extended later.
The test suite does not cover whether
torch.compile
actually accelerates the code. This may not be the case even if it works correctly (e.g. because of graph breaks). Testing this would require bigger models and more data, which is prohibitively slow to test.