Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: torch compile tests #1725

Merged

Conversation

BenjaminBossan
Copy link
Member

@BenjaminBossan BenjaminBossan commented May 10, 2024

Right now, we don't have specific tests for torch.compile. Instead, we have a "hack" that allows to run all tests with torch.compile if we set the environment variable PEFT_DEBUG_WITH_TORCH_COMPILE=1.

This is not very practical because it takes a lot of time to run all these tests with compilation enabled. Also, currently hundreds of tests are failing, which makes it impossible to understand more closely what does or does not work.

This PR removes the aforementioned "hack" and instead replaces it with a list of explicit torch.compile tests. Currently, these tests cover training/inference with a bunch of different tuner types, as well as more advanced features with LoRA (e.g. quantization, multiple adapters, etc.).

Some of these tests pass and some of them fail. This is documented now, so that users can quickly look up if their use case would be compatible with torch.compile. This is very useful to have, because sometimes torch.compile may appear to work but actually returns the wrong result. For users, it's not immediately obvious when this happens.

The test suite is not exhaustive, there are many combinations of features that could be added. However, it should be a good starting point and can be extended later.

The test suite does not cover whether torch.compile actually accelerates the code. This may not be the case even if it works correctly (e.g. because of graph breaks). Testing this would require bigger models and more data, which is prohibitively slow to test.

Right now, we don't have specific tests for torch.compile. Instead, we
have a "hack" that allows to run _all_ tests with torch.compile if we
set the environment variable PEFT_DEBUG_WITH_TORCH_COMPILE=1.

This is not very practical because it takes a lot of time to run all
these tests with compilation enabled. Also, currently hundreds of tests
are failing, which makes it impossible to understand more closely what
does or does not work.

This PR removes the aforementioned "hack" and instead replaces it with a
list of explicit torch.compile tests. Currently, these tests cover
training/inference with a bunch of different tuner types, as well as
more advanced features with LoRA (e.g. quantization, multiple adapters,
etc.).

Some of these tests pass and some of them fail. This is documented now,
so that users can quickly look up if their use case would be compatbile
with torch.compile. This is very useful to have, because sometimes
torch.compile may appear to work but actually returns the wrong result.
For users, it's not immediately obvious when this happens.

The test suite is not exhaustive, there are many combinations of
features that could be added. However, it should be a good starting
point and can be extended later.

The test suite does _not_ cover whether torch.compile actually
accelerates the code. This may not be the case even if it works
correctly (e.g. because of graph breaks). Testing this would require
bigger models and more data, which is prohibitively slow to test.
@BenjaminBossan BenjaminBossan marked this pull request as draft May 10, 2024 16:31
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan BenjaminBossan marked this pull request as ready for review May 13, 2024 15:59
Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for adding! I think it'd be more compact and easier to compare if we presented what works and doesn't side by side in a table

docs/source/_toctree.yml Outdated Show resolved Hide resolved
docs/source/developer_guides/torch_compile.md Outdated Show resolved Hide resolved
docs/source/developer_guides/torch_compile.md Outdated Show resolved Hide resolved
docs/source/developer_guides/torch_compile.md Outdated Show resolved Hide resolved
docs/source/developer_guides/torch_compile.md Outdated Show resolved Hide resolved
Comment on lines 31 to 46
- Training with `Trainer` from 🤗 transformers.
- Training with a custom PyTorch loop.
- Inference
- Generation

The following adapters were tested:

- AdaLoRA
- BOFT
- IA³
- Layer Norm Tuning
- LoHa
- LoRA
- LoRA + DoRA
- OFT
- VeRA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Training with `Trainer` from 🤗 transformers.
- Training with a custom PyTorch loop.
- Inference
- Generation
The following adapters were tested:
- AdaLoRA
- BOFT
- IA³
- Layer Norm Tuning
- LoHa
- LoRA
- LoRA + DoRA
- OFT
- VeRA

Comment on lines 58 to 75
## Features that don't work with `torch.compile`

### Training and inference

The following adapters don't work correctly for training or inference when using `torch.compile`:

- LoKr
- LoRA targeting embedding layers

### Advanced PEFT features

These more advanced PEFT features don't work in conjunction with `torch.compile`. Tests were run with LoRA:

- Using PEFT adapters with quantization (bitsandbytes)
- Inference with multiple adapters
- Unloading (i.e. calling `model.merge_and_unload()`)
- Disabling adapters (i.e. using `with model.disable_adapter()`)
- Mixed adapter batches (i.e. calling `model(batch, adapter_names=["__base__", "default", "other", ...])`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Features that don't work with `torch.compile`
### Training and inference
The following adapters don't work correctly for training or inference when using `torch.compile`:
- LoKr
- LoRA targeting embedding layers
### Advanced PEFT features
These more advanced PEFT features don't work in conjunction with `torch.compile`. Tests were run with LoRA:
- Using PEFT adapters with quantization (bitsandbytes)
- Inference with multiple adapters
- Unloading (i.e. calling `model.merge_and_unload()`)
- Disabling adapters (i.e. using `with model.disable_adapter()`)
- Mixed adapter batches (i.e. calling `model(batch, adapter_names=["__base__", "default", "other", ...])`)

Comment on lines 48 to 75
### Advanced PEFT features

Below are some of the more advanced PEFT features that work. They were all tested with LoRA.

- `modules_to_save` (i.e. `config = LoraConfig(..., modules_to_save=...)`)
- Merging adapters (one or multiple)
- Merging multiple adapters into one adapter (i.e. calling `model.add_weighted_adapter(...)`)

Generally, we can expect that if a feature works correctly with LoRA and is also supported by other adapter types, it should also work for that adapter type.

## Features that don't work with `torch.compile`

### Training and inference

The following adapters don't work correctly for training or inference when using `torch.compile`:

- LoKr
- LoRA targeting embedding layers

### Advanced PEFT features

These more advanced PEFT features don't work in conjunction with `torch.compile`. Tests were run with LoRA:

- Using PEFT adapters with quantization (bitsandbytes)
- Inference with multiple adapters
- Unloading (i.e. calling `model.merge_and_unload()`)
- Disabling adapters (i.e. using `with model.disable_adapter()`)
- Mixed adapter batches (i.e. calling `model(batch, adapter_names=["__base__", "default", "other", ...])`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Advanced PEFT features
Below are some of the more advanced PEFT features that work. They were all tested with LoRA.
- `modules_to_save` (i.e. `config = LoraConfig(..., modules_to_save=...)`)
- Merging adapters (one or multiple)
- Merging multiple adapters into one adapter (i.e. calling `model.add_weighted_adapter(...)`)
Generally, we can expect that if a feature works correctly with LoRA and is also supported by other adapter types, it should also work for that adapter type.
## Features that don't work with `torch.compile`
### Training and inference
The following adapters don't work correctly for training or inference when using `torch.compile`:
- LoKr
- LoRA targeting embedding layers
### Advanced PEFT features
These more advanced PEFT features don't work in conjunction with `torch.compile`. Tests were run with LoRA:
- Using PEFT adapters with quantization (bitsandbytes)
- Inference with multiple adapters
- Unloading (i.e. calling `model.merge_and_unload()`)
- Disabling adapters (i.e. using `with model.disable_adapter()`)
- Mixed adapter batches (i.e. calling `model(batch, adapter_names=["__base__", "default", "other", ...])`)
## Advanced PEFT features
PEFT supports a number of features for working with adapters, such as merging and saving, but not all of them work with torch.compile. Check the table below to find out which features are compatible with torch.compile and which ones aren't.
| torch.compile compatible | torch.compile incompatible |
|------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|
| [modules_to_save](https://huggingface.co/docs/peft/package_reference/lora#peft.LoraConfig.modules_to_save) | usage with quantization (bitsandbytes) |
| merging one or more adapters | inference with multiple adapters |
| merging multiple adapters into one adapter | unloading weights with [merge_and_unload](https://huggingface.co/docs/peft/v0.10.0/en/package_reference/lora#peft.LoraModel.merge_and_unload) |
| | disabling adapters with [disable_adapter](https://huggingface.co/docs/peft/v0.10.0/en/package_reference/peft_model#peft.PeftModel.disable_adapter) |
| | mixing adapter batches (`model(batch, adapter_names=["__base__", "default", "other"])`) |

docs/source/developer_guides/torch_compile.md Outdated Show resolved Hide resolved
docs/source/developer_guides/torch_compile.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this and for explaining ! LGTM with few nits

Makefile Outdated Show resolved Hide resolved
tests/test_torch_compile.py Outdated Show resolved Hide resolved
BenjaminBossan and others added 2 commits May 14, 2024 11:22
Co-authored-by: Younes Belkada <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
- remove commented code
- link test file
@BenjaminBossan
Copy link
Member Author

@stevhliu Thanks for the feedback, I added most of your suggestions. I'm not sure about the tables though. Using tables was also my first instinct, but what I dislike about them is 1) there are many empty entries and 2) there is no correspondence between items in the same row, which is typically expected in a table. WDYT?

Argunment name was changed in some pytest version.
@stevhliu
Copy link
Member

stevhliu commented May 14, 2024

The empty entries don't really bother me, but 2 is a good point. How about just reorganizing the sections but keeping the lists?

## Training and inference
- list adapters that work and don't work for training and inference
## Advanced PEFT features
- list features that work and don't work

@BenjaminBossan
Copy link
Member Author

@stevhliu I re-arranged the sections, is this what you had in mind? There are now subsections with identical names, not sure if this could be problematic.

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more minor suggestions but LGTM!

If you do keep the smaller subsection titles, make sure you format them like ## Features that work with torch.compile[[training]] to differentiate it from the identical title in the Advanced PEFT features section. Otherwise, clicking on the ## Features that work with torch.compile in the Advanced PEFT features section will link to the first one.

docs/source/developer_guides/torch_compile.md Outdated Show resolved Hide resolved

## Training and inference

### Features that work with `torch.compile`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can even remove these smaller subsections (Features that work with torch.compile and Features that don't work with torch.compile) to keep things cleaner and avoid duplicate subsection titles

Suggested change
### Features that work with `torch.compile`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, could you please do a (hopefully) last check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for iterating! ❤️

@BenjaminBossan BenjaminBossan merged commit 4e32679 into huggingface:main May 17, 2024
14 checks passed
@BenjaminBossan BenjaminBossan deleted the tst-torch-compile-tests branch May 17, 2024 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants