Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for freezing pretrained vision model layers with regex #3981

Open
wants to merge 35 commits into
base: master
Choose a base branch
from

Conversation

ethanreidel
Copy link
Contributor

@ethanreidel ethanreidel commented Apr 2, 2024

Allows the user to input a regular expression in the yaml config which freezes specific layers of a pretrained model. Adds new CLI option "pretrained_summary" to let users access string representations of model layers for freezing via regex. Currently all pretrained torchvision models are accessible.

trainer:
layers_to_freeze_regex: (regex here)

ludwig pretrained_summary -m (model name here)

(I am aware that the collect_summary CLI command is similar, however it only accepts a preexisting directory so I thought creating a separate command to strictly output layer names was appropriate for this feature.)

Closes #3733

Future plans -> expand this capability to implement gradual unfreezing

Test: pytest tests/ludwig/modules/test_regex_freezing.py

Copy link

github-actions bot commented Apr 2, 2024

Unit Test Results

       6 files  ±       0         6 suites  ±0   57m 8s ⏱️ + 42m 47s
2 997 tests +2 985  2 967 ✔️ +2 958  23 💤 +20    7 +  7 
8 991 runs  +8 931  8 909 ✔️ +8 867  69 💤 +51  13 +13 

For more details on these failures, see this check.

Results for commit 1feb853. ± Comparison against base commit 4b07ce4.

This pull request removes 4 and adds 2989 tests. Note that renamed tests count towards both.
tests.regression_tests.model.test_old_models ‑ test_model_loaded_from_old_config_prediction_works
tests.regression_tests.model.test_old_models ‑ test_predict_deprecated_model[respiratory]
tests.regression_tests.model.test_old_models ‑ test_predict_deprecated_model[titanic]
tests.regression_tests.model.test_old_models ‑ test_predict_deprecated_model[twitter_bots]
tests.ludwig.accounting.test_used_tokens ‑ test_get_used_tokens_for_ecd
tests.ludwig.accounting.test_used_tokens ‑ test_get_used_tokens_for_ecd_no_targets
tests.ludwig.accounting.test_used_tokens ‑ test_get_used_tokens_for_gbm
tests.ludwig.accounting.test_used_tokens ‑ test_get_used_tokens_for_llm
tests.ludwig.augmentation.test_augmentation_pipeline ‑ test_image_augmentation[augmentation_pipeline_ops0]
tests.ludwig.augmentation.test_augmentation_pipeline ‑ test_image_augmentation[augmentation_pipeline_ops1]
tests.ludwig.augmentation.test_augmentation_pipeline ‑ test_image_augmentation[augmentation_pipeline_ops2]
tests.ludwig.augmentation.test_augmentation_pipeline ‑ test_invalid_augmentation_parameters[None]
tests.ludwig.augmentation.test_augmentation_pipeline ‑ test_invalid_augmentation_parameters[augmentation_pipeline_ops1]
tests.ludwig.augmentation.test_augmentation_pipeline ‑ test_invalid_augmentation_parameters[augmentation_pipeline_ops2]
…
This pull request skips 2 tests.
tests.regression_tests.benchmark.test_model_performance ‑ test_performance[ames_housing.gbm.yaml]
tests.regression_tests.benchmark.test_model_performance ‑ test_performance[mercedes_benz_greener.gbm.yaml]

♻️ This comment has been updated with latest results.

ludwig/cli.py Outdated
from ludwig.utils import pretrained_summary

pretrained_summary.cli_summarize_pretrained(sys.argv[2:])

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be good to add in the docs some example runs and outputs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethanreidel To second @skanjila -- would it be possible to show an example of running this command in terms of how it is different from the existing one -- and an example output. Thank you very much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qq: when you say you'd like an example, do you mean an example in the Ludwig docs or how would you prefer it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethanreidel One option is to create an example in the examples/ top level directory in Ludwig

ludwig/schema/metadata/configs/trainer.yaml Outdated Show resolved Hide resolved
from ludwig.contrib import add_contrib_callback_args
from ludwig.globals import LUDWIG_VERSION
from ludwig.utils.print_utils import print_ludwig

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait are we really supporting all of these models, I thought we were just going to go out the door with a couple of models to start?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this specific feature (simple regex freezing), as long as you have access to the string representation of layers + actual model architecture, you can freeze any layers that you'd like. It wasn't any extra work adding support for all torchvision models besides adding to this list. I however don't like the look of this long model array though

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethanreidel While it looks like for torchvision this will be supported, what about text/LLMs (this is kind of related to my previous comment in the Trainers section). Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this specific feature (simple regex freezing), as long as you have access to the string representation of layers + actual model architecture, you can freeze any layers that you'd like. It wasn't any extra work adding support for all torchvision models besides adding to this list. I however don't like the look of this long model array though

@ethanreidel Sorry, could you please point me to this "long model array"? Which line in your code has it? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For your first question Alex: as long as access to the model layers + their requires_grad parameter is available, in theory, this feature should work on LLMs/text. I'm not too familiar with LLM architecture and I'll have to do some quick checks, but I'm 99% sure it is an easy addition. Second question: in a previous commit, I had a pretty hacky solution where users had another command line option (under pretrained_summary) which would list all available model names. Those names were stored in a Python list which had a few issues namely having to expand it regularly/many lines of unnecessary code. Saad made a good point and said to fully remove it (it was not needed), so it's no longer there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked and sure enough you can apply the same regex freezing technique to an LLM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked and sure enough you can apply the same regex freezing technique to an LLM
@ethanreidel That's awesome. Maybe we can then use one of my earlier comments to only add this parameter to the ECDTrainerConfig and FineTuneTrainerConfig for now.

As part of the examples you have, it would be good to create 2 example Python files:

  1. To show how to use it with a computer vision model
  2. To show how to use it with an LLM base model

What do you think?



def freeze_layers_regex(config: "BaseTrainerConfig", model: ECD) -> None:
"""Freezes layers based on provided regular expression."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add all of the comments around inputs/outputs as well

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethanreidel I think that if you put rom __future__ import annotations as the very first line in the module, you would not need to quote the types. Would you like to give it a try and see if it works? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the annotations import, and it worked, but the git pre-commit was forcing changes (e.g. converting all uppercase Dicts to lowercase dicts) that I didn't like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethanreidel I think that makes sense.

Are you able to expand on the docstring itself for this function? Also, if it also supports LLM, can we make model a union of ECD and LLM?

@ethanreidel
Copy link
Contributor Author

@saad-palapa

ludwig/utils/pretrained_summary.py Outdated Show resolved Hide resolved
ludwig/utils/trainer_utils.py Outdated Show resolved Hide resolved
@@ -225,6 +227,10 @@ def prepare(self):
base_learning_rate *= lr_scale_fn(self.distributed.size())
self.base_learning_rate = base_learning_rate

# Given that regex is supplied, freeze layers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethanreidel Question: By placing this capability at the top-level "trainer.py" module, we imply that all subclasses will inherit this capability. While for Computer Vision, based on yours and others' work, it will work and we believe will be a useful capability to have, what can we say about such trainers as FineTuneTrainer (for LLM fine-tuning) and NoneTrainer (for LLM predictions) -- and also the GBM Trainer (albeit not as popular nowadays). In particular, I am curious whether or not it will work for LLM architectures and, if affirmative, what you think about how we can take advantage of it? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexsherstinsky This wouldn't be particularly useful for the NoneTrainer, but it could be useful for the FineTuneTrainer. I left a comment above that might help simplify this by only enabling this parameter for the ECDTrainerConfig for now, which would then make it a no-op for LLMs and GBMs entirely.

def freeze_layers_regex(config: "BaseTrainerConfig", model: ECD) -> None:
"""Freezes layers based on provided regular expression."""
try:
pattern = re.compile(config.layers_to_freeze_regex)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethanreidel Would you be interested if I gave you a reasonably well-featured RegEx utility so that you can just put it into the utils and use it -- it will save a lot of boilerplate like this. Please let me know any time. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sounds good. Thanks

Copy link
Collaborator

@alexsherstinsky alexsherstinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethanreidel I love this work! Of course others need to look at it as well; I made a few largely minor comments and suggestions (and raised a few questions) for your consideration. More examples (e.g., into Ludwig Docs) would be wonderful!

Thank you very much!

Comment on lines 89 to 99
layers_to_freeze_regex: str = schema_utils.String(
default=None,
allow_none=True,
description=(
"Freeze specific layers based on provided regex. Freezing specific layers can improve a "
"pretrained model's performance in a number of ways. At a basic level, freezing early layers can "
"prevent overfitting by retaining more general features (beneficial for small datasets). Also can "
"reduce computational resource use and lower overall training time due to less gradient calculations. "
),
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of putting this in the base trainer config, what if we put it in the ECDTrainerConfig? For now, that will be good enough to ensure that this only works for ECD supported models and is not a valid argument/parameter for GBMs and LLMs. If it also works for LLMs, then we can also duplicate adding it in FineTuneTrainerConfig which is used by LLMs. It also means we don't have to modify any other trainers for now.

model = encoder_class()

for name, _ in model.named_parameters():
print(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally don't like to use Prints in Ludwig code - can we use logger.info() instead?

import logging
logger = logging.getLogger(__name__)

logger.info("message")

pattern = re.compile(config.layers_to_freeze_regex)
except re.error:
logger.error(f"Invalid regex input: {config.layers_to_freeze_regex}")
exit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of exit(), let's raise a RuntimeError() with the same message.

In fact, here's a thought I have: We can move this check to earlier in the code path, that is, at config validation time. Specifically, you can create a __post_init__() hook for ECDTrainerConfig and FineTuneTrainerConfig that tries to do re.compile() and if it fails, throws a ConfigValidationError with the error message. That way, we don't have to wait for all of preprocessing etc to be done before catching this error.

Here's an example explaining the same idea in a different part of the Ludwig codepath: https://github.com/ludwig-ai/ludwig/blob/master/ludwig/schema/llms/peft.py#L443

Comment on lines 423 to 429
matched = False
for name, p in model.named_parameters():
if re.search(pattern, str(name)):
p.requires_grad = False
matched = True
if not matched:
logger.error(f"No regex match for {config.layers_to_freeze_regex}! Check layer names and regex syntax.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two thoughts here:

  1. Instead of logger.error, perhaps we can do a logger.warning()? I think it's okay if there are no matches, but we just want to claim that as a warning so the users can notice it as opposed to calling it an error (which it sort of is).
  2. One thing that could be very useful here, is to also create a set of all the layers where the regex search actually returns true and requires grad gets set to false, and then log that full list! Observability can be super helpful

Copy link
Contributor

@arnavgarg1 arnavgarg1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, really nice work and clean implementation @ethanreidel! I left a few suggestions that might help simplify edge cases as well as considerations to add more observability into which layers are frozen.

@arnavgarg1
Copy link
Contributor

I would also recommend installing pre-commit via pip install pre-commit, then running pre-commit install within the Ludwig repo. That will help fix some of the pre-commit related styling errors here: https://results.pre-commit.ci/run/github/163346054/1714080170.FEtyFVFyR8m3t6xqARcnDQ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Option to unfreeze encoder layers when training a computer vision model
5 participants