Support for freezing pretrained vision model layers with regex #3981

ethanreidel · 2024-04-02T21:33:57Z

Allows the user to input a regular expression in the yaml config which freezes specific layers of a pretrained model. Adds new CLI option "pretrained_summary" to let users access string representations of model layers for freezing via regex. Currently all pretrained torchvision models are accessible.

trainer:
layers_to_freeze_regex: (regex here)

ludwig pretrained_summary -m (model name here)

(I am aware that the collect_summary CLI command is similar, however it only accepts a preexisting directory so I thought creating a separate command to strictly output layer names was appropriate for this feature.)

Closes #3733

Future plans -> expand this capability to implement gradual unfreezing

Test: pytest tests/ludwig/modules/test_regex_freezing.py

…tils

This reverts commit 5df7362.

github-actions · 2024-04-02T22:06:26Z

Unit Test Results

      6 files ±      0       6 suites ±0 57m 8s ⏱️ + 42m 47s
2 997 tests +2 985 2 967 ✔️ +2 958 23 💤 +20   7 ❌ +  7
8 991 runs +8 931 8 909 ✔️ +8 867 69 💤 +51 13 ❌ +13

For more details on these failures, see this check.

Results for commit 1feb853. ± Comparison against base commit 4b07ce4.

This pull request removes 4 and adds 2989 tests. Note that renamed tests count towards both.

tests.regression_tests.model.test_old_models ‑ test_model_loaded_from_old_config_prediction_works
tests.regression_tests.model.test_old_models ‑ test_predict_deprecated_model[respiratory]
tests.regression_tests.model.test_old_models ‑ test_predict_deprecated_model[titanic]
tests.regression_tests.model.test_old_models ‑ test_predict_deprecated_model[twitter_bots]

tests.ludwig.accounting.test_used_tokens ‑ test_get_used_tokens_for_ecd
tests.ludwig.accounting.test_used_tokens ‑ test_get_used_tokens_for_ecd_no_targets
tests.ludwig.accounting.test_used_tokens ‑ test_get_used_tokens_for_gbm
tests.ludwig.accounting.test_used_tokens ‑ test_get_used_tokens_for_llm
tests.ludwig.augmentation.test_augmentation_pipeline ‑ test_image_augmentation[augmentation_pipeline_ops0]
tests.ludwig.augmentation.test_augmentation_pipeline ‑ test_image_augmentation[augmentation_pipeline_ops1]
tests.ludwig.augmentation.test_augmentation_pipeline ‑ test_image_augmentation[augmentation_pipeline_ops2]
tests.ludwig.augmentation.test_augmentation_pipeline ‑ test_invalid_augmentation_parameters[None]
tests.ludwig.augmentation.test_augmentation_pipeline ‑ test_invalid_augmentation_parameters[augmentation_pipeline_ops1]
tests.ludwig.augmentation.test_augmentation_pipeline ‑ test_invalid_augmentation_parameters[augmentation_pipeline_ops2]
…

This pull request skips 2 tests.

tests.regression_tests.benchmark.test_model_performance ‑ test_performance[ames_housing.gbm.yaml]
tests.regression_tests.benchmark.test_model_performance ‑ test_performance[mercedes_benz_greener.gbm.yaml]

♻️ This comment has been updated with latest results.

skanjila · 2024-04-06T19:10:52Z

ludwig/cli.py

+ from ludwig.utils import pretrained_summary
+
+ pretrained_summary.cli_summarize_pretrained(sys.argv[2:])
+


it might be good to add in the docs some example runs and outputs

@ethanreidel To second @skanjila -- would it be possible to show an example of running this command in terms of how it is different from the existing one -- and an example output. Thank you very much.

Qq: when you say you'd like an example, do you mean an example in the Ludwig docs or how would you prefer it?

@ethanreidel One option is to create an example in the examples/ top level directory in Ludwig

ludwig/schema/metadata/configs/trainer.yaml

skanjila · 2024-04-06T19:12:02Z

ludwig/utils/pretrained_summary.py

+from ludwig.contrib import add_contrib_callback_args
+from ludwig.globals import LUDWIG_VERSION
+from ludwig.utils.print_utils import print_ludwig
+


wait are we really supporting all of these models, I thought we were just going to go out the door with a couple of models to start?

For this specific feature (simple regex freezing), as long as you have access to the string representation of layers + actual model architecture, you can freeze any layers that you'd like. It wasn't any extra work adding support for all torchvision models besides adding to this list. I however don't like the look of this long model array though

@ethanreidel While it looks like for torchvision this will be supported, what about text/LLMs (this is kind of related to my previous comment in the Trainers section). Thanks!

For this specific feature (simple regex freezing), as long as you have access to the string representation of layers + actual model architecture, you can freeze any layers that you'd like. It wasn't any extra work adding support for all torchvision models besides adding to this list. I however don't like the look of this long model array though

@ethanreidel Sorry, could you please point me to this "long model array"? Which line in your code has it? Thanks!

For your first question Alex: as long as access to the model layers + their requires_grad parameter is available, in theory, this feature should work on LLMs/text. I'm not too familiar with LLM architecture and I'll have to do some quick checks, but I'm 99% sure it is an easy addition. Second question: in a previous commit, I had a pretty hacky solution where users had another command line option (under pretrained_summary) which would list all available model names. Those names were stored in a Python list which had a few issues namely having to expand it regularly/many lines of unnecessary code. Saad made a good point and said to fully remove it (it was not needed), so it's no longer there.

Just checked and sure enough you can apply the same regex freezing technique to an LLM

Just checked and sure enough you can apply the same regex freezing technique to an LLM
@ethanreidel That's awesome. Maybe we can then use one of my earlier comments to only add this parameter to the ECDTrainerConfig and FineTuneTrainerConfig for now.

As part of the examples you have, it would be good to create 2 example Python files:

To show how to use it with a computer vision model

To show how to use it with an LLM base model

What do you think?

skanjila · 2024-04-06T19:12:40Z

ludwig/utils/trainer_utils.py

+
+
+def freeze_layers_regex(config: "BaseTrainerConfig", model: ECD) -> None:
+ """Freezes layers based on provided regular expression."""


lets add all of the comments around inputs/outputs as well

@ethanreidel I think that if you put rom __future__ import annotations as the very first line in the module, you would not need to quote the types. Would you like to give it a try and see if it works? Thanks!

I tested the annotations import, and it worked, but the git pre-commit was forcing changes (e.g. converting all uppercase Dicts to lowercase dicts) that I didn't like.

@ethanreidel I think that makes sense.

Are you able to expand on the docstring itself for this function? Also, if it also supports LLM, can we make model a union of ECD and LLM?

ethanreidel · 2024-04-16T03:25:51Z

@saad-palapa

ludwig/utils/pretrained_summary.py

ludwig/utils/trainer_utils.py

ludwig/utils/pretrained_summary.py

ludwig/schema/metadata/configs/trainer.yaml

ludwig/schema/trainer.py

alexsherstinsky · 2024-04-25T15:48:12Z

ludwig/trainers/trainer.py

@@ -225,6 +227,10 @@ def prepare(self):
 base_learning_rate *= lr_scale_fn(self.distributed.size())
 self.base_learning_rate = base_learning_rate

+ # Given that regex is supplied, freeze layers


@ethanreidel Question: By placing this capability at the top-level "trainer.py" module, we imply that all subclasses will inherit this capability. While for Computer Vision, based on yours and others' work, it will work and we believe will be a useful capability to have, what can we say about such trainers as FineTuneTrainer (for LLM fine-tuning) and NoneTrainer (for LLM predictions) -- and also the GBM Trainer (albeit not as popular nowadays). In particular, I am curious whether or not it will work for LLM architectures and, if affirmative, what you think about how we can take advantage of it? Thanks!

@alexsherstinsky This wouldn't be particularly useful for the NoneTrainer, but it could be useful for the FineTuneTrainer. I left a comment above that might help simplify this by only enabling this parameter for the ECDTrainerConfig for now, which would then make it a no-op for LLMs and GBMs entirely.

alexsherstinsky · 2024-04-25T16:46:56Z

ludwig/utils/trainer_utils.py

+def freeze_layers_regex(config: "BaseTrainerConfig", model: ECD) -> None:
+ """Freezes layers based on provided regular expression."""
+ try:
+ pattern = re.compile(config.layers_to_freeze_regex)


@ethanreidel Would you be interested if I gave you a reasonably well-featured RegEx utility so that you can just put it into the utils and use it -- it will save a lot of boilerplate like this. Please let me know any time. Thanks!

Yeah that sounds good. Thanks

alexsherstinsky

@ethanreidel I love this work! Of course others need to look at it as well; I made a few largely minor comments and suggestions (and raised a few questions) for your consideration. More examples (e.g., into Ludwig Docs) would be wonderful!

Thank you very much!

…develop

arnavgarg1 · 2024-05-10T19:31:07Z

ludwig/schema/trainer.py

+ layers_to_freeze_regex: str = schema_utils.String(
+ default=None,
+ allow_none=True,
+ description=(
+ "Freeze specific layers based on provided regex. Freezing specific layers can improve a "
+ "pretrained model's performance in a number of ways. At a basic level, freezing early layers can "
+ "prevent overfitting by retaining more general features (beneficial for small datasets). Also can "
+ "reduce computational resource use and lower overall training time due to less gradient calculations. "
+ ),
+ )
+


Instead of putting this in the base trainer config, what if we put it in the ECDTrainerConfig? For now, that will be good enough to ensure that this only works for ECD supported models and is not a valid argument/parameter for GBMs and LLMs. If it also works for LLMs, then we can also duplicate adding it in FineTuneTrainerConfig which is used by LLMs. It also means we don't have to modify any other trainers for now.

arnavgarg1 · 2024-05-10T19:34:16Z

ludwig/utils/pretrained_summary.py

+ model = encoder_class()
+
+ for name, _ in model.named_parameters():
+ print(name)


We generally don't like to use Prints in Ludwig code - can we use logger.info() instead?

import logging logger = logging.getLogger(__name__) logger.info("message")

arnavgarg1 · 2024-05-10T19:38:45Z

ludwig/utils/trainer_utils.py

+ pattern = re.compile(config.layers_to_freeze_regex)
+ except re.error:
+ logger.error(f"Invalid regex input: {config.layers_to_freeze_regex}")
+ exit()


Instead of exit(), let's raise a RuntimeError() with the same message.

In fact, here's a thought I have: We can move this check to earlier in the code path, that is, at config validation time. Specifically, you can create a __post_init__() hook for ECDTrainerConfig and FineTuneTrainerConfig that tries to do re.compile() and if it fails, throws a ConfigValidationError with the error message. That way, we don't have to wait for all of preprocessing etc to be done before catching this error.

Here's an example explaining the same idea in a different part of the Ludwig codepath: https://github.com/ludwig-ai/ludwig/blob/master/ludwig/schema/llms/peft.py#L443

arnavgarg1 · 2024-05-10T19:41:25Z

ludwig/utils/trainer_utils.py

+ matched = False
+ for name, p in model.named_parameters():
+ if re.search(pattern, str(name)):
+ p.requires_grad = False
+ matched = True
+ if not matched:
+ logger.error(f"No regex match for {config.layers_to_freeze_regex}! Check layer names and regex syntax.")


Two thoughts here:

Instead of logger.error, perhaps we can do a logger.warning()? I think it's okay if there are no matches, but we just want to claim that as a warning so the users can notice it as opposed to calling it an error (which it sort of is).

One thing that could be very useful here, is to also create a set of all the layers where the regex search actually returns true and requires grad gets set to false, and then log that full list! Observability can be super helpful

arnavgarg1

Overall, really nice work and clean implementation @ethanreidel! I left a few suggestions that might help simplify edge cases as well as considerations to add more observability into which layers are frozen.

arnavgarg1 · 2024-05-10T19:45:25Z

I would also recommend installing pre-commit via pip install pre-commit, then running pre-commit install within the Ludwig repo. That will help fix some of the pre-commit related styling errors here: https://results.pre-commit.ci/run/github/163346054/1714080170.FEtyFVFyR8m3t6xqARcnDQ

ethanreidel added 18 commits March 21, 2024 20:47

added regex support for freezing specific layers

92366d5

fixed changes to trainer yaml config

cbe1b67

regen static schema

5df7362

added trainer schema changes

b7985f6

fixed var names

3a4507d

added unit test, cleaned up trainer code, added function in trainer_u…

9fe5df6

…tils

added training test

82167ce

cleaned up tests

3b8bbd3

misc comments/var name changes

a7683de

updated description of layers_to_freeze_regex parameter

5418a6a

Revert "regen static schema"

aeb2121

This reverts commit 5df7362.

fixed typo

1600d19

well another typo fix

58d18ef

initial summary CLI addition

f4e9cb4

removed try statement

192119b

added test and model list function

6649433

use pretrained off

5ff7e7c

use_pretrained false for test

ad77764

ethanreidel requested review from w4nderlust, tgaddair, justinxzhao, arnavgarg1, geoffreyangus, jeffkinnison, Infernaught and alexsherstinsky as code owners April 2, 2024 21:33

skanjila reviewed Apr 6, 2024

View reviewed changes

saad-palapa reviewed Apr 16, 2024

View reviewed changes

ludwig/utils/pretrained_summary.py Outdated Show resolved Hide resolved

ludwig/utils/trainer_utils.py Outdated Show resolved Hide resolved

saad-palapa reviewed Apr 22, 2024

View reviewed changes

ludwig/utils/pretrained_summary.py Outdated Show resolved Hide resolved

ethanreidel added 2 commits April 22, 2024 14:43

added more thorough checking for valid regex

cc94eb4

fixed train test, cleaned up pretrained summary CLI

6f36aba

alexsherstinsky reviewed Apr 25, 2024

View reviewed changes

ludwig/schema/metadata/configs/trainer.yaml Outdated Show resolved Hide resolved

alexsherstinsky reviewed Apr 25, 2024

View reviewed changes

ludwig/schema/metadata/configs/trainer.yaml Outdated Show resolved Hide resolved

alexsherstinsky reviewed Apr 25, 2024

View reviewed changes

ludwig/schema/trainer.py Outdated Show resolved Hide resolved

alexsherstinsky reviewed Apr 25, 2024

View reviewed changes

alexsherstinsky requested changes Apr 25, 2024

View reviewed changes

ethanreidel added 4 commits April 25, 2024 14:03

various nits fixed

31511fe

nit fixes

a900e48

Merge branch 'develop' of https://github.com/ethanreidel/ludwig into …

02bb963

…develop

reverted changes to trainer utils

a375296

arnavgarg1 reviewed May 10, 2024

View reviewed changes

ethanreidel added 11 commits May 15, 2024 17:08

updated collect summary + cli changes

23a8b3c

post init changes + trainer cleanup

1c0f173

updated unit test for LLM freezing

f96b5b3

two examples and various fixes

5546c83

small fix

7483111

fix

3210c56

spaces fix

66fb3de

added instructions for new functionality

f8a46ca

quick fixes

8fff02e

small llm test changes

d2e0690

added padding token for IT

1feb853

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for freezing pretrained vision model layers with regex #3981

Support for freezing pretrained vision model layers with regex #3981

ethanreidel commented Apr 2, 2024 •

edited

github-actions bot commented Apr 2, 2024 •

edited

skanjila Apr 6, 2024

alexsherstinsky Apr 25, 2024

ethanreidel Apr 25, 2024

ethanreidel Apr 25, 2024

arnavgarg1 May 10, 2024

skanjila Apr 6, 2024

ethanreidel Apr 12, 2024

alexsherstinsky Apr 25, 2024

alexsherstinsky Apr 25, 2024

ethanreidel Apr 25, 2024

ethanreidel Apr 25, 2024

arnavgarg1 May 10, 2024

skanjila Apr 6, 2024

alexsherstinsky Apr 25, 2024

ethanreidel Apr 25, 2024

arnavgarg1 May 10, 2024

ethanreidel commented Apr 16, 2024

alexsherstinsky Apr 25, 2024

arnavgarg1 May 10, 2024

alexsherstinsky Apr 25, 2024

ethanreidel Apr 25, 2024

alexsherstinsky left a comment

arnavgarg1 May 10, 2024

arnavgarg1 May 10, 2024

arnavgarg1 May 10, 2024

arnavgarg1 May 10, 2024

arnavgarg1 left a comment

arnavgarg1 commented May 10, 2024

		from ludwig.utils import pretrained_summary

		pretrained_summary.cli_summarize_pretrained(sys.argv[2:])



		def freeze_layers_regex(config: "BaseTrainerConfig", model: ECD) -> None:
		"""Freezes layers based on provided regular expression."""

Support for freezing pretrained vision model layers with regex #3981

Are you sure you want to change the base?

Support for freezing pretrained vision model layers with regex #3981

Conversation

ethanreidel commented Apr 2, 2024 • edited

github-actions bot commented Apr 2, 2024 • edited

Unit Test Results

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ethanreidel commented Apr 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexsherstinsky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arnavgarg1 left a comment

Choose a reason for hiding this comment

arnavgarg1 commented May 10, 2024

ethanreidel commented Apr 2, 2024 •

edited

github-actions bot commented Apr 2, 2024 •

edited