-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate X-LoRA #1491
base: main
Are you sure you want to change the base?
Integrate X-LoRA #1491
Conversation
Let me know once this is ready for review. |
Hi @BenjaminBossan, I think this is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for working on this PR. I did not do an in-depth review yet, but from what I saw it already looks very solid.
Before working on the details, I think it's best to get the overall design polished. For this, I still have a few questions, please check out my comments. Together, I'm sure we can figure out a way to simplify some of the code.
Also, it would really help to have examples and/or tests to see X-LoRA in action. Do you have something that you could add? Later, we should also work on documentation, but this can wait until for now.
Edit: For other readers, here is the paper link: https://arxiv.org/abs/2402.07148
src/peft/tuners/xlora/classifier.py
Outdated
npy = result.numpy() | ||
numpy.save(file, npy) | ||
|
||
def flush_log_scalings(self, path: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why we need this? Is this just for debugging/monitoring purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API enables the user to get a log of the scalings. It is useful for generating visualizations such as this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In that case, there should be some kind of example provided to illustrate how to use this, I think otherwise users will not discover this feature.
Also, I'd suggest for this method to only return the indices_map
and allow the caller to decide themselves if they want to save it on disk as json or do something else with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't returning the indices_map
(or maybe the seqlens_map
would be better, as it contains the tensors) make flush_log_scalings
redundant to get_scalings_log
? Of course, a dedicated method to calculate the indices_map
may be helpful, and given the addition of such a method I think removing flush_log_scalings
would be OK.
Additionally, there is actually a wrapper method on XLoraModel
with the rest of the API which contains a docstring. This way, it should be easy to find. This method in the classifier is actually internal, I have prefixed it with _
now.
src/peft/peft_model.py
Outdated
@@ -149,7 +155,7 @@ def active_adapters(self) -> list[str]: | |||
adapters = self.active_adapter | |||
if isinstance(adapters, str): | |||
adapters = [adapters] | |||
return adapters | |||
return list(filter(lambda x: len(x) > 0, adapters)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this is needed. Is this because of self.active_adapter = ""
. Could you please explain the reason behind that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During some of the testing I did, self.active_adapters
would return a function instead of executing the @property
. Through debugging, I discovered that it was somehow connected to self.active_adapter
not being set, as no adapter is initially loaded.
However, suppose that I could allow the default adapter to be loaded, and then delete it later. That should make those obsolete. I have pushed a commit to hopefully resolve this.
src/peft/peft_model.py
Outdated
@@ -123,7 +129,7 @@ def __init__(self, model: PreTrainedModel, peft_config: PeftConfig, adapter_name | |||
else: | |||
self._peft_config = None | |||
cls = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type] | |||
self.base_model = cls(model, {adapter_name: peft_config}, adapter_name) | |||
self.base_model = cls(model, {adapter_name: peft_config}, adapter_name, self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really want to avoid this, as this looks like a mixing of concerns. Surely, we can figure out a better way. Could you explain why this was needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was needed because in XLoraModel.__init__
we load adapters into the PeftModel
via load_adapter
and for use by the XLoraClassifier. This way, automatic loading is achieved. However, all that matters is that we are able to get some reference to PeftModel
into the constructor.
Perhaps we could add some code after we call cls(model...)
, and check if self.base_model
is a XLoraModel
. Then, we could call a __post_init__
which would take the PeftModel
self, and do the adapter loading. Would this be more elegant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed a commit (c5cdfc3) which is the latest one currently for easy reversion. Does this resolve the concern?
src/peft/tuners/xlora/classifier.py
Outdated
npy = result.numpy() | ||
numpy.save(file, npy) | ||
|
||
def flush_log_scalings(self, path: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In that case, there should be some kind of example provided to illustrate how to use this, I think otherwise users will not discover this feature.
Also, I'd suggest for this method to only return the indices_map
and allow the caller to decide themselves if they want to save it on disk as json or do something else with it.
src/peft/tuners/xlora/model.py
Outdated
model_peft.get_nb_trainable_parameters, | ||
model_peft.generate, | ||
) | ||
model_peft.save_pretrained = peft_model_wrapper.save_pretrained # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I fully understand this yet. Above, we call:
self.base_model.__xlora_post_init__(model, peft_config, adapter_name)
but here it looks like this method requires 4 arguments. What is the model_peft
, is this the PeftModel
?
So what I suspect is happening here is that you want to modify the behavior of generate
and save_pretrained
of the PeftModel
without actually modifying PeftModel
, is this the goal?
When it comes to generate
, the PeftModel
calls generate
on the base_model
anyway, could we not add the modification at that point?
When it comes to save_pretrained
, I think we could check if we can somehow make the two approaches work together, I'd need to understand what the custom save_pretrained
method does differently.
One easy way that we could allow custom save_pretrained
methods without a lot of changes in PeftModel
would be something like this:
class PeftModel:
def save_pretrained(...):
if hasattr(self.base_model, "_custom_save_pretrained"):
return self.base_model._custom_save_pretrained(...)
# same code as right now
This way, we only added 2 extra lines in PeftModel
but would allow the underlying model to implement their own save_pretrained
method.
src/peft/tuners/xlora/insertion.py
Outdated
# TODO(EricLBuehler): Evaluate effectiveness and performance degradation | ||
self.peft_model.base_model.eval() | ||
if not self.config.use_trainable_adapters: | ||
for name, param in self.peft_model.base_model.named_parameters(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks strange to me, generate
should normally not have anything to do with parameter updates. Could you explain why this is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly! We discovered during training that the adapters were being set to trainable after each generate
call. If you could provide any insight to why that may be the case, it would be great!
This should certainly not happen. Could you please open an issue and provide an example so that we can fix it? Regarding the state of the PR, it's actually a bit hard for me to tell which comments have been addressed and which haven't (as GH removes them when the line has been changed, even if it may not have been addressed). Regardless of some designs which I think could be improved, I think the most efficient way forward would be if you could provide an example/tests so that I can see X-LoRA in action. This may help answer some questions I still have. |
Thank you, I work on some example code to reproduce the behavior and will raise an issue. Regarding examples of X-LoRA, we have put together some examples using the I hope this will work as a demonstration of X-LoRA in action, would it be better for me to provide some examples using this PR? |
I took a look at these examples, thanks. But let's get something to work based on this PR, even if very basic. This helps me understand what code paths are taken and how the different parts interact. Otherwise, the review is much harder for me. Also, we will need something like this eventually because we want to add tests for X-LoRA. As I said, it can be very basic for a start. |
I was able to put together a small example which shows how one would use the API as documented here, which I have attached in a plain text file as GH does not allow .py to be attached. This is very basic and just shows creation, generation and simple API usage. I hope this helps! |
I tried to run your example but encountered some problems:
|
Thank you for trying it out. I ran the following code on a local machine with the latest installation from this branch to test the loading of a normal LoRA adapter, and it seemed to work, as after printing the model I can see from transformers import AutoModelForCausalLM, OPTForCausalLM, AutoTokenizer
from peft import LoraConfig
model_id = "HuggingFaceH4/zephyr-7b-beta"
model = AutoModelForCausalLM.from_pretrained(model_id)
lora_config = LoraConfig(
target_modules=[
"q_proj",
"gate_proj",
"o_proj",
"v_proj",
"k_proj"
],
init_lora_weights=False
)
model.add_adapter(lora_config, adapter_name="adapter_1")
print(model) Strangely, when I printed out the model in the X-LoRA test script, it showed proper injection of adapters as well as the classifier. To begin fixing this, could you please provide a minimum-reproducible example for the LoRA adapter loading so that I can find the error? |
Using your exact branch with the merge conflict removed, when I run your script with this slight modification, I get the issue of no LoRA weights being applied: - model.add_adapter(lora_config, adapter_name="adapter_1")
+ from peft import get_peft_model
+ model = get_peft_model(model, lora_config) |
I found the bug: it was because of a mistake in a |
Thanks for fixing this, it should now be working. There is still an issue with a merge conflict being unresolved as mentioned earlier:
|
I have now fixed the merge conflict. |
Hi @BenjaminBossan, I'm not sure if you have had a chance to look at the updated PR, as merge conflict has been resolved, and I think it is ready for another review. Would you prefer a new PR to be opened as a cleaner slate for further reviews? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on my previous comments. I was busy with some other stuff and only today could do another in detailed review of your PR. As I have to go now, it is unfortunately not 100% complete, but I still wanted to give you the feedback so you don't have to wait.
In addition to the individual comments I made, I have a few general questions:
- How is the X-LoRA adapter trained? Could you please provide an example? Eventually, we'll want to move this to unit tests.
- Could you please add the copyright notice to all new modules?
- X-LoRA only really works with transformers language models, right? Can we document this more clearly? Also, do you think it would be possible to make this work with other types of models?
- I'm not a fan of the type annotations of the style
self.inner: nn.ModuleList = nn.ModuleList([])
ormodel: nn.Module = self.model
, especially when followed by a# type: ignore
. Same with the use oftyping.cast
. Is that because your IDE flags the code otherwise? Maybe you could deactivate the type checker for this project, as PEFT isn't really well annotated.
src/peft/peft_model.py
Outdated
if not isinstance(config, XLoraConfig): | ||
raise TypeError(f"Expected 'XLoraConfig', got '{type(config)}' instead.") | ||
|
||
device = infer_device() # As in PeftModel.load_adapter, torch_device = infer_device( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment is cut off?
src/peft/tuners/xlora/model.py
Outdated
) -> None: | ||
super().__init__(model, config, adapter_name) | ||
|
||
def _xlora_post_init( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it wouldn't be better to convert this to a standalone function, something like def post_init_lora(peft_model)
. Not sure if we need all the other arguments, can they not be derived from the PeftModel
instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. Most of those are deeply nested by the time post_init_lora
is called, so I thought it would increase readability to pass it this way. Would you prefer it if they are accessed through the PeftModel?
src/peft/peft_model.py
Outdated
config.device = torch.device(device) | ||
|
||
# If we are passed adapters in the kwargs, it is already in the config. | ||
# If no adapters are passed, config.adapters is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that this is for the case where we call save_pretrained
on the X-LoRA model and then load this pretrained model again with from_pretrained
? The only "new" thing added in that case would be the X-LoRA classifier, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct. I simply load the weights for the classifier here.
src/peft/tuners/lora/layer.py
Outdated
@@ -434,7 +446,14 @@ def forward(self, x: torch.Tensor, *args: Any, **kwargs: Any) -> torch.Tensor: | |||
x = x.to(lora_A.weight.dtype) | |||
|
|||
if not self.use_dora[active_adapter]: | |||
result = result + lora_B(lora_A(dropout(x))) * scaling | |||
if _xlora_layer is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so happy with this addition to the LoraLayer
s. It makes reading and understanding them more complex and requires all LoRA layers to be updated (e.g. what about the bnb lora layers?).
I couldn't come up with an alternative yet, but I wonder if we could achieve something with wrapping and/or forward hooks. I'll continue to think about this tomorrow but wanted to let you know already in case you have some ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it is not very easy to read. Perhaps we could implement some sort of hook in the LoraLayer
so that techniques such as DoRA and X-LoRA could use that instead of modifying the layer source?
Thank you for your review! I have updated the code with your suggestions.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the updates. I left a few comments, but let's focus more on the bigger picture and not each detail.
When I tried to run your code, I encountered an error:
AttributeError: 'XLoraConfig' object has no attribute 'target_modules'
I also think some other parts of the code don't quite work. To avoid this in the future, let's start adding unit tests. Let's start simple and add a new file tests/test_xlora.py
with a functional test based on the example you posted earlier. It should also contain a test of training the XLoRA classifier.
Regarding the issue of only working with transformers language models, I think it's fine for the start. We can think of generalizing this in a follow up PR.
Again, thanks a lot for this contribution and your patience.
src/peft/tuners/xlora/classifier.py
Outdated
|
||
logits = self.last.forward(hidden_state) | ||
|
||
### Repeat to make layerwise scalings if the classifier layer does not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add this explanation as a comment? Thanks.
How is the state of the PR? Let me know if you need any help with it. |
Yes, so when you do |
I just updated it to use the |
I think this should be ready for another review: there have been a significant amount of changes made, and the hacking is essentially gone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. I didn't do an in-depth review yet, as the CI is failing. I'll do a proper review once we get it to be green :)
As to why it's failing, part of this seems to be that you use some newer Python type annotation features that are not supported with older Python versions. Your safest bet is to add this to all modules that you added:
from __future__ import annotations
I'm not sure if this fully explains what's going, but let's start with that and see if the CI still complains.
src/peft/tuners/lora/model.py
Outdated
peft_config.target_modules = set( | ||
TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING[model_config["model_type"]] | ||
) | ||
if model_config["model_type"] not in TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change also needed for X-LoRA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think so. The X-LoRA model doesn't handle target modules at all, as it delegates that to the LoRA layers.
I added |
I think the changes in |
Ah ok, I think that should be fixed now. |
Thanks for the latest fix, but a bunch of tests are still failing. It seems that this is because of this line: Probably, replacing it with Generally, you could locally run |
I'm now getting this result, do you think it could be a hardware issue:
However, I don't think any of the changes in this PR would affect that? |
My guess is that you're using an older PyTorch version. This should not be related to this PR. |
Ok, thanks! After updating, |
I think the tests are failing because we are trying to test on the CPU while the tests cast to CUDA. It uses the |
Tests are still failing. One fix is very easy, in the |
I fixed those tests and added some fake defaults for |
Nice @EricLBuehler now we're down to a single failing test: |
Ok, I think that's fixed: it was a small bug. It's strange why the tests I ran locally didn't catch it! |
Again an error:
This should be fixed by just removing the
How do you run the tests locally? |
Paper link: https://arxiv.org/abs/2402.07148
This PR integrates X-LoRA by creating a new tuner model type on the level of
LoraModel
. Please see #1472.Changes
Although the new model type is a subclass of
LoraModel
, this is only an implementation detail to remove the need for nestedPeftModel
s. In a similar vein of thought, I have updated the signatures of the tuner__init__
functions to allow the method swapping and ensure thexLoRAModel
is a tuner and not on the "level" ofPeftModel
.__init__
functions to take a back reference (not used for all butxLoRAModel
).xLoRAModel
andxLoRAConfig
.xLoRAModel
.Status
LoraModel
somewhere before layer swapping inxLoRAModel.__init__