Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding adapters to SpeechBrain (Code from Samsung AI Center Cambridge) #2534

Open
wants to merge 37 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
47e3097
shorter augmentations in yaml
Feb 8, 2024
5ab888a
layout to 80 char
Feb 8, 2024
a3bf472
listed label replication
Feb 8, 2024
c86d687
listed label replication
Feb 8, 2024
761bf93
listed label replication
Feb 8, 2024
09cfde3
Refact CTC
Feb 8, 2024
e60396f
Refact transducer
Feb 8, 2024
d6a5524
Refact seq2seq
Feb 8, 2024
9daba50
call replicate label instead of duplication
Feb 8, 2024
6bf2361
refactor aishell
Feb 8, 2024
7ec92c5
refactor aishell
Feb 8, 2024
ebae569
CommonLanuageÃ
Feb 8, 2024
088a0eb
fix error + CV CTC
Feb 8, 2024
bfb9bc2
Giga OOF
Feb 8, 2024
21353d5
Giga OOF
Feb 8, 2024
9971121
Giga OOF
Feb 8, 2024
f879302
Giga OOF
Feb 8, 2024
95c5ea4
Giga OOF
Feb 8, 2024
1b24844
Giga OOF
Feb 8, 2024
a5a97aa
Giga OOF
Feb 8, 2024
55904dd
Giga OOF
Feb 8, 2024
7f366bb
Giga OOF
Feb 8, 2024
963bda4
Finishing OOF
Feb 8, 2024
922024a
final touch LULZ
Feb 8, 2024
819f8c8
fix tests
Feb 8, 2024
8ade568
Tests???Ã
Feb 8, 2024
9e73c10
fix augment in some recipes
mravanelli Feb 10, 2024
b2b8f56
merge
Feb 20, 2024
f0e9f6d
Merge branch 'develop' of https://github.com/TParcollet/speechbrain-r…
Feb 20, 2024
afd37a1
Merge branch 'develop' of https://github.com/speechbrain/speechbrain …
Feb 22, 2024
331ff7d
Merge branch 'develop' of https://github.com/speechbrain/speechbrain …
Feb 26, 2024
81db8cc
Merge branch 'develop' of https://github.com/speechbrain/speechbrain …
Feb 28, 2024
9ba61e6
Merge branch 'develop' of https://github.com/speechbrain/speechbrain …
Mar 2, 2024
56b5d3c
Merge branch 'develop' of https://github.com/speechbrain/speechbrain …
Mar 19, 2024
e4c6f32
Merge branch 'develop' of https://github.com/speechbrain/speechbrain …
Apr 30, 2024
13f889d
Initial adapter proposal
Apr 30, 2024
5f3c311
Make sacrifice to the CI mighty spirit
May 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .dict-speechbrain.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
### Compound Words With 1 or 2 letter Words ###
### Jargon ###
### Names ###
Houlsby
### British ###
### Non-English ###

Expand Down
238 changes: 238 additions & 0 deletions speechbrain/lobes/models/Adapters.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this go in nnet rather than lobes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. it's unclear because Adapters can be considered as "entire models" coming from the literature. But I think I agree that they can also be seen as small components. I'd be happy if you could help with the get_model like for PEFT. From your previous PR, I liked the fact that we can rely on the larger Adapter base from PEFT -- I am wondering if there isn't a way to combine both ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally like the fact that with my function, you can actually specify what part of the Brain.modules or whatever model you want to put Adapters on. But I'd be happy to see something else.

Original file line number Diff line number Diff line change
@@ -0,0 +1,238 @@
"""The SpeechBrain implementation of various pre-trained model adapters e.g.
LoRA, Houlsby

Authors
* Titouan Parcollet 2024
"""

import torch
import torch.nn as nn

from speechbrain.nnet.activations import Swish


def add_adapters_to_linear_in_model(
model: torch.nn.Module,
adapter_class: torch.nn.Module,
**kwargs,
):
"""Given any torch model, e.g. asr_brain.modules.Transformer, and an adapter
class, e.g. HoulsbyAdapter, this method will change all the linear layers
with this new adapter class (while preserving the parameters).

Arguments
---------
model: torch.nn.Module
The base PyTorch model.
adapter_class: torch.nn.Module
A Module corresponding to one of the adapter of this (not initialized)
SpeechBrain library.
kwargs: dict,
Ensemble of parameters that should be given to the adapter.
"""

for name, module in model.named_modules():
if isinstance(module, nn.Linear):
parent_module, target_name, target_module = get_submodules(
model, name
)
new_module = adapter_class(target_module, **kwargs)
replace_linear(
parent_module, target_name, target_module, new_module
)


class HoulsbyAdapterLinear(nn.Module):
"""This class implements the Houlsby Adapter as described in:
'Parameter-Efficient Transfer Learning for NLP'
https://arxiv.org/abs/1902.00751

Arguments
---------

target_linear: torch.nn.Module
Module corresponding to the pretrained Linear that will be wrapped with
this adapter.
input_size : int
Size of the incoming feature vector (previous layer). Output size is the
same.
projection_size : int
Size of the projection layer (usually smaller).
activation : torch.nn.Module
The activation function. Default is Swish.
bias : bool
Whether to use biases in the linear projections.

Example
-------
>>> import torch
>>> x = torch.rand((8, 60, 64))
>>> base_linear = torch.nn.Linear(64,64)
>>> adapt = HoulsbyAdapterLinear(base_linear, 8)
>>> output = adapt(x)
>>> output.shape
torch.Size([8, 60, 64])
"""

def __init__(
self,
target_linear,
projection_size,
activation=Swish,
bias=True,
):
super().__init__()

output_size = target_linear.weight.data.shape[0]

self.pretrained_linear = target_linear
self.adapter_down_proj = nn.Linear(
output_size, projection_size, bias=bias
)
self.adapter_up_proj = nn.Linear(
projection_size, output_size, bias=bias
)
self.activation = activation()

if bias:
self.adapter_down_proj.bias.data.fill_(0.0)
self.adapter_up_proj.bias.data.fill_(0.0)

def forward(
self,
x: torch.Tensor,
):
"""Applies the HoulsbyAdapter to an input tensor `x`.

Arguments
---------
x: torch.Tensor
Input tensor to the adapter module. Shape: [B, Time, X]
"""

x_pretrained = self.pretrained_linear(x)

return (
self.adapter_up_proj(
self.activation(self.adapter_down_proj(x_pretrained))
)
+ x_pretrained
)


class LoRALinear(nn.Module):
"""This class implements the LoRA Adapter as described in:
'LoRA: Low-Rank Adaptation of Large Language Models'
https://arxiv.org/abs/2106.09685

Arguments
---------

target_linear: torch.nn.Module
Module corresponding to the pretrained Linear that will be wrapped with
this adapter.
input_size : int
Size of the incoming feature vector (previous layer). Output size is the
same.
rank : int
Size of the projection layer or rank (usually smaller).
alpha : float
Value used to control the scaling in LoRA. Default is one.

Example
-------
>>> import torch
>>> x = torch.rand((8, 60, 64))
>>> base_linear = torch.nn.Linear(64,64)
>>> adapt = LoRALinear(base_linear, 64, 4)
>>> output = adapt(x)
>>> output.shape
torch.Size([8, 60, 64])
"""

def __init__(
self,
target_linear,
rank=16,
alpha=1.0,
):
super().__init__()

input_size = target_linear.weight.data.shape[1]
output_size = target_linear.weight.data.shape[0]

self.pretrained_linear = target_linear

self.adapter_down_proj = nn.Linear(input_size, rank, bias=False)
self.adapter_up_proj = nn.Linear(rank, output_size, bias=False)

self.scaling = alpha / rank
self.adapter_up_proj.weight.data.fill_(0.0)

def forward(
self,
x: torch.Tensor,
):
"""Applies the LoRA Adapter.

Arguments
---------
x: torch.Tensor
Input tensor to the adapter module. Shape: [B, Time, X]
"""
x_pretrained = self.pretrained_linear(x)
x_lora = self.adapter_up_proj(self.adapter_down_proj(x)) * self.scaling

return x_pretrained + x_lora


def replace_linear(
parent_module: torch.nn.Module,
name: str,
old_linear: torch.nn.Module,
new_module: torch.nn.Module,
):
"""Replace linear layers with a new module based on a parent assignation.
This is used to replace Linear layers with an Adapter layer wrapped around
the original layer. Hence, old parameters are preserved and new ones are
added.

Arguments
---------
parent_module: torch.nn.Module
Parent module for the old module.
name: str
Name of the child module.
old_linear: torch.nn.Module
Module corresponding to the old linear layer.
new_module: torch.nn.Module
New module made of the old linear plus the new parameters.
"""

device = old_linear.weight.device
setattr(parent_module, name, new_module)

new_module.weight = old_linear.weight
if hasattr(old_linear, "bias") and old_linear.bias is not None:
new_module.bias = old_linear.bias

new_module.to(device)


def get_submodules(model: torch.nn.Module, name: str):
"""Get the parent module, the target name as well as the target module
given a torch.nn.Module and a name (obtained from .named_modules()). We use
this function to get the parent node of a given module that we want to
replace with something else (e.g. an adapter).

Arguments
---------
model: torch.nn.Module
The base PyTorch model.
name: str
Name of the child module to look for in the model.
"""
parent_module = model.get_submodule(".".join(name.split(".")[:-1]))
target_name = name.split(".")[-1]
target_module = model.get_submodule(name)
return parent_module, target_name, target_module