implement bidirectional lstm #1035

wcshds · 2023-12-01T19:32:55Z

I need bidirectional lstm in CRNN model.

Checklist

Confirmed that run-checks all script has been executed.
Made sure the book is up to date with changes in this PR.

nathanielsimard · 2023-12-04T16:32:02Z

burn-core/src/nn/rnn/lstm.rs

+ if self.bidirectional {
+ (input_gate_bw, forget_gate_bw, output_gate_bw, cell_gate_bw) = (
+ Some(new_gate()),
+ Some(new_gate()),
+ Some(new_gate()),
+ Some(new_gate()),
+ );
+ } else {
+ (input_gate_bw, forget_gate_bw, output_gate_bw, cell_gate_bw) =
+ (None, None, None, None);
+ }


Even if it's a boolean condition, I feel like a match might be easier to read, but using new_gate is a big win.

nathanielsimard · 2023-12-04T16:49:40Z

burn-core/src/nn/rnn/lstm.rs

@@ -110,7 +124,7 @@ impl<B: Backend> Lstm<B> {
 /// Parameters:
 /// batched_input: The input tensor of shape [batch_size, sequence_length, input_size].
 /// state: An optional tuple of tensors representing the initial cell state and hidden state.
- /// Each state tensor has shape [batch_size, hidden_size].
+ /// Each state tensor has shape [num_directions, batch_size, hidden_size].


I feel like it would be better to have a BiDirectionalLstm module instead of one module that does both. The bi-directional module might just have two Lstm modules instead of duplicating all the gates. We could also make it more Rusty by using Kind types:

pub trait LstmKind<B: Backend> { fn forward(...) -> ...; } #[derive(Module)] pub struct UniDirectional{ input_gate: GateController, forget_gate: GateController, output_gate: GateController, cell_gate: GateController, }; #[derive(Module)] pub struct BiDirectional<B: Backend> { forward: Lstm<B, UniDirectional>, backward: Lstm<B, UniFirectional>, } pub struct Lstm<B: Backend, K: LstmKind = UniDirectional> { state: K, } impl<B: Backend, K: LstmKind> Lstm<B, K> { pub fn forward(...) -> ... { self.kind.forward(...) // static dispatch to the right forward pass depending on the kind. } } impl<B: Backend> LstmKind for UniDirectional { fn forward(...) -> ... { // uni directional forward pass } } impl<B: Backend> LstmKind for BiDirectional { fn forward(...) -> ... { // bi directional forward pass } }

This is equivalent to having two different modules for the uni directional and bi directional lstm but with syntax sugar so users can also use Lstm for uni directional or Lstm<B, BiDirectional> when they want both directions. This is the same pattern that we use for the Tensor API.

This would also make it backward compatible in term of code, but not in term of state (Recorder).

@agelas what are your thoughts

@nathanielsimard I agree that this pattern fits better with the overall approach that is usually taken. I think composing LSTM modules like this is a bit more elegant than having various toggles in the forward pass, and we can reuse most of the prior implementation that way.

@nathanielsimard The problem I have is that the forward() for UniDirectional and BiDirectional have different function signatures due to the inconsistency in the shape of the state. This seems not easy to resolve, so it's probably best to keep the LSTM implementation unchanged and let users implement bidirectional LSTM or multi-layer LSTM on their own.

Not sure how often bidirectional LSTM is used, but could we have an implementation totally separated from unidirectional LSTM, and just offer both? Because I do agree that messing with function signatures might get a bit bothersome

The pattern I'm proposing creates two different types, so there are no problems with the type signature. It's just a way for both types to use the LSTM nomenclature. It's probably easier to start with two different types, then we can add the pattern afterward if it makes sense.

@nathanielsimard This pattern is a bit tricky for me, so I think it's best for me to create a new type for bidirectional LSTM. I sincerely hope you can improve it if possible. Thank you very much!

agelas · 2023-12-14T19:25:33Z

burn-core/src/nn/rnn/lstm.rs

- Some((cell_state, hidden_state)) => (cell_state, hidden_state),
+ let [batch_size, seq_length, _] = batched_input.shape().dims;
+ let mut batched_cell_state =
+ Tensor::zeros_device([batch_size, seq_length, self.d_hidden * num_directions], &device);


@nathanielsimard Is this necessary? The implementation is already tied to a device vis-à-vis the backend right?

In my understanding, Module::fork can only fork the tensor in the module struct field to the given device. So, when creating a new Tensor in the forward(), it is necessary to specify the device?

By the way, now I feel that bidirectional LSTM may not be a common requirement for everyone. If someone needs bidirectional LSTM or multi-layer LSTM, it's probably best for them to implement it themselves. Additionally, keeping the implementation of Lstm unchanged is essential to avoid disrupting compatibility.

The device is necessary when creating a new tensor; otherwise, the new tensor will be created on the default device, but not necessarily on the same device as the module.

codecov · 2023-12-20T04:32:41Z

Codecov Report

Attention: Patch coverage is 99.45504% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 86.51%. Comparing base (2f294c5) to head (56f6d7c).
Report is 3 commits behind head on main.

Files	Patch %	Lines
crates/burn-core/src/nn/rnn/lstm.rs	99.45%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1035      +/-   ##
==========================================
+ Coverage   86.41%   86.51%   +0.09%     
==========================================
  Files         696      696              
  Lines       81131    81499     +368     
==========================================
+ Hits        70112    70508     +396     
+ Misses      11019    10991      -28

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nathanielsimard · 2023-12-20T17:46:10Z

burn-core/src/nn/rnn/lstm.rs

+ input_gate: GateController<B>,
+ forget_gate: GateController<B>,
+ output_gate: GateController<B>,
+ cell_gate: GateController<B>,
+ input_gate_reverse: GateController<B>,
+ forget_gate_reverse: GateController<B>,
+ output_gate_reverse: GateController<B>,
+ cell_gate_reverse: GateController<B>,


I guess it wasn't possible to use two Lstm modules here?

wcshds · 2024-01-12T12:08:26Z

I think the current implementation of bidirectional lstm can work during inference, but the implementation cannot update the parameters of the model in backward propagation due to #1098. Using Tensor::cat instead of Tensor::slice_assign can solve this problem, but it seems to have some negative performance impact, so I think it's best to wait for the Tensor::slice_assign bug to be resolved for now.

antimora · 2024-03-26T23:00:55Z

Closing this ticket and linking to this ticket: #1537. So someone else can pick up.

wcshds · 2024-04-16T12:21:40Z

@antimora Could you reopen this pull request? The issue regarding Autodiff has been resolved in #1575, and I think it's time to proceed with implementing the Bidirectional LSTM.

wcshds · 2024-04-16T14:18:40Z

Wgpu test failed, but I don't know why...

agelas · 2024-04-16T19:08:20Z

What was the error in wgpu?

agelas · 2024-04-16T22:07:54Z

Hm ok well on ubuntu-22.04 everything seems fine when I run the tests locally. Given that ubuntu and windows work, maybe it hints at possible compatibility issues or differences in how Metal handles resource management compared to Vulkan (ie Linux) or DirectX (Windows)?

nathanielsimard

I like it! I have a few comments, but nothing major. Would also like @laggui to review this.

nathanielsimard · 2024-04-17T11:59:07Z

crates/burn-core/src/nn/rnn/lstm.rs

+ let new_gate = || {
+ GateController::new(
+ self.d_input,
+ d_output,
+ self.bias,
+ self.initializer.clone(),
+ device,
+ )
+ };


nathanielsimard · 2024-04-17T12:04:57Z

crates/burn-core/src/nn/rnn/lstm.rs

+ /// ## Returns:
+ /// A tuple of tensors, where the first tensor represents the cell states and
+ /// the second tensor represents the hidden states for each sequence element.
+ /// Both output tensors have the shape `[batch_size, sequence_length, hidden_size]`.
 pub fn forward(
 &self,
 batched_input: Tensor<B, 3>,
 state: Option<(Tensor<B, 2>, Tensor<B, 2>)>,


Eventhough that's a breaking change, I think it might be beneficial to create a type for the state.

pub struct LstmState { pub cell: Tensor<B, 2>, pub hidden: Tensor<B, 2>, }

We can remove the optional and implement Default instead. It also gives us a space to document what each element of the state is used for.

We could also do the same for the return type, where it's easy to make mistake (chosing the wrong returned tensor).

What do you think?

I quite like that idea, especially for the return type!

If breaking changes are acceptable, I'd like to follow PyTorch's return values output, (h_n, c_n).

In that case, we can return (output, state). For BiLSTM, output is a concatenated tensor, and the cell state and hidden state can be provided in state to the user without modification.

If you agree to the modifications according to the PyTorch outputs, batched_cell_state is unnecessary. For the state, we only need to return the hidden state and cell state of the last time step.

nathanielsimard · 2024-04-17T12:09:04Z

crates/burn-core/src/nn/rnn/lstm.rs

 let mut batched_cell_state = Tensor::zeros([batch_size, seq_length, self.d_hidden], device);
 let mut batched_hidden_state =
 Tensor::zeros([batch_size, seq_length, self.d_hidden], device);


Not linked to the current change, but is zero necessary or empty would be enough here?

nathanielsimard · 2024-04-17T12:10:33Z

crates/burn-core/src/nn/rnn/lstm.rs

+ input_biases: [f32; D1],
+ hidden_weights: [[f32; D1]; D1],
+ hidden_biases: [f32; D1],
+ device: &<TestBackend as Backend>::Device,


Just a detail, but we can use &burn_tensor::Device<TestBackend> instead of this notation, just a bit prettier, especially when importing burn_tensor::Device.

laggui

Great rewrite to add the BiLstm! I don't see any issues 🙂 Just a minor comment regarding the state as pointed out by Nath.

laggui · 2024-04-17T12:32:59Z

crates/burn-core/src/nn/rnn/lstm.rs

+ /// ## Returns:
+ /// A tuple of tensors, where the first tensor represents the cell states and
+ /// the second tensor represents the hidden states for each sequence element.
+ /// Both output tensors have the shape `[batch_size, sequence_length, hidden_size]`.
 pub fn forward(
 &self,
 batched_input: Tensor<B, 3>,
 state: Option<(Tensor<B, 2>, Tensor<B, 2>)>,


I quite like that idea, especially for the return type!

wcshds · 2024-04-18T04:56:44Z

The outputs of both Lstm and BiLstm are now aligned with PyTorch.

The script I used to generate tests for Bidirectional LSTM can be found here.

The tests for the wgpu backend still failed, possibly due to some data race issues in the wgpu backend?

wcshds · 2024-04-18T10:38:08Z

@nathanielsimard @louisfd I changed the batch size to 2, and the test passed. This seems to be a bug with the Wgpu backend, so I've opened an issue #1656.

nathanielsimard

No more comments from my side, great job! We will investigate the wgpu test problem, but it's quite unreleated.

antimora · 2024-04-26T16:53:10Z

We will merge it, once we have an approval from @laggui

laggui

Good job! 😄 I like the addition of LstmState and the returned output.

Only a few minor changes, and then we can merge 🎉

laggui · 2024-04-26T17:22:21Z

crates/burn-core/src/nn/rnn/lstm.rs

+ /// returns hidden state for each element in a sequence (i.e., across `seq_length`) and a final state,
+ /// producing 3-dimensional tensors where the dimensions represent `[batch_size, sequence_length, hidden_size]`.


Since we document the return types, I think we can simplify the docstring here to:

Applies the forward pass on the input tensor. This LSTM implementation returns the state for each element in a sequence (i.e., across seq_length) and a final state.

(also removed the ambiguity with "returns hidden state", the state contains both the hidden and cell state).

laggui · 2024-04-26T17:23:15Z

crates/burn-core/src/nn/rnn/lstm.rs

+ /// Applies the forward pass on the input tensor. This Bidirectional LSTM implementation
+ /// returns hidden state for each element in a sequence (i.e., across `seq_length`) and a final state,
+ /// producing 3-dimensional tensors where the dimensions represent `[batch_size, sequence_length, hidden_size * 2]`.


Same as with the Lstm forward docstring. We can simplify to:

Applies the forward pass on the input tensor. This Bidirectional LSTM implementation returns the state for each element in a sequence (i.e., across seq_length) and a final state.

laggui · 2024-04-26T17:24:08Z

crates/burn-core/src/nn/rnn/lstm.rs

- .select(0, Tensor::arange(0..1, &device))
- .squeeze(0);
- cell_state
+ // let (cell_state_batch, hidden_state_batch) = lstm.forward(input, None);


Dead comment, we can remove 🙂

laggui

🚀 🚀

nathanielsimard reviewed Dec 4, 2023

View reviewed changes

agelas reviewed Dec 14, 2023

View reviewed changes

nathanielsimard reviewed Dec 20, 2023

View reviewed changes

wcshds closed this Dec 21, 2023

wcshds force-pushed the bidirectional-lstm branch from 3579493 to 1fd07fc Compare December 21, 2023 03:01

resolve conflict

fc49a97

wcshds reopened this Dec 21, 2023

wcshds marked this pull request as draft January 12, 2024 10:32

wcshds and others added 2 commits January 12, 2024 18:36

Merge branch 'tracel-ai:main' into bidirectional-lstm

ece4b9a

move gate_product to GateController

eb92f78

BiLstm needs to use its own initializer when init

2f8debf

antimora added the enhancement Enhance existing features label Jan 31, 2024

antimora added the stale The issue or pr has been open for too long label Feb 24, 2024

antimora mentioned this pull request Mar 26, 2024

Implement bidirectional lstm #1537

Closed

antimora closed this Mar 26, 2024

wcshds and others added 6 commits April 16, 2024 16:58

Merge branch 'main' into bidirectional-lstm

96aeaac

resolve conflicts

0abe2a0

Merge branch 'tracel-ai:main' into bidirectional-lstm

1fc5149

add some comments

102e2a8

improve doc

e09d360

correct the description of GateController

f5ed22c

Merge branch 'tracel-ai:main' into bidirectional-lstm

ce88fb4

antimora reopened this Apr 16, 2024

wcshds and others added 2 commits April 16, 2024 21:41

Merge branch 'tracel-ai:main' into bidirectional-lstm

c103611

fix fmt

847db7f

wcshds marked this pull request as ready for review April 16, 2024 14:17

wcshds requested a review from nathanielsimard April 16, 2024 14:18

antimora requested review from louisfd and agelas April 16, 2024 16:32

github-actions bot removed the stale The issue or pr has been open for too long label Apr 17, 2024

nathanielsimard reviewed Apr 17, 2024

View reviewed changes

laggui reviewed Apr 17, 2024

View reviewed changes

wcshds added 2 commits April 18, 2024 12:21

add LstmState

5a9345e

add test for state

7c0e761

wcshds requested a review from nathanielsimard April 18, 2024 04:45

set batch 2 in bilstm test

bfeb74a

nathanielsimard approved these changes Apr 22, 2024

View reviewed changes

wcshds added 6 commits April 26, 2024 21:27

Merge branch 'main' into bidirectional-lstm

bfd5773

resolve conflict

02e3c56

fix

744fb40

fix doc

9b3e1f9

change the batch size back to 1

b8b5e91

change the batch size back to 1

5d26ea4

antimora requested a review from laggui April 26, 2024 16:50

laggui requested changes Apr 26, 2024

View reviewed changes

modify docstring; delete dead comment

56f6d7c

laggui approved these changes Apr 26, 2024

View reviewed changes

antimora merged commit b387829 into tracel-ai:main Apr 26, 2024
14 checks passed

wcshds deleted the bidirectional-lstm branch April 27, 2024 04:38

		/// returns hidden state for each element in a sequence (i.e., across `seq_length`) and a final state,
		/// producing 3-dimensional tensors where the dimensions represent `[batch_size, sequence_length, hidden_size]`.

implement bidirectional lstm #1035

implement bidirectional lstm #1035

Conversation

wcshds commented Dec 1, 2023 • edited

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wcshds Dec 15, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Dec 20, 2023 • edited

Codecov Report

Choose a reason for hiding this comment

wcshds commented Jan 12, 2024

antimora commented Mar 26, 2024

wcshds commented Apr 16, 2024

wcshds commented Apr 16, 2024

agelas commented Apr 16, 2024

agelas commented Apr 16, 2024

nathanielsimard left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wcshds Apr 17, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

laggui left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wcshds commented Apr 18, 2024 • edited

wcshds commented Apr 18, 2024 • edited

nathanielsimard left a comment

Choose a reason for hiding this comment

antimora commented Apr 26, 2024

laggui left a comment

Choose a reason for hiding this comment

laggui Apr 26, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

laggui left a comment

Choose a reason for hiding this comment

wcshds commented Dec 1, 2023 •

edited

wcshds Dec 15, 2023 •

edited

codecov bot commented Dec 20, 2023 •

edited

wcshds Apr 17, 2024 •

edited

wcshds commented Apr 18, 2024 •

edited

wcshds commented Apr 18, 2024 •

edited

laggui Apr 26, 2024 •

edited