Question about tensor.view operation in Bi-LSTM(Attention) #38

iamxpy · 2019-09-29T13:46:15Z

nlp-tutorial/4-3.Bi-LSTM(Attention)/Bi-LSTM(Attention)-Torch.py

Line 50 in cb4881e

 hidden = final_state.view(-1, n_hidden * 2, 1) # hidden : [batch_size, n_hidden * num_directions(=2), 1(=n_layer)] 

Hi, this repo is awesome, but there might be something wrong in the code above. According to the comment above, this snippet intends to change a tensor from shape [num_layers(=1) * num_directions(=2), batch_size, n_hidden] to shape [batch_size, n_hidden * num_directions(=2), 1(=n_layer)], i.e. to concatenate the 2 hidden vector from different direction for every data example in a batch(By saying "data example", I mean a batch has batch_size examples). But I think the code above will mess up the data examples in a batch and lead to unexpected result.

For example, we can use IPython to check the effect of the snippet above.

# create a tensor with shape [num_layers(=1) * num_directions(=2), batch_size, n_hidden]                                                                                           
In [10]: a=torch.arange(2*3*5).reshape(2,3,5) 
                                                                       
In [11]: a                                                             
Out[11]:                                                               
tensor([[[ 0,  1,  2,  3,  4],                                         
         [ 5,  6,  7,  8,  9],                                         
         [10, 11, 12, 13, 14]],                                        
                                                                       
        [[15, 16, 17, 18, 19],                                         
         [20, 21, 22, 23, 24],                                         
         [25, 26, 27, 28, 29]]])                                       
                                                                       
In [12]: a.view(-1,10,1)                                               
Out[12]:                                                               
tensor([[[ 0],                                                         
         [ 1],                                                         
         [ 2],                                                         
         [ 3],                                                         
         [ 4],                                                         
         [ 5],                                                         
         [ 6],                                                         
         [ 7],                                                         
         [ 8],                                                         
         [ 9]],                                                        
                                                                       
        [[10],                                                         
         [11],                                                         
         [12],                                                         
         [13],                                                         
         [14],                                                         
         [15],                                                         
         [16],                                                         
         [17],                                                         
         [18],                                                         
         [19]],                                                        
                                                                       
        [[20],                                                         
         [21],                                                         
         [22],                                                         
         [23],                                                         
         [24],                                                         
         [25],                                                         
         [26],                                                         
         [27],                                                         
         [28],                                                         
         [29]]])

As you can see, we create a tensor with batch_size=3 and n_hidden=5, e.g [ 0, 1, 2, 3, 4] and [15, 16, 17, 18, 19] belong to the same data example in the batch, but they are from different directions, so what we want is to concatenate them in the resulting tensor. But what the code really does is to concatenate [ 0, 1, 2, 3, 4] and [ 5, 6, 7, 8, 9], which are from different data examples in a batch.

I think it can be fixed by changing the line of code to hidden=torch.cat(final_state[0],final_state[1]],1).view(-1,10,1)

The effect of the new code can be shown as follows:

In [13]: torch.cat([a[0],a[1]],1).view(-1,10,1)
Out[13]:
tensor([[[ 0],
         [ 1],
         [ 2],
         [ 3],
         [ 4],
         [15],
         [16],
         [17],
         [18],
         [19]],

        [[ 5],
         [ 6],
         [ 7],
         [ 8],
         [ 9],
         [20],
         [21],
         [22],
         [23],
         [24]],

        [[10],
         [11],
         [12],
         [13],
         [14],
         [25],
         [26],
         [27],
         [28],
         [29]]])

The text was updated successfully, but these errors were encountered:

liuxiaoqun · 2021-06-04T06:45:44Z

I think it need to change hidden = final_state.view(batch_size, -1, 1) to hidden = final_state.transpose(0,1).reshape(batch_size,-1,1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about tensor.view operation in Bi-LSTM(Attention) #38

Question about tensor.view operation in Bi-LSTM(Attention) #38

iamxpy commented Sep 29, 2019 •

edited

liuxiaoqun commented Jun 4, 2021

Question about tensor.view operation in Bi-LSTM(Attention) #38

Question about tensor.view operation in Bi-LSTM(Attention) #38

Comments

iamxpy commented Sep 29, 2019 • edited

liuxiaoqun commented Jun 4, 2021

iamxpy commented Sep 29, 2019 •

edited