Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output values of output_recurrence #59

Open
helga-lvl opened this issue Apr 1, 2020 · 0 comments
Open

Output values of output_recurrence #59

helga-lvl opened this issue Apr 1, 2020 · 0 comments

Comments

@helga-lvl
Copy link

Hey, I’m trying to understand the outputs of the output_recurrence function in models.py. Right now, I’m not seeing the updates I would expect. I’ve gone through the function step by step and somewhere it just starts multiplying everything with 0 and loses all values. All the fusion parameters seem to be initalized at 0 and I don’t see where they’re updated.

My first question:

weights_const: it’s defined as np.ones but then every time it’s used all the values are multiplied by 0 giving it only an output of zeroes?

My second question:

Output recurrence, I went through it some steps of the theano.scan, will give the results of the third update.

I have:

minibatch_size = 128
n_hidden = 256
x_vocabulary_size = 15695
y_vocabulary_size = 8

Inputs in the theano.scan:

x_t:
context[3]: hidden layer forward and backward concatenated
shape: (50,128,512)

h_tm1:
starts at GRU.h0=np.zeros((128,256) in the first iteration, updates with the GRU.step function
shape: (128,256)

Attention parameters

Wa_h:
d = 6/(256+512)
random between -d and d
shape: (256, 512)

Wa_y:
d = 6/(512+1)
random between -d and d
shape: (512, 1)

Late fusion parameters

Wf_h:
np.zeros((256,256))
shape: (256, 256)

Wf_c:
np.zeros((512,256))
shape: (512,256)

Wf_f:
np.zeros((256,256))
shape: (512,256)

bf:
np.zeros((1,256))
shape: (1,256)

Output model

Wy:
np.zeros((256,8))
shape: (512,256)
by:
np.zeros((256,8))
shape: (256,8)

context:
forward and backward sequences
shape: (50,128,512)

projected_context:
T.dot(context, Wa_c) + ba = (context*random) + 0
shape: (50,128,512)

Inside the function:

h_a:
values between -1 and 1
shape: (50,128,512)

alphas (1):
T.exp(T.dot(h_a, Wa_y)): random values
shape: (50,128)

alphas (2):
alphas.reshaped to only the first two dimensions (extra question, I only have two dimensions, should I have three?)
shape: (50,128)

alphas (3):
Normalized alphas
shape: (50,128)

weighed_context:
(context * alphas[:,:,None]).sum(axis=0), random values
shape: (128,512)

h_t:
GRU.step, the hidden time step that is used in the recurrence, the initial value is h0, updated values have gone through sigmoid and tanh activations.

Late fusion (now I stop understanding)

lfc:
T.dot(weighted_context, Wf_c) -Wf_c is a matrix of zeros so it becomes np.zeros((128,256)).
shape: (128,256)

fw (fusion weights):
T.nnet.sigmoid(T.dot(lfc, Wf_f) + T.dot(h_t, Wf_h) + bf)
Wf_f, lfc, Wf_h and bf are all matrices of zeroes, since it’s taking sigmoid of it I get a matrix of 0.5.
shape: (128,256)

hf_t (weighted fused context + hidden state):
lfc * fw + h_t - lfc is 0, fw is 0.5 and h_t is just the hidden state, so hf_t = h_t in all steps.
Shape: (128,256)

z:
T.dot(hf_t, Wy) + by: Wy is a zero matrix and by a zero vector, becomes np.zeros((128,8))
shape: (128,8)

y_t:
T.nnet.softmax(x)
Softmax of zeroes in every single step, result row_number * ([0.125] * y_vocabulary_size)
Shape: (128,8)

I want to ask, shouldn’t I be able to update this? Are the fusion values all supposed to be zero through the process? Thanks. :)

Disclaimer: I write that values are random although they are being carefully updated if they were initialized as random.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant