Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HAN的attention里为什么加reduce_sum和reduce_max? #139

Open
fengdoudou1895 opened this issue Apr 21, 2020 · 1 comment
Open

HAN的attention里为什么加reduce_sum和reduce_max? #139

fengdoudou1895 opened this issue Apr 21, 2020 · 1 comment

Comments

@fengdoudou1895
Copy link

在HAN的attention里面看到:
attetion_logits = tf.reduce_sum(hidden_state_context_similarity,axis = 2) attention_logits_max = tf.reduce_max(attention_logits, axis = 1,keep_dims = True) p_attention = tf.nn.softmax(attetion_logits-attention_logits_max)
原论文里没看到这个操作,请问这是为什么呢?

@acadTags
Copy link
Contributor

acadTags commented May 20, 2020

This is because of the fact that softmax is shift-invariant by a constant offset in the input.

softmax is invariant under translation by the same value in each coordinate
See wikipedia and a StackOverflow answer.

Deducing the maximum value in the attention_logits allows a faster and more stable numerical computation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants