HAN的attention里为什么加reduce_sum和reduce_max？ #139

fengdoudou1895 · 2020-04-21T12:43:31Z

在HAN的attention里面看到：
attetion_logits = tf.reduce_sum(hidden_state_context_similarity,axis = 2) attention_logits_max = tf.reduce_max(attention_logits, axis = 1,keep_dims = True) p_attention = tf.nn.softmax(attetion_logits-attention_logits_max)
原论文里没看到这个操作，请问这是为什么呢？

The text was updated successfully, but these errors were encountered:

acadTags · 2020-05-20T12:16:51Z

This is because of the fact that softmax is shift-invariant by a constant offset in the input.

softmax is invariant under translation by the same value in each coordinate
See wikipedia and a StackOverflow answer.

Deducing the maximum value in the attention_logits allows a faster and more stable numerical computation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HAN的attention里为什么加reduce_sum和reduce_max？ #139

HAN的attention里为什么加reduce_sum和reduce_max？ #139

fengdoudou1895 commented Apr 21, 2020

acadTags commented May 20, 2020 •

edited

HAN的attention里为什么加reduce_sum和reduce_max？ #139

HAN的attention里为什么加reduce_sum和reduce_max？ #139

Comments

fengdoudou1895 commented Apr 21, 2020

acadTags commented May 20, 2020 • edited

acadTags commented May 20, 2020 •

edited