Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about your code #16

Open
Huzhen757 opened this issue Nov 4, 2021 · 1 comment
Open

Some questions about your code #16

Huzhen757 opened this issue Nov 4, 2021 · 1 comment

Comments

@Huzhen757
Copy link

Huzhen757 commented Nov 4, 2021

您好,我对您关于使用卷积来实现Self-Attention,并因此来替代CNN backbone中的bottleneck这种设计非常感兴趣,但是关于Contextual Transformer block我有一些问题,想向您请教:

  1. 基于contextual attention matrix w来attention所有的values map V得到attented feature map,为什么要是用LocalConvolution,而不是直接的矩阵乘法,这样设计的原因是什么呢,并且将contextual attention matrix进行reshape分组之后再与value map进行LocalConvolution, 这个LocalConvolution 具体是怎么实现的呢?:reshape - LocalConvolution

  2. 代码中在static key与contextual dynamic key进行fusion之后,为什么又进行了一个类似Self-Attention的操作呢?这个设计的目标又是什么呢?好像在论文中并没有提关于这里的细节:
    image

  3. 最后,我想问的是,模型的前向传播过程中没有出现任何关于position encoding或者position bias的设计,是因为采取了卷积操作替代了之前的Self-Attention机制,由于卷积的捕捉local-range信息的能力,就不再需要position encoding来提供位置信息了,是这个原因吗还是其他的什么原因呢?

希望可以得到您的回复,谢谢!

@beauty-snowman
Copy link

我想知道用卷积构建self-attention训练小规模数据集会不会像transformer一样造成负提升

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants