Attention scale value from pretrained model #62

TooTouch · 2021-08-30T07:19:00Z

Hello

Thank you for providing pretrained weights.

The qk_scale was defined by embed_dim ** -0.5 in models/transformer_block.py. But, the attention scale value is (embed_dim // num_heads) ** -0.5 as I know.

@register_model
def t2t_vit_7(pretrained=False, **kwargs): # adopt performer for tokens to token
    if pretrained:
        kwargs.setdefault('qk_scale', 256 ** -0.5)
    model = T2T_ViT(tokens_type='performer', embed_dim=256, depth=7, num_heads=4, mlp_ratio=2., **kwargs)
    model.default_cfg = default_cfgs['T2t_vit_7']
    if pretrained:
        load_pretrained(
            model, num_classes=model.num_classes, in_chans=kwargs.get('in_chans', 3))
    return model

Please check if I'm right or if you have any other intentions.

The text was updated successfully, but these errors were encountered:

TooTouch closed this as completed Aug 30, 2021

TooTouch reopened this Aug 30, 2021

TooTouch closed this as completed Aug 30, 2021

TooTouch reopened this Aug 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attention scale value from pretrained model #62

Attention scale value from pretrained model #62

TooTouch commented Aug 30, 2021 •

edited

Attention scale value from pretrained model #62

Attention scale value from pretrained model #62

Comments

TooTouch commented Aug 30, 2021 • edited

TooTouch commented Aug 30, 2021 •

edited