Different architecture of the provided checkpoints #127

aamir-mustafa-yoti · 2023-12-09T03:02:18Z

Hi,
Thanks for the great work. I have noticed that the EfficientNetv2S checkpoints provided do not have the exact same last few layers as the code---> output_layer == "F":

The last few layers of the provided checkpoint are:


 F_flatten (Flatten)            (None, 25088)        0           ['dropout[0][0]']                
                                                                                                  
 F_dense (Dense)                (None, 512)          12845056    ['F_flatten[0][0]']              
                                                                                                  
 pre_embedding (BatchNormalizat  (None, 512)         2048        ['F_dense[0][0]']                
 ion)                                                                                             
                                                                                                  
 embedding (Activation)         (None, 512)          0           ['pre_embedding[0][0]']

Whereas, the output_layer = F gives the following:


 F_dense (Dense)                (None, 512)          12845056    ['F_flatten[0][0]']              
                                                                                                  
 reshape (Reshape)              (None, 1, 1, 512)    0           ['F_dense[0][0]']                
                                                                                                  
 pre_embedding (BatchNormalizat  (None, 1, 1, 512)   2048        ['reshape[0][0]']                
 ion)                                                                                             
                                                                                                  
 flatten (Flatten)              (None, 512)          0           ['pre_embedding[0][0]']          
                                                                                                  
 embedding (Activation)         (None, 512)          0           ['flatten[0][0]']

I understand that this should not make any difference to the model, but is there a particular reason for doing this?

Thanks

The text was updated successfully, but these errors were encountered:

leondgarse · 2023-12-09T04:18:57Z

Ya, it makes no difference , and the change is introduced from fix BatchNormalization for QAT, fixing #115.

aamir-mustafa-yoti · 2023-12-10T14:25:32Z

Thanks for clarification.

Another question: For how many epochs are the pre-trained 'latest_models' for EfficientNet v2S trained for?

leondgarse · 2023-12-11T03:07:08Z

It's 67 epochs, 50 epochs from scratch, and go on trained for another 17 epochs. EffcientNetV2S swish drop_conn 0.2 dropout 0.2 using SGD + L2 regularizer + cosine lr decay + randaug training on MS1MV3 dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different architecture of the provided checkpoints #127

Different architecture of the provided checkpoints #127

aamir-mustafa-yoti commented Dec 9, 2023

leondgarse commented Dec 9, 2023

aamir-mustafa-yoti commented Dec 10, 2023

leondgarse commented Dec 11, 2023

Different architecture of the provided checkpoints #127

Different architecture of the provided checkpoints #127

Comments

aamir-mustafa-yoti commented Dec 9, 2023

leondgarse commented Dec 9, 2023

aamir-mustafa-yoti commented Dec 10, 2023

leondgarse commented Dec 11, 2023