Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different architecture of the provided checkpoints #127

Open
aamir-mustafa-yoti opened this issue Dec 9, 2023 · 3 comments
Open

Different architecture of the provided checkpoints #127

aamir-mustafa-yoti opened this issue Dec 9, 2023 · 3 comments

Comments

@aamir-mustafa-yoti
Copy link

Hi,
Thanks for the great work. I have noticed that the EfficientNetv2S checkpoints provided do not have the exact same last few layers as the code---> output_layer == "F":

The last few layers of the provided checkpoint are:


 F_flatten (Flatten)            (None, 25088)        0           ['dropout[0][0]']                
                                                                                                  
 F_dense (Dense)                (None, 512)          12845056    ['F_flatten[0][0]']              
                                                                                                  
 pre_embedding (BatchNormalizat  (None, 512)         2048        ['F_dense[0][0]']                
 ion)                                                                                             
                                                                                                  
 embedding (Activation)         (None, 512)          0           ['pre_embedding[0][0]']    

Whereas, the output_layer = F gives the following:


 F_dense (Dense)                (None, 512)          12845056    ['F_flatten[0][0]']              
                                                                                                  
 reshape (Reshape)              (None, 1, 1, 512)    0           ['F_dense[0][0]']                
                                                                                                  
 pre_embedding (BatchNormalizat  (None, 1, 1, 512)   2048        ['reshape[0][0]']                
 ion)                                                                                             
                                                                                                  
 flatten (Flatten)              (None, 512)          0           ['pre_embedding[0][0]']          
                                                                                                  
 embedding (Activation)         (None, 512)          0           ['flatten[0][0]']   

I understand that this should not make any difference to the model, but is there a particular reason for doing this?

Thanks

@leondgarse
Copy link
Owner

Ya, it makes no difference , and the change is introduced from fix BatchNormalization for QAT, fixing #115.

@aamir-mustafa-yoti
Copy link
Author

Thanks for clarification.

Another question: For how many epochs are the pre-trained 'latest_models' for EfficientNet v2S trained for?

@leondgarse
Copy link
Owner

It's 67 epochs, 50 epochs from scratch, and go on trained for another 17 epochs. EffcientNetV2S swish drop_conn 0.2 dropout 0.2 using SGD + L2 regularizer + cosine lr decay + randaug training on MS1MV3 dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants