Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of FAN_tiny on ImageNet1K #15

Open
bfshi opened this issue Oct 16, 2022 · 5 comments
Open

Performance of FAN_tiny on ImageNet1K #15

bfshi opened this issue Oct 16, 2022 · 5 comments

Comments

@bfshi
Copy link

bfshi commented Oct 16, 2022

Hi, congratulations on the cool work!

One question about the code: when I train fan_tiny_12_p16_224 on IN1K, I got the clean accuracy of 77.454, lower than the reported 79.2. I followed all the hyperparameters setting as in the README, except that I train the model on 4 gpus, each with batch-size of 200. Will that severely affect the performance? Or is there any other possible reason? Thanks!

@zhoudaquan
Copy link
Contributor

Hi,

Thanks for your interests in the work! Based on the previous experiments experience, the tiny model needs to be train on a large batch size (e.g. 1024) with 300 epochs.

In your case, it is probably that the network is not converged yet. You can do a sanity check on this by observing the training loss and the validation loss, seeing that they are still decreasing indicates my point.

If that's the case, you can simply increase the number of epochs to compensate the small batch size's impacts.

I hope this can help your experiments a little bit.

regards,
DQ

@bfshi
Copy link
Author

bfshi commented Oct 21, 2022

Hi,

Thanks for the response! I used 4 gpus and batch_size_per_gpu=200, that's total batch size of 800, which is not far from 1024 you used, so I think this shouldn't be the problem. I also double checked if the model has converged, and it seemed that the loss barely changed in the last 30 epochs so I guess it converged. I haven't found the reason yet, but I will try training a larger model to see if the problem still exists.

Thanks!

@bfshi
Copy link
Author

bfshi commented Nov 2, 2022

Hi! I've tried training FAN-S and I can reproduce the results in the paper. However, when I train FAN-L, I found that the validation accuracy reaches a peak of ~83.5 at around epoch 200, and then falls back to ~82.3 after all 300 epochs. Is this supposed to happen? I trained on 150 batch_size_per_gpu * 8 gpus. All other configurations follow the ones in the repo. Thanks!

@zhoudaquan
Copy link
Contributor

Hi! I've tried training FAN-S and I can reproduce the results in the paper. However, when I train FAN-L, I found that the validation accuracy reaches a peak of ~83.5 at around epoch 200, and then falls back to ~82.3 after all 300 epochs. Is this supposed to happen? I trained on 150 batch_size_per_gpu * 8 gpus. All other configurations follow the ones in the repo. Thanks!

Hi, based on my previous experience, this typically indicates an overfitting. you can try to increase the value for the droppath.

Regards,
DQ

@bfshi
Copy link
Author

bfshi commented Nov 3, 2022

Thanks for the suggestion! I will try that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants