Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficulty reproducing SBIR results #2

Open
jabader97 opened this issue Nov 22, 2022 · 4 comments
Open

Difficulty reproducing SBIR results #2

jabader97 opened this issue Nov 22, 2022 · 4 comments

Comments

@jabader97
Copy link

jabader97 commented Nov 22, 2022

Hello, thank you for your work. I’m having a bit of difficulty using your sbir_baseline to reproduce the Siam.-VGG16 results from your paper when training and evaluating on FS-COCO for the FG-SBIR task—my runs reach closer to 12 (35) for R@1 (R@10), as opposed to 23.3 (52.6) reported. Are there perhaps specific important settings, or additional changes I would need to make to reproduce your results?

Thanks in advance,
Jessie

@pinakinathc
Copy link
Owner

hey Jessie,
before diving deep, can you confirm the batch size? (you can use gradient accumulation to simulate a higher batch size)

@jabader97
Copy link
Author

Thank you for your fast response!

The original results I quoted were with the default parameters in the codebase (bs 16 and accumulate_grad_batches 8, for a total of 128). Your paper mentions that for CLIP you used bs 256, so I also tried doubling the gradient accumulation. It did slightly better, but still topped out around 13 (36).

I did need to put it into 'train mode' by removing load_from_checkpoint and commenting back in the line for trainer.fit, perhaps there was something else I needed to put back in?

@jabader97
Copy link
Author

Hello, thank you again for your work.

I wanted to follow up on my questions before. Specifically, 1.) if there are any other settings or changes I would need to reproduce your work, and 2.) if there is anything to do for putting sbir_baseline/main.py into 'train mode', other than putting back trainer.fit in line 87 and not loading from the checkpoint in line 50.

Additionally, I wanted to check how many epochs you used. In the code base, it is 100000, but I was unable to find it reported in the paper.

@pinakinathc
Copy link
Owner

hi @jabader97 sorry for the late response, was sick for a few days :)

I am adding a checkpoint to help resolve your issue: https://drive.google.com/file/d/1ug_Yemql2PrCh_8YHBeR3r3SxX5wuVGE/view?usp=share_link
Please download the checkpoint, as I might delete it from Google drive after a few weeks.

Also, the 100000 epochs is just to make sure my training doesn't prematurely stops before converging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants