Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow loss convergence #29

Closed
arielverbin opened this issue Mar 5, 2024 · 3 comments
Closed

Slow loss convergence #29

arielverbin opened this issue Mar 5, 2024 · 3 comments

Comments

@arielverbin
Copy link

arielverbin commented Mar 5, 2024

Hello,
I'm attempting to perform fine tuning with your implementation (I'm using the commit e8e2ad1 from April 24, as I don't need feet key points).
Unfortunately I think the loss might not converge properly. I tried to run the training without fine tuning (from scratch) - in the first 5 epochs it decreased from 0.0168 to 0.0063, but remained stuck at 0.0063 for the next 25 epochs.

Do you have any suggestions for how to solve it?
I've used the same hyper parameters in your code, but changed the layer decay rate from 0.75 to 1-1e-4.

Thank you for your time and assistance!

@JunkyByte
Copy link
Owner

I'm sorry to hear that. Can you try to repeat your experiments with the original implementation I started from and see if there's any difference? https://github.com/jaehyunnn/ViTPose_pytorch

@arielverbin
Copy link
Author

Same problem :( the loss doesn't seem to go below 0.006-0.007.

image

I used the exact code from the repository, except:

  • In config.yaml, changed resume_from to False.
  • In COCO.py, changed np.float to float (It raised an error probably due to a version difference).
  • In COCO.py, I also added conversion to RGB if image.ndim == 2 (as you did in this repository).
  • In train.py, changed data_version="train_custom" / "valid_custom", to "train2017" / "val2017" (so it would match the name of the directories in COCO). Maybe this is the problem? I used COCO dataset without any preprocessing.

I might just be impatient, but in the log files of the official repo, the loss reached 0.003 on the first epoch.

@JunkyByte
Copy link
Owner

I’m sorry, it seems to be a problem with the original project. I will remove the fine tuning part completely from the current state of the repository if it is broken. I would suggest you to use the original vitpose implementation or check if any obvious bug is present in this implementation. Good luck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants