Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the loss #1

Open
hsx479 opened this issue Jun 14, 2019 · 1 comment
Open

About the loss #1

hsx479 opened this issue Jun 14, 2019 · 1 comment

Comments

@hsx479
Copy link

hsx479 commented Jun 14, 2019

Hello!Thanks for your code! Have you observed the loss? I had loaded the code and executed it. However, the loss didn't seems to be convergent. It descends rapidly at first, but with the increase of the epoch, it looks like a cosine function and the amplitude increases too.

@ljjb
Copy link

ljjb commented Aug 23, 2019

However, the loss didn't seems to be convergent. It descends rapidly at first, but with the increase of the epoch, it looks like a cosine function and the amplitude increases too.

The fluctuation in the loss is probably caused by corresponding variation in the learning rate. It's using a cosine annealing learning rate schedule, in which the learning rate is lowered from its starting point to zero over the course of every epoch, and then reset at the start of the next one. This has been found to work in certain contexts, because it allows convergence to local minima within an epoch, but allows the optimizer to escape local minima by periodically resetting the learning rate.

Personally I've had more success with just straight SGD with the learning rate being exponentially lowered every epoch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants