-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the loss #1
Comments
The fluctuation in the loss is probably caused by corresponding variation in the learning rate. It's using a cosine annealing learning rate schedule, in which the learning rate is lowered from its starting point to zero over the course of every epoch, and then reset at the start of the next one. This has been found to work in certain contexts, because it allows convergence to local minima within an epoch, but allows the optimizer to escape local minima by periodically resetting the learning rate. Personally I've had more success with just straight SGD with the learning rate being exponentially lowered every epoch. |
Hello!Thanks for your code! Have you observed the loss? I had loaded the code and executed it. However, the loss didn't seems to be convergent. It descends rapidly at first, but with the increase of the epoch, it looks like a cosine function and the amplitude increases too.
The text was updated successfully, but these errors were encountered: