About the loss #1

hsx479 · 2019-06-14T05:53:10Z

Hello！Thanks for your code! Have you observed the loss? I had loaded the code and executed it. However, the loss didn't seems to be convergent. It descends rapidly at first, but with the increase of the epoch, it looks like a cosine function and the amplitude increases too.

ljjb · 2019-08-23T05:53:00Z

However, the loss didn't seems to be convergent. It descends rapidly at first, but with the increase of the epoch, it looks like a cosine function and the amplitude increases too.

The fluctuation in the loss is probably caused by corresponding variation in the learning rate. It's using a cosine annealing learning rate schedule, in which the learning rate is lowered from its starting point to zero over the course of every epoch, and then reset at the start of the next one. This has been found to work in certain contexts, because it allows convergence to local minima within an epoch, but allows the optimizer to escape local minima by periodically resetting the learning rate.

Personally I've had more success with just straight SGD with the learning rate being exponentially lowered every epoch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the loss #1

About the loss #1

hsx479 commented Jun 14, 2019

ljjb commented Aug 23, 2019 •

edited

About the loss #1

About the loss #1

Comments

hsx479 commented Jun 14, 2019

ljjb commented Aug 23, 2019 • edited

ljjb commented Aug 23, 2019 •

edited