-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The mlsl loss stays at 0.6 #1
Comments
same problem, in Trihard loss, I used K.gradients to check every layer, and I found this
will cause gradients become Nan, you can try this:
In my case, the model converge normally |
It does work,thanks @kardoszc |
The network initialization has a great influence on MSML, which means an inappropriate initialization may results in NAN. I always trained the model several epochs with softmax loss to initialize the model. @kardoszc gived us a good solution. Really thanks! |
Hi @michuanhaohao, have you had any success training with MSML from scratch (instead of combining MSML with another loss)? If so, I would be curious to know the hyper-parameters that lead to convergence. |
@ergysr It may depend on the datasets. Without another/softmax loss, i successfully trained the model on Market1501, but failed on CUHK03. I thought that CUHK03 has two images for each person IDs and I set K=4 for MSML. So there were repeated images in a batch. |
Hi, @michuanhaohao, may I ask how is your msml performance(mAP score) on Market1501? |
I have the same problem, and it is very confusing. I used the imagenet pre-train weights, and sometimes 1-5 epochs can reach a good result, and sometimes nothing... |
Sorry to bother you, but I finished loaddata function and used mlsl loss func, however, flatten_loss which is mlsl loss was somehow stay at alpha which is set as 0.6 in origin codes.
I think this means positive equals to negative, so that the loss choose to be a larger value which becomes alpha. Is that normal? And the overall loss was not decreasing.
Thanks again!
The text was updated successfully, but these errors were encountered: