Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The mlsl loss stays at 0.6 #1

Open
Hydraz320 opened this issue Jan 10, 2018 · 7 comments
Open

The mlsl loss stays at 0.6 #1

Hydraz320 opened this issue Jan 10, 2018 · 7 comments

Comments

@Hydraz320
Copy link

Sorry to bother you, but I finished loaddata function and used mlsl loss func, however, flatten_loss which is mlsl loss was somehow stay at alpha which is set as 0.6 in origin codes.
I think this means positive equals to negative, so that the loss choose to be a larger value which becomes alpha. Is that normal? And the overall loss was not decreasing.
Thanks again!

@kardoszc
Copy link

same problem, in Trihard loss, I used K.gradients to check every layer, and I found this

    dis_mat = K.sum(K.square(delta),axis = 2)
    dis_mat = K.sqrt(dis_mat) + 1e-8 

will cause gradients become Nan, you can try this:

    dis_mat = K.sum(K.square(delta), axis = 2) + K.epsilon()
    dis_mat = K.sqrt(dis_mat)

In my case, the model converge normally

@shen-ee
Copy link

shen-ee commented Apr 2, 2018

It does work,thanks @kardoszc

@michuanhaohao
Copy link
Owner

The network initialization has a great influence on MSML, which means an inappropriate initialization may results in NAN. I always trained the model several epochs with softmax loss to initialize the model.

@kardoszc gived us a good solution. Really thanks!

@ergysr
Copy link

ergysr commented Apr 30, 2018

Hi @michuanhaohao, have you had any success training with MSML from scratch (instead of combining MSML with another loss)? If so, I would be curious to know the hyper-parameters that lead to convergence.

@michuanhaohao
Copy link
Owner

@ergysr It may depend on the datasets. Without another/softmax loss, i successfully trained the model on Market1501, but failed on CUHK03. I thought that CUHK03 has two images for each person IDs and I set K=4 for MSML. So there were repeated images in a batch.

@ca-joe-yang
Copy link

Hi, @michuanhaohao, may I ask how is your msml performance(mAP score) on Market1501?
I've tried but can't achieve any good result if only training with msml.
I used Resnet v1 50(pre-trained on Imagenet) as backbone model.
Some of my hyper-parameters and implementation details are:
batch_K = 4, batch_P = 18 (each mini-batch contains 18 pid and 4 images per pid)
lr = 1e-3 at first, then exp decay rate=0.1 for every 10000 steps after step 10000
total training steps is 30000
data augment with flip and random crop to size 256x128
model architecture is resnet_v1_50 -> fc1024 -> fc128(with l2_norm) as embedding layer
Thank you.

@NobleYd
Copy link

NobleYd commented Aug 16, 2018

I have the same problem, and it is very confusing. I used the imagenet pre-train weights, and sometimes 1-5 epochs can reach a good result, and sometimes nothing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants