The mlsl loss stays at 0.6 #1

Hydraz320 · 2018-01-10T06:33:21Z

Sorry to bother you, but I finished loaddata function and used mlsl loss func, however, flatten_loss which is mlsl loss was somehow stay at alpha which is set as 0.6 in origin codes.
I think this means positive equals to negative, so that the loss choose to be a larger value which becomes alpha. Is that normal? And the overall loss was not decreasing.
Thanks again!

kardoszc · 2018-03-18T07:06:39Z

same problem, in Trihard loss, I used K.gradients to check every layer, and I found this

    dis_mat = K.sum(K.square(delta),axis = 2)
    dis_mat = K.sqrt(dis_mat) + 1e-8

will cause gradients become Nan, you can try this:

    dis_mat = K.sum(K.square(delta), axis = 2) + K.epsilon()
    dis_mat = K.sqrt(dis_mat)

In my case, the model converge normally

shen-ee · 2018-04-02T11:51:57Z

It does work,thanks @kardoszc

michuanhaohao · 2018-04-24T05:45:31Z

The network initialization has a great influence on MSML, which means an inappropriate initialization may results in NAN. I always trained the model several epochs with softmax loss to initialize the model.

@kardoszc gived us a good solution. Really thanks!

ergysr · 2018-04-30T18:40:15Z

Hi @michuanhaohao, have you had any success training with MSML from scratch (instead of combining MSML with another loss)? If so, I would be curious to know the hyper-parameters that lead to convergence.

michuanhaohao · 2018-05-01T06:38:15Z

@ergysr It may depend on the datasets. Without another/softmax loss, i successfully trained the model on Market1501, but failed on CUHK03. I thought that CUHK03 has two images for each person IDs and I set K=4 for MSML. So there were repeated images in a batch.

ca-joe-yang · 2018-07-14T17:28:41Z

Hi, @michuanhaohao, may I ask how is your msml performance(mAP score) on Market1501?
I've tried but can't achieve any good result if only training with msml.
I used Resnet v1 50(pre-trained on Imagenet) as backbone model.
Some of my hyper-parameters and implementation details are:
batch_K = 4, batch_P = 18 (each mini-batch contains 18 pid and 4 images per pid)
lr = 1e-3 at first, then exp decay rate=0.1 for every 10000 steps after step 10000
total training steps is 30000
data augment with flip and random crop to size 256x128
model architecture is resnet_v1_50 -> fc1024 -> fc128(with l2_norm) as embedding layer
Thank you.

NobleYd · 2018-08-16T02:27:56Z

I have the same problem, and it is very confusing. I used the imagenet pre-train weights, and sometimes 1-5 epochs can reach a good result, and sometimes nothing...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The mlsl loss stays at 0.6 #1

The mlsl loss stays at 0.6 #1

Hydraz320 commented Jan 10, 2018

kardoszc commented Mar 18, 2018

shen-ee commented Apr 2, 2018

michuanhaohao commented Apr 24, 2018

ergysr commented Apr 30, 2018

michuanhaohao commented May 1, 2018

ca-joe-yang commented Jul 14, 2018

NobleYd commented Aug 16, 2018

The mlsl loss stays at 0.6 #1

The mlsl loss stays at 0.6 #1

Comments

Hydraz320 commented Jan 10, 2018

kardoszc commented Mar 18, 2018

shen-ee commented Apr 2, 2018

michuanhaohao commented Apr 24, 2018

ergysr commented Apr 30, 2018

michuanhaohao commented May 1, 2018

ca-joe-yang commented Jul 14, 2018

NobleYd commented Aug 16, 2018