Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triplet loss training #118

Open
SaadSallam7 opened this issue Jul 10, 2023 · 3 comments
Open

Triplet loss training #118

SaadSallam7 opened this issue Jul 10, 2023 · 3 comments

Comments

@SaadSallam7
Copy link

I was trying to train FaceNet on kaggle using TPU but I had some problem and I noticed that you have train with it before and have good results so can you help me, please? I used batch hard strategy with the code provided here -I compared it with your implementation they gave the same results so there's no problem in the implementation- I'm training with vggface2 dataset where I take 32 image per the person and a batch size of 1024 so the batch will contain 32 different persons each with 32 image. The problem is that there's no improving on the test set, accuracy and threshold are constants at 0.5, 0 even after 10 epochs.
269/269 [==============================] - ETA: 0s - loss: 1.0424

lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.000000
Improved = 0.500000
Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_1_0.500000.h5
Epoch 2/50
269/269 [==============================] - ETA: 0s - loss: 1.0030

lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.500000
Improved = 0.000000
Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_2_0.500000.h5
269/269 [==============================] - 191s 712ms/step - loss: 1.0030
Epoch 3/50
213/269 [======================>.......] - ETA: 5s - loss: 1.0015
lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.500000
Improved = 0.000000
Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_3_0.500000.h5
269/269 [==============================] - 190s 710ms/step - loss: 1.0015
Epoch 4/50
269/269 [==============================] - ETA: 0s - loss: 1.0012

lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.500000
Improved = 0.000000
Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_4_0.500000.h5
269/269 [==============================] - 191s 712ms/step - loss: 1.0012
Epoch 5/50
269/269 [==============================] - ETA: 0s - loss: 1.0011

lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.500000
Improved = 0.000000
Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_5_0.500000.h5
269/269 [==============================] - 192s 715ms/step - loss: 1.0011
Epoch 6/50
269/269 [==============================] - ETA: 0s - loss: 1.0011

lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.500000
Improved = 0.000000
Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_6_0.500000.h5
269/269 [==============================] - 193s 718ms/step - loss: 1.0011
Epoch 7/50
269/269 [==============================] - ETA: 0s - loss: 1.0009

lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.500000
Improved = 0.000000
Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_7_0.500000.h5
269/269 [==============================] - 193s 718ms/step - loss: 1.0009
Epoch 8/50
269/269 [==============================] - ETA: 0s - loss: 1.0008

lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.500000
Improved = 0.000000
Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_8_0.500000.h5
269/269 [==============================] - 192s 717ms/step - loss: 1.0008
Epoch 9/50
269/269 [==============================] - ETA: 0s - loss: 1.0008

lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.500000
Improved = 0.000000
Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_9_0.500000.h5
269/269 [==============================] - 193s 718ms/step - loss: 1.0008
Epoch 10/50
269/269 [==============================] - ETA: 0s - loss: 1.0008

lfw evaluation max accuracy: 0.500000, thresh: 0.000000, previous max accuracy: 0.500000
Improved = 0.000000
Saving model to: /kaggle/working/chekpoints_basic_lfw_epoch_10_0.500000.h5

This is the notebook if you can take a look. Thanks in advance.

@leondgarse
Copy link
Owner

I cannot see your notebook, telling No saved version. Generally, triplet loss should better used after some softmax or arcface training, as in the early stage of training, the model cannot mine a good positive / negative pair. May refer some related issue like MobileFacenet SE Train from scratch #9 or the result ResNet101V2 using nadam and finetuning with triplet.

@SaadSallam7
Copy link
Author

I'm sorry but you can open it now. Ok, I will train it with arcface then triplet loss but to be honest, I don't think this what makes the a problem as the accuracy is 50% indicates that the model isn't really learning it gives always true or always false! Last question please, how are you initializing the dataset for online mining? for me, when I read the dataset I read it sorted so the first 32 example are for one class and the second 32 example are for another class and so on so the batches are fixed while fitting the model but I think in the original paper they were sample batches randomly.

@leondgarse
Copy link
Owner

  • I just took some basic tests in colab Keras_insightface_CASIA.ipynb, the last Test part, using only 4 classes for training. Though the result not good, but at least the loss is dropping, and the lfw accuracy just better than 0.5.
  • For offline mining, the kernel function in dataset is data.py#L445-L446, that takes image_per_class images from some randomly picked classes. It's just making sure each class has some positive samples. But technically, the regular dataset just randomly picking images without this strategy also works. Like if we picked [0, 1, 1, 2, 2, 2] classes, 0 will just use itself as positive one.
  • I think you are using [0, 255] value range for model trainig and evaluating, which maybe not good. Another tiny issue is in eval_callback.__eval_func__, don't need to call normalize again as you already have it normalized.
  • At least the loss should be dropping, and the lfw threshhold value should not be 0. You may check the trained model, like run manually on some images and compare their similarity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants