Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the "epsilon" parameter of the MultiSimilarityMiner working as expected? #691

Open
NateThom opened this issue Mar 26, 2024 · 1 comment

Comments

@NateThom
Copy link

NateThom commented Mar 26, 2024

I apologize in advance if this is a nonsensical issue. I fear that it might be as nobody else seems to have an issue with it. Regardless, I'll ask just in case.

I am working with the MultiSimilarityMiner to select hard negative pairs for Contrastive Learning. When attempting to tune hyperparameters of the model I noted that aggressive changes to the epsilon did not reduce the number of hard pairs to always be none. The documentation states the following about the epsilon parameter: "Positive pairs are chosen if they have similarity less than the hardest negative pair, plus this margin (epsilon). " For me, this reads to say that increases to the epsilon parameter will result in pairs which are an "epsilon" less similar than the hardest negative pair. The documentation also states that the default distance function for MultiSimilarityMiner is CosineSimilarity.

I debugged the issue into pytorch-metric-learning source and found the code block which is responsible for identifying the the hard mined samples in "multi-similarity-miner.py", lines 43-56.

if self.distance.is_inverted:
    hard_pos_idx = torch.where(
        pos_sorted - self.epsilon < neg_sorted[:, -1].unsqueeze(1)
    )
    hard_neg_idx = torch.where(
        neg_sorted + self.epsilon > pos_sorted[:, 0].unsqueeze(1)
    )
else:
    hard_pos_idx = torch.where(
        pos_sorted + self.epsilon > neg_sorted[:, 0].unsqueeze(1)
    )
    hard_neg_idx = torch.where(
        neg_sorted - self.epsilon < pos_sorted[:, -1].unsqueeze(1)
    )

According to my interpretation of the docs (increases to epsilon result in harder mined pairs), the expected condition would be in the else portion of this if-else statement.

"self.distance" is set by default to CosineSimilarity, which inherits from the BaseDistance class. The CosineSimilarity object is required have the is_inverted flag set to True (which makes sense based on the operation of class member functions like "smallest_dist". When using CosineSimilarity as the distance metric we will always drop into the first condition of this if-else statement. In english, the interpretation of the torch.where call is "select any positive sample which is less similar than the least similar negative sample, minus epsilon". This is causing increases to the epsilon parameter to select easier samples rather than harder samples.

I don't have a dog in the fight about whether or not epsilon should be making the mined samples more or less hard. I will also happily admit defeat if I have done this analysis wrong. Just trying to be helpful! Massively grateful for all of the hard work that has been done to make this awesome library. Cheers!

Epsilon set to 10
Screenshot 2024-03-25 at 8 58 30 PM

The first value in this row of pos_sorted is a candidate value (not set to infinity)
Screenshot 2024-03-25 at 8 59 03 PM

After subtracting epsilon, the value becomes less similar than any negative sample could possibly be (cosine similarity produces values between -1 and 1).
Screenshot 2024-03-25 at 9 00 51 PM

The cosine similarity of the corresponding negative sample
Screenshot 2024-03-25 at 9 07 12 PM

The operation results in this sample being selected as a hard positive, despite an epsilon of 10. I believe the correct functionality would be for this hard positive to be filtered out.
Screenshot 2024-03-25 at 9 01 45 PM

@KevinMusgrave
Copy link
Owner

KevinMusgrave commented Apr 1, 2024

Thanks for bringing this up. This may be an issue with the wording in the documentation.

Here are the equations from the original paper.

Negative pairs:
image

Positive pairs:
image

In the above equations, "S" is cosine similarity.

So increasing epsilon should return more positive pairs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants