Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mis-alignment between Sentence Embeddings and Classifier in multi-label classification ? #500

Open
ycouble opened this issue Mar 11, 2024 · 0 comments

Comments

@ycouble
Copy link

ycouble commented Mar 11, 2024

Hello,

(Cross posting this between SetFit and sentence-transformers)

We're investigating the possibility to use SetFit for customer service message classification.

Our case is a multi-label case since often the customers have more than one request in each message.
During the training phase of SetFit, the texts and labels are passed to Sentence Transformers' SentenceLabelDataset.
The contrastive examples are created based on the combination of labels, not on the intersection of labels, e.g. Labels [1, 1, 0] and [1, 0, 0] are going to be separated by contrastive learning, and only pairs of [1, 1, 0] will be gathered by the contrastive learning phase.

This can be somewhat counter productive in SetFit since with, for example, the classifier "one-vs-rest" which would require examples with one common label to be close to each other.

We were wondering if that behaviour was deliberatelly chosen this way and why ? Would you have experience dealing with this type of data and used a workaround ? Would you be interested in a contribution to allow this type of use-case ?

Cheers,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant