Mis-alignment between Sentence Embeddings and Classifier in multi-label classification ? #500

ycouble · 2024-03-11T10:07:04Z

Hello,

(Cross posting this between SetFit and sentence-transformers)

We're investigating the possibility to use SetFit for customer service message classification.

Our case is a multi-label case since often the customers have more than one request in each message.
During the training phase of SetFit, the texts and labels are passed to Sentence Transformers' SentenceLabelDataset.
The contrastive examples are created based on the combination of labels, not on the intersection of labels, e.g. Labels [1, 1, 0] and [1, 0, 0] are going to be separated by contrastive learning, and only pairs of [1, 1, 0] will be gathered by the contrastive learning phase.

This can be somewhat counter productive in SetFit since with, for example, the classifier "one-vs-rest" which would require examples with one common label to be close to each other.

We were wondering if that behaviour was deliberatelly chosen this way and why ? Would you have experience dealing with this type of data and used a workaround ? Would you be interested in a contribution to allow this type of use-case ?

Cheers,

ycouble mentioned this issue Mar 11, 2024

Contrastive Learning on multi-label datasets UKPLab/sentence-transformers#2537

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mis-alignment between Sentence Embeddings and Classifier in multi-label classification ? #500

Mis-alignment between Sentence Embeddings and Classifier in multi-label classification ? #500

ycouble commented Mar 11, 2024

Mis-alignment between Sentence Embeddings and Classifier in multi-label classification ? #500

Mis-alignment between Sentence Embeddings and Classifier in multi-label classification ? #500

Comments

ycouble commented Mar 11, 2024