关于spearman评估方法的疑问 #133

hellopahe · 2023-10-18T12:30:20Z

首先, Nice work!

我看评估的代码tests/model_spearman.py中, 在整个数据集上对pred和labels运行了spearmanr(x, y)方法, 数据集中的标签为1-5之间的整数, 而pred的余弦相似度为-1-1之间的连续浮点数.
由于spearman是按照秩次评估相关性的, 数据集中大量重复的label会不会导致大量等次的秩, 影响相关性的评估准确性?

一个想法, 是不是可以吧计算spearman相关性的步骤拆分成若干个小组, 每个小组里放6个label不相同的结果, 再与预测值计算相关度, 这样更能凸显出文本相似度之间的对比关系.

谢谢作者

shibing624 · 2023-10-23T07:46:23Z

评估方法只是参考，用哪种方法都行，公平客观原则满足就可以。最好是用业务数据跑出来看case，选效果好的模型。也可以用cmteb，mteb https://arxiv.org/abs/2210.07316 方法用zero shot的分类f1评估。

stale · 2023-12-27T07:41:23Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.(由于长期不活动，机器人自动关闭此问题，如果需要欢迎提问)

stale bot added the wontfix This will not be worked on label Dec 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于spearman评估方法的疑问 #133

关于spearman评估方法的疑问 #133

hellopahe commented Oct 18, 2023 •

edited

shibing624 commented Oct 23, 2023

stale bot commented Dec 27, 2023

关于spearman评估方法的疑问 #133

关于spearman评估方法的疑问 #133

Comments

hellopahe commented Oct 18, 2023 • edited

shibing624 commented Oct 23, 2023

stale bot commented Dec 27, 2023

hellopahe commented Oct 18, 2023 •

edited