Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

知识匹配分数阈值应该怎么设置? #3895

Closed
blackwhites opened this issue Apr 26, 2024 · 2 comments
Closed

知识匹配分数阈值应该怎么设置? #3895

blackwhites opened this issue Apr 26, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@blackwhites
Copy link

我生成好了知识库 用了bge-large-zh 这个向量数据库,我的Temperature设置为0.1当我设置了阈值0.7招回不了数据,选择0.4就招回啦数据,这个阈值是怎么设置的?

@blackwhites blackwhites added the enhancement New feature or request label Apr 26, 2024
Copy link

dosubot bot commented Apr 26, 2024

在使用bge-large-zh向量数据库和设置Temperature参数为0.1时,知识匹配分数阈值的设置主要依赖于你的具体需求。阈值的选择影响着招回数据的严格程度:较低的阈值(如0.4)会招回更多数据,但可能包含相关性较低的结果;较高的阈值(如0.7)则会减少招回的数据量,但提高了数据的相关性。

bge-large-zh向量数据库中,知识匹配分数是通过FAISS库计算的,使用的是内积距离策略和L2规范化。这意味着,向量嵌入在计算相似度分数前会进行L2规范化处理,而分数本身是基于向量间的内积计算的。这些分数用于过滤和选择匹配的知识条目。

对于阈值的具体设置,没有一成不变的规则。建议从一个中等阈值开始实验,然后根据招回的数据质量逐步调整。例如,如果你发现阈值为0.7时招回的数据太少或没有数据,那么可以尝试降低阈值,如调整到0.4,以观察是否能够招回更多相关数据。重要的是要找到一个平衡点,既能够招回足够的数据,又能保证数据的相关性。

希望这能帮助到你!如果还有其他问题,随时欢迎提问。

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR self-assigned this May 2, 2024
@zRzRzRzRzRzRzR
Copy link
Collaborator

Temperature 根数据库没有关系啊?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants