Replies: 1 comment
-
Semantra splits into chunks instead of sentences because that way you can ensure each chunk is the same length. From my personal experience, the results are better when all the chunks are the exact same size because sometimes embeddings are affected by size (e.g. embeddings of small sentences match more closely with other small sentences, even if a longer sentence is more relevant). It doesn't really matter that much if a chunk spans several sentences or starts in the middle of a sentence — on average it will still find the relevant parts. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I was under the impression that the embeddings models were trained on sentences? Does it matter?
https://www.sbert.net/docs/quickstart.html#comparing-sentence-similarities
Beta Was this translation helpful? Give feedback.
All reactions