You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
Pleas do not modify this template :) and fill in all the required fields.
1. Is this request related to a challenge you're experiencing?
Can our embedding model has an property maxLength, similar as context_size. Then split text into chunks by maxLength.
bedrock Cohere embedding: Its "context_size" is 512.
It says: 1 token is about 4 characters, so its limication is 512 tokens.
But it is not true, in fact it can deal with 1024 tokens. Its hard limitation is 2048 characters.
We got the senario: I have 2500 characters, while has only 300 tokens.
expected maxLength: 2048, actual: 2459
2. Describe the feature you'd like to see
Model configuration, choose one property: {maxLength|context_size}, or add a unit {tokens|characters}
Model configuration, enable alternative property: {maxLength|context_size}, or add a property unit {tokens|characters}
3. How will this feature improve your workflow or experience?
to fix cohere error: expected maxLength: 2048
4. Additional context or comments
No response
5. Can you help us with this feature?
I am interested in contributing to this feature.
The text was updated successfully, but these errors were encountered:
I discovered that it is due to the existence of massive semanticless characters, such as consecutive periods or dashes. Is there any way to use Python to remove these meaningless characters? Or is there a way to invoke a pre-configured LLM within the embedding module to remove these meaningless characters? So the max length of text will less than 2048 chars.
Self Checks
1. Is this request related to a challenge you're experiencing?
Can our embedding model has an property maxLength, similar as context_size. Then split text into chunks by maxLength.
bedrock Cohere embedding: Its "context_size" is 512.
It says: 1 token is about 4 characters, so its limication is 512 tokens.
But it is not true, in fact it can deal with 1024 tokens. Its hard limitation is 2048 characters.
We got the senario: I have 2500 characters, while has only 300 tokens.
expected maxLength: 2048, actual: 2459
2. Describe the feature you'd like to see
Model configuration, choose one property: {maxLength|context_size}, or add a unit {tokens|characters}
Model configuration, enable alternative property: {maxLength|context_size}, or add a property unit {tokens|characters}
3. How will this feature improve your workflow or experience?
to fix cohere error: expected maxLength: 2048
4. Additional context or comments
No response
5. Can you help us with this feature?
The text was updated successfully, but these errors were encountered: