AWS bedrock Cohere embedding - got error "expected maxLength: 2048" #3942

kerlion · 2024-04-28T07:58:30Z

Self Checks

I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
Pleas do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing?

Can our embedding model has an property maxLength, similar as context_size. Then split text into chunks by maxLength.

bedrock Cohere embedding: Its "context_size" is 512.
It says: 1 token is about 4 characters, so its limication is 512 tokens.
But it is not true, in fact it can deal with 1024 tokens. Its hard limitation is 2048 characters.
We got the senario: I have 2500 characters, while has only 300 tokens.

expected maxLength: 2048, actual: 2459

2. Describe the feature you'd like to see

Model configuration, choose one property: {maxLength|context_size}, or add a unit {tokens|characters}
Model configuration, enable alternative property: {maxLength|context_size}, or add a property unit {tokens|characters}

3. How will this feature improve your workflow or experience?

to fix cohere error: expected maxLength: 2048

4. Additional context or comments

No response

5. Can you help us with this feature?

I am interested in contributing to this feature.

kerlion · 2024-05-07T08:14:05Z

I discovered that it is due to the existence of massive semanticless characters, such as consecutive periods or dashes. Is there any way to use Python to remove these meaningless characters? Or is there a way to invoke a pre-configured LLM within the embedding module to remove these meaningless characters? So the max length of text will less than 2048 chars.

dosubot bot added the 👻 feat:rag Embedding related issue, like qdrant, weaviate, milvus, vector database. label Apr 28, 2024

kerlion linked a pull request May 10, 2024 that will close this issue

Bedrock cohere fix max length 2048 #4253

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS bedrock Cohere embedding - got error "expected maxLength: 2048" #3942

AWS bedrock Cohere embedding - got error "expected maxLength: 2048" #3942

kerlion commented Apr 28, 2024

kerlion commented May 7, 2024

AWS bedrock Cohere embedding - got error "expected maxLength: 2048" #3942

AWS bedrock Cohere embedding - got error "expected maxLength: 2048" #3942

Comments

kerlion commented Apr 28, 2024

Self Checks

1. Is this request related to a challenge you're experiencing?

2. Describe the feature you'd like to see

3. How will this feature improve your workflow or experience?

4. Additional context or comments

5. Can you help us with this feature?

kerlion commented May 7, 2024