Error when generating summary for long documents: 'ValueError: A single document was longer than the context length, we cannot handle this.' #21284
Labels
🤖:bug
Related to a bug, vulnerability, unexpected error with an existing feature
🔌: huggingface
Primarily related to HuggingFace integrations
Ɑ: text splitters
Related to text splitters package
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
Description
Description
I am attempting to generate summaries for long documents using the Langchain library combined with Llama-3 model, but I encounter a ValueError indicating that "a single document was longer than the context length, we cannot handle this." This issue occurs even after splitting the document into smaller chunks.
Expected Behavior
I expect the summary chain to generate concise summaries for each document chunk without exceeding the token limit.
Actual Behavior
The process results in a ValueError as mentioned above, suggesting that the document chunks still exceed the token limit configured in the summary chain.
Possible Solution
I suspect this might be related to how the RecursiveCharacterTextSplitter handles the tokenization and chunking, but I'm not sure how to adjust it correctly to ensure all chunks are within the acceptable token limit.
Additional Context
I tried reducing the chunk_size and adjusting the chunk_overlap, but these attempts did not resolve the issue. Any guidance on how to ensure that the document chunks conform to the specified token limits would be greatly appreciated.
System Info
Environment
The text was updated successfully, but these errors were encountered: