Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the map_reduce mode load_summarize_chain in version 0.1.16 of langchain, I occasionally ran into situations where output_text was empty. #21068

Closed
5 tasks done
D-siheng opened this issue Apr 30, 2024 · 0 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@D-siheng
Copy link

D-siheng commented Apr 30, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

llm = VLLMOpenAI(
    max_tokens=10000,
    temperature=0.7,
    openai_api_key="EMPTY",
    openai_api_base="http://xx.xx.xx.xx:8080/v1",  # xx indicates my IP address, which I cannot disclose due to privacy concerns
    model_name="/data/models/Qwen1.5-72B-Chat/"
)

blobpath = "https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken"
cache_key = hashlib.sha1(blobpath.encode()).hexdigest()
tiktoken_cache_dir = "/app/api"
os.environ["TIKTOKEN_CACHE_DIR"] = tiktoken_cache_dir
assert os.path.exists(os.path.join(tiktoken_cache_dir, cache_key))

if not pdfs_folder:
    return self.create_text_message('Please input pdfs_folder')
def summarize_pdfs_from_folder(pdfs_folder):
    summaries = []
    for pdf_file in glob.glob(pdfs_folder + "/*.pdf"):
        loader = PyPDFLoader(pdf_file)
        docs = loader.load_and_split()
        prompt_template = """Write a concise summary of the following:

        {text}

        CONCISE SUMMARY IN CHINESE:"""
        PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
        chain = load_summarize_chain(llm, chain_type="map_reduce", return_intermediate_steps=False, map_prompt=PROMPT, combine_prompt=PROMPT)
        summary = chain.run(docs)
        summaries.append(summary)
return summaries
summaries = summarize_pdfs_from_folder("/home/user/mytest")

Error Message and Stack Trace (if applicable)

No response

Description

When I use the map_reduce mode of load_summarize_chain in version 0.1.16 of langchain, I encounter output_text being occasionally empty. ① I am a summary of pdf documents, and each pdf document is divided by page.
So I printed intermediate_steps, and I found that intermediate_steps has a summary of the parts that are divided, but not all of them. For example, a pdf has ten pages, while intermediate_steps has only a summary of one page, sometimes six or ten.
③ And I have made it clear IN my prompt that CONSIE IN CHINESE, but this will be the case in every summary of intermediate_steps, a Chinese abstract +CONCISE IN ENGLISH: English abstract.

System Info

System Information

OS: Ubuntu 22.04
Python Version: 3.12.3

Package Information

langchain_core: 0.1.46
langchain: 0.1.16
langchain_community: 0.0.34
langsmith: 0.1.52
langchain_text_splitters: 0.0.1

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph
langserve

@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Apr 30, 2024
@D-siheng D-siheng changed the title Use the map_reduce mode of load_summarize_chain in version 0.1.16 of langchain, I encounter output_text being occasionally empty. Using the map_reduce mode with prompt load_summarize_chain in version 0.1.16 of langchain, I occasionally ran into situations where output_text was empty. May 8, 2024
@D-siheng D-siheng closed this as completed May 8, 2024
@D-siheng D-siheng changed the title Using the map_reduce mode with prompt load_summarize_chain in version 0.1.16 of langchain, I occasionally ran into situations where output_text was empty. Using the map_reduce mode load_summarize_chain in version 0.1.16 of langchain, I occasionally ran into situations where output_text was empty. May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant