Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to supply PDF files to VertexAI object despite feature available for native Vertex AI library #215

Open
5 tasks done
aaronepinto-bell opened this issue May 8, 2024 · 7 comments

Comments

@aaronepinto-bell
Copy link

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

Taken from the following discussion

Description
I am trying to pass a PDF document to a gemini-1.5-pro in multimodal mode, following a process similar to the one explained here. The documentation illustrates how to pass an image and query Gemini Pro Vision, but I want to pass a PDF directly instead.

Here is my attempt:

from langchain_core.messages import HumanMessage
from langchain_google_vertexai import ChatVertexAI
import base64

file_location = "/path/to/my/document.pdf"

# Initialize the LangChain LLM
llm = ChatVertexAI(model_name="gemini-1.5-pro-preview-0409")

# Open and read the PDF file
with open(file_location, "rb") as pdf_file:
    pdf_bytes = pdf_file.read()

# Create a message containing the Base64-encoded PDF
pdf_message = {
    "type": "image_url",  # Assuming the LLM accepts PDF under this key, you might need to verify this
    "image_url": {
        "url": f"data:application/pdf;base64,{base64.b64encode(pdf_bytes).decode('utf-8')}"
    },
}

# Create a text message asking what the PDF contains
text_message = {
    "type": "text",
    "text": "What does this PDF contain?",
}

# Combine the messages into a HumanMessage object
message = HumanMessage(content=[text_message, pdf_message])

# Send the message to the LLM and print the response
output = llm.invoke([message])
print(output.content)
Unfortunately this code fails.
However, if I use the official Vertex AI library, I am able to do it. Here is part of my code:

from vertexai.generative_models import GenerativeModel, Part

model = GenerativeModel("gemini-1.5-pro-preview-0409")

prompt = "please summarise the provided pdf document"

pdf_file_uri = "gs://my_bucket/my_document.pdf"
pdf_file = Part.from_uri(pdf_file_uri, mime_type="application/pdf")
contents = [pdf_file, prompt]

response = model.generate_content(contents)
print(response.text)

This approach works, but I was hoping to make the LangChain method function similarly.

System Info
System Information
OS: Darwin
OS Version: Darwin Kernel Version 23.4.0: Fri Mar 15 00:11:05 PDT 2024; root:xnu-10063.101.17~1/RELEASE_X86_64
Python Version: 3.11.5 (main, Sep 11 2023, 08:19:27) [Clang 14.0.6 ]

Package Information

langchain_core: 0.1.40
langchain: 0.1.14
langchain_community: 0.0.31
langsmith: 0.1.40
langchain_google_genai: 0.0.5
langchain_google_vertexai: 0.1.2
langchain_openai: 0.1.1
langchain_text_splitters: 0.0.1

Packages not installed (Not Necessarily a Problem)
The following packages were not found:

langgraph
langserve
111

### Error Message and Stack Trace (if applicable)

_No response_

### Description

I am trying to pass a PDF document to a gemini-1.5-pro in multimodal mode, following a process similar to the one explained [here](https://python.langchain.com/docs/integrations/llms/google_vertex_ai_palm/#multimodality). The documentation illustrates how to pass an image and query Gemini Pro Vision, but I want to pass a PDF directly instead.

This native Vertex AI library feature is not implemented within LangChain. I am happy to contribute and implement; would appreciate some pointers on where to start as a new contributer.

### System Info

System Info
System Information
OS: Darwin
OS Version: Darwin Kernel Version 23.4.0: Fri Mar 15 00:11:05 PDT 2024; root:xnu-10063.101.17~1/RELEASE_X86_64
Python Version: 3.11.5 (main, Sep 11 2023, 08:19:27) [Clang 14.0.6 ]
@efriis efriis transferred this issue from langchain-ai/langchain May 9, 2024
@lkuligin
Copy link
Collaborator

@aaronepinto-bell You can just pass a GCS uri with LangChain, it works well for me.

just construct it as:

pdf_message = {
    "type": "image_url",  # Assuming the LLM accepts PDF under this key, you might need to verify this
    "image_url": {
        "url": "gs://my_bucket/my_document.pdf"
    },
}

@lkuligin
Copy link
Collaborator

@Adi8885 please, add it to our LC documentation

@jamesev15
Copy link

@lkuligin I've tried that using the recent langchain-core version == 0.1.52 but it doesn't work

from langchain_google_vertexai import VertexAI
from langchain_google_vertexai import HarmBlockThreshold, HarmCategory
from langchain_core.messages import HumanMessage

safety_settings = {
    HarmCategory.HARM_CATEGORY_UNSPECIFIED: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
}

model = VertexAI(model_name="gemini-1.5-pro-preview-0514", project="project_id")


pdf_message = {
    "type": "image_url",
    "image_url": {
        "url": "gs://....pdf"
    },
}
text_message = {
    "type": "text",
    "text": "Summarize the provided document.",
}
message = HumanMessage(content=[text_message, pdf_message])

output = model.invoke([message])

Answer

Please provide me with the content of the PDF file located at the URL you provided: 'gs:...pdf'. I need the text content of the document to summarize it for you. 

Once I have the content, I can analyze it and provide you with a concise and informative summary. 

@aaronepinto-bell
Copy link
Author

@lkuligin @Adi8885 Hey folks, confirming the same as above, it is unable to register the document

@Kashi-Datum
Copy link

Any updates on this?

@lkuligin
Copy link
Collaborator

please, update the version of langchain-google-vertexai

@jamesev15
Copy link

@lkuligin the version is still 1.0.4 in pypi. I've seen the fix, it is the addition of the option "media" but that change is not available yet. I've seen the commit from @wafle in langchain-google-genai release . Please correct me if I'm wrong

new way to use

with open("file.pdf", "rb") as f:
    pdf = base64.b64encode(f.read()).decode("utf-8")

content = [{
    "type": "text",
    "text": "prompt here",
},
{
    "type": "media",
    "mime_type": "application/pdf",
    "data": pdf
}]
message = [HumanMessage(content=content)]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants