Unable to supply PDF files to VertexAI object despite feature available for native Vertex AI library #215

aaronepinto-bell · 2024-05-08T15:55:37Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

Taken from the following discussion

Description
I am trying to pass a PDF document to a gemini-1.5-pro in multimodal mode, following a process similar to the one explained here. The documentation illustrates how to pass an image and query Gemini Pro Vision, but I want to pass a PDF directly instead.

Here is my attempt:

from langchain_core.messages import HumanMessage
from langchain_google_vertexai import ChatVertexAI
import base64

file_location = "/path/to/my/document.pdf"

# Initialize the LangChain LLM
llm = ChatVertexAI(model_name="gemini-1.5-pro-preview-0409")

# Open and read the PDF file
with open(file_location, "rb") as pdf_file:
    pdf_bytes = pdf_file.read()

# Create a message containing the Base64-encoded PDF
pdf_message = {
    "type": "image_url",  # Assuming the LLM accepts PDF under this key, you might need to verify this
    "image_url": {
        "url": f"data:application/pdf;base64,{base64.b64encode(pdf_bytes).decode('utf-8')}"
    },
}

# Create a text message asking what the PDF contains
text_message = {
    "type": "text",
    "text": "What does this PDF contain?",
}

# Combine the messages into a HumanMessage object
message = HumanMessage(content=[text_message, pdf_message])

# Send the message to the LLM and print the response
output = llm.invoke([message])
print(output.content)

Unfortunately this code fails.
However, if I use the official Vertex AI library, I am able to do it. Here is part of my code:

from vertexai.generative_models import GenerativeModel, Part

model = GenerativeModel("gemini-1.5-pro-preview-0409")

prompt = "please summarise the provided pdf document"

pdf_file_uri = "gs://my_bucket/my_document.pdf"
pdf_file = Part.from_uri(pdf_file_uri, mime_type="application/pdf")
contents = [pdf_file, prompt]

response = model.generate_content(contents)
print(response.text)

This approach works, but I was hoping to make the LangChain method function similarly.

System Info
System Information
OS: Darwin
OS Version: Darwin Kernel Version 23.4.0: Fri Mar 15 00:11:05 PDT 2024; root:xnu-10063.101.17~1/RELEASE_X86_64
Python Version: 3.11.5 (main, Sep 11 2023, 08:19:27) [Clang 14.0.6 ]

Package Information

langchain_core: 0.1.40
langchain: 0.1.14
langchain_community: 0.0.31
langsmith: 0.1.40
langchain_google_genai: 0.0.5
langchain_google_vertexai: 0.1.2
langchain_openai: 0.1.1
langchain_text_splitters: 0.0.1

Packages not installed (Not Necessarily a Problem)
The following packages were not found:

langgraph
langserve
111

### Error Message and Stack Trace (if applicable)

_No response_

### Description

I am trying to pass a PDF document to a gemini-1.5-pro in multimodal mode, following a process similar to the one explained [here](https://python.langchain.com/docs/integrations/llms/google_vertex_ai_palm/#multimodality). The documentation illustrates how to pass an image and query Gemini Pro Vision, but I want to pass a PDF directly instead.

This native Vertex AI library feature is not implemented within LangChain. I am happy to contribute and implement; would appreciate some pointers on where to start as a new contributer.

### System Info

System Info
System Information
OS: Darwin
OS Version: Darwin Kernel Version 23.4.0: Fri Mar 15 00:11:05 PDT 2024; root:xnu-10063.101.17~1/RELEASE_X86_64
Python Version: 3.11.5 (main, Sep 11 2023, 08:19:27) [Clang 14.0.6 ]

lkuligin · 2024-05-10T15:35:30Z

@aaronepinto-bell You can just pass a GCS uri with LangChain, it works well for me.

just construct it as:

pdf_message = {
    "type": "image_url",  # Assuming the LLM accepts PDF under this key, you might need to verify this
    "image_url": {
        "url": "gs://my_bucket/my_document.pdf"
    },
}

lkuligin · 2024-05-10T15:36:39Z

@Adi8885 please, add it to our LC documentation

jamesev15 · 2024-05-16T19:55:47Z

@lkuligin I've tried that using the recent langchain-core version == 0.1.52 but it doesn't work

from langchain_google_vertexai import VertexAI
from langchain_google_vertexai import HarmBlockThreshold, HarmCategory
from langchain_core.messages import HumanMessage

safety_settings = {
    HarmCategory.HARM_CATEGORY_UNSPECIFIED: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
}

model = VertexAI(model_name="gemini-1.5-pro-preview-0514", project="project_id")


pdf_message = {
    "type": "image_url",
    "image_url": {
        "url": "gs://....pdf"
    },
}
text_message = {
    "type": "text",
    "text": "Summarize the provided document.",
}
message = HumanMessage(content=[text_message, pdf_message])

output = model.invoke([message])

Answer

Please provide me with the content of the PDF file located at the URL you provided: 'gs:...pdf'. I need the text content of the document to summarize it for you. 

Once I have the content, I can analyze it and provide you with a concise and informative summary.

aaronepinto-bell · 2024-05-21T14:47:38Z

@lkuligin @Adi8885 Hey folks, confirming the same as above, it is unable to register the document

Kashi-Datum · 2024-05-24T23:23:40Z

Any updates on this?

lkuligin · 2024-05-25T05:47:48Z

please, update the version of langchain-google-vertexai

jamesev15 · 2024-05-25T12:40:52Z

@lkuligin the version is still 1.0.4 in pypi. I've seen the fix, it is the addition of the option "media" but that change is not available yet. I've seen the commit from @wafle in langchain-google-genai release . Please correct me if I'm wrong

new way to use

with open("file.pdf", "rb") as f:
    pdf = base64.b64encode(f.read()).decode("utf-8")

content = [{
    "type": "text",
    "text": "prompt here",
},
{
    "type": "media",
    "mime_type": "application/pdf",
    "data": pdf
}]
message = [HumanMessage(content=content)]

efriis transferred this issue from langchain-ai/langchain May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to supply PDF files to VertexAI object despite feature available for native Vertex AI library #215

Unable to supply PDF files to VertexAI object despite feature available for native Vertex AI library #215

aaronepinto-bell commented May 8, 2024

lkuligin commented May 10, 2024

lkuligin commented May 10, 2024

jamesev15 commented May 16, 2024

aaronepinto-bell commented May 21, 2024

Kashi-Datum commented May 24, 2024

lkuligin commented May 25, 2024

jamesev15 commented May 25, 2024

Unable to supply PDF files to VertexAI object despite feature available for native Vertex AI library #215

Unable to supply PDF files to VertexAI object despite feature available for native Vertex AI library #215

Comments

aaronepinto-bell commented May 8, 2024

Checked other resources

Example Code

lkuligin commented May 10, 2024

lkuligin commented May 10, 2024

jamesev15 commented May 16, 2024

aaronepinto-bell commented May 21, 2024

Kashi-Datum commented May 24, 2024

lkuligin commented May 25, 2024

jamesev15 commented May 25, 2024