ConversationChain not suppot GPT-4 Turbo with Vision model(gpt-4-turbo-2024-04-09), with image inputs #5326

WangBiaoxuan · 2024-05-09T11:27:11Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain.js documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain.js rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

var chain = new ConversationChain({
           // this.model is gpt-4-turbo  with Vision model，so i want to pass image to gpt-4-turbo
            llm: this.model,
            memory: this.memory,
        })

var input = new HumanMessage({
                content: [
                    {
                        type: 'text',
                        text: '描述这个图片',
                    },
                    {
                        type: 'image_url',
                        image_url: {
                            url: 'https://www.freecodecamp.org/news/content/images/2023/05/Screenshot-2023-05-29-at-5.40.38-PM.png',
                            detail: 'low',
                        },
                    },
                ],
            });
chain.call({ input: input })

Error Message and Stack Trace (if applicable)

No response

Description

var chain = new ConversationChain({
           // this.model is gpt-4-turbo  with Vision model，so i want to pass image to gpt-4-turbo
            llm: this.model,
            memory: this.memory,
        })

var input = new HumanMessage({
                content: [
                    {
                        type: 'text',
                        text: 'Describe this image',
                    },
                    {
                        type: 'image_url',
                        image_url: {
                            url: 'https://www.freecodecamp.org/news/content/images/2023/05/Screenshot-2023-05-29-at-5.40.38-PM.png',
                            detail: 'low',
                        },
                    },
                ],
            });
chain.call({ input: input })

When executing this code, an error will be reported。

i find langchainjs code，find input only support string，but above code input is a Array

export const DEFAULT_TEMPLATE = `The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
{history}
Human: {input}
AI:`;

System Info

platform：mac
node：v20.11.1

The text was updated successfully, but these errors were encountered:

dosubot · 2024-05-09T11:27:45Z

To address the issue of passing an image along with text to the GPT-4 Turbo with Vision model using LangChain.js, you'll need to ensure your input is correctly formatted. The model expects a MessageContentComplex array that includes both text and image data. Here's a streamlined approach to achieve this:

Prepare the Image and Text Data: If you're using a base64-encoded image, ensure it's correctly prefixed (e.g., data:image/png;base64, for PNG images). For online images, directly use the URL.

Construct the Input Array: Combine your text and image data into a single array. For a base64-encoded image, the structure would look like this:

const messageContent = [
  {
    type: "text",
    text: "What is in this image?",
  },
  {
    type: "image_url",
    image_url: `data:image/png;base64,${your_base64_encoded_image_data}`,
  },
];

Invoke the Model: Use the prepared input to call the model. Ensure your model instance is set up for vision capabilities.

const model = new GoogleLLM({
  modelName: "gemini-pro-vision",
});

const response = await model.invoke(input);
console.log("Response:", response);

For converting images to base64 in Node.js, you can use the fs module:

import fs from "node:fs/promises";

const image = await fs.readFile("path_to_your_image.jpg");
const base64Image = image.toString("base64");

This method allows you to leverage the multimodal capabilities of the GPT-4 Turbo with Vision model within LangChain.js effectively. For further details, refer to the LangChain.js documentation and examples.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

WangBiaoxuan · 2024-05-10T06:30:23Z

@dosu your example do not use ConversationChain，i want to use ConversationChain

dosubot · 2024-05-10T06:40:30Z

To effectively use ConversationChain with GPT-4 Turbo with Vision model in LangChain.js for handling both text and image inputs, it's important to understand that ConversationChain is primarily designed for text interactions and does not natively support complex objects or arrays, including those containing image data.

Since ConversationChain expects string inputs, a workaround is to process the image and text inputs separately and then integrate the responses into your conversation flow. Here's a conceptual approach to achieve this:

Process Image and Text Separately: Use a model instance that supports vision capabilities to process the image. This step involves sending the image to the model and receiving a text description or relevant output.
Integrate Model Responses into ConversationChain: Once you have the text output from the vision model, you can incorporate this text into your ConversationChain as part of the conversation. This means you'll manually handle the image processing part outside of ConversationChain and then use the resulting text for the conversation.
Example Workflow:
- First, use a vision-capable model to process the image and obtain a text description.
- Next, incorporate this text description into your conversation flow within ConversationChain by treating it as a regular text input.

This approach allows you to leverage the capabilities of GPT-4 Turbo with Vision by manually bridging the gap between vision and text processing. Remember, the key is to handle the image processing separately and then feed the resulting text into the ConversationChain.

For processing images, you might use a separate instance or method specifically designed for handling vision tasks, then integrate the output into your conversation flow. This ensures you can still utilize the advanced capabilities of GPT-4 Turbo with Vision within the constraints of ConversationChain's design, which is optimized for text-based interactions.

Details

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label May 9, 2024

WangBiaoxuan changed the title ~~ConversationChain not suppot GPT-4 Turbo with Vision model(gpt-4-turbo-2024-04-09)~~ ConversationChain not suppot GPT-4 Turbo with Vision model(gpt-4-turbo-2024-04-09), with image inputs May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConversationChain not suppot GPT-4 Turbo with Vision model(gpt-4-turbo-2024-04-09), with image inputs #5326

ConversationChain not suppot GPT-4 Turbo with Vision model(gpt-4-turbo-2024-04-09), with image inputs #5326

WangBiaoxuan commented May 9, 2024 •

edited

dosubot bot commented May 9, 2024 •

edited

Details

WangBiaoxuan commented May 10, 2024

dosubot bot commented May 10, 2024 •

edited

Details

ConversationChain not suppot GPT-4 Turbo with Vision model(gpt-4-turbo-2024-04-09), with image inputs #5326

ConversationChain not suppot GPT-4 Turbo with Vision model(gpt-4-turbo-2024-04-09), with image inputs #5326

Comments

WangBiaoxuan commented May 9, 2024 • edited

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

dosubot bot commented May 9, 2024 • edited

Details

WangBiaoxuan commented May 10, 2024

dosubot bot commented May 10, 2024 • edited

Details

WangBiaoxuan commented May 9, 2024 •

edited

dosubot bot commented May 9, 2024 •

edited

dosubot bot commented May 10, 2024 •

edited