Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
add MultiModalPrompt class and an example
Description
This PR introduces a new class,
MultiModalPrompt
, aimed at facilitating the transfer of information between multimodal agents. The class encapsulates both text prompts and additional multimodal data, thereby allowing seamless integration and interchangeability.The updated src file is
camel/prompts/multimodal.py
andcamel/prompts/__init__.py
.An example is added to
examples/multimodal/formating_example.py
Key Features:
TextPrompt
class) and multimodal information.MODALITIES
), and it can validate the provided modalities against this list.format
method allows the formatting of both text prompts and multimodal information in tandem. It can also distinguish between keyword arguments meant for the text prompt and those intended for multimodal information.to_model_format
method, the prompt can be converted into a model-understandable format. By default, it uses thedefault_to_model_format
method, but custom methods can also be provided.Code Changes:
MultiModalPrompt
class with methods for initializing, formatting, and converting to a model-understandable format.default_to_model_format
, which serves as the default method to format multimodal prompts for models.Example Description for Pull Request
MultiModalPrompt Example Demonstrations
In the attached example
examples/multimodal/formating_example.py
, it demonstrates the capabilities and practical use-cases of the newly addedMultiModalPrompt
class for various multimodal scenarios.Single Image VQA (Visual Question-Answering) Prompt:
Multi-Image Question with Custom Model Input Format:
multi_image_input_format
, is implemented which labels images in the prompt with numbers. This indexing format is inspired by MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning.<Image{i}>
are introduced in the textual prompt to indicate image positions.[Image{i}]
acts as the visual placeholder for the i-th image in the prompt.This example serves as a practical guide to:
MultiModalPrompt
can be seamlessly integrated with existing prompts.The described example not only showcases the ease of use and flexibility of the
MultiModalPrompt
class but also demonstrates its applicability across various real-world scenarios, emphasizing its potential utility for developers and researchers in the multimodal domain.Future Work
MultiModalPromptDict
.Please review the changes and provide feedback.
Motivation and Context
Why is this change required? What problem does it solve?
close #317
Feature RequestTypes of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Implemented Tasks
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!