Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration of VLM embedding model #446

Open
wants to merge 47 commits into
base: master
Choose a base branch
from
Open

Integration of VLM embedding model #446

wants to merge 47 commits into from

Conversation

FUYICC
Copy link
Contributor

@FUYICC FUYICC commented Feb 1, 2024

Description

Issue #445

Summary by CodeRabbit

  • New Features
    • Introduced CLIPEmbedding for image and text embedding functionalities.
  • Bug Fixes
    • Improved file encoding handling in license updates.
  • Tests
    • Added tests for the new CLIPEmbedding functionality, covering initialization, embedding processes, and output dimension retrieval.

@FUYICC FUYICC self-assigned this Feb 1, 2024
@FUYICC FUYICC linked an issue Feb 1, 2024 that may be closed by this pull request
4 tasks
@FUYICC FUYICC closed this Feb 4, 2024
@FUYICC FUYICC reopened this Feb 5, 2024
@FUYICC FUYICC marked this pull request as ready for review February 9, 2024 03:11
@Wendong-Fan
Copy link
Member

Hey @Appointat, thanks for your detailed polish on the docstring! If other parts is also good for you could you approve this PR? Also one quick tip, maybe it's better to use review mode list suggestions rather than push commit directly, it would be better for PR owner to track the issue and learn from your review~

Copy link
Collaborator

@dandansamax dandansamax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot. leave some suggestions.

camel/embeddings/clip_embedding.py Outdated Show resolved Hide resolved

def embed_list(
self,
objs: List[Union[Image.Image, str]], # to do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does to do mean here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot to move it.

camel/embeddings/clip_embedding.py Outdated Show resolved Hide resolved
licenses/update_license.py Outdated Show resolved Hide resolved
@Appointat
Copy link
Member

Appointat commented Mar 9, 2024

Hey @Appointat, thanks for your detailed polish on the docstring! If other parts is also good for you could you approve this PR? Also one quick tip, maybe it's better to use review mode list suggestions rather than push commit directly, it would be better for PR owner to track the issue and learn from your review~

Hey, thanks for the tip! I appreciate the tip for the code review. It does make sense to maintain clarity and help the PR owner to track changes. I was previously concerned that my comments were too numerous, to the point where code review would waste too much time for both parties, so I committed the changes directly. By the way, my recent commits were focused on formatting and adding comments to improve code readability and maintainability. I've gone through the rest of the changes in the PR, and everything looks good to me. Good job!

@dandansamax
Copy link
Collaborator

By the way, my recent commits were focused on formatting and adding comments to improve code readability and maintainability.

Yeah. It makes sense not to leave every single detail in comments. The contributor can directly learn from reviewer's commits.

from camel.embeddings import BaseEmbedding


class CLIPEmbedding(BaseEmbedding[Union[str, Image.Image]]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this to something like VisionLanguageEmbedding? There are some other models also similar to CLIP such as BLIP etc.

(default: :obj:`openai/clip-vit-base-patch32`)
"""

from transformers import CLIPModel, CLIPProcessor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, to make this class more general you can use AutoModel and AutoProcessor here instead of specific ones.

Suggested change
from transformers import CLIPModel, CLIPProcessor
from transformers import AutoModel, AutoProcessor

from transformers import CLIPModel, CLIPProcessor
self.model = CLIPModel.from_pretrained(model_name)
self.processor = CLIPProcessor.from_pretrained(model_name)
text = 'dimension'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we can make this attribute lazily initialized. For example, during init you can have self.dim = None.
If self.dim is None and someone is calling get_output_dim, you can assign and return self.dim in get_output_dim.
If self.dim is None and someone is calling embed_list, you can assign self.dim within embed_list
Or you can explore using AutoConfig which may have the embedding size, though I am not sure.



def test_CLIPEmbedding_initialization():
embedding = CLIPEmbedding()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create some mock tests for the embedding instead of download the model every time (which is quite expensive)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right.

Copy link
Member

@Appointat Appointat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated my code reviewing. Please let me know if there are any questions.


def embed_list(
self,
objs: List[Union[Image.Image, str]], # to do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commentation # to do, what do you mean? Is the PR not finished, remaining something to be done? Remove it or complete the commentation to make it understandable.

def embed_list(
self,
objs: List[Union[Image.Image, str]], # to do
**kwargs: Any,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is noticed that the **kwargs parameter doesn't appear to be utilized within the function's implementation, and it's also not covered by the existing unit tests. Could we enhance our test suite by including tests that verify the handling of **kwargs? This would ensure that all aspects of the function's behavior are thoroughly tested." Thanks.

assert isinstance(embeddings, list)
assert len(embeddings) == 2
for e in embeddings:
assert len(e) == embedding.get_output_dim()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the tow funcs test_image_embed_list_with_valid_input and test_text_embed_list_with_valid_input, to ensure robustness, could we consider introducing a combined test case, such as test_image_and_text_embed_list_with_valid_input, whose input can be test_image_text = [image, "Hello world"]? If not, I suggest adding a small check to validate that the function correctly raises an error message, The type of the input is inconsistent., when encountering mixed input types? Thank you for addressing this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for bringing this up, at the moment we do have a design that can only accept the same type of input, I will add an error reporting tip.

as a list of floating-point numbers.
"""
if not objs:
raise ValueError("Input text list is empty.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error msg should be "Input objs list is empty."

@Appointat
Copy link
Member

@FUYICC Hi, is the pr still in progress? Let me know if you have any difficulties.

@FUYICC
Copy link
Contributor Author

FUYICC commented Mar 27, 2024

@FUYICC Hi, is the pr still in progress? Let me know if you have any difficulties.

Thank you for your kind help! Sorry I've been mostly working on my dissertation for the past 3 weeks so I haven't had time to move forward, I'll be up and running starting next week, we'll discuss any questions anytime!

@FUYICC FUYICC changed the title Integration of CLIP embedding model Integration of VLM embedding model Apr 15, 2024
@FUYICC FUYICC requested a review from Wendong-Fan May 5, 2024 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Embeddings lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
Status: Reviewed
Development

Successfully merging this pull request may close these issues.

[Feature Request] Multi-modal RAG(Retrieval-Augmented Generation)
6 participants