New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration of VLM embedding model #446
base: master
Are you sure you want to change the base?
Conversation
Hey @Appointat, thanks for your detailed polish on the docstring! If other parts is also good for you could you approve this PR? Also one quick tip, maybe it's better to use review mode list suggestions rather than push commit directly, it would be better for PR owner to track the issue and learn from your review~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot. leave some suggestions.
camel/embeddings/clip_embedding.py
Outdated
|
||
def embed_list( | ||
self, | ||
objs: List[Union[Image.Image, str]], # to do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does to do
mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot to move it.
Hey, thanks for the tip! I appreciate the tip for the code review. It does make sense to maintain clarity and help the PR owner to track changes. I was previously concerned that my comments were too numerous, to the point where code review would waste too much time for both parties, so I committed the changes directly. By the way, my recent commits were focused on formatting and adding comments to improve code readability and maintainability. I've gone through the rest of the changes in the PR, and everything looks good to me. Good job! |
Co-authored-by: Tianqi Xu <[email protected]>
Yeah. It makes sense not to leave every single detail in comments. The contributor can directly learn from reviewer's commits. |
Co-authored-by: Tianqi Xu <[email protected]>
camel/embeddings/clip_embedding.py
Outdated
from camel.embeddings import BaseEmbedding | ||
|
||
|
||
class CLIPEmbedding(BaseEmbedding[Union[str, Image.Image]]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this to something like VisionLanguageEmbedding
? There are some other models also similar to CLIP
such as BLIP
etc.
camel/embeddings/clip_embedding.py
Outdated
(default: :obj:`openai/clip-vit-base-patch32`) | ||
""" | ||
|
||
from transformers import CLIPModel, CLIPProcessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, to make this class more general you can use AutoModel
and AutoProcessor
here instead of specific ones.
from transformers import CLIPModel, CLIPProcessor | |
from transformers import AutoModel, AutoProcessor |
camel/embeddings/clip_embedding.py
Outdated
from transformers import CLIPModel, CLIPProcessor | ||
self.model = CLIPModel.from_pretrained(model_name) | ||
self.processor = CLIPProcessor.from_pretrained(model_name) | ||
text = 'dimension' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we can make this attribute lazily initialized. For example, during init
you can have self.dim = None
.
If self.dim is None
and someone is calling get_output_dim
, you can assign and return self.dim
in get_output_dim
.
If self.dim is None
and someone is calling embed_list
, you can assign self.dim
within embed_list
Or you can explore using AutoConfig
which may have the embedding size, though I am not sure.
|
||
|
||
def test_CLIPEmbedding_initialization(): | ||
embedding = CLIPEmbedding() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we create some mock tests for the embedding instead of download the model every time (which is quite expensive)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated my code reviewing. Please let me know if there are any questions.
camel/embeddings/clip_embedding.py
Outdated
|
||
def embed_list( | ||
self, | ||
objs: List[Union[Image.Image, str]], # to do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commentation # to do
, what do you mean? Is the PR not finished, remaining something to be done? Remove it or complete the commentation to make it understandable.
camel/embeddings/clip_embedding.py
Outdated
def embed_list( | ||
self, | ||
objs: List[Union[Image.Image, str]], # to do | ||
**kwargs: Any, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is noticed that the **kwargs
parameter doesn't appear to be utilized within the function's implementation, and it's also not covered by the existing unit tests. Could we enhance our test suite by including tests that verify the handling of **kwargs
? This would ensure that all aspects of the function's behavior are thoroughly tested." Thanks.
assert isinstance(embeddings, list) | ||
assert len(embeddings) == 2 | ||
for e in embeddings: | ||
assert len(e) == embedding.get_output_dim() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the tow funcs test_image_embed_list_with_valid_input
and test_text_embed_list_with_valid_input
, to ensure robustness, could we consider introducing a combined test case, such as test_image_and_text_embed_list_with_valid_input
, whose input can be test_image_text = [image, "Hello world"]? If not, I suggest adding a small check to validate that the function correctly raises an error message, The type of the input is inconsistent.
, when encountering mixed input types? Thank you for addressing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for bringing this up, at the moment we do have a design that can only accept the same type of input, I will add an error reporting tip.
camel/embeddings/clip_embedding.py
Outdated
as a list of floating-point numbers. | ||
""" | ||
if not objs: | ||
raise ValueError("Input text list is empty.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error msg should be "Input objs list is empty."
@FUYICC Hi, is the pr still in progress? Let me know if you have any difficulties. |
Thank you for your kind help! Sorry I've been mostly working on my dissertation for the past 3 weeks so I haven't had time to move forward, I'll be up and running starting next week, we'll discuss any questions anytime! |
Description
Issue #445
Summary by CodeRabbit
CLIPEmbedding
for image and text embedding functionalities.CLIPEmbedding
functionality, covering initialization, embedding processes, and output dimension retrieval.