You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In current implementation of VLMs, the "_supports_sdpa" attribute checks and activates SDPA attention only for the language model. For example in Llava
It should also check and if available use SDPA attention for vision tower. Current implementations of the most common vision tower, CLIP, do not support SDPA (this PR adds sdpa for clip)
We can raise a warning for composite models if only one part support sdpa, but other does not. So that the user knows what is happening in the background.
Verified models
BLIP-2
InstructBLIP
KOSMOS-2
LLaVa
LLaVa-NeXT
Idefics
Idefics2
The text was updated successfully, but these errors were encountered:
System Info
In current implementation of VLMs, the "_supports_sdpa" attribute checks and activates SDPA attention only for the language model. For example in Llava
It should also check and if available use SDPA attention for vision tower. Current implementations of the most common vision tower, CLIP, do not support SDPA (this PR adds sdpa for clip)
We can raise a warning for composite models if only one part support sdpa, but other does not. So that the user knows what is happening in the background.
Verified models
The text was updated successfully, but these errors were encountered: