Correct check for SDPA in Vision Language Models #30565

zucchini-nlp · 2024-04-30T07:53:55Z

System Info

In current implementation of VLMs, the "_supports_sdpa" attribute checks and activates SDPA attention only for the language model. For example in Llava

It should also check and if available use SDPA attention for vision tower. Current implementations of the most common vision tower, CLIP, do not support SDPA (this PR adds sdpa for clip)

We can raise a warning for composite models if only one part support sdpa, but other does not. So that the user knows what is happening in the background.

Verified models

NielsRogge · 2024-04-30T08:58:18Z

Edited your issue to include a list of models to check ;) feel free to expand

zucchini-nlp added Should Fix This has been identified as a bug and should be fixed. Vision labels Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct check for SDPA in Vision Language Models #30565

Correct check for SDPA in Vision Language Models #30565

zucchini-nlp commented Apr 30, 2024 •

edited

NielsRogge commented Apr 30, 2024

Correct check for SDPA in Vision Language Models #30565

Correct check for SDPA in Vision Language Models #30565

Comments

zucchini-nlp commented Apr 30, 2024 • edited

System Info

Verified models

NielsRogge commented Apr 30, 2024

zucchini-nlp commented Apr 30, 2024 •

edited