Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct check for SDPA in Vision Language Models #30565

Open
7 tasks
zucchini-nlp opened this issue Apr 30, 2024 · 1 comment
Open
7 tasks

Correct check for SDPA in Vision Language Models #30565

zucchini-nlp opened this issue Apr 30, 2024 · 1 comment
Labels
Should Fix This has been identified as a bug and should be fixed. Vision

Comments

@zucchini-nlp
Copy link
Member

zucchini-nlp commented Apr 30, 2024

System Info

In current implementation of VLMs, the "_supports_sdpa" attribute checks and activates SDPA attention only for the language model. For example in Llava

It should also check and if available use SDPA attention for vision tower. Current implementations of the most common vision tower, CLIP, do not support SDPA (this PR adds sdpa for clip)

We can raise a warning for composite models if only one part support sdpa, but other does not. So that the user knows what is happening in the background.

Verified models

  • BLIP-2
  • InstructBLIP
  • KOSMOS-2
  • LLaVa
  • LLaVa-NeXT
  • Idefics
  • Idefics2
@zucchini-nlp zucchini-nlp added Should Fix This has been identified as a bug and should be fixed. Vision labels Apr 30, 2024
@NielsRogge
Copy link
Contributor

Edited your issue to include a list of models to check ;) feel free to expand

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Should Fix This has been identified as a bug and should be fixed. Vision
Projects
None yet
Development

No branches or pull requests

2 participants