Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLaVa 1.6 anyres support #8

Open
ai-and-i opened this issue May 18, 2024 · 3 comments
Open

LLaVa 1.6 anyres support #8

ai-and-i opened this issue May 18, 2024 · 3 comments

Comments

@ai-and-i
Copy link

Hi @dusty-nv, thanks for this amazing library! We're using it in a cool art project for Burning Man :-)

I tested the new llava 1.6 (specifically https://huggingface.co/lmms-lab/llama3-llava-next-8b), and it seems to work. However, reading the code, it seems that anyres feature (dividing images into multiple patches) is not implemented in nanollm yet. For now, the image is simply downscaled (and cropped or padded) to a 336x336 square. Is that accurate, or am I missing anything?

@dusty-nv
Copy link
Owner

dusty-nv commented May 18, 2024 via email

@ai-and-i
Copy link
Author

Great, thanks for confirming! We're still exploring, and will likely end up preferring low latency too. Unfortunately Llama-3-VILA1.5-8B didn't produce great results with our prompts, but llava-1.6 seems promising so far — even without anyres.

In case you're interested, we just recently put some photos of the last year's version here: https://www.instagram.com/ai_and_i.art/ . It was based on GPT-4, but unreliable internet and model latency were quite annoying. So this year it's all built around Jetson AGX Orin and local LLMs — with a lot more fun costumes, animations and interactions already in the works!

@rsun-bdti
Copy link

Llama-3-VILA1.5-8B didn't produce great results no matter how we tweaked our prompts. I also observed that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants