-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLaVa 1.6 anyres support #8
Comments
Hi @ai-and-i , that's correct, I only do the single tile to keep latency to a minimum. Although I'd like to support this now that I have CLIP/SigLIP running faster through TensorRT. The anyres tiling also pretty rapidly develops which is a challenge to keep up with and adds more image tokens. The latest VILA models have increased 384/448 input res and 196 image tokens.
Would love to hear more about your Burning Man project, sounds awesome! And your latency/accuracy tradeoffs would be good to understand. I optimize for max FPS but understand others might have different priorities
…________________________________
From: ai-and-i ***@***.***>
Sent: Saturday, May 18, 2024 1:24:30 PM
To: dusty-nv/NanoLLM ***@***.***>
Cc: Dustin Franklin ***@***.***>; Mention ***@***.***>
Subject: [dusty-nv/NanoLLM] LLaVa 1.6 anyres support (Issue #8)
Hi @dusty-nv<https://github.com/dusty-nv>, thanks for this amazing library! We're using it in a cool art project for Burning Man :-)
I tested the new llava 1.6 (specifically https://huggingface.co/lmms-lab/llama3-llava-next-8b), and it seems to work. However, reading the code, it seems that anyres feature (dividing images into multiple patches) is not implemented in nanollm yet. For now, the image is simply downscaled (and cropped or padded) to a 336x336 square. Is that accurate, or am I missing anything?
—
Reply to this email directly, view it on GitHub<#8>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADVEGK674H6Z3HOS67I6W7DZC6FE5AVCNFSM6AAAAABH5T2TECVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMYDIMJXGU3TOMY>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Great, thanks for confirming! We're still exploring, and will likely end up preferring low latency too. Unfortunately Llama-3-VILA1.5-8B didn't produce great results with our prompts, but llava-1.6 seems promising so far — even without anyres. In case you're interested, we just recently put some photos of the last year's version here: https://www.instagram.com/ai_and_i.art/ . It was based on GPT-4, but unreliable internet and model latency were quite annoying. So this year it's all built around Jetson AGX Orin and local LLMs — with a lot more fun costumes, animations and interactions already in the works! |
Llama-3-VILA1.5-8B didn't produce great results no matter how we tweaked our prompts. I also observed that. |
Hi @dusty-nv, thanks for this amazing library! We're using it in a cool art project for Burning Man :-)
I tested the new llava 1.6 (specifically https://huggingface.co/lmms-lab/llama3-llava-next-8b), and it seems to work. However, reading the code, it seems that anyres feature (dividing images into multiple patches) is not implemented in nanollm yet. For now, the image is simply downscaled (and cropped or padded) to a 336x336 square. Is that accurate, or am I missing anything?
The text was updated successfully, but these errors were encountered: