LLaVa 1.6 anyres support #8

ai-and-i · 2024-05-18T17:24:09Z

Hi @dusty-nv, thanks for this amazing library! We're using it in a cool art project for Burning Man :-)

I tested the new llava 1.6 (specifically https://huggingface.co/lmms-lab/llama3-llava-next-8b), and it seems to work. However, reading the code, it seems that anyres feature (dividing images into multiple patches) is not implemented in nanollm yet. For now, the image is simply downscaled (and cropped or padded) to a 336x336 square. Is that accurate, or am I missing anything?

dusty-nv · 2024-05-18T17:32:22Z

Hi @ai-and-i , that's correct, I only do the single tile to keep latency to a minimum. Although I'd like to support this now that I have CLIP/SigLIP running faster through TensorRT. The anyres tiling also pretty rapidly develops which is a challenge to keep up with and adds more image tokens. The latest VILA models have increased 384/448 input res and 196 image tokens. Would love to hear more about your Burning Man project, sounds awesome! And your latency/accuracy tradeoffs would be good to understand. I optimize for max FPS but understand others might have different priorities

…

________________________________ From: ai-and-i ***@***.***> Sent: Saturday, May 18, 2024 1:24:30 PM To: dusty-nv/NanoLLM ***@***.***> Cc: Dustin Franklin ***@***.***>; Mention ***@***.***> Subject: [dusty-nv/NanoLLM] LLaVa 1.6 anyres support (Issue #8) Hi @dusty-nv<https://github.com/dusty-nv>, thanks for this amazing library! We're using it in a cool art project for Burning Man :-) I tested the new llava 1.6 (specifically https://huggingface.co/lmms-lab/llama3-llava-next-8b), and it seems to work. However, reading the code, it seems that anyres feature (dividing images into multiple patches) is not implemented in nanollm yet. For now, the image is simply downscaled (and cropped or padded) to a 336x336 square. Is that accurate, or am I missing anything? — Reply to this email directly, view it on GitHub<#8>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADVEGK674H6Z3HOS67I6W7DZC6FE5AVCNFSM6AAAAABH5T2TECVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMYDIMJXGU3TOMY>. You are receiving this because you were mentioned.Message ID: ***@***.***>

ai-and-i · 2024-05-18T20:15:17Z

Great, thanks for confirming! We're still exploring, and will likely end up preferring low latency too. Unfortunately Llama-3-VILA1.5-8B didn't produce great results with our prompts, but llava-1.6 seems promising so far — even without anyres.

In case you're interested, we just recently put some photos of the last year's version here: https://www.instagram.com/ai_and_i.art/ . It was based on GPT-4, but unreliable internet and model latency were quite annoying. So this year it's all built around Jetson AGX Orin and local LLMs — with a lot more fun costumes, animations and interactions already in the works!

rsun-bdti · 2024-05-20T16:32:01Z

Llama-3-VILA1.5-8B didn't produce great results no matter how we tweaked our prompts. I also observed that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLaVa 1.6 anyres support #8

LLaVa 1.6 anyres support #8

ai-and-i commented May 18, 2024

dusty-nv commented May 18, 2024 via email

ai-and-i commented May 18, 2024

rsun-bdti commented May 20, 2024

LLaVa 1.6 anyres support #8

LLaVa 1.6 anyres support #8

Comments

ai-and-i commented May 18, 2024

dusty-nv commented May 18, 2024 via email

ai-and-i commented May 18, 2024

rsun-bdti commented May 20, 2024