Error using Xenova/nanoLLaVA in pipeline #758

kendelljoseph · 2024-05-12T01:38:37Z

System Info

Using:

Node v21.7.1
Mac M1

Environment/Platform

Website/web-app
Server-side (e.g., Node.js, Deno, Bun)

Description

https://huggingface.co/Xenova/nanoLLaVA

New model nanoLLaVA threw this error:

Unknown model class "llava", attempting to construct from base class.
Model type for 'llava' not found, assuming encoder-only architecture.

 Error: Could not locate file: "https://huggingface.co/Xenova/nanoLLaVA/resolve/main/onnx/model_quantized.onnx".

Reproduction

Use Xenova/nanoLLaVA like this:

const featureExtractor = await transformers.pipeline('image-feature-extraction', 'Xenova/nanoLLaVA')

package.json

"@xenova/transformers": "^2.17.1",

The text was updated successfully, but these errors were encountered:

xenova · 2024-05-12T10:58:36Z

I appreciate your enthusiasm with testing the model out, since I only added it a few hours ago... but I'm still adding support for it to the library! I will let you know when it is supported.

kendelljoseph · 2024-05-12T15:49:44Z

Brilliant, thank you very much!

I'm closely watching this feature, and if you link a PR for this I can glean from the work and help maintain the code!

xenova · 2024-05-12T21:49:18Z

You can follow along in the v3 branch: #545

Here's some example code which should work:

import { AutoTokenizer, AutoProcessor, RawImage, LlavaForConditionalGeneration } from '@xenova/transformers';

// Load tokenizer, processor and model
const model_id = 'Xenova/nanoLLaVA';
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
const processor = await AutoProcessor.from_pretrained(model_id);
const model = await LlavaForConditionalGeneration.from_pretrained(model_id, {
    dtype: {
        embed_tokens: 'fp16',
        vision_encoder: 'q8', // or 'fp16'
        decoder_model_merged: 'q4', // or 'q8'
    },
});

// Prepare text inputs
const prompt = 'Describe this image in detail';
const messages = [
    { 'role': 'user', 'content': `<image>\n${prompt}` }
]
const text = tokenizer.apply_chat_template(messages, { tokenize: false, add_generation_prompt: true })
const text_inputs = tokenizer(text, { padding: true });

// Prepare vision inputs
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg'
const image = await RawImage.fromURL(url);
const vision_inputs = await processor(image);

// Generate response
const inputs = { ...text_inputs, ...vision_inputs };
const output = await model.generate({
    ...inputs,
    do_sample: false,
    max_new_tokens: 64,
});

// Decode output
const decoded = tokenizer.batch_decode(output, { skip_special_tokens: false });
console.log('decoded', decoded);

Note that this may change in future, and I'll update the model card when I've done some more testing.

xenova · 2024-05-22T23:55:00Z

The model card has been updated with example code 👍 https://huggingface.co/Xenova/nanoLLaVA

We also put an online demo out for you to try: https://huggingface.co/spaces/Xenova/experimental-nanollava-webgpu

Example videos:

nanollava-webgpu.mp4

nanollava-webgpu-2.mp4

kendelljoseph added the bug Something isn't working label May 12, 2024

xenova closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error using Xenova/nanoLLaVA in pipeline #758

Error using Xenova/nanoLLaVA in pipeline #758

kendelljoseph commented May 12, 2024 •

edited

xenova commented May 12, 2024

kendelljoseph commented May 12, 2024

xenova commented May 12, 2024

xenova commented May 22, 2024

Error using Xenova/nanoLLaVA in pipeline #758

Error using Xenova/nanoLLaVA in pipeline #758

Comments

kendelljoseph commented May 12, 2024 • edited

System Info

Environment/Platform

Description

Reproduction

xenova commented May 12, 2024

kendelljoseph commented May 12, 2024

xenova commented May 12, 2024

xenova commented May 22, 2024

kendelljoseph commented May 12, 2024 •

edited