Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification and supplement to the online docs example #1904

Open
2 of 4 tasks
paulcx opened this issue May 16, 2024 · 1 comment
Open
2 of 4 tasks

Clarification and supplement to the online docs example #1904

paulcx opened this issue May 16, 2024 · 1 comment

Comments

@paulcx
Copy link

paulcx commented May 16, 2024

System Info

docs[main]: https://huggingface.co/docs/text-generation-inference/basic_tutorials/visual_language_models
vlm: https://huggingface.co/llava-hf/llava-v1.6-34b-hf

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

In current docs, there a few examples about how to query vlm model. for example:

from huggingface_hub import InferenceClient

client = InferenceClient("http://127.0.0.1:3000")
image = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png"
prompt = f"![]({image})What is this a picture of?\n\n"
for token in client.text_generation(prompt, max_new_tokens=16, stream=True):
    print(token)

Expected behavior

However, there is no example of how to deal with the default chat template. For example, the chat template of llava-hf/llava-v1.6-34b-hf is following:

"<|im_start|>system\nAnswer the questions.<|im_end|><|im_start|>user\n<image>\n<your_text_prompt_here><|im_end|><|im_start|>assistant\n"

Should we ignore it and use the tgi format as showed above? and how to deal with the multi-turn queries? Any examples would be appreciated.

@drbh
Copy link
Collaborator

drbh commented May 30, 2024

Hi @paulcx thanks for pointing this out, we should be more clear about generation and templates in the docs.

In TGI the chat_template is applied when the chat endpoint is used, in the example above the generate endpoint is used and no template is applied.

Chat can used with the chat_completion method like below.

from huggingface_hub import InferenceClient

client = InferenceClient("http://127.0.0.1:3000")

chat = client.chat_completion(
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Whats in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png"
                    },
                },
            ],
        },
    ],
    seed=42,
    max_tokens=100,
)

print(chat)
# ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='length', index=0, message=ChatCompletionOutputMessage(role='assistant', content=" The image you've provided features an anthropomorphic rabbit in spacesuit attire. This rabbit is depicted with human-like posture and movement, standing on a rocky terrain with a vast, reddish-brown landscape in the background. The spacesuit is detailed with mission patches, circuitry, and a helmet that covers the rabbit's face and ear, with an illuminated red light on the chest area.\n\nThe artwork style is that of a", name=None, tool_calls=None), logprobs=None)], created=1714589614, id='', model='llava-hf/llava-v1.6-mistral-7b-hf', object='text_completion', system_fingerprint='2.0.2-native', usage=ChatCompletionOutputUsage(completion_tokens=100, prompt_tokens=2943, total_tokens=3043))

Note that when using the chat endpoint images are sent as typed messages rather than markdown format.

I hope this helps clarify! please let me know if you have any questions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants