In the Llama-2-7b-chat-hf-q4f32_1-1k model, the number of tokens in the prefill is 36 when inputting 'hello'. #396

137591 · 2024-05-14T05:24:56Z

Why is the number of tokens in the prompt's output different from the actual number of tokens produced by the tokenizer?
I used the LLaMA 2 tokenizer, and the prompt 'hello' is only split into 2 tokens. However, the prefill count provided by the project is 36 tokens, and experiments have confirmed that all prefill outputs have 34 more tokens than the original prompt's token count. Please explain the reason.

CharlieFRuan · 2024-05-14T05:42:45Z

They are due to the system prompt as shown in the mlc-chat-config.json: https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC/blob/main/mlc-chat-config.json#L33-L34, which follow the specification of the official model releases.

If you'd like not to use a system prompt, try overriding it with an empty string:

  const request: webllm.ChatCompletionRequest = {
    messages: [
      {"role": "system", "content": ""},
      { "role": "user", "content": "Hello" },
    ],
  };

137591 · 2024-05-14T06:32:00Z

They are due to the system prompt as shown in the mlc-chat-config.json: https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC/blob/main/mlc-chat-config.json#L33-L34, which follow the specification of the official model releases.它们是由于 mlc-chat-config.json 中所示的系统提示造成的：https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC/blob/main/mlc -chat-config.json#L33-L34，遵循官方模型发布的规范。

If you'd like not to use a system prompt, try overriding it with an empty string:如果您不想使用系统提示符，请尝试使用空字符串覆盖它：
  const request: webllm.ChatCompletionRequest = {
    messages: [
      {"role": "system", "content": ""},
      { "role": "user", "content": "Hello" },
    ],
  };

got it！thank you very much！

CharlieFRuan closed this as completed May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In the Llama-2-7b-chat-hf-q4f32_1-1k model, the number of tokens in the prefill is 36 when inputting 'hello'. #396

In the Llama-2-7b-chat-hf-q4f32_1-1k model, the number of tokens in the prefill is 36 when inputting 'hello'. #396

137591 commented May 14, 2024

CharlieFRuan commented May 14, 2024 •

edited

137591 commented May 14, 2024

In the Llama-2-7b-chat-hf-q4f32_1-1k model, the number of tokens in the prefill is 36 when inputting 'hello'. #396

In the Llama-2-7b-chat-hf-q4f32_1-1k model, the number of tokens in the prefill is 36 when inputting 'hello'. #396

Comments

137591 commented May 14, 2024

CharlieFRuan commented May 14, 2024 • edited

137591 commented May 14, 2024

CharlieFRuan commented May 14, 2024 •

edited