[BUG] <Qwen-14B-Chat 输入长文本时无输出结果> #1232

TianWuYuJiangHenShou · 2024-04-29T07:57:43Z

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

首先，我是用的transformer load的模型(按照readme,安装了flash-attention),受限于公司内部AI平台暂时无法升级显卡驱动(现在是cuda 11.7),只能用flask+transformers+Flash-Attention的方式部署服务。最近在用readme的Batch Inference进行批量处理，在A800上能达到80~90Token/s的推理速度。
问题：在Batch Inference时，有时输出结果为空(b'')，是偶发事件。
参考了#50 ，但是14B的模型里面use_dynamic_ntk和use_logn_attn两个参数都为True，无法解决我的问题。

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

# 完全copy了readme的代码，只换了输入数据
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import GenerationConfig
from qwen_generation_utils import make_context, decode_tokens, get_stop_words_ids

# To generate attention masks automatically, it is necessary to assign distinct
# token_ids to pad_token and eos_token, and set pad_token_id in the generation_config.
tokenizer = AutoTokenizer.from_pretrained(
    './',
    pad_token='<|extra_0|>',
    eos_token='<|endoftext|>',
    padding_side='left',
    trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
    './',
    pad_token_id=tokenizer.pad_token_id,
    device_map="auto",
    trust_remote_code=True
).eval()
model.generation_config = GenerationConfig.from_pretrained('./', pad_token_id=tokenizer.pad_token_id)

all_raw_text = ["我想听你说爱我。", "今天我想吃点啥，甜甜的，推荐下", "我马上迟到了，怎么做才能不迟到"]
batch_raw_text = []
for q in all_raw_text:
    raw_text, _ = make_context(
        tokenizer,
        q,
        system="You are a helpful assistant.",
        max_window_size=model.generation_config.max_window_size,
        chat_format=model.generation_config.chat_format,
    )
    batch_raw_text.append(raw_text)

batch_input_ids = tokenizer(batch_raw_text, padding='longest')
batch_input_ids = torch.LongTensor(batch_input_ids['input_ids']).to(model.device)
batch_out_ids = model.generate(
    batch_input_ids,
    return_dict_in_generate=False,
    generation_config=model.generation_config
)
padding_lens = [batch_input_ids[i].eq(tokenizer.pad_token_id).sum().item() for i in range(batch_input_ids.size(0))]

batch_response = [
    decode_tokens(
        batch_out_ids[i][padding_lens[i]:],
        tokenizer,
        raw_text_len=len(batch_raw_text[i]),
        context_length=(batch_input_ids[i].size(0)-padding_lens[i]),
        chat_format="chatml",
        verbose=False,
        errors='replace'
    ) for i in range(len(all_raw_text))
]
print(batch_response)

response, _ = model.chat(tokenizer, "我想听你说爱我。", history=None)
print(response)

response, _ = model.chat(tokenizer, "今天我想吃点啥，甜甜的，推荐下", history=None)
print(response)

response, _ = model.chat(tokenizer, "我马上迟到了，怎么做才能不迟到", history=None)
print(response)

运行环境 | Environment

- OS:Ubuntu 20.04
- Python: 3.10.9
- Transformers:4.37.0
- PyTorch:2.0.1
- CUDA :11.7

备注 | Anything else?

No response

The text was updated successfully, but these errors were encountered:

TianWuYuJiangHenShou · 2024-04-29T08:02:17Z

Batch Inference 异常时，输入的文本应该没有太长.同样的输入有时候model.generate(text)能出结果,绝大多数不能

jklj077 · 2024-04-29T12:01:47Z

How long is the input? I wonder if you could share with us an example input.

P.S.: The 14B model have a comparatively shorter sequence length support. If it is okay, could you try if the 7B model or the Qwen1.5-14B model have the same issue?

TianWuYuJiangHenShou · 2024-04-30T02:33:08Z

How long is the input? I wonder if you could share with us an example input.

P.S.: The 14B model have a comparatively shorter sequence length support. If it is okay, could you try if the 7B model or the Qwen1.5-14B model have the same issue?

我做了实验，应该是我输入异常导致的问题，正常输入是ok的。
另外同样batch inference 的代码，Qwen1.5-14B是没有办法适用的，很多模型tokenizer的参数不一样。如果有适合Qwen1.5-14B的代码，可以共享下。

TianWuYuJiangHenShou · 2024-04-30T03:23:39Z

遇到了个异常情况，输入如下，还是会偶发异常，输出为空：

['你是一个医学专家，请对以下指南内容做归纳，根据以下要求总结出三个符合以下要求的问答对:\n    1、问题以Q:开头，答案以A:开头。\n    2、问答对之间用两个换行符间隔；问题和答案之间用一个换行间隔\n    3、答案需尽可能的详细。\n    4、问题需要跟氟马替尼强相关。\n    ```氟马替尼在血浆中主要以原形药物形式存在，此外存在的主要代谢物形式为 N-去甲基化代谢物 M1 和酰胺键水解代谢物 M3，代谢物 M1 的稳态血浆暴露量约为原形药物的 20%，代谢物 M3 的稳态血浆暴露量约为原形药物的 10%。氧化、水解后乙酰化和氧化、以及葡萄糖醛酸结合物等其它代谢物浓度均低于原形药物的 10%。单次给予甲磺酸氟马替尼片 400mg 和 600mg 剂量后，氟马替尼在慢性粒细胞白血病慢性期患者体内平均血浆消除半衰期（t1/2）为 16.01~17.21h；N-去甲基代谢物 M1 的平均血浆消除半衰期（t1/2）为 18.92~19.21h；水解代谢产物 M3 的平均血浆消除半衰期（t1/2）为7.63~8.66h。CYP3A4 是氟马替尼的主要代谢酶，同时本品对 CYP3A4 酶的抑制具有时间依赖性。```',
'你是一个医学专家，请对以下指南内容做归纳，根据以下要求总结出三个符合以下要求的问答对:\n    1、问题以Q:开头，答案以A:开头。\n    2、问答对之间用两个换行符间隔；问题和答案之间用一个换行间隔\n    3、答案需尽可能的详细。\n    4、问题需要跟氟马替尼强相关。\n    ```药理作用甲磺酸氟马替尼在分子水平对 Bcr-Abl 酪氨酸激酶磷酸化抑制作用的 IC50 为 11 nM；对P210 Bcr-Abl 表达阳性的白血病细胞（K562、KU812）增殖的抑制作用 IC50 为 6-8 nM，对P210 Bcr-Abl 表达阴性的肿瘤细胞（人早幼粒白血病细胞 HL-60、人组织细胞淋巴瘤细胞U-937）、表达生长因子受体（PDGFR）的人非小细胞肺癌 A549 细胞、人结肠癌 Ls174t 细胞、人表皮样癌 A431 细胞均未见明显抑制作用。体外药效试验结果显示，甲磺酸氟马替尼代谢产物 N-去甲基代谢物（M1）对 Bcr-Abl抑制作用的 IC50 为 16.6 nM，对 K562 细胞增殖抑制作用 IC50 为 3 nM (甲磺酸氟马替尼为 1nM)；甲磺酸氟马替尼酰胺键水解羧酸代谢物（M3）未见明显酪氨酸激酶抑制活性。``` ', 
'你是一个医学专家，请对以下指南内容做归纳，根据以下要求总结出三个符合以下要求的问答对:\n    1、问题以Q:开头，答案以A:开头。\n    2、问答对之间用两个换行符间隔；问题和答案之间用一个换行间隔\n    3、答案需尽可能的详细。\n    4、问题需要跟氟马替尼强相关。\n    ``` 液体潴留同类其他产品存在严重液体潴留的不良反应。本品的临床试验中，受试者发生的液体潴留主要表现为浅表水肿，如面部水肿、眼睑浮肿、眼睑水肿等，均为 1 级（见【不良反应】）。在服用本品过程中，建议监测体重，出现非预期的快速体重增加，需要警惕液体潴留的可能，建议及时就医明确诊断。液体潴留可以加重或导致心衰，目前尚无严重心衰患者（按纽约心脏学会分类法的Ⅲ~Ⅳ级）临床应用甲磺酸氟马替尼的经验。有心脏病、心力衰竭风险因素或肾衰竭病史的患者慎用本品。青光眼患者建议慎用。```']

jklj077 · 2024-04-30T06:20:59Z

For batch inference using Qwen1.5, see this comment: QwenLM/Qwen1.5#282 (comment).

TianWuYuJiangHenShou changed the title ~~[BUG] <Qwen-14B-Chat 输出长文本时无输出结果>~~ [BUG] <Qwen-14B-Chat 输入长文本时无输出结果> Apr 29, 2024

jklj077 closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] <Qwen-14B-Chat 输入长文本时无输出结果> #1232

[BUG] <Qwen-14B-Chat 输入长文本时无输出结果> #1232

TianWuYuJiangHenShou commented Apr 29, 2024 •

edited by jklj077

TianWuYuJiangHenShou commented Apr 29, 2024

jklj077 commented Apr 29, 2024 •

edited

TianWuYuJiangHenShou commented Apr 30, 2024 •

edited by jklj077

TianWuYuJiangHenShou commented Apr 30, 2024 •

edited by jklj077

jklj077 commented Apr 30, 2024

[BUG] <Qwen-14B-Chat 输入长文本时无输出结果> #1232

[BUG] <Qwen-14B-Chat 输入长文本时无输出结果> #1232

Comments

TianWuYuJiangHenShou commented Apr 29, 2024 • edited by jklj077

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?

TianWuYuJiangHenShou commented Apr 29, 2024

jklj077 commented Apr 29, 2024 • edited

TianWuYuJiangHenShou commented Apr 30, 2024 • edited by jklj077

TianWuYuJiangHenShou commented Apr 30, 2024 • edited by jklj077

jklj077 commented Apr 30, 2024

TianWuYuJiangHenShou commented Apr 29, 2024 •

edited by jklj077

jklj077 commented Apr 29, 2024 •

edited

TianWuYuJiangHenShou commented Apr 30, 2024 •

edited by jklj077

TianWuYuJiangHenShou commented Apr 30, 2024 •

edited by jklj077