Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] <Qwen-14B-Chat 输入长文本时无输出结果> #1232

Closed
2 tasks done
TianWuYuJiangHenShou opened this issue Apr 29, 2024 · 5 comments
Closed
2 tasks done

[BUG] <Qwen-14B-Chat 输入长文本时无输出结果> #1232

TianWuYuJiangHenShou opened this issue Apr 29, 2024 · 5 comments

Comments

@TianWuYuJiangHenShou
Copy link

TianWuYuJiangHenShou commented Apr 29, 2024

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

首先,我是用的transformer load的模型(按照readme,安装了flash-attention),受限于公司内部AI平台暂时无法升级显卡驱动(现在是cuda 11.7),只能用flask+transformers+Flash-Attention的方式部署服务。最近在用readme的Batch Inference进行批量处理,在A800上能达到80~90Token/s的推理速度。
问题:在Batch Inference时,有时输出结果为空(b''),是偶发事件。
参考了#50 ,但是14B的模型里面use_dynamic_ntk和use_logn_attn两个参数都为True,无法解决我的问题。

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

# 完全copy了readme的代码,只换了输入数据
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import GenerationConfig
from qwen_generation_utils import make_context, decode_tokens, get_stop_words_ids

# To generate attention masks automatically, it is necessary to assign distinct
# token_ids to pad_token and eos_token, and set pad_token_id in the generation_config.
tokenizer = AutoTokenizer.from_pretrained(
    './',
    pad_token='<|extra_0|>',
    eos_token='<|endoftext|>',
    padding_side='left',
    trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
    './',
    pad_token_id=tokenizer.pad_token_id,
    device_map="auto",
    trust_remote_code=True
).eval()
model.generation_config = GenerationConfig.from_pretrained('./', pad_token_id=tokenizer.pad_token_id)

all_raw_text = ["我想听你说爱我。", "今天我想吃点啥,甜甜的,推荐下", "我马上迟到了,怎么做才能不迟到"]
batch_raw_text = []
for q in all_raw_text:
    raw_text, _ = make_context(
        tokenizer,
        q,
        system="You are a helpful assistant.",
        max_window_size=model.generation_config.max_window_size,
        chat_format=model.generation_config.chat_format,
    )
    batch_raw_text.append(raw_text)

batch_input_ids = tokenizer(batch_raw_text, padding='longest')
batch_input_ids = torch.LongTensor(batch_input_ids['input_ids']).to(model.device)
batch_out_ids = model.generate(
    batch_input_ids,
    return_dict_in_generate=False,
    generation_config=model.generation_config
)
padding_lens = [batch_input_ids[i].eq(tokenizer.pad_token_id).sum().item() for i in range(batch_input_ids.size(0))]

batch_response = [
    decode_tokens(
        batch_out_ids[i][padding_lens[i]:],
        tokenizer,
        raw_text_len=len(batch_raw_text[i]),
        context_length=(batch_input_ids[i].size(0)-padding_lens[i]),
        chat_format="chatml",
        verbose=False,
        errors='replace'
    ) for i in range(len(all_raw_text))
]
print(batch_response)

response, _ = model.chat(tokenizer, "我想听你说爱我。", history=None)
print(response)

response, _ = model.chat(tokenizer, "今天我想吃点啥,甜甜的,推荐下", history=None)
print(response)

response, _ = model.chat(tokenizer, "我马上迟到了,怎么做才能不迟到", history=None)
print(response)

运行环境 | Environment

- OS:Ubuntu 20.04
- Python: 3.10.9
- Transformers:4.37.0
- PyTorch:2.0.1
- CUDA :11.7

备注 | Anything else?

No response

@TianWuYuJiangHenShou TianWuYuJiangHenShou changed the title [BUG] <Qwen-14B-Chat 输出长文本时无输出结果> [BUG] <Qwen-14B-Chat 输入长文本时无输出结果> Apr 29, 2024
@TianWuYuJiangHenShou
Copy link
Author

Batch Inference 异常时,输入的文本应该没有太长.同样的输入有时候model.generate(text)能出结果,绝大多数不能

@jklj077
Copy link
Contributor

jklj077 commented Apr 29, 2024

How long is the input? I wonder if you could share with us an example input.

P.S.: The 14B model have a comparatively shorter sequence length support. If it is okay, could you try if the 7B model or the Qwen1.5-14B model have the same issue?

@TianWuYuJiangHenShou
Copy link
Author

TianWuYuJiangHenShou commented Apr 30, 2024

How long is the input? I wonder if you could share with us an example input.

P.S.: The 14B model have a comparatively shorter sequence length support. If it is okay, could you try if the 7B model or the Qwen1.5-14B model have the same issue?

我做了实验,应该是我输入异常导致的问题,正常输入是ok的。
另外 同样batch inference 的代码,Qwen1.5-14B是没有办法适用的,很多模型tokenizer的参数不一样。如果有适合Qwen1.5-14B的代码,可以共享下。

@TianWuYuJiangHenShou
Copy link
Author

TianWuYuJiangHenShou commented Apr 30, 2024

遇到了个异常情况,输入如下,还是会偶发异常,输出为空:

['你是一个医学专家,请对以下指南内容做归纳,根据以下要求总结出三个符合以下要求的问答对:\n    1、问题以Q:开头,答案以A:开头。\n    2、问答对之间用两个换行符间隔;问题和答案之间用一个换行间隔\n    3、答案需尽可能的详细。\n    4、问题需要跟氟马替尼强相关。\n    ```氟马替尼在血浆中主要以原形药物形式存在,此外存在的主要代谢物形式为 N-去甲基化代谢物 M1 和酰胺键水解代谢物 M3,代谢物 M1 的稳态血浆暴露量约为原形药物的 20%,代谢物 M3 的稳态血浆暴露量约为原形药物的 10%。氧化、水解后乙酰化和氧化、以及葡萄糖醛酸结合物等其它代谢物浓度均低于原形药物的 10%。单次给予甲磺酸氟马替尼片 400mg 和 600mg 剂量后,氟马替尼在慢性粒细胞白血病慢性期患者体内平均血浆消除半衰期(t1/2)为 16.01~17.21h;N-去甲基代谢物 M1 的平均血浆消除半衰期(t1/2)为 18.92~19.21h;水解代谢产物 M3 的平均血浆消除半衰期(t1/2)为7.63~8.66h。CYP3A4 是氟马替尼的主要代谢酶,同时本品对 CYP3A4 酶的抑制具有时间依赖性。```',
'你是一个医学专家,请对以下指南内容做归纳,根据以下要求总结出三个符合以下要求的问答对:\n    1、问题以Q:开头,答案以A:开头。\n    2、问答对之间用两个换行符间隔;问题和答案之间用一个换行间隔\n    3、答案需尽可能的详细。\n    4、问题需要跟氟马替尼强相关。\n    ```药理作用甲磺酸氟马替尼在分子水平对 Bcr-Abl 酪氨酸激酶磷酸化抑制作用的 IC50 为 11 nM;对P210 Bcr-Abl 表达阳性的白血病细胞(K562、KU812)增殖的抑制作用 IC50 为 6-8 nM,对P210 Bcr-Abl 表达阴性的肿瘤细胞(人早幼粒白血病细胞 HL-60、人组织细胞淋巴瘤细胞U-937)、表达生长因子受体(PDGFR)的人非小细胞肺癌 A549 细胞、人结肠癌 Ls174t 细胞、人表皮样癌 A431 细胞均未见明显抑制作用。体外药效试验结果显示,甲磺酸氟马替尼代谢产物 N-去甲基代谢物(M1)对 Bcr-Abl抑制作用的 IC50 为 16.6 nM,对 K562 细胞增殖抑制作用 IC50 为 3 nM (甲磺酸氟马替尼为 1nM);甲磺酸氟马替尼酰胺键水解羧酸代谢物(M3)未见明显酪氨酸激酶抑制活性。``` ', 
'你是一个医学专家,请对以下指南内容做归纳,根据以下要求总结出三个符合以下要求的问答对:\n    1、问题以Q:开头,答案以A:开头。\n    2、问答对之间用两个换行符间隔;问题和答案之间用一个换行间隔\n    3、答案需尽可能的详细。\n    4、问题需要跟氟马替尼强相关。\n    ``` 液体潴留同类其他产品存在严重液体潴留的不良反应。本品的临床试验中,受试者发生的液体潴留主要表现为浅表水肿,如面部水肿、眼睑浮肿、眼睑水肿等,均为 1 级(见【不良反应】)。在服用本品过程中,建议监测体重,出现非预期的快速体重增加,需要警惕液体潴留的可能,建议及时就医明确诊断。液体潴留可以加重或导致心衰,目前尚无严重心衰患者(按纽约心脏学会分类法的Ⅲ~Ⅳ级)临床应用甲磺酸氟马替尼的经验。有心脏病、心力衰竭风险因素或肾衰竭病史的患者慎用本品。青光眼患者建议慎用。```']

@jklj077
Copy link
Contributor

jklj077 commented Apr 30, 2024

For batch inference using Qwen1.5, see this comment: QwenLM/Qwen1.5#282 (comment).

@jklj077 jklj077 closed this as completed May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants