New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference bug of the MoE GPTQ models #30515
Comments
Hi @bozheng-hit, thanks for reporting ! I can indeed reproduce the error and it also happens to the mixtral models. I'm not sure what would be the best fix for now since adding back the |
@SunMarc Would having a conditional check e.g. |
Thanks for the tip @amyeroberts but it doesn't work ;) . However, I tested the exllamav2 kernel and it works with it. The exllamav1 kernel must have some issues. A potential fix would be to change the quantization_config inside the config.json, so that users uses exllamav2 kernel by default. WDYT @bozheng-hit ? You would have to set version to 2 in exllama_config field here. |
System Info
Generating with GPTQ models encounters the following errors after merging this PR: #30209 @younesbelkada @SunMarc
The error information is here, and the model successfully generates after I revert the change for modeling_qwen2_moe.py.
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
The code to reproduce to error is here:
The error information is here:
Expected behavior
Output the following text:
A large language model is a type of artificial intelligence that is trained to understand and generate human language. These models are designed to process and comprehend natural language input, and can be used for a variety of tasks such as language translation, sentiment analysis, and chatbot development. They are typically very large neural networks that have been pre-trained on vast amounts of text data, allowing them to learn the nuances of language and make intelligent predictions about how to respond to different inputs. Large language models have become increasingly popular in recent years due to their ability to handle complex language tasks and their potential applications in fields such as customer service, content creation, and education.
The text was updated successfully, but these errors were encountered: