Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama_7b model OOM issue #2051

Open
jinsong-mao opened this issue Nov 21, 2023 · 2 comments
Open

llama_7b model OOM issue #2051

jinsong-mao opened this issue Nov 21, 2023 · 2 comments

Comments

@jinsong-mao
Copy link

jinsong-mao commented Nov 21, 2023

Hi

I duplicate the llama model and rename it into llama_7b, changed the model parameters according to llama_7b specification, looks like this:
image

skiped the CPU eager mode, only run the cuda model.

it reports the following issue when running with this command:
python userbenchmark/dynamo/dynamobench/torchbench.py -dcuda --float16 -n1 --inductor --performance --inference --filter "llama" --batch_size 1 --in_slen 32 --out_slen 3 --output-dir=torchbench_llama_test_logs
image

If I want to run this model, how should I fix it? my hardware is A100-40G

thanks

@xuzhao9
Copy link
Contributor

xuzhao9 commented Nov 21, 2023

We only guarantee the runability of models on PT eager mode on A100 40GB in our CI. It is possible that inductor uses more GPU memory than eager mode, causing OOM. Optimizing GPU memory usage with inductor is an open question.
cc @msaroufim

@jinsong-mao
Copy link
Author

@xuzhao9 I tried to use 4xA100-40G to avoid the OOM issue, looks torchbench.py only use one GPU's memory, I used options like --device-index or --multiprocess, both failed. do you have any advice on multi GPU support?

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants