New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4张3080ti跑chatglm2-6b-lora报oom #151

Open

imjking opened this issue Aug 18, 2023 · 5 comments

Labels

imjking commented Aug 18, 2023

你好，我用4张12g 3080ti想跑chatglm2微调，但是报显存不足; 使用你列的只需14G显存的参数后也不行；使用int8加载模型后在训练时报了oom。上面都是在模型并行参数开启下跑的。

在关闭模型并行参数时，也会在训练时报oom, 这时候只会用到一张卡。

请问这种情况正常吗，我该如何解决呢？

Owner

yuanzhoulvpi2017 commented Aug 18, 2023

通常来说，使用lora训练chatglm2，确实只需要14g左右（在batchsize=1，文本长度也不太长的情况下（比如长度为512））。

但是也有例外，会导致你的显存不够，建议你检查一下下面内容：

你的数据batchsize多大，设置为1，如果可以跑通，然后不断的向上加。
文本长度多长？可以设置为512，1024，等，不断往上加。
检查代码，是不是因为代码的运行方式不对？是完全和我的代码保持一致么？（因为我的这个是模型并行，非常节约显存了）。
检查transformers和peft包的版本，更新到最新的版本试一试？
gradient_checkpoint 打开试一试，也可以节约不少显存。
🚨 还有一种可能：因为我的代码是模型并行，在计算loss的时候，比较占用最后一张显卡的显存。不过这个你可能就比较难解决了，只能通过换用更大显存的显卡来解决。
还有很多低级错误，如：是不是别的人也在占用了一个显卡，确保每张卡的显存都是为10MB以下哦，这个情况通常发生在实验室里面。
一般int8之后，结合lora，单张3080ti都可以跑起来，但是有坑，不建议你使用chatglm自己的量化代码，而是使用transformers的量化方式。

基本上就这些，希望可以帮到你

yuanzhoulvpi2017 added the chatglm2 label

Author

imjking commented Aug 18, 2023

好的，我试试。谢谢

Author

imjking commented Aug 21, 2023

已解决，transformers更新到最新版

fengzehui0422 commented Nov 15, 2023

请问chatglm2-6b-lora训练可以设置多轮的epoch吗？我没找到在哪设置的

mimosa1987 commented Mar 4, 2024

已解决，transformers更新到最新版

请问一下你用的transformers的版本是多少？

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment