-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请问可以实现用qlora+model parallel 吗 #349
Comments
zin-Fu
changed the title
因为
请问可以实现用qlora+model parallel 吗
Apr 9, 2024
bitsandbytes
实现模型量化的时候是通过重载.cuda()
函数实现的,也就是说模型在放到显卡的时候会发生量化(改变tensor维度)。在微调的时候,加载的预训练权重是fp16的,所以需要设置args.device='cpu'
,把权重加载进来再调用.cuda()
。因为这个是bitsandbytes
的实现,我们也没办法控制,只能适配。
显卡资源有限 (4070*2), 用lora+model parallel 还是会报错OOM(并且根据这个issue作者有提到这个#209 (comment)) 我根据#209 这个issue修改了finetune_qlora.sh和finetune_visualglm.py这两个文件 但是如果用qlora的话如果要先在cpu上加载模型,那么 那这样的话请问怎么实现用qlora+model parallel呢 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
所以维度不一致是显卡配置的问题,
.cuda()
调用失败了。Originally posted by @1049451037 in #125 (comment)
The text was updated successfully, but these errors were encountered: