Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问可以实现用qlora+model parallel 吗 #349

Open
zin-Fu opened this issue Apr 9, 2024 · 1 comment
Open

请问可以实现用qlora+model parallel 吗 #349

zin-Fu opened this issue Apr 9, 2024 · 1 comment

Comments

@zin-Fu
Copy link

zin-Fu commented Apr 9, 2024

          因为`bitsandbytes`实现模型量化的时候是通过重载`.cuda()`函数实现的,也就是说模型在放到显卡的时候会发生量化(改变tensor维度)。在微调的时候,加载的预训练权重是fp16的,所以需要设置`args.device='cpu'`,把权重加载进来再调用`.cuda()`。因为这个是`bitsandbytes`的实现,我们也没办法控制,只能适配。

所以维度不一致是显卡配置的问题,.cuda()调用失败了。

Originally posted by @1049451037 in #125 (comment)

@zin-Fu zin-Fu changed the title 因为bitsandbytes实现模型量化的时候是通过重载.cuda()函数实现的,也就是说模型在放到显卡的时候会发生量化(改变tensor维度)。在微调的时候,加载的预训练权重是fp16的,所以需要设置args.device='cpu',把权重加载进来再调用.cuda()。因为这个是bitsandbytes的实现,我们也没办法控制,只能适配。 请问可以实现用qlora+model parallel 吗 Apr 9, 2024
@zin-Fu
Copy link
Author

zin-Fu commented Apr 9, 2024

显卡资源有限 (4070*2), 用lora+model parallel 还是会报错OOM(并且根据这个issue作者有提到这个#209 (comment))

我根据#209 这个issue修改了finetune_qlora.sh和finetune_visualglm.py这两个文件

但是如果用qlora的话如果要先在cpu上加载模型,那么
model, args = FineTuneVisualGLMModel.from_pretrained(model_type, args, overwrite_args={'model_parallel_size':2})
这个命令就无法执行了(我只有一个cpu)

那这样的话请问怎么实现用qlora+model parallel呢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant