Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[arcface_torch]关于多卡并行的问题 #2559

Open
supertx opened this issue Apr 12, 2024 · 0 comments
Open

[arcface_torch]关于多卡并行的问题 #2559

supertx opened this issue Apr 12, 2024 · 0 comments

Comments

@supertx
Copy link

supertx commented Apr 12, 2024

DDP为了实现多级多卡的并行,但是作者代码里似乎将模型的duplicate。数据在多GPU上的scatter,正向传播和反向传播的数据reduce,gather操作都手工实现了,那么将骨干网络包裹在DDP的作用是什么,并且使用DDP包裹主干网络时将device_ids设置的时单卡,这似乎是包裹主干网络的DDP无法发挥作用。
代码:

backbone = torch.nn.parallel.DistributedDataParallel(
        module=backbone, broadcast_buffers=False, device_ids=[local_rank], bucket_cap_mb=16,
        find_unused_parameters=True)

我将这几行注释掉似乎也不影响代码的运行,能解释一下任用ddp的作用吗,以及会不会对模型训练的速度有一定的影响

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant