We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DDP为了实现多级多卡的并行,但是作者代码里似乎将模型的duplicate。数据在多GPU上的scatter,正向传播和反向传播的数据reduce,gather操作都手工实现了,那么将骨干网络包裹在DDP的作用是什么,并且使用DDP包裹主干网络时将device_ids设置的时单卡,这似乎是包裹主干网络的DDP无法发挥作用。 代码:
backbone = torch.nn.parallel.DistributedDataParallel( module=backbone, broadcast_buffers=False, device_ids=[local_rank], bucket_cap_mb=16, find_unused_parameters=True)
我将这几行注释掉似乎也不影响代码的运行,能解释一下任用ddp的作用吗,以及会不会对模型训练的速度有一定的影响
The text was updated successfully, but these errors were encountered:
No branches or pull requests
DDP为了实现多级多卡的并行,但是作者代码里似乎将模型的duplicate。数据在多GPU上的scatter,正向传播和反向传播的数据reduce,gather操作都手工实现了,那么将骨干网络包裹在DDP的作用是什么,并且使用DDP包裹主干网络时将device_ids设置的时单卡,这似乎是包裹主干网络的DDP无法发挥作用。
代码:
我将这几行注释掉似乎也不影响代码的运行,能解释一下任用ddp的作用吗,以及会不会对模型训练的速度有一定的影响
The text was updated successfully, but these errors were encountered: