You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, is there any way to use two DDP-based experiments on the same server?
For example,
I have 4-GPUS, and first run the run_downstream.py task with DDP with 2-GPUS like: CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node 2 run_downstream.py ~ ~
Thus the remained GPUs are 2.
So I want to use these 2-GPUS for another experiment with DDP. But I got the error as follows:
Maybe this is due to the same port problem.
But there is no way to change the port in run_downstream.py.
How can I solve it?
Best,
The text was updated successfully, but these errors were encountered:
Hi, you can specified the argument --master-port after --nproc_per_node. Since you already launch a DDP training, the default port 29500 is utilized by it and thus the error occurs. You can specified --master-port 29501 (or other port that are not being used) to solve this problem.
Hello, is there any way to use two DDP-based experiments on the same server?
For example,
I have 4-GPUS, and first run the
run_downstream.py
task with DDP with 2-GPUS like:CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node 2 run_downstream.py ~ ~
Thus the remained GPUs are 2.
So I want to use these 2-GPUS for another experiment with DDP. But I got the error as follows:
Maybe this is due to the same port problem.
But there is no way to change the port in
run_downstream.py
.How can I solve it?
Best,
The text was updated successfully, but these errors were encountered: