Skip to content

Commit

Permalink
Resolved torchrun Bug: Fixed issue #2163
Browse files Browse the repository at this point in the history
Updated torch.distributed.launch to torchrun.
  • Loading branch information
anxiangsir committed Feb 8, 2023
1 parent bf32ec2 commit e0fdff8
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 8 deletions.
4 changes: 2 additions & 2 deletions recognition/arcface_torch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ The "arcface_torch" repository is the official implementation of the ArcFace alg

## Requirements

To avail the latest features of PyTorch, we have upgraded to version 1.9.0.
To avail the latest features of PyTorch, we have upgraded to version 1.12.0.

- Install [PyTorch](http://pytorch.org) (torch>=1.9.0), our doc for [install.md](docs/install.md).
- Install [PyTorch](https://pytorch.org/get-started/previous-versions/) (torch>=1.12.0).
- (Optional) Install [DALI](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/), our doc for [install_dali.md](docs/install_dali.md).
- `pip install -r requirement.txt`.

Expand Down
4 changes: 2 additions & 2 deletions recognition/arcface_torch/dist.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ config=wf42m_pfc03_32gpu_r100

for((node_rank=0;node_rank<${#ip_list[*]};node_rank++));
do
ssh face@${ip_list[node_rank]} "cd `pwd`;PATH=$PATH \
ssh ubuntu@${ip_list[node_rank]} "cd `pwd`;PATH=$PATH \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -m torch.distributed.launch \
torchrun \
--nproc_per_node=8 \
--nnodes=${#ip_list[*]} \
--node_rank=$node_rank \
Expand Down
4 changes: 2 additions & 2 deletions recognition/arcface_torch/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@
from utils.utils_distributed_sampler import setup_seed
from utils.utils_logging import AverageMeter, init_logging

assert torch.__version__ >= "1.9.0", "In order to enjoy the features of the new torch, \
we have upgraded the torch to 1.9.0. torch before than 1.9.0 may not work in the future."
assert torch.__version__ >= "1.12.0", "In order to enjoy the features of the new torch, \
we have upgraded the torch to 1.12.0. torch before than 1.12.0 may not work in the future."

try:
rank = int(os.environ["RANK"])
Expand Down
4 changes: 2 additions & 2 deletions recognition/arcface_torch/train_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@
from utils.utils_distributed_sampler import setup_seed
from utils.utils_logging import AverageMeter, init_logging

assert torch.__version__ >= "1.9.0", "In order to enjoy the features of the new torch, \
we have upgraded the torch to 1.9.0. torch before than 1.9.0 may not work in the future."
assert torch.__version__ >= "1.12.0", "In order to enjoy the features of the new torch, \
we have upgraded the torch to 1.12.0. torch before than 1.12.0 may not work in the future."

try:
rank = int(os.environ["RANK"])
Expand Down

0 comments on commit e0fdff8

Please sign in to comment.