Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use multiple devices #186

Open
eimann opened this issue Nov 15, 2018 · 2 comments
Open

Use multiple devices #186

eimann opened this issue Nov 15, 2018 · 2 comments

Comments

@eimann
Copy link

eimann commented Nov 15, 2018

I'm trying to run HyperGAN on multiple GPUs and tried the following:

-d '/gpu:0 /gpu:1 /gpu:2 /gpu:3 /gpu:4 /gpu:5 /gpu:6 /gpu:7 /gpu:8 /gpu:9 /gpu:10 /gpu:11 /gpu:12 /gpu:13 /gpu:14 /gpu:15'

-d '/gpu:0,/gpu:1,/gpu:2,/gpu:3,/gpu:4,/gpu:5,/gpu:6,/gpu:7,/gpu:8,/gpu:9,/gpu:10,/gpu:11,/gpu:12,/gpu:13,/gpu:14,/gpu:15'

-d '/gpu:0' -d '/gpu:1' -d '/gpu:2' -d '/gpu:3' -d '/gpu:4' -d '/gpu:5' -d '/gpu:6' -d '/gpu:7' -d '/gpu:8' -d '/gpu:9' -d '/gpu:10' -d '/gpu:11' -d '/gpu:12' -d '/gpu:13' -d '/gpu:14' -d '/gpu:15'

None of these parameters uses more than one GPU according to nvidia-smi:

$ nvidia-smi 
Thu Nov 15 15:06:04 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44                 Driver Version: 396.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:0F.0 Off |                    0 |
| N/A   72C    P0   140W / 149W |   8418MiB / 11441MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           On   | 00000000:00:10.0 Off |                    0 |
| N/A   44C    P0    70W / 149W |    127MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           On   | 00000000:00:11.0 Off |                    0 |
| N/A   62C    P0    61W / 149W |    127MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           On   | 00000000:00:12.0 Off |                    0 |
| N/A   51C    P0    72W / 149W |    127MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           On   | 00000000:00:13.0 Off |                    0 |
| N/A   50C    P0    58W / 149W |    127MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           On   | 00000000:00:14.0 Off |                    0 |
| N/A   41C    P0    67W / 149W |    127MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           On   | 00000000:00:15.0 Off |                    0 |
| N/A   55C    P0    57W / 149W |    127MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           On   | 00000000:00:16.0 Off |                    0 |
| N/A   46C    P0    70W / 149W |    127MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   8  Tesla K80           On   | 00000000:00:17.0 Off |                    0 |
| N/A   56C    P0    60W / 149W |    127MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   9  Tesla K80           On   | 00000000:00:18.0 Off |                    0 |
| N/A   46C    P0    69W / 149W |    125MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  10  Tesla K80           On   | 00000000:00:19.0 Off |                    0 |
| N/A   60C    P0    58W / 149W |    125MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  11  Tesla K80           On   | 00000000:00:1A.0 Off |                    0 |
| N/A   48C    P0    72W / 149W |    125MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  12  Tesla K80           On   | 00000000:00:1B.0 Off |                    0 |
| N/A   61C    P0    59W / 149W |    125MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  13  Tesla K80           On   | 00000000:00:1C.0 Off |                    0 |
| N/A   47C    P0    70W / 149W |    125MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  14  Tesla K80           On   | 00000000:00:1D.0 Off |                    0 |
| N/A   60C    P0    59W / 149W |    125MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  15  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   48C    P0    71W / 149W |    125MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
@eimann
Copy link
Author

eimann commented Nov 15, 2018

Okay apparently this is not implemented if I look at the way tf.device is used in HyperGAN:
https://github.com/HyperGAN/HyperGAN/search?q=tf.device&unscoped_q=tf.device

Where as how it should be used according to the TensorFlow documentation:
https://www.tensorflow.org/guide/using_gpu#using_multiple_gpus

@martyn
Copy link
Contributor

martyn commented Jul 30, 2020

Status update: Now that training is stable in the pytorch branch this is on the top of our list to do. We might release 1.0 first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants