Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable selection of GPU on inference page #1511

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

rodrigoberriel
Copy link
Contributor

This PR attempts to fix #1418, see #1418 (comment). I tried to minimize changes to avoid side-effects.

If the user doesn't have more than one GPU (or only one CUDA_VISIBLE_DEVICES for DIGITS), nothing is going to change:

image

But if the user has multiple GPUs (2 or more), it is going to look like this:

image

where Next Available is the default choice and has the same behavior as the current implementation. If the user selects a GPU (only one can be selected), then this GPU will be used.

PS: I modified images/generic/show.html the same way I did for images/classification/show.html, but I didn't test the former as much as I did on the latter.

else get_device(index).totalGlobalMem)
),
) for index in config_value('gpu_list').split(',') if index],
default='next',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens here if there are no GPUs on the system?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukeyeager the same behavior that you'd get if you were to start a training job without a GPU:

  1. If all setup was made with CPU only, you'd get the same page you currently get (without the multi-gpu form)
  2. If you manage to have Caffe compiled with CUDA and has no GPU on the system (suppose you removed it afterwards -- I tested it masking all my GPUs using an invalid id on CUDA_VISIBLE_DEVICES), then you'd get the page without multi-gpu form and your job will fail exactly as the train job fails currently:
Check failed: error == cudaSuccess (38 vs. 0)  no CUDA-capable device is detected

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me!

Copy link
Contributor

@gheinrich gheinrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for late response and thanks for another great PR! Do you think you could add some unit tests to verify the new functionality?

return flask.render_template(
'models/images/generic/show.html',
form=generic_form,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit uneasy about passing the form here when all you need is the list of GPUs. Wouldn't it more explicit and self-explanatory to pass the list instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Do you think this way is okay?

{% if show_multi_gpu_form %}
<div class="col-sm-6">
<div class="form-group{{mark_errors([form.select_one_of_gpus])}}">
{{form.select_one_of_gpus.label}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's indeed better to use the WTForm fields here but it's inconsistent with the rest of the file... so I'm not sure about that...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed that too. What you think?

@rodrigoberriel
Copy link
Contributor Author

@gheinrich I'm on a deadline (March 27), but I could work on the unit tests after that. I have to say that I am not familiar with unit tests using flask, but I can give it a try.

@RSly
Copy link

RSly commented Jul 27, 2017

Hi, any update on the merge for this PR?
tnx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Define which GPU for inference?
4 participants