Model performance degrades when moved to Multi-GPU #29

ereday · 2019-11-08T13:10:21Z

Hi,

When I run your code on multi-gpu, performance degrades severely (compared to the single-gpu version). To make the code multi-gpu competable, I've only added 2 lines of code:

model = nn.torch.DataParallel(model) between your model = model_class.from_pretrained(args['model_name']) and model.to(device) calls
loss = loss.mean() after the loss = outputs[0] line in the train function. Do you have any idea how can I get the same (or similar) performance on Multi-GPU setting?

These are the results I got with these two settings:

With Multi-GPU training:
evaluate_loss: = 0.3928874781464829
fn = 116
fp = 81
mcc = 0.5114751200090137
tn = 1291
tp = 136
With Single-GPU Training:
evaluate_loss: = 0.39542119007776766
fn = 82
fp = 126
mcc = 0.5465463104769824
tn = 1246
tp = 170

Although avg loss values are similar, there are big differences in other metrics.

The text was updated successfully, but these errors were encountered:

ThilinaRajapakse · 2019-11-08T13:23:29Z

Those changes should be sufficient to enable multi-gpu training in my experience. Is there any other difference (e.g. batch size) between the two runs?

ereday · 2019-11-08T13:24:10Z

Nope, I did not change any of the variables in args dictionary.

ThilinaRajapakse · 2019-11-08T13:29:11Z

This is probably a silly question, but did you try this multiple times and receive the same results?

ereday · 2019-11-08T13:31:16Z

Yes, I run the code with the same configuration multiples times. There is no difference across different runs.

ThilinaRajapakse · 2019-11-09T10:36:58Z

Sorry, I am not sure why this is happening. I recommend that you try the Simple Transformers library as it supports multi-gpu training by default and I have used multi-gpu training with that library without any performance degradation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model performance degrades when moved to Multi-GPU #29

Model performance degrades when moved to Multi-GPU #29

ereday commented Nov 8, 2019

ThilinaRajapakse commented Nov 8, 2019

ereday commented Nov 8, 2019

ThilinaRajapakse commented Nov 8, 2019

ereday commented Nov 8, 2019

ThilinaRajapakse commented Nov 9, 2019

Model performance degrades when moved to Multi-GPU #29

Model performance degrades when moved to Multi-GPU #29

Comments

ereday commented Nov 8, 2019

ThilinaRajapakse commented Nov 8, 2019

ereday commented Nov 8, 2019

ThilinaRajapakse commented Nov 8, 2019

ereday commented Nov 8, 2019

ThilinaRajapakse commented Nov 9, 2019