-
Notifications
You must be signed in to change notification settings - Fork 56
Training with GPU #119
Replies: 4 comments · 22 replies
-
@chiku-parida At the first, you should update the code, and then set the num_workers=0, this is very important. Also note that do not use GPUs more than one card. I think the reason for this issue is that there are some numpy arrays and tensor operations in the code. When using the default device as GPU, the numpy array will be in the CPU, while the tensor will be in the GPU, which will cause this error. If you want to use multiple cards, you need to convert all numpy arrays into tensors to operate and ensure that they are assigned to the correct GPU. If this doesn't work, perhaps you need to set generator="cuda" in MGLDataLoader. I have already modified the code, so I forgot how to modify the parameters specifically. I hope it will be helpful to you |
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks @SmallBearC . I followed all your instructions still the error persists. I am adding the whole error output below. Please look into it.
|
Beta Was this translation helpful? Give feedback.
All reactions
-
I think it might be more helpful if you paste your script here |
Beta Was this translation helpful? Give feedback.
All reactions
-
Please read my response below. |
Beta Was this translation helpful? Give feedback.
All reactions
-
I am sorry for the previous wrong file. Please look at the attached file. I have also considered cuda as default device @shyuep |
Beta Was this translation helpful? Give feedback.
All reactions
-
The settings you set here are incorrect: |
Beta Was this translation helpful? Give feedback.
All reactions
-
Pls refer to pytorch documentation on setting the default device. See https://pytorch.org/docs/stable/notes/cuda.html In general, if you wrap your entire code with |
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
Hi! which python version you recommend to work with MatGL and cuda? |
Beta Was this translation helpful? Give feedback.
All reactions
-
I don't think that matters. |
Beta Was this translation helpful? Give feedback.
All reactions
-
I tried with 3.9 and the same DGLError, should I try with clonning the repo then pip install -e . ? |
Beta Was this translation helpful? Give feedback.
All reactions
-
I tried with 'pip install' and 'python setup install' for the GPU support and still the same DGLError |
Beta Was this translation helpful? Give feedback.
All reactions
-
We do not have a docker image. Also, a lot depends on the specific GPU and OS you are using. But we have yet to encounter issues with pip install or conda install. |
Beta Was this translation helpful? Give feedback.
All reactions
-
sounds great, so is there recommendations about the environment how should it be? |
Beta Was this translation helpful? Give feedback.
All reactions
-
Training a MEGNet Formation Energy Model with PyTorch Lightning maybe stupid question! |
Beta Was this translation helpful? Give feedback.
All reactions
-
You need to save the model. Just use model.save(). The files you see during training are intermediate cache files containing the generated graphs and attributes and labels. They are not used when actually running the model for predictions. |
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
@shyuep much appreciated your reply, I want also to thank you for uploading your talks over youtube, I learned allot! |
Beta Was this translation helpful? Give feedback.
-
I have tried initializing my GPU in the beginning like below.
'''
if torch.cuda.is_available():
device = 'cuda'
else:
device = 'cpu'
print(f'The available device is {device}')
'''
The model is detecting the GPU correctly still I don't understand which tensors should be assigned ti GPU. I am getting the below error. Please Help!
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Beta Was this translation helpful? Give feedback.
All reactions