Rework forward pass to remove old gradients #46

Arkay92 · 2022-12-27T18:55:40Z

Using the torch.cuda.device_of() function to determine if the input tensors are on the GPU or CPU, and then choosing the appropriate layer implementations for better performance. Uses the torch.no_grad() context manager to prevent the model from tracking gradients in the forward pass.

Arkay92 · 2022-12-27T18:58:41Z

This may be linked to issue #27

dancergraham · 2022-12-29T09:39:52Z

This is awesome - without this change I cannot run any of the examples on my Geforce GTX 1650 with 4Gb of dedicated GPU memory. With this change I can run the 40M-textvec model. This takes sampling time from nearly one hour (cpu) to a couple of minutes (Gpu) on my laptop. Thank you so much ! I hope it is accepted in to the repo.

dancergraham · 2022-12-29T10:10:50Z

This also relates to issue #36

dancergraham · 2022-12-29T11:05:22Z

Hello,
Using the pointcloud2mesh.ipynb notebook I get an error:

AttributeError: module 'torch.nn' has no attribute 'CUDALayerNorm'

I am using pytorch version '1.13.1+cu117'

Arkay92 · 2022-12-29T11:53:39Z

Good spot yet again @dancergraham have switched over to nvidia apex for fusedlayernorm on GPU can you try this now ?

Arkay92 · 2022-12-29T14:56:38Z

NB this requires the external lib for apex to work should speed up rendering once it fires up. Any issues let me know and I'll rework @dancergraham (have added to setup.py install_requires)

dancergraham · 2022-12-29T21:40:43Z

I was not able to install apex with pip on my Windows machine - I got a lot of errors about "filename too long"

I tried python -m pip install "apex @ git+https://github.com/NVIDIA/apex.git"

Arkay92 · 2022-12-29T22:52:47Z

Looking into other alternatives to layernorm and it seems instance or group normalisation may help speed things up here ! Will ping another refactored PR soon

dancergraham · 2022-12-30T14:02:57Z

hmm this looks rather complex - I will try it out on my machine but if I was running the point-e repo I don't think I would want to adopt a complicated dependency, especially one marked as "experimental" on Windows...

It might be good to add it as an optional dependency in the same way that the code currently works with or without cuda; That adds complexity to the library so it is the maintainers' call whether or not to accept that approach.

Arkay92 · 2022-12-30T14:18:46Z

Shall remove the apex lib but keep the forward pass change this should still preserve performance without the lib dependency

Arkay92 · 2022-12-30T14:31:29Z

@dancergraham try this now, textvec rendering should still be significantly faster whilst I find a native way of speeding up layer norm on gpu / cuda

dancergraham · 2023-01-01T20:48:21Z

I now get an error TypeError: ResidualCrossAttentionBlock.forward() missing 1 required positional argument: 'device' when I try to run pointcloud2mesh

perceiver.py:154, in SimplePerceiver.forward(self, x, data)
    152 with torch.no_grad():
    153     for block in self.resblocks:
--> 154         x = block(x, data)
    155 return x

Arkay92 · 2023-01-02T10:42:23Z

My bad @dancergraham forgot I added as a param, changed back so .to() uses torch.deice directly rather than by reference from param list

dancergraham · 2023-01-02T20:10:47Z

still not working for me - I get errors with pointcloud2mesh:

File ...\point_e\models\perceiver.py:154, in SimplePerceiver.forward(self, x, data)
    152 with torch.no_grad():
    153     for block in self.resblocks:
--> 154         x = block(x, data)
    155 return x

File ...\lib\site-packages\torch\nn\modules\module.py:1194, in Module._call_impl(self, *input, **kwargs)
   1190 # If we don't have any hooks, we want to skip the rest of the logic in
   1191 # this function, and just call forward.
   1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
   1195 # Do not call functions when jit is used
   1196 full_backward_hooks, non_full_backward_hooks = [], []

File ...\point-e\point_e\models\perceiver.py:106, in ResidualCrossAttentionBlock.forward(self, x, data)
    103 def forward(self, x: torch.Tensor, data: torch.Tensor):
    104     with torch.no_grad():
    105         # Use the to() method to move the input tensors to the specified device
--> 106         x = x.to(torch.device)
    107         data = data.to(torch.device)
    109         # Normalize input tensors and pass them through the attention and MLP layers

TypeError: to() received an invalid combination of arguments - got (type), but expected one of:
 * (torch.device device, torch.dtype dtype, bool non_blocking, bool copy, *, torch.memory_format memory_format)
 * (torch.dtype dtype, bool non_blocking, bool copy, *, torch.memory_format memory_format)
 * (Tensor tensor, bool non_blocking, bool copy, *, torch.memory_format memory_format)

dancergraham · 2023-01-03T06:46:13Z

We have liftoff 🚀 I can now run at grid_size=128 in 45 seconds per model on my GPU - many thanks again !

Arkay92 · 2023-01-03T11:23:38Z

Thankyou so much for the testing support @dancergraham ! LFG !

Rework perceiver tensor logic

d40d34d

dancergraham mentioned this pull request Dec 29, 2022

What's the minimum requirement for the GPU to run this correctly? #36

Open

Arkay92 added 2 commits December 29, 2022 11:31

Add apex lib to setup

4830080

Switch to apex fused layer norm

400535e

Tidy up repeating code and apex import

032673c

Add apex gitlib to install requires

80a2920

Add np_grad to every forward pass in percevier

16b3492

Arkay92 added 2 commits December 30, 2022 13:21

Change to fusedlayernorm lib, fix forward pass

bfa5020

Add back cuda check

cbbf405

Arkay92 added 2 commits December 30, 2022 14:29

Revert to std layernorm

816f343

Remove apex from setup and import

9aff88e

Arkay92 added 2 commits January 2, 2023 10:39

Remove device to pass until figured out the logic

fadbf06

Add back .to() with torch.device directly

51a3f95

remove .to()

1a7a7ea

Arkay92 changed the title ~~Rework perceiver tensor logic~~ Rework forward pass to remove old gradients Jan 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework forward pass to remove old gradients #46

Rework forward pass to remove old gradients #46

Arkay92 commented Dec 27, 2022

Arkay92 commented Dec 27, 2022

dancergraham commented Dec 29, 2022

dancergraham commented Dec 29, 2022

dancergraham commented Dec 29, 2022 •

edited

Arkay92 commented Dec 29, 2022

Arkay92 commented Dec 29, 2022 •

edited

dancergraham commented Dec 29, 2022

Arkay92 commented Dec 29, 2022

dancergraham commented Dec 30, 2022

Arkay92 commented Dec 30, 2022 •

edited

Arkay92 commented Dec 30, 2022

dancergraham commented Jan 1, 2023

Arkay92 commented Jan 2, 2023

dancergraham commented Jan 2, 2023

dancergraham commented Jan 3, 2023

Arkay92 commented Jan 3, 2023

Rework forward pass to remove old gradients #46

Are you sure you want to change the base?

Rework forward pass to remove old gradients #46

Conversation

Arkay92 commented Dec 27, 2022

Arkay92 commented Dec 27, 2022

dancergraham commented Dec 29, 2022

dancergraham commented Dec 29, 2022

dancergraham commented Dec 29, 2022 • edited

Arkay92 commented Dec 29, 2022

Arkay92 commented Dec 29, 2022 • edited

dancergraham commented Dec 29, 2022

Arkay92 commented Dec 29, 2022

dancergraham commented Dec 30, 2022

Arkay92 commented Dec 30, 2022 • edited

Arkay92 commented Dec 30, 2022

dancergraham commented Jan 1, 2023

Arkay92 commented Jan 2, 2023

dancergraham commented Jan 2, 2023

dancergraham commented Jan 3, 2023

Arkay92 commented Jan 3, 2023

dancergraham commented Dec 29, 2022 •

edited

Arkay92 commented Dec 29, 2022 •

edited

Arkay92 commented Dec 30, 2022 •

edited