[Bug]: `sparse` element-wise multiplication returns wrong `indptr` on CUDA #1273

ClaudiaComito · 2023-11-23T08:30:38Z

What happened?

While running our unit tests with PyTorch 2.1, the sparse module tests failed on GPU (see error message below). Tests passed on CPU.

Failure is on any number of processes, single-GPU as well as multi-GPU.

Tested with CUDA only, not yet with ROCm.

Tagging @Mystic-Slice in case he wants to explore.

Python was actually 3.11, PyTorch 2.1.0. Will update the issue template.

Code snippet triggering the error

heat.sparse.tests.test_arithmetics.TestArithmetics.test_mul

Error message or erroneous outcome

======================================================================
FAIL: test_mul (heat.sparse.tests.test_arithmetics.TestArithmetics.test_mul)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/p/scratch/haf/comito1/devel/heat/heat/sparse/tests/test_arithmetics.py", line 750, in test_mul
    self.assertTrue(
AssertionError: tensor(False, device='cuda:0') is not true

Version

1.3.x

Python version

None

PyTorch version

None

MPI version

No response

The text was updated successfully, but these errors were encountered:

Mystic-Slice · 2023-11-24T17:06:48Z

I tried reproducing it on my local machine.
Seems like there is a change in behavior of sparse torch tensors in version 2.1.1

Code:

A = [[0, 0], 
     [1, 0], 
     [0, 2]]

B = [[1, 0],
     [0, 0],
     [2, 3]]

a = torch.tensor(A, device='cuda:0').float().to_sparse_csr()
b = torch.tensor(B, device='cuda:0').float().to_sparse_csr()

print(a * b)

Output Torch 2.0.0:

(torch2.0.0) mystic-slice@MysticSlice:/mnt/e/Opensource/heat$ python3 dummy.py
tensor(crow_indices=tensor([0, 0, 0, 1]),
       col_indices=tensor([1]),
       values=tensor([6.]), device='cuda:0', size=(3, 2), nnz=1, layout=torch.sparse_csr)

Output Torch 2.1.1:

(torch2.1.1) mystic-slice@MysticSlice:/mnt/e/Opensource/heat$ python3 dummy.py
tensor(crow_indices=tensor([0, 0, 1, 2]),
       col_indices=tensor([0, 1]),
       values=tensor([0., 6.]), device='cuda:0', size=(3, 2), nnz=2, layout=torch.sparse_csr)

The value 0 that is obtained after multiplication is considered significant in the new version. And this happens only when run on GPU.
Couldn't find any reference to this change in their release notes.
I feel like this is something Pytorch has to sort out because a tensor's behaviour cannot be different wrt the device.

What do you think we should do, @ClaudiaComito?

ClaudiaComito · 2023-11-24T18:24:11Z

Brilliant @Mystic-Slice , thanks for looking into this!

I think you should go ahead and report it to PyTorch. When we merge support for PyTorch 2.1, we'll skip that test until a fix is out. Does that sound reasonable?

Mystic-Slice · 2023-11-25T05:08:34Z

Yeah. Sounds good.
I will raise an issue in the PyTorch repo.

ClaudiaComito · 2023-11-27T10:28:10Z

Reported here. Thanks again @Mystic-Slice !

github-actions · 2024-01-29T02:01:22Z

This issue is stale because it has been open for 60 days with no activity.

ClaudiaComito added bug Something isn't working sparse labels Nov 23, 2023

ClaudiaComito changed the title ~~[Bug]: sparse element-wise multiplication returns wrong indptr~~ [Bug]: sparse element-wise multiplication returns wrong indptr on CUDA Nov 23, 2023

ClaudiaComito mentioned this issue Nov 23, 2023

Support Pytorch 2.1 #1271

Closed

5 tasks

github-actions bot added the stale label Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: `sparse` element-wise multiplication returns wrong `indptr` on CUDA #1273

[Bug]: `sparse` element-wise multiplication returns wrong `indptr` on CUDA #1273

ClaudiaComito commented Nov 23, 2023 •

edited

Mystic-Slice commented Nov 24, 2023

ClaudiaComito commented Nov 24, 2023

Mystic-Slice commented Nov 25, 2023

ClaudiaComito commented Nov 27, 2023

github-actions bot commented Jan 29, 2024

[Bug]: sparse element-wise multiplication returns wrong indptr on CUDA #1273

[Bug]: sparse element-wise multiplication returns wrong indptr on CUDA #1273

Comments

ClaudiaComito commented Nov 23, 2023 • edited

What happened?

Code snippet triggering the error

Error message or erroneous outcome

Version

Python version

PyTorch version

MPI version

Mystic-Slice commented Nov 24, 2023

ClaudiaComito commented Nov 24, 2023

Mystic-Slice commented Nov 25, 2023

ClaudiaComito commented Nov 27, 2023

github-actions bot commented Jan 29, 2024

[Bug]: `sparse` element-wise multiplication returns wrong `indptr` on CUDA #1273

[Bug]: `sparse` element-wise multiplication returns wrong `indptr` on CUDA #1273

ClaudiaComito commented Nov 23, 2023 •

edited