Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: sparse: reformat str and repr for sparse arrays, correct 1D coords, improve dtype looks #20649

Merged
merged 6 commits into from May 15, 2024

Conversation

dschult
Copy link
Contributor

@dschult dschult commented May 5, 2024

Fixes #20378
Fixes #20377

Currently, str(A) for sparse A with no stored elements returns the empty string.
In repr(A), for 1D arrays, the shape does not appear nicely, e.g. "958 sparse array of type ...". 2D is "4x3 sparse array of type".

This PR puts in repr: format and dtype in first line, nnz and shape in 2nd line.
For str, the text from repr is the start, followed by coord/value column titles followed by coord/value info as before, but now it works for 1D. Also, if no stored values exist, the repr text is returned.

@github-actions github-actions bot added scipy.sparse maintenance Items related to regular maintenance tasks labels May 5, 2024
@dschult
Copy link
Contributor Author

dschult commented May 7, 2024

When changing the str/repr format, I didn't change the refguide doctests to match. The CI reports 47 doctests that need to be updated. I've updated/corrected them.

I also changed the term "type" to "dtype" because that is what we are showing... not idx_type or the object's type.
The diff is bigger now, but hopefully still reasonable for review.

Copy link
Member

@perimosocordiae perimosocordiae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks really nice, thanks!

scipy/sparse/_base.py Outdated Show resolved Hide resolved
scipy/sparse/_base.py Outdated Show resolved Hide resolved
if self.nnz == 0:
return val

val += '\n Coords\tValue\n'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base class uses Coords and Values, so we should use Values here too.

Though I wonder: since this doesn't support maxprint, should we even have this override?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point -- This str override could print something very long. And the base class will handle it just fine.

I've removed that method from _lilbase here.

I've also rebased to include the csr-1d stuff in case that changes anything.

@dschult
Copy link
Contributor Author

dschult commented May 8, 2024

The CI failure seems not to be related to this PR. I believe this str/repr PR is ready for review again.

Copy link
Member

@perimosocordiae perimosocordiae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait a bit to see if anyone else has comments, otherwise this is good to merge.

@tylerjereddy
Copy link
Contributor

Let me restart the failing CI job for sparse/linalg/tests/test_expm_multiply.py::test_expm_multiply_dtype. I think that one is a known flake (#18880), but sparse namespace makes me want to re-run it.

@dschult
Copy link
Contributor Author

dschult commented May 9, 2024

Good thanks! Rerunning the test seems to have let it pass. :)

return '\n'.join([(' {}\t{}'.format(*t)) for t in triples])
def tostr(coords, data):
pairs = zip(zip(*coords), data)
return '\n'.join(f' {idx}\t{val}' for idx, val in pairs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: in Numpy 2.0+, this will format the indices with np.int32() around each one. For example:

  (np.int32(3), np.int32(1))    0.2435

@dschult
Copy link
Contributor Author

dschult commented May 14, 2024

The "add unittests" commit adds tests that expose the change in str for numpy 2.0. (somehow doctests are not being tested using numpy 2.0 yet)

The "fix numpy 2 printing" commit converts values to python numeric before printing (using ndarray.tolist() which uses i.item()). Only converting the maxprint number of values. Maintenance on this is helped by the removal of the override method of __str__ in _lil.py. We now have a single definition of __str__ for sparse arrays. Yay...

If these tests pass I think we're good to start a re-review by @perimosocordiae due to the numpy changes.

Copy link
Member

@perimosocordiae perimosocordiae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI errors are unrelated to this change.

@perimosocordiae perimosocordiae merged commit dddb3f2 into scipy:main May 15, 2024
28 of 31 checks passed
@lucascolley lucascolley added this to the 1.14.0 milestone May 15, 2024
@dschult dschult deleted the sparse_repr branch May 15, 2024 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.sparse
Projects
None yet
4 participants