Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: sparse: reformat str and repr for sparse arrays, correct 1D coords, improve dtype looks #20649

Merged
merged 6 commits into from
May 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
34 changes: 17 additions & 17 deletions doc/source/tutorial/sparse.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@ Sparse arrays are a special kind of array where only a few locations in the arra
[0, 4, 1, 0],
[0, 0, 5, 0]])
>>> sparse
<3x4 sparse array of type '<class 'numpy.int64'>'
with 5 stored elements in COOrdinate format>
<COOrdinate sparse array of dtype 'int64'
with 5 stored elements and shape (3, 4)>

Note that in our dense array, we have five nonzero values. For example, ``2`` is at location ``0,3``, and ``4`` is at location ``1,1``. All of the other values are zero. The sparse array records these five values *explicitly* (see the ``5 stored elements in COOrdinate format``), and then represents all of the remaining zeros as *implicit* values.
Note that in our dense array, we have five nonzero values. For example, ``2`` is at location ``0,3``, and ``4`` is at location ``1,1``. All of the other values are zero. The sparse array records these five values *explicitly* (see the ``5 stored elements and shape (3, 4)``), and then represents all of the remaining zeros as *implicit* values.

Most sparse array methods work in a similar fashion to dense array methods:

Expand Down Expand Up @@ -78,8 +78,8 @@ But, other formats, such as the Compressed Sparse Row (CSR) :func:`csr_array()`
Sometimes, `scipy.sparse` will return a different sparse matrix format than the input sparse matrix format. For example, the dot product of two sparse arrays in COO format will be a CSR format array:

>>> sparse @ sparse.T
<3x3 sparse array of type '<class 'numpy.int64'>'
with 5 stored elements in Compressed Sparse Row format>
<Compressed Sparse Row sparse array of dtype 'int64'
with 5 stored elements and shape (3, 3)>

This change occurs because `scipy.sparse` will change the format of input sparse arrays in order to use the most efficient computational method.

Expand Down Expand Up @@ -112,8 +112,8 @@ Using these, we can now define a sparse array without building a dense array fir

>>> csr = sp.sparse.csr_array((data, (row, col)))
>>> csr
<3x4 sparse array of type '<class 'numpy.int64'>'
with 5 stored elements in Compressed Sparse Row format>
<Compressed Sparse Row sparse array of dtype 'int64'
with 5 stored elements and shape (3, 4)>

Different classes have different constructors, but the :func:`scipy.sparse.csr_array`, :func:`scipy.sparse.csc_array`, and :func:`scipy.sparse.coo_array` allow for this style of construction.

Expand All @@ -132,8 +132,8 @@ Then, our sparse array will have *six* stored elements, not five:

>>> csr = sp.sparse.csr_array((data, (row, col)))
>>> csr
<3x4 sparse array of type '<class 'numpy.int64'>'
with 6 stored elements in Compressed Sparse Row format>
<Compressed Sparse Row sparse array of dtype 'int64'
with 6 stored elements and shape (3, 4)>

The "extra" element is our *explicit zero*. The two are still identical when converted back into a dense array, because dense arrays represent *everything* explicitly:

Expand All @@ -149,12 +149,12 @@ The "extra" element is our *explicit zero*. The two are still identical when con
But, for sparse arithmetic, linear algebra, and graph methods, the value at ``2,3`` will be considered an *explicit zero*. To remove this explicit zero, we can use the ``csr.eliminate_zeros()`` method. This operates on the sparse array *in place*, and removes any zero-value stored elements:

>>> csr
<3x4 sparse array of type '<class 'numpy.int64'>'
with 6 stored elements in Compressed Sparse Row format>
<Compressed Sparse Row sparse array of dtype 'int64'
with 6 stored elements and shape (3, 4)>
>>> csr.eliminate_zeros()
>>> csr
<3x4 sparse array of type '<class 'numpy.int64'>'
with 5 stored elements in Compressed Sparse Row format>
<Compressed Sparse Row sparse array of dtype 'int64'
with 5 stored elements and shape (3, 4)>

Before ``csr.eliminate_zeros()``, there were six stored elements. After, there are only five stored elements.

Expand All @@ -168,8 +168,8 @@ In this case, we can see that there are *two* ``data`` values that correspond to

>>> dupes = sp.sparse.coo_array((data, (row, col)))
>>> dupes
<3x4 sparse array of type '<class 'numpy.int64'>'
with 6 stored elements in COOrdinate format>
<COOrdinate sparse array of dtype 'int64'
with 6 stored elements and shape (3, 4)>

Note that there are six stored elements in this sparse array, despite only having five unique locations where data occurs. When these arrays are converted back to dense arrays, the duplicate values are summed. So, at location ``1,1``, the dense array will contain the sum of duplicate stored entries, ``1 + 3``:

Expand All @@ -182,8 +182,8 @@ To remove duplicate values within the sparse array itself and thus reduce the nu

>>> dupes.sum_duplicates()
>>> dupes
<3x4 sparse array of type '<class 'numpy.int64'>'
with 5 stored elements in COOrdinate format>
<COOrdinate sparse array of dtype 'int64'
with 5 stored elements and shape (3, 4)>

Now there are only five stored elements in our sparse array, and it is identical to the array we have been working with throughout this guide:

Expand Down
4 changes: 2 additions & 2 deletions scipy/io/_fast_matrix_market/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -332,8 +332,8 @@ def mmread(source):

>>> m = mmread(StringIO(text))
>>> m
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 7 stored elements in COOrdinate format>
<COOrdinate sparse matrix of dtype 'float64'
with 7 stored elements and shape (5, 5)>
>>> m.toarray()
array([[0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0.],
Expand Down
6 changes: 6 additions & 0 deletions scipy/io/_harwell_boeing/hb.py
Original file line number Diff line number Diff line change
Expand Up @@ -499,6 +499,9 @@ def hb_read(path_or_open_file):
>>> data = csr_matrix(eye(3)) # create a sparse matrix
>>> hb_write("data.hb", data) # write a hb file
>>> print(hb_read("data.hb")) # read a hb file
<Compressed Sparse Column sparse matrix of dtype 'float64'
with 3 stored elements and shape (3, 3)>
Coords Values
(0, 0) 1.0
(1, 1) 1.0
(2, 2) 1.0
Expand Down Expand Up @@ -550,6 +553,9 @@ def hb_write(path_or_open_file, m, hb_info=None):
>>> data = csr_matrix(eye(3)) # create a sparse matrix
>>> hb_write("data.hb", data) # write a hb file
>>> print(hb_read("data.hb")) # read a hb file
<Compressed Sparse Column sparse matrix of dtype 'float64'
with 3 stored elements and shape (3, 3)>
Coords Values
(0, 0) 1.0
(1, 1) 1.0
(2, 2) 1.0
Expand Down
24 changes: 14 additions & 10 deletions scipy/sparse/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -328,10 +328,9 @@ def imag(self):
def __repr__(self):
_, format_name = _formats[self.format]
sparse_cls = 'array' if isinstance(self, sparray) else 'matrix'
shape_str = 'x'.join(str(x) for x in self.shape)
return (
f"<{shape_str} sparse {sparse_cls} of type '{self.dtype.type}'\n"
f"\twith {self.nnz} stored elements in {format_name} format>"
f"<{format_name} sparse {sparse_cls} of dtype '{self.dtype}'\n"
f"\twith {self.nnz} stored elements and shape {self.shape}>"
)

def __str__(self):
Expand All @@ -340,18 +339,23 @@ def __str__(self):
A = self.tocoo()

# helper function, outputs "(i,j) v"
def tostr(row, col, data):
triples = zip(list(zip(row, col)), data)
return '\n'.join([(' {}\t{}'.format(*t)) for t in triples])
def tostr(coords, data):
pairs = zip(zip(*(c.tolist() for c in coords)), data)
return '\n'.join(f' {idx}\t{val}' for idx, val in pairs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: in Numpy 2.0+, this will format the indices with np.int32() around each one. For example:

  (np.int32(3), np.int32(1))    0.2435


out = repr(self)
if self.nnz == 0:
return out

out += '\n Coords\tValues\n'
if self.nnz > maxprint:
half = maxprint // 2
out = tostr(A.row[:half], A.col[:half], A.data[:half])
out += tostr(tuple(c[:half] for c in A.coords), A.data[:half])
out += "\n :\t:\n"
half = maxprint - maxprint//2
out += tostr(A.row[-half:], A.col[-half:], A.data[-half:])
half = maxprint - half
out += tostr(tuple(c[-half:] for c in A.coords), A.data[-half:])
else:
out = tostr(A.row, A.col, A.data)
out += tostr(A.coords, A.data)

return out

Expand Down
7 changes: 3 additions & 4 deletions scipy/sparse/_bsr.py
Original file line number Diff line number Diff line change
Expand Up @@ -220,11 +220,10 @@ def _getnnz(self, axis=None):
def __repr__(self):
_, fmt = _formats[self.format]
sparse_cls = 'array' if isinstance(self, sparray) else 'matrix'
shape_str = 'x'.join(str(x) for x in self.shape)
blksz = 'x'.join(str(x) for x in self.blocksize)
b = 'x'.join(str(x) for x in self.blocksize)
return (
f"<{shape_str} sparse {sparse_cls} of type '{self.dtype.type}'\n"
f"\twith {self.nnz} stored elements (blocksize = {blksz}) in {fmt} format>"
f"<{fmt} sparse {sparse_cls} of dtype '{self.dtype}'\n"
f"\twith {self.nnz} stored elements (blocksize={b}) and shape {self.shape}>"
)

def diagonal(self, k=0):
Expand Down
20 changes: 10 additions & 10 deletions scipy/sparse/_construct.py
Original file line number Diff line number Diff line change
Expand Up @@ -312,11 +312,11 @@ def identity(n, dtype='d', format=None):
[ 0., 1., 0.],
[ 0., 0., 1.]])
>>> sp.sparse.identity(3, dtype='int8', format='dia')
<3x3 sparse matrix of type '<class 'numpy.int8'>'
with 3 stored elements (1 diagonals) in DIAgonal format>
<DIAgonal sparse matrix of dtype 'int8'
with 3 stored elements (1 diagonals) and shape (3, 3)>
>>> sp.sparse.eye_array(3, dtype='int8', format='dia')
<3x3 sparse array of type '<class 'numpy.int8'>'
with 3 stored elements (1 diagonals) in DIAgonal format>
<DIAgonal sparse array of dtype 'int8'
with 3 stored elements (1 diagonals) and shape (3, 3)>

"""
return eye(n, n, dtype=dtype, format=format)
Expand Down Expand Up @@ -351,8 +351,8 @@ def eye_array(m, n=None, *, k=0, dtype=float, format=None):
[ 0., 1., 0.],
[ 0., 0., 1.]])
>>> sp.sparse.eye_array(3, dtype=np.int8)
<3x3 sparse array of type '<class 'numpy.int8'>'
with 3 stored elements (1 diagonals) in DIAgonal format>
<DIAgonal sparse array of dtype 'int8'
with 3 stored elements (1 diagonals) and shape (3, 3)>

"""
# TODO: delete next 15 lines [combine with _eye()] once spmatrix removed
Expand Down Expand Up @@ -430,8 +430,8 @@ def eye(m, n=None, k=0, dtype=float, format=None):
[ 0., 1., 0.],
[ 0., 0., 1.]])
>>> sp.sparse.eye(3, dtype=np.int8)
<3x3 sparse matrix of type '<class 'numpy.int8'>'
with 3 stored elements (1 diagonals) in DIAgonal format>
<DIAgonal sparse matrix of dtype 'int8'
with 3 stored elements (1 diagonals) and shape (3, 3)>

"""
return _eye(m, n, k, dtype, format, False)
Expand Down Expand Up @@ -1390,8 +1390,8 @@ def rand(m, n, density=0.01, format="coo", dtype=None, random_state=None):
>>> from scipy.sparse import rand
>>> matrix = rand(3, 4, density=0.25, format="csr", random_state=42)
>>> matrix
<3x4 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>
<Compressed Sparse Row sparse matrix of dtype 'float64'
with 3 stored elements and shape (3, 4)>
>>> matrix.toarray()
array([[0.05641158, 0. , 0. , 0.65088847], # random
[0. , 0. , 0. , 0.14286682],
Expand Down
7 changes: 3 additions & 4 deletions scipy/sparse/_dia.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,11 +96,10 @@ def __init__(self, arg1, shape=None, dtype=None, copy=False):
def __repr__(self):
_, fmt = _formats[self.format]
sparse_cls = 'array' if isinstance(self, sparray) else 'matrix'
shape_str = 'x'.join(str(x) for x in self.shape)
ndiag = self.data.shape[0]
d = self.data.shape[0]
return (
f"<{shape_str} sparse {sparse_cls} of type '{self.dtype.type}'\n"
f"\twith {self.nnz} stored elements ({ndiag} diagonals) in {fmt} format>"
f"<{fmt} sparse {sparse_cls} of dtype '{self.dtype}'\n"
f"\twith {self.nnz} stored elements ({d} diagonals) and shape {self.shape}>"
)

def _data_mask(self):
Expand Down
8 changes: 4 additions & 4 deletions scipy/sparse/_extract.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,8 @@ def tril(A, k=0, format=None):
[4, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
>>> tril(A, format='csc')
<3x5 sparse array of type '<class 'numpy.int32'>'
with 4 stored elements in Compressed Sparse Column format>
<Compressed Sparse Column sparse array of dtype 'int32'
with 4 stored elements and shape (3, 5)>

"""
coo_sparse = coo_array if isinstance(A, sparray) else coo_matrix
Expand Down Expand Up @@ -161,8 +161,8 @@ def triu(A, k=0, format=None):
[4, 5, 0, 6, 7],
[0, 0, 8, 9, 0]])
>>> triu(A, format='csc')
<3x5 sparse array of type '<class 'numpy.int32'>'
with 8 stored elements in Compressed Sparse Column format>
<Compressed Sparse Column sparse array of dtype 'int32'
with 8 stored elements and shape (3, 5)>

"""
coo_sparse = coo_array if isinstance(A, sparray) else coo_matrix
Expand Down
7 changes: 0 additions & 7 deletions scipy/sparse/_lil.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,13 +111,6 @@ def count_nonzero(self):
_getnnz.__doc__ = _spbase._getnnz.__doc__
count_nonzero.__doc__ = _spbase.count_nonzero.__doc__

def __str__(self):
val = ''
for i, row in enumerate(self.rows):
for pos, j in enumerate(row):
val += f" {str((i, j))}\t{str(self.data[i][pos])}\n"
return val[:-1]

def getrowview(self, i):
"""Returns a view of the 'i'th row (without copying).
"""
Expand Down
16 changes: 8 additions & 8 deletions scipy/sparse/_matrix_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ def save_npz(file, matrix, compressed=True):
>>> import scipy as sp
>>> sparse_matrix = sp.sparse.csc_matrix([[0, 0, 3], [4, 0, 0]])
>>> sparse_matrix
<2x3 sparse matrix of type '<class 'numpy.int64'>'
with 2 stored elements in Compressed Sparse Column format>
<Compressed Sparse Column sparse matrix of dtype 'int64'
with 2 stored elements and shape (2, 3)>
>>> sparse_matrix.toarray()
array([[0, 0, 3],
[4, 0, 0]], dtype=int64)
Expand All @@ -48,8 +48,8 @@ def save_npz(file, matrix, compressed=True):
>>> sparse_matrix = sp.sparse.load_npz('/tmp/sparse_matrix.npz')

>>> sparse_matrix
<2x3 sparse matrix of type '<class 'numpy.int64'>'
with 2 stored elements in Compressed Sparse Column format>
<Compressed Sparse Column sparse matrix of dtype 'int64'
with 2 stored elements and shape (2, 3)>
>>> sparse_matrix.toarray()
array([[0, 0, 3],
[4, 0, 0]], dtype=int64)
Expand Down Expand Up @@ -109,8 +109,8 @@ def load_npz(file):
>>> import scipy as sp
>>> sparse_array = sp.sparse.csc_array([[0, 0, 3], [4, 0, 0]])
>>> sparse_array
<2x3 sparse array of type '<class 'numpy.int64'>'
with 2 stored elements in Compressed Sparse Column format>
<Compressed Sparse Column sparse array of dtype 'int64'
with 2 stored elements and shape (2, 3)>
>>> sparse_array.toarray()
array([[0, 0, 3],
[4, 0, 0]], dtype=int64)
Expand All @@ -119,8 +119,8 @@ def load_npz(file):
>>> sparse_array = sp.sparse.load_npz('/tmp/sparse_array.npz')

>>> sparse_array
<2x3 sparse array of type '<class 'numpy.int64'>'
with 2 stored elements in Compressed Sparse Column format>
<Compressed Sparse Column sparse array of dtype 'int64'
with 2 stored elements and shape (2, 3)>
>>> sparse_array.toarray()
array([[0, 0, 3],
[4, 0, 0]], dtype=int64)
Expand Down
6 changes: 6 additions & 0 deletions scipy/sparse/csgraph/_reordering.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,9 @@ def reverse_cuthill_mckee(graph, symmetric_mode=False):
... ]
>>> graph = csr_matrix(graph)
>>> print(graph)
<Compressed Sparse Row sparse matrix of dtype 'int64'
with 5 stored elements and shape (4, 4)>
Coords Values
(0, 1) 1
(0, 2) 2
(1, 3) 1
Expand Down Expand Up @@ -215,6 +218,9 @@ def structural_rank(graph):
... ]
>>> graph = csr_matrix(graph)
>>> print(graph)
<Compressed Sparse Row sparse matrix of dtype 'int64'
with 8 stored elements and shape (4, 4)>
Coords Values
(0, 1) 1
(0, 2) 2
(1, 0) 1
Expand Down
18 changes: 18 additions & 0 deletions scipy/sparse/csgraph/_shortest_path.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,9 @@ def shortest_path(csgraph, method='auto',
... ]
>>> graph = csr_matrix(graph)
>>> print(graph)
<Compressed Sparse Row sparse matrix of dtype 'int64'
with 5 stored elements and shape (4, 4)>
Coords Values
(0, 1) 1
(0, 2) 2
(1, 3) 1
Expand Down Expand Up @@ -294,6 +297,9 @@ def floyd_warshall(csgraph, directed=True,
... ]
>>> graph = csr_matrix(graph)
>>> print(graph)
<Compressed Sparse Row sparse matrix of dtype 'int64'
with 5 stored elements and shape (4, 4)>
Coords Values
(0, 1) 1
(0, 2) 2
(1, 3) 1
Expand Down Expand Up @@ -519,6 +525,9 @@ def dijkstra(csgraph, directed=True, indices=None,
... ]
>>> graph = csr_matrix(graph)
>>> print(graph)
<Compressed Sparse Row sparse matrix of dtype 'int64'
with 4 stored elements and shape (4, 4)>
Coords Values
(0, 1) 1
(0, 2) 2
(1, 3) 1
Expand Down Expand Up @@ -1004,6 +1013,9 @@ def bellman_ford(csgraph, directed=True, indices=None,
... ]
>>> graph = csr_matrix(graph)
>>> print(graph)
<Compressed Sparse Row sparse matrix of dtype 'int64'
with 5 stored elements and shape (4, 4)>
Coords Values
(0, 1) 1
(0, 2) 2
(1, 3) 1
Expand Down Expand Up @@ -1241,6 +1253,9 @@ def johnson(csgraph, directed=True, indices=None,
... ]
>>> graph = csr_matrix(graph)
>>> print(graph)
<Compressed Sparse Row sparse matrix of dtype 'int64'
with 5 stored elements and shape (4, 4)>
Coords Values
(0, 1) 1
(0, 2) 2
(1, 3) 1
Expand Down Expand Up @@ -1771,6 +1786,9 @@ def yen(
... ]
>>> graph = csr_matrix(graph)
>>> print(graph)
<Compressed Sparse Row sparse matrix of dtype 'int64'
with 5 stored elements and shape (4, 4)>
Coords Values
(0, 1) 1
(0, 2) 2
(1, 3) 1
Expand Down