The future, array API, and so on #1743

bmcfee · 2023-08-30T15:02:30Z

This question came up at the SpeechBrain 2023 summit, and I figured it warrants some open discussion and thought. In short: what will it take for librosa to be compatible with the Array API? Motivation here is that doing so would make librosa backend-agnostic and ease interoperability with other packages, platforms (GPU), etc.

The way I see it, the answer is both "a lot" and "not much".

A lot

We make pretty extensive use of several upstream libraries. Specifically:

Numpy
Scipy
numba
soundfile
soxr

We also make not-so-extensive use of some other upstream libraries:

sklearn
audioread
matplotlib

I don't think we can plausibly make an attempt at array API until the upstream libraries have implemented it. There could be some exceptions here, eg if the interaction from librosa is one-directional (reading from soundfile, writing to matplotlib for example), we can probably wedge in some appropriate conversions and not have to worry.

However, the core functionality will require at least numpy, scipy, and numba to be on board. Some relevant discussion threads:

So if I understand the state of things (at present) correctly, we're talking about at least numpy 2.0, and probably a ways after that while midstream libraries (scipy, numba) catch up.

Not much

On the other hand, the work we would have to do once our upstream dependencies are in place ought to be rather superficial. In most cases, I expect it to involve swapping object methods (x.sum()) for package functions (np.sum(x)) or similar trivial changes.

There will also have to be some modifications to our input validators and type checkers, obviously.

Some caveats

Things that could cause difficulty include:

Sparse arrays. For the sake of speed, we have a few spots where we bypass the scipy API and operate directly on the underlying data structures for things like index manipulation (eg util.__shear_sparse). These might cause trouble for us.
Type hints: could obviously get ugly. Uglier.
String data: string dtypes are technically out of scope for the array api. We use these extensively in unit conversion and music notation as this was the cleanest way to get dimension- and type-preserving implementations via numpy.vectorize. It's not clear yet (to me) how this will shake out with an abstracted array API backend.

There are probably more issues at hand than those noted above, and I'd like to hear from folks if there are major concerns that I'm missing.

From my perspective, I think it makes sense to let this cook a while, and perhaps make it the milestone feature for a 2.0 release of librosa. (Where 1.0 will be the stabilized release of the 0.x series once we wrap up all major outstanding features, which at this point are vanishingly few!)

The text was updated successfully, but these errors were encountered:

lostanlen · 2023-09-02T16:41:20Z

Great summary! Thank you for writing it. A few additional remarks/questions

Quoting:

String data: string dtypes are technically out of scope for the array api. We use these extensively in unit conversion and music notation as this was the cleanest way to get dimension- and type-preserving implementations via numpy.vectorize. It's not clear yet (to me) how this will shake out with an abstracted array API backend.

Likewise for arrays of datetime64 and timedelta64 values in the context of #1364.

One of my favorite librosa functions is librosa.util.valid_audio:
https://librosa.org/doc/0.10.1/generated/librosa.util.valid_audio.html#librosa.util.valid_audio

I'm not sure how to update this function so as to check efficiently that the type of y complies with the Array API yet relax the requirement of isinstance(y, np.ndarray).

By default, librosa.stack (The big Multi-Channel PR #1351) guarantees to return Fortran-contiguous arrays. I don't think that the Array API has notions like C- and F-contiguity, so I wonder how we might test that. Unless we plan on relaxing that guarantee?

https://librosa.org/doc/0.10.1/generated/librosa.util.stack.html#librosa.util.stack

bmcfee · 2023-09-04T14:53:46Z

Likewise for arrays of datetime64 and timedelta64 values in the context of #1364.

This actually seems okay to me. I expect that matplotlib will be slower to support array api, so we'd likely have to force a conversion to ndarray at this point anyway. Since we'll be fundamentally limited by upstream requirements, it seems okay to restrict functionality at this point.

I'm not sure how to update this function so as to check efficiently that the type of y complies with the Array API yet relax the requirement of isinstance(y, np.ndarray).

This one is actually easy: https://data-apis.org/array-api/latest/purpose_and_scope.html#checking-an-array-object-for-compliance

3. Unless we plan on relaxing that guarantee?

We'd likely have to relax that guarantee if supporting different device backends (as required by array api). We might be able to get away with preserving array order if the input is specifically ndarray, but in the year 202x it really shouldn't be our business to interfere with byte ordering any more than we absolutely have to.

bmcfee · 2023-10-05T17:43:28Z

https://labs.quansight.org/blog/scipy-array-api has some updates on the scipy side of things

bmcfee · 2023-10-23T13:58:53Z

Maybe also worth considering https://github.com/data-apis/array-api-compat (as used in sklearn currently for experimental array api support)

bmcfee added the discussion Open-ended discussion for developers and users label Aug 30, 2023

bmcfee mentioned this issue Feb 29, 2024

Resolve the performance issue of autocorrelate #1813

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The future, array API, and so on #1743

The future, array API, and so on #1743

bmcfee commented Aug 30, 2023

lostanlen commented Sep 2, 2023 •

edited

bmcfee commented Sep 4, 2023

bmcfee commented Oct 5, 2023

bmcfee commented Oct 23, 2023

The future, array API, and so on #1743

The future, array API, and so on #1743

Comments

bmcfee commented Aug 30, 2023

lostanlen commented Sep 2, 2023 • edited

bmcfee commented Sep 4, 2023

bmcfee commented Oct 5, 2023

bmcfee commented Oct 23, 2023

lostanlen commented Sep 2, 2023 •

edited