Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The future, array API, and so on #1743

Open
bmcfee opened this issue Aug 30, 2023 · 4 comments
Open

The future, array API, and so on #1743

bmcfee opened this issue Aug 30, 2023 · 4 comments
Labels
discussion Open-ended discussion for developers and users

Comments

@bmcfee
Copy link
Member

bmcfee commented Aug 30, 2023

This question came up at the SpeechBrain 2023 summit, and I figured it warrants some open discussion and thought. In short: what will it take for librosa to be compatible with the Array API? Motivation here is that doing so would make librosa backend-agnostic and ease interoperability with other packages, platforms (GPU), etc.

The way I see it, the answer is both "a lot" and "not much".

A lot

We make pretty extensive use of several upstream libraries. Specifically:

  • Numpy
  • Scipy
  • numba
  • soundfile
  • soxr

We also make not-so-extensive use of some other upstream libraries:

  • sklearn
  • audioread
  • matplotlib

I don't think we can plausibly make an attempt at array API until the upstream libraries have implemented it. There could be some exceptions here, eg if the interaction from librosa is one-directional (reading from soundfile, writing to matplotlib for example), we can probably wedge in some appropriate conversions and not have to worry.

However, the core functionality will require at least numpy, scipy, and numba to be on board. Some relevant discussion threads:

So if I understand the state of things (at present) correctly, we're talking about at least numpy 2.0, and probably a ways after that while midstream libraries (scipy, numba) catch up.

Not much

On the other hand, the work we would have to do once our upstream dependencies are in place ought to be rather superficial. In most cases, I expect it to involve swapping object methods (x.sum()) for package functions (np.sum(x)) or similar trivial changes.

There will also have to be some modifications to our input validators and type checkers, obviously.

Some caveats

Things that could cause difficulty include:

  • Sparse arrays. For the sake of speed, we have a few spots where we bypass the scipy API and operate directly on the underlying data structures for things like index manipulation (eg util.__shear_sparse). These might cause trouble for us.
  • Type hints: could obviously get ugly. Uglier.
  • String data: string dtypes are technically out of scope for the array api. We use these extensively in unit conversion and music notation as this was the cleanest way to get dimension- and type-preserving implementations via numpy.vectorize. It's not clear yet (to me) how this will shake out with an abstracted array API backend.

There are probably more issues at hand than those noted above, and I'd like to hear from folks if there are major concerns that I'm missing.

From my perspective, I think it makes sense to let this cook a while, and perhaps make it the milestone feature for a 2.0 release of librosa. (Where 1.0 will be the stabilized release of the 0.x series once we wrap up all major outstanding features, which at this point are vanishingly few!)

@bmcfee bmcfee added the discussion Open-ended discussion for developers and users label Aug 30, 2023
@lostanlen
Copy link
Contributor

lostanlen commented Sep 2, 2023

Great summary! Thank you for writing it. A few additional remarks/questions

  1. Quoting:

String data: string dtypes are technically out of scope for the array api. We use these extensively in unit conversion and music notation as this was the cleanest way to get dimension- and type-preserving implementations via numpy.vectorize. It's not clear yet (to me) how this will shake out with an abstracted array API backend.

Likewise for arrays of datetime64 and timedelta64 values in the context of #1364.

  1. One of my favorite librosa functions is librosa.util.valid_audio:
    https://librosa.org/doc/0.10.1/generated/librosa.util.valid_audio.html#librosa.util.valid_audio

I'm not sure how to update this function so as to check efficiently that the type of y complies with the Array API yet relax the requirement of isinstance(y, np.ndarray).

  1. By default, librosa.stack (The big Multi-Channel PR #1351) guarantees to return Fortran-contiguous arrays. I don't think that the Array API has notions like C- and F-contiguity, so I wonder how we might test that. Unless we plan on relaxing that guarantee?

https://librosa.org/doc/0.10.1/generated/librosa.util.stack.html#librosa.util.stack

@bmcfee
Copy link
Member Author

bmcfee commented Sep 4, 2023

Likewise for arrays of datetime64 and timedelta64 values in the context of #1364.

This actually seems okay to me. I expect that matplotlib will be slower to support array api, so we'd likely have to force a conversion to ndarray at this point anyway. Since we'll be fundamentally limited by upstream requirements, it seems okay to restrict functionality at this point.

I'm not sure how to update this function so as to check efficiently that the type of y complies with the Array API yet relax the requirement of isinstance(y, np.ndarray).

This one is actually easy: https://data-apis.org/array-api/latest/purpose_and_scope.html#checking-an-array-object-for-compliance

3. Unless we plan on relaxing that guarantee?

We'd likely have to relax that guarantee if supporting different device backends (as required by array api). We might be able to get away with preserving array order if the input is specifically ndarray, but in the year 202x it really shouldn't be our business to interfere with byte ordering any more than we absolutely have to.

@bmcfee
Copy link
Member Author

bmcfee commented Oct 5, 2023

https://labs.quansight.org/blog/scipy-array-api has some updates on the scipy side of things

@bmcfee
Copy link
Member Author

bmcfee commented Oct 23, 2023

Maybe also worth considering https://github.com/data-apis/array-api-compat (as used in sklearn currently for experimental array api support)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Open-ended discussion for developers and users
Development

No branches or pull requests

2 participants