Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: NEP 50 inconsistency in copyto #26381

Open
seberg opened this issue May 3, 2024 · 8 comments
Open

BUG: NEP 50 inconsistency in copyto #26381

seberg opened this issue May 3, 2024 · 8 comments
Milestone

Comments

@seberg
Copy link
Member

seberg commented May 3, 2024

As @ev-br noticed: Copyto is currently inconsistent in NEP 50, which will be noticed relatively rarely, because the default is same-kind casting, where it just doesn't matter.

It turns out, copyto is the only (meaningful) user of PyArray_AssignRawScalar when the input is 0-D, and that function is the only direct user of the old scalar logic.

Now, copyto runs into the same difficulty as ufuncs (even more niche), about how you define "safe" casting for Python int/float/complex. In principle, we could:

  1. Just designate that the src is an array, and ignore any subtleties (right now it uses old logic, which also kicks in for 0-d arrays).
    • Means that np.copyto(uint8_arr, 1000) wraps the integer (which is what happens right now anyway).
  2. Make a decision on how to define "safe" for Python scalars:
    • For integers, safe could be "fits the result" (based on the value).
    • For floats, safe is tricky, they always lose precision.

I don't care too much which way we go here, but I think we should punt this to NumPy 2.1 and then need to make a decision though. We were bound to have such issues, that it is actually using old logic is annoying, but not much worse than similar ones we must still have in Python.

The advantage of defining things as 2. is that it can make copyto behave like a ufunc: np.positive(src, out=dst, casting=...) (positive is just effectively an identity ufunc).
That part has the subtlety about mixing casting= with Python scalars (but not the issue about 0-D arrays).


I would be much obliged for input, there is a reason this is still there: Deciding what to do with scalars and cast-safety is not trivial and probably more of an "expectation" decision than trying to reason what is "right".

@mhvk
Copy link
Contributor

mhvk commented May 4, 2024

The advantage of defining things as 2. is that it can make copyto behave like a ufunc: np.positive(src, out=dst, casting=...)

Agreed with this - they should behave the same as much as possible, so I think we should go for 2.

But your question about casting="safe" for the python case seems relevant. In principle, I guess one could have it mean the same as np.copyto(a, a.dtype.type(python-int)). Though perhaps it is also not unreasonable to just document that option, since if we were to go that route, it would seem not inappropriate to do that everywhere, i.e., pick the minimum size dtype that actually is able to hold the value and then get the final loop from there (though typically the other dtypes may then lead to a preference for unsigned, if possible).

@ev-br
Copy link
Contributor

ev-br commented May 6, 2024

+1 for np.copyto(dst, src, casting) being `np.positive(src, out=dst, casting=casting).

If this means that short ints may raise, OK, this is self-consistent. An alternative is to basically set casting='unsafe' whenever a scalar is involved. Then all operations succeed but short/unsigned ints may wrap. This is also consistent. (and this is what pytorch does I believe).
But then it's indeed best to make the change across the board indeed. Currently,

In [16]: np.asarray(2001, dtype=np.uint8)
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
Cell In[16], line 1
----> 1 np.asarray(2001, dtype=np.uint8)

OverflowError: Python integer 2001 out of bounds for uint8

@seberg
Copy link
Member Author

seberg commented May 6, 2024

Right, but two question:

  • do you guys care about the upcoming release? Because I still lean towards waiting because it is rather niche. I.e. I doubt it matters to cupy (we either need to ignore it or need to work around one way or another). But changing it might break already released downstream.
  • I agree with np.positive being the right behavior, but unfortunately that doesn't lead to a definition for casting= behavior for Python scalars there. (I have some tendencies for what to do, but they are not strong.)
    • This is an opinionated decisions, IMO. I have not been able to reason clearly one way or another for Python scalars. Beyond that the default behavior should be what it is now (raise an error). (Additionally, we probably won't yet have the API to get what we land on.)

@ev-br
Copy link
Contributor

ev-br commented May 6, 2024

IMO, copyto(python scalar) can indeed wait until numpy 2.1 or something. It is annoying yes, but can be worked around.

Why does np.positive not lead to a definition of casting for scalars? This seems consistent:

In [9]: In [8]: for src in [101, 1001]:
   ...:    ...:   for casting in ['no', 'equiv', 'safe', 'same_kind', 'unsafe']:
   ...:    ...:     dst = np.ones(1, dtype=np.uint8)
   ...:    ...:     try:
   ...:    ...:         np.positive(src, out=dst, casting=casting)
   ...:    ...:         result = dst.copy()
   ...:    ...:     except Exception as e:
   ...:    ...:         result = e
   ...:    ...:     print(src, casting, result)
   ...: 
101 no Cannot cast ufunc 'positive' output from dtype('int64') to dtype('uint8') with casting rule 'no'
101 equiv Cannot cast ufunc 'positive' output from dtype('int64') to dtype('uint8') with casting rule 'equiv'
101 safe Cannot cast ufunc 'positive' output from dtype('int64') to dtype('uint8') with casting rule 'safe'
101 same_kind Cannot cast ufunc 'positive' output from dtype('int64') to dtype('uint8') with casting rule 'same_kind'
101 unsafe [101]
1001 no Cannot cast ufunc 'positive' output from dtype('int64') to dtype('uint8') with casting rule 'no'
1001 equiv Cannot cast ufunc 'positive' output from dtype('int64') to dtype('uint8') with casting rule 'equiv'
1001 safe Cannot cast ufunc 'positive' output from dtype('int64') to dtype('uint8') with casting rule 'safe'
1001 same_kind Cannot cast ufunc 'positive' output from dtype('int64') to dtype('uint8') with casting rule 'same_kind'
1001 unsafe [233]

What looks like a bug to me is value-based logic in 0D arrays. As discussed offline:

In [7]: In [8]: for src in [np.asarray(101), np.asarray(1001)]:
   ...:    ...:   for casting in ['no', 'equiv', 'safe', 'same_kind', 'unsafe']:
   ...:    ...:     dst = np.ones(1, dtype=np.uint8)
   ...:    ...:     try:
   ...:    ...:         np.copyto(dst, src, casting)
   ...:    ...:         result = dst.copy()
   ...:    ...:     except Exception as e:
   ...:    ...:         result = e
   ...:    ...:     print(src, casting, result)
   ...: 
101 no [101]
101 equiv [101]
101 safe [101]
101 same_kind [101]
101 unsafe [101]
1001 no Cannot cast scalar from dtype('int64') to dtype('uint8') according to the rule 'no'
1001 equiv Cannot cast scalar from dtype('int64') to dtype('uint8') according to the rule 'equiv'
1001 safe Cannot cast scalar from dtype('int64') to dtype('uint8') according to the rule 'safe'
1001 same_kind [233]
1001 unsafe [233]

OTOH, if it's more convenient to fix this together with python scalars, downstream can adapt.

@seberg
Copy link
Member Author

seberg commented May 6, 2024

Argg. Partially, because np.positive is more strict there than we may want it to be.

We allow: np.add(101, 101, dtype=np.uint8) because not doing so would also not allow np.add(101, np.uint8(0)) also note try using int8 instead of uint8 since int8 and int64 are same-kind. Making that distinction for Python scalars doesn't seem right.

Now, the closest to just saying "all fine", to me seems like saying: OK, if casting is passed explicitly, we just use the default dtype (even for uint that isn't same-kind to Python int). But if casting is not passed the cast-safety is actually "same-kind or try convert py-scalar".

@ev-br
Copy link
Contributor

ev-br commented May 6, 2024

TBH and FWIW, what you write sounds too tricky. I've no idea how to predict what's going on or how to explain what's going on (the "how to teach this" PEP rubric). For np.copyto, if np.positive(src, out=dst, casting=casting) does not cut it, I think there are two reasonable options:

  • convert the scalar to the default dtype --- equivalent to np.asarray(scalar)? --- and proceed from there with casting as specified, explicitly or via a default kwarg
  • convert the src scalar to dst.dtype --- equivalent to np.asarray(scalar, dtype=dst.dtype)? --- and proceed from there with the casting as specified, explicitly or via a default kwarg

@seberg
Copy link
Member Author

seberg commented May 6, 2024

Well, it's what we have: I have struggled with this and brought up the fact that it is tricky many times and over a very long time...

  • You probably want 101 + uint8(1) to work and return a uint8
  • And most people seem to want np.add(101, uint8(1)) to be identical to the first.

So you have no choice but to do the promotion by default by basically ignoring the value and saying a Python integer could be any integer really. And that works fine, except that there is a line when code calls np.asarray(python_int), which is a problem because after that you cannot also behave like a ufunc anymore.

But back to the problem: This is mostly fine for promotion, but if you look at casting things get tricky because while promotion used to be implemented via can_cast it isn't the same thing and we don't have a definition for it for Python scalars that makes much sense.

Unless you are suggesting to not do any NEP 50 style weak promotion at all, which of course is the only truly consistent thing without pitfalls, but...

@seberg seberg added this to the 2.1.0 release milestone May 13, 2024
@seberg
Copy link
Member Author

seberg commented May 13, 2024

FWIW, my understanding is that in the last meeting it seemed like there was a preference for just considering these casts safe (in particular pyfloat -> float32).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants