-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Return true minimum dtype when np.inf/nan present & handle empty arrays #2724
base: main
Are you sure you want to change the base?
Conversation
Return the smallest dtype.
Thanks for these changes. I think in the scenario when you don't have any data to go off of, it is safer to pick the largest dtype. I think that is why numpy uses >>> import numpy
>>> arr = numpy.array([])
>>> arr
array([], dtype=float64) For that reason, I suggest we follow the logic numpy is using and instead default to |
After thinking about it further, I think what you have makes sense for the "minimum" dtype. So, ignore my previous comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @groutr 👍
@snowman2 After looking over this PR some more, I realized there is some other subtle issues with
I restructured After thinking it over more, None might be a more consistent return (or maybe even raise a ValueError?) on an empty container. There are two ways of thought, and I'm not sure which is the preferred way: 1) No values mean an unknown dtype (GDT_Unknown) and therefore return None/raise an error or 2) No values mean any dtype is possible and therefore return the smallest/minimum dtype. # returning None
dtype = get_minimum_dtype([])
if dtype:
np.can_cast(dtype, 'float32')
else:
# no values, or some unrecognized dtype (not int/float/complex)
# raise an error
try:
dtype = get_minimum_dtype([])
np.can_cast(dtype, 'float32')
except ValueError:
print("No values???")
else:
print("Unknown dtype (not int/float/complex) I could see the constant checks after calling |
See #2657 Boolean is not a GDAL type in the mapping: Lines 32 to 45 in 6bb3e52
In rasterize, it doesn't have the boolean dtype: Lines 255 to 257 in 6bb3e52
Boolean appears to only be used for masks. From what I can tell, |
@groutr, I think you have raised some valid points. I think the inf/nan changes are good to go. However, the empty array logic appears to need more thought/discussion. Mind moving the empty array changes/discussion to a separate PR? |
Mind moving these changes into a separate PR as well? I thin this topic is "complex" enough to be on its own 😄 |
Let's get #2734 merged first. Having several different open PRs against the same function is really confusing for me. With that PR merged, it would remove the non-array path and this PR could be simplified. Regarding the bool return type, I was going off the docstring for
|
Your logic makes sense. However, since bool doesn't map to a GDAL dtype, this will cause troubles in the To help make the function clearer, I am thinking that it may be a good idea to update the docstring. |
@sgillies There are several questions in this PR that need guidance/clarification.
|
@groutr @snowman2 how about this?
Removing implicit use of this function from |
When get_minimum_dtype was called with np.inf or np.nan, it would return float64. This PR fixes it to only do the range check on the finite values.
Additionally, I return the smallest dtype when the containers are empty, though, it could be argued to let the exceptions bubble up. Another approach would be to return None (ie GDT_Unknown). What would be preferred approach? My opinion is that either the smallest dtype (bool) or None should be returned.
ping @sgillies
Split off of #2722