Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: stats: make reducing functions emit consistent warning when sample is too small or empty #20694

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

mdhaber
Copy link
Contributor

@mdhaber mdhaber commented May 10, 2024

Reference issue

Address stats part of gh-19805

What does this implement/fix?

Eliminates inconsistent errors and excessive warnings emitted by most stats reducing functions when 1D input is too small. Documents required sample sizes. Emits a single warning when NaN outputs are generated without NaNs in the input or when nan_policy='omit'.

Return values (usually NaNs) are already correct for the most part; this just deals with warnings and errors.

Examples below are for skew, but the behavior of most reducing functions is adjusted.

Before:

import numpy as np
from scipy import stats

stats.skew([])  # too noisy
# /usr/local/lib/python3.10/dist-packages/scipy/stats/_stats_py.py:1193: RuntimeWarning: Mean of empty slice.
#   mean = a.mean(axis, keepdims=True)
# /usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:121: RuntimeWarning: invalid value encountered in divide
#   ret = um.true_divide(
# /usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice.
#   return _methods._mean(a, axis=axis, dtype=dtype,
# /usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide
#   ret = ret.dtype.type(ret / rcount)
# nan

stats.skew([np.nan], nan_policy='omit')  # too noisy
# /usr/local/lib/python3.10/dist-packages/scipy/stats/_stats_py.py:1193: RuntimeWarning: Mean of empty slice.
#   mean = a.mean(axis, keepdims=True)
# /usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:121: RuntimeWarning: invalid value encountered in divide
#   ret = um.true_divide(
# /usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice.
#   return _methods._mean(a, axis=axis, dtype=dtype,
# /usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide
#   ret = ret.dtype.type(ret / rcount)
# nan

stats.skew([[]], axis=1)  # silent
# array([nan])

stats.skew([[np.nan]], nan_policy='omit', axis=1)  # silent
# array([nan])

After: (maybe I should change "is too small" → "contains too few observations")

import numpy as np
from scipy import stats

stats.skew([])
# UserWarning: One or more sample arguments is too small; all returned values will be NaN. See documentation for sample size requirements.
#   stats.skew([])
# nan

stats.skew([np.nan], nan_policy='omit')
# After omitting NaNs, one or more sample arguments is too small; all returned values will be NaN. See documentation for sample size requirements.
#  stats.skew([np.nan], nan_policy='omit')
# nan

stats.skew([[]], axis=1)
# UserWarning: All axis-slices of one or more sample arguments are too small; all elements of returned arrays will be NaN. See documentation for sample size requirements.
#  stats.skew([[]], axis=1)
# array([nan])

stats.skew([[np.nan]], nan_policy='omit', axis=1)
# UserWarning: After omitting NaNs, one or more axis-slices of one or more sample arguments is too small; corresponding elements of returned arrays will be NaN. See documentation for sample size requirements.
#  stats.skew([[np.nan]], nan_policy='omit', axis=1)
# array([nan])

Additional information

Consider ignoring whitepace changes and reviewing the commits separately. Each has a distinct purpose summarized by the commit message.

@rgommers @h-vetinari all I'd ask for now (if you're still interested in gh-19805) is to 1) do some before/after testing to see whether you prefer the new behavior (almost all reducing functions are fair game) and 2) help me answer these two questions.

Questions before I continue:

  • Initially, I've changed all the tests that looked for a function-specific error message or warning; now they look for the standard behavior. It took a lot of time, but I think it was important to investigate each case. That said, should I just eliminate these function-specific tests? I would want to add a generic test anyway, and that would make these scattered tests redundant.
  • Should the generic test (which looks for the correct warning and return value) check every function, or should I just check, say, a few representative functions (e.g. wilcoxon with one argument, mannwhitneyu with two arguments, and kruskal with three arguments)?

To do:

  • fix special cases (e.g. mode, f_oneway)
  • fix _axis_nan_policy test - it's very broken because it actually looked for the warnings/errors produced by each function
  • add test that correct warning is emitted in each case
  • add function-specific warning information

For a follow-up - fix for array-API.

We still need to adjust a lot of things about the documentation. Some documentation assumes the samples are 1D, whereas in others it is careful to refer to axis-slices of ND arrays. I would prefer that we write the API documentation using 1D language but consistently link to a tutorial that describes how axis/nan_policy/keepdims work with N-D arrays. That's for a separate PR.

Also, most hypothesis tests can easily get a method argument that computes a more accurate p-values instead of returning NaN for small samples (e.g. like pearsonr and anderson_ksamp now have). That can be a series of follow-up PRs.

@mdhaber mdhaber added scipy.stats maintenance Items related to regular maintenance tasks labels May 10, 2024
@mdhaber mdhaber changed the title WIP/MAINT: stats: enforce consistent behavior when sample is too small or empty WIP/MAINT: stats: enforce consistent reducing function warning when sample is too small or empty May 10, 2024
@mdhaber mdhaber changed the title WIP/MAINT: stats: enforce consistent reducing function warning when sample is too small or empty WIP/MAINT: stats: enforce consistent warning from reducing functions when sample is too small or empty May 11, 2024
@mdhaber mdhaber changed the title WIP/MAINT: stats: enforce consistent warning from reducing functions when sample is too small or empty WIP/MAINT: stats: make reducing functions emit consistent warning when sample is too small or empty May 11, 2024
@mdhaber
Copy link
Contributor Author

mdhaber commented May 20, 2024

@h-vetinari @rgommers @ilayn As participants in gh-19805, if you could suggest answers to these two questions and compare the before/after behavior, I'd appreciate it.

  • Many stats functions had tests that looked for a function-specific error message or warning when there were too few observations. I changed each of these tests to look for the new, standard behavior instead. It took a lot of time, but I think it was important to investigate each case. That said, should I just eliminate these function-specific tests? I need to add a generic test that looks for the new behavior anyway, and that would make these scattered tests redundant.
  • The generic test will looks for the correct warning and return value. Should it check every function, or should I just check, say, a few representative functions (e.g. wilcoxon with one argument, mannwhitneyu with two arguments, and kruskal with three arguments)?

This PR is going to accumulate merge conflicts, and it would be nice to get it into the release candidate to get wider feedback on the elimination of the ad hoc warnings/errors and new standardized warnings.

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally this looks really nice! :)

I changed each of these tests to look for the new, standard behavior instead.

Love this!

That said, should I just eliminate these function-specific tests? I need to add a generic test that looks for the new behavior anyway, and that would make these scattered tests redundant.

Fine by me to just do a generic test (modulo caveats below)

Should it check every function, or should I just check, say, a few representative functions

Parametrization is easy and if the tests run fast, I don't see a problem to run this for (almost) every function.

Comment on lines 100 to 102
if (nx < 5) or (ny < 5):
raise ValueError('x and y should have at least 5 elements, but len(x) '
f'= {nx} and len(y) = {ny}.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should remove more specific error messages that are only valid for certain functions. It's certainly beneficial to add that constraint to the docstring, but only adding it there and not enforcing it anymore is IMO a step back.

I understand that this makes testing harder or less consistent, but I'd still prefer to skip/special-case these rather than throw away that extra information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably keep them for for 1D input. What I'll do is try-except the function and append the error message to the warning or something. For ND I would prefer not to do this for now: no warning is emitted at all now, so emitting some sort of warning is only a step in the right direction.

Copy link
Contributor Author

@mdhaber mdhaber May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but only adding it there and not enforcing it anymore is IMO a step back.

Wait - this makes me think that there might be a misunderstanding. These limitations are still enforced, but rather than raising a ValueError or return NaN and emit multiple warnings (which may or may not be informative), the function returns nan and emits a standard "small sample" warning that refers to the documentation for specific sample size limitations.

My first response #20694 (comment) still holds - for 1D input, we could append this function-specific information to the warning message without too much trouble. LMK if that's what you'd like to see.

Update: never mind. I think we're on the same page. #20694 (comment)

Comment on lines 2005 to 2009
if N < 3:
raise ValueError("Data must be at least length 3.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we could probably append "Data must be at least length 3." to the more generic message, which would also make the testing still work (because extra characters afterwards don't break the matching of the error messages).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, and this makes me think that there is no misunderstanding : ) Yes, exactly - I wouldn't have to revamp the testing.

Comment on lines 3841 to 3857
if N < 3:
raise ValueError("Not enough observations.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Comment on lines 1617 to 1616
if n < 8:
raise ValueError(
f"skewtest is not valid with less than 8 samples; {n} samples were given.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Comment on lines 1806 to 1809
if n < 5:
raise ValueError(
"kurtosistest requires at least 5 observations; %i observations"
" were given." % int(n))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

scipy/stats/tests/test_axis_nan_policy.py Show resolved Hide resolved
scipy/stats/tests/test_axis_nan_policy.py Show resolved Hide resolved
@rgommers
Copy link
Member

Agree with @h-vetinari's answers to the two questions - choices LGTM.

Copy link
Contributor Author

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments to aid the reviewer.

if any(contains_nan) and nan_policy == 'omit':
# consider passing in contains_nan
samples = _remove_nans(samples, paired)
too_small_msg = too_small_1d_omit
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide specific warning message depending on whether the 1D sample is only too small after omitting NaNs or if there were not enough observations to begin with.

# if is_too_small(samples):
# return tuple_to_result(NaN, NaN)
# but some existing functions raise exceptions, and changing
# behavior of those would break backward compatibility.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided this was OK.

# behavior of those would break backward compatibility.
if override['empty'] and is_too_small(samples, kwds):
warnings.warn(too_small_msg, SmallSampleWarning, stacklevel=2)
return tuple_to_result(*np.full(n_out, NaN))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment was very old, and it was from a time when the decorator only applied to hypothesis tests that returned a statistic and p-value. Now, the decorator is applied to more general reducing functions, so this is updated accordingly.

Comment on lines -308 to -312
# raise ValueError on empty input
if N == 0:
raise ValueError("Data input must not be empty")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm removing function-specific errors about empty input since they add no function-specific information.

@@ -376,10 +372,10 @@ def kstatvar(data, n=2, *, axis=None):
N = data.shape[axis]

if n == 1:
return kstat(data, n=2, axis=axis) * 1.0/N
return kstat(data, n=2, axis=axis, _no_deco=True) * 1.0/N
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_no_deco calls the function itself, without the decorator. I don't remember exactly why I was prompted to make this change here, but it's useful to reduce overhead anyway. There's no reason to use the decorated version here because any nans have already been omitted if necessary, etc.

Comment on lines +775 to +777
sup.filter(RuntimeWarning, "invalid value encountered")
sup.filter(UserWarning, r"var\(\): degrees of freedom is <= 0.")
sup.filter(RuntimeWarning, "Degrees of freedom <= 0 for slice")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we'll deal with other backends later.

@@ -117,14 +123,11 @@ def test_mean_zero(self, xp):
y2 = variation(x2, axis=1)
xp_assert_equal(y2, xp.asarray([xp.inf, xp.inf]))

@pytest.mark.parametrize('x', [[0.]*5, [], [1, 2, np.inf, 9]])
@pytest.mark.parametrize('x', [[0.]*5, [1, 2, np.inf, 9]])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty input case has a conceptually different reason for returning NaN, and it's tested by generic tests, so I've omitted it.

@@ -173,7 +187,7 @@ def test_more_nan_policy_omit_tests(self, ddof, expected, xp):
# The slightly strange formatting in the follow array is my attempt to
# maintain a clean tabular arrangement of the data while satisfying
# the demands of pycodestyle. Currently, E201 and E241 are not
# disabled by the `# noqa` annotation.
# disabled by the `noqa` annotation.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started getting a lint complaint.

@@ -4959,7 +4993,10 @@ def test_ttest_rel_axis_size_zero(b, expected_shape):
# The results should be arrays containing nan with shape
# given by the broadcast nonaxis dimensions.
a = np.empty((3, 1, 0))
result = stats.ttest_rel(a, b, axis=-1)
with np.testing.suppress_warnings() as sup:
# first case should warn, second shouldn't?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what this comment is about:

import numpy as np
from scipy.stats import skew
skew(np.ones((0, 1)))  # array([nan]) and warning
# SmallSampleWarning: All axis-slices of one or more sample arguments are too small; all elements of returned arrays will be NaN. See documentation for sample size requirements.
skew(np.ones((0, 0)))  # array([], dtype=float64)

Both of these calls take the statistic along a slice with length 0. However, the result of the second one is an empty array, so no NaNs actually get produced, and the warning seems unnecessary, so I didn't think the output was surprising enough to deserve a warning. LMK if you disagree.

result = stats.f_oneway([10], [11], [12], [13])
assert_equal(result, (np.nan, np.nan))

@pytest.mark.parametrize('args', [(), ([1, 2, 3],)])
@pytest.mark.parametrize('args', [([1, 2, 3],)])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically undesirable:

from scipy import stats
stats.f_oneway([])  # F_onewayResult(statistic=nan, pvalue=nan) and "too small" warning
stats.f_oneway([1, 2, 3])  # at least two inputs are required; got 1.

This probably affects all decorated functions that accepts *args. It happens because the decorator produces the warning and result before checking the number of arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we leave this for a follow-up? If so, could you open an issue about it?

Copy link
Contributor Author

@mdhaber mdhaber May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed for f_oneway and added the test back.

@ilayn
Copy link
Member

ilayn commented May 22, 2024

This one went a bit over my head to be honest. So I must leave this to you folks unfortunately. Apologies about that.

@mdhaber
Copy link
Contributor Author

mdhaber commented May 22, 2024

@ilayn If it helps, don't worry about the diff. If you're interested enough, just take a look at the example behavior in the top post. I don't think we need everyone to review line by line, but if you're willing to just play with a few stats functions (among those listed here) with empty arguments before and after this PR to confirm that we like the new behavior better, that would be appreciated.

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't go over it with a fine-toothed comb, but basically this LGTM. Just some minor comments.

result = stats.f_oneway([10], [11], [12], [13])
assert_equal(result, (np.nan, np.nan))

@pytest.mark.parametrize('args', [(), ([1, 2, 3],)])
@pytest.mark.parametrize('args', [([1, 2, 3],)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we leave this for a follow-up? If so, could you open an issue about it?

Comment on lines 395 to 398
elif (nan_policy == 'omit' and data_generator in {"all_nans", "mixed"}
and hypotest not in too_small_special_case_funcs):
with pytest.warns(SmallSampleWarning, match=too_small_nd_omit):
hypotest(*data, axis=axis, nan_policy=nan_policy, *args, **kwds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still an open TODO, or am I misunderstanding your comment?

@@ -579,13 +607,15 @@ def hypotest_fun_out(*samples, **kwds):
return tuple_to_result(*res)

# Addresses nan_policy == "omit"
too_small_msg = too_small_nd_omit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this assignment is redundant? We only have one user in the rest of the function, which could just use too_small_nd_omit directly

@mdhaber
Copy link
Contributor Author

mdhaber commented May 23, 2024

Yes, those things are still open items. I should rework the test before we merge, and I might be able to fix the issue with too few arguments not emitting an error easily. I also said I would try to include the more detailed information in the error messages, but I'd like to save that for a follow-up to prevent this from continuing to accumulate merge conflicts. I'll try to finish by tomorrow.

@mdhaber mdhaber marked this pull request as ready for review May 24, 2024 00:52
@mdhaber mdhaber requested a review from tupui as a code owner May 24, 2024 00:52
@mdhaber mdhaber requested a review from h-vetinari May 24, 2024 00:52
@mdhaber
Copy link
Contributor Author

mdhaber commented May 24, 2024

OK, I think this is now doing what I intended to do here. A few limitations:

  • It only emits the generic warning for now rather than appending a function-specific part of the message. I can try to append the function-specific part as described above in a follow-up.
  • The issue with calling f_oneway with too few arguments is resolved, but the problem existed before this PR., and it applies to other functions that accepts *args, so I think there is a more general fix to be applied. I can fix that in the same follow-up.
  • It doesn't improve the behavior for other array API backends; those still emit warnings or raise errors as before. We'll want to rework the decorator significantly for array API support (and maybe actually modify most functions to handle nan_policy with n-D arrays/axis natively using something like WIP: stats.masked_array: array API compatible masked arrays #20363), and we'll fix this then.

One remaining question: by design, this is going to produce some warnings where there were none before. Should we make the SmallSampleWarning public to make them easier to filter?

@mdhaber mdhaber changed the title WIP/MAINT: stats: make reducing functions emit consistent warning when sample is too small or empty MAINT: stats: make reducing functions emit consistent warning when sample is too small or empty May 24, 2024
@mdhaber mdhaber added this to the 1.14.0 milestone May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants