New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: stats: make reducing functions emit consistent warning when sample is too small or empty #20694
base: main
Are you sure you want to change the base?
Conversation
@h-vetinari @rgommers @ilayn As participants in gh-19805, if you could suggest answers to these two questions and compare the before/after behavior, I'd appreciate it.
This PR is going to accumulate merge conflicts, and it would be nice to get it into the release candidate to get wider feedback on the elimination of the ad hoc warnings/errors and new standardized warnings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally this looks really nice! :)
I changed each of these tests to look for the new, standard behavior instead.
Love this!
That said, should I just eliminate these function-specific tests? I need to add a generic test that looks for the new behavior anyway, and that would make these scattered tests redundant.
Fine by me to just do a generic test (modulo caveats below)
Should it check every function, or should I just check, say, a few representative functions
Parametrization is easy and if the tests run fast, I don't see a problem to run this for (almost) every function.
scipy/stats/_hypotests.py
Outdated
if (nx < 5) or (ny < 5): | ||
raise ValueError('x and y should have at least 5 elements, but len(x) ' | ||
f'= {nx} and len(y) = {ny}.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should remove more specific error messages that are only valid for certain functions. It's certainly beneficial to add that constraint to the docstring, but only adding it there and not enforcing it anymore is IMO a step back.
I understand that this makes testing harder or less consistent, but I'd still prefer to skip/special-case these rather than throw away that extra information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably keep them for for 1D input. What I'll do is try-except the function and append the error message to the warning or something. For ND I would prefer not to do this for now: no warning is emitted at all now, so emitting some sort of warning is only a step in the right direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but only adding it there and not enforcing it anymore is IMO a step back.
Wait - this makes me think that there might be a misunderstanding. These limitations are still enforced, but rather than raising a ValueError
or return NaN and emit multiple warnings (which may or may not be informative), the function returns nan
and emits a standard "small sample" warning that refers to the documentation for specific sample size limitations.
My first response #20694 (comment) still holds - for 1D input, we could append this function-specific information to the warning message without too much trouble. LMK if that's what you'd like to see.
Update: never mind. I think we're on the same page. #20694 (comment)
scipy/stats/_morestats.py
Outdated
if N < 3: | ||
raise ValueError("Data must be at least length 3.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we could probably append "Data must be at least length 3." to the more generic message, which would also make the testing still work (because extra characters afterwards don't break the matching of the error messages).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, and this makes me think that there is no misunderstanding : ) Yes, exactly - I wouldn't have to revamp the testing.
scipy/stats/_morestats.py
Outdated
if N < 3: | ||
raise ValueError("Not enough observations.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
scipy/stats/_stats_py.py
Outdated
if n < 8: | ||
raise ValueError( | ||
f"skewtest is not valid with less than 8 samples; {n} samples were given.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
scipy/stats/_stats_py.py
Outdated
if n < 5: | ||
raise ValueError( | ||
"kurtosistest requires at least 5 observations; %i observations" | ||
" were given." % int(n)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
Agree with @h-vetinari's answers to the two questions - choices LGTM. |
[skip cirrus] [skip circle]
[skip circle] [skip cirrus]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments to aid the reviewer.
if any(contains_nan) and nan_policy == 'omit': | ||
# consider passing in contains_nan | ||
samples = _remove_nans(samples, paired) | ||
too_small_msg = too_small_1d_omit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide specific warning message depending on whether the 1D sample is only too small after omitting NaNs or if there were not enough observations to begin with.
# if is_too_small(samples): | ||
# return tuple_to_result(NaN, NaN) | ||
# but some existing functions raise exceptions, and changing | ||
# behavior of those would break backward compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided this was OK.
scipy/stats/_axis_nan_policy.py
Outdated
# behavior of those would break backward compatibility. | ||
if override['empty'] and is_too_small(samples, kwds): | ||
warnings.warn(too_small_msg, SmallSampleWarning, stacklevel=2) | ||
return tuple_to_result(*np.full(n_out, NaN)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment was very old, and it was from a time when the decorator only applied to hypothesis tests that returned a statistic and p-value. Now, the decorator is applied to more general reducing functions, so this is updated accordingly.
# raise ValueError on empty input | ||
if N == 0: | ||
raise ValueError("Data input must not be empty") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm removing function-specific errors about empty input since they add no function-specific information.
@@ -376,10 +372,10 @@ def kstatvar(data, n=2, *, axis=None): | |||
N = data.shape[axis] | |||
|
|||
if n == 1: | |||
return kstat(data, n=2, axis=axis) * 1.0/N | |||
return kstat(data, n=2, axis=axis, _no_deco=True) * 1.0/N |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_no_deco
calls the function itself, without the decorator. I don't remember exactly why I was prompted to make this change here, but it's useful to reduce overhead anyway. There's no reason to use the decorated version here because any nans have already been omitted if necessary, etc.
sup.filter(RuntimeWarning, "invalid value encountered") | ||
sup.filter(UserWarning, r"var\(\): degrees of freedom is <= 0.") | ||
sup.filter(RuntimeWarning, "Degrees of freedom <= 0 for slice") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, we'll deal with other backends later.
@@ -117,14 +123,11 @@ def test_mean_zero(self, xp): | |||
y2 = variation(x2, axis=1) | |||
xp_assert_equal(y2, xp.asarray([xp.inf, xp.inf])) | |||
|
|||
@pytest.mark.parametrize('x', [[0.]*5, [], [1, 2, np.inf, 9]]) | |||
@pytest.mark.parametrize('x', [[0.]*5, [1, 2, np.inf, 9]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The empty input case has a conceptually different reason for returning NaN, and it's tested by generic tests, so I've omitted it.
@@ -173,7 +187,7 @@ def test_more_nan_policy_omit_tests(self, ddof, expected, xp): | |||
# The slightly strange formatting in the follow array is my attempt to | |||
# maintain a clean tabular arrangement of the data while satisfying | |||
# the demands of pycodestyle. Currently, E201 and E241 are not | |||
# disabled by the `# noqa` annotation. | |||
# disabled by the `noqa` annotation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started getting a lint complaint.
@@ -4959,7 +4993,10 @@ def test_ttest_rel_axis_size_zero(b, expected_shape): | |||
# The results should be arrays containing nan with shape | |||
# given by the broadcast nonaxis dimensions. | |||
a = np.empty((3, 1, 0)) | |||
result = stats.ttest_rel(a, b, axis=-1) | |||
with np.testing.suppress_warnings() as sup: | |||
# first case should warn, second shouldn't? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what this comment is about:
import numpy as np
from scipy.stats import skew
skew(np.ones((0, 1))) # array([nan]) and warning
# SmallSampleWarning: All axis-slices of one or more sample arguments are too small; all elements of returned arrays will be NaN. See documentation for sample size requirements.
skew(np.ones((0, 0))) # array([], dtype=float64)
Both of these calls take the statistic along a slice with length 0. However, the result of the second one is an empty array, so no NaNs actually get produced, and the warning seems unnecessary, so I didn't think the output was surprising enough to deserve a warning. LMK if you disagree.
scipy/stats/tests/test_stats.py
Outdated
result = stats.f_oneway([10], [11], [12], [13]) | ||
assert_equal(result, (np.nan, np.nan)) | ||
|
||
@pytest.mark.parametrize('args', [(), ([1, 2, 3],)]) | ||
@pytest.mark.parametrize('args', [([1, 2, 3],)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is technically undesirable:
from scipy import stats
stats.f_oneway([]) # F_onewayResult(statistic=nan, pvalue=nan) and "too small" warning
stats.f_oneway([1, 2, 3]) # at least two inputs are required; got 1.
This probably affects all decorated functions that accepts *args
. It happens because the decorator produces the warning and result before checking the number of arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we leave this for a follow-up? If so, could you open an issue about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed for f_oneway
and added the test back.
This one went a bit over my head to be honest. So I must leave this to you folks unfortunately. Apologies about that. |
@ilayn If it helps, don't worry about the diff. If you're interested enough, just take a look at the example behavior in the top post. I don't think we need everyone to review line by line, but if you're willing to just play with a few stats functions (among those listed here) with empty arguments before and after this PR to confirm that we like the new behavior better, that would be appreciated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't go over it with a fine-toothed comb, but basically this LGTM. Just some minor comments.
scipy/stats/tests/test_stats.py
Outdated
result = stats.f_oneway([10], [11], [12], [13]) | ||
assert_equal(result, (np.nan, np.nan)) | ||
|
||
@pytest.mark.parametrize('args', [(), ([1, 2, 3],)]) | ||
@pytest.mark.parametrize('args', [([1, 2, 3],)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we leave this for a follow-up? If so, could you open an issue about it?
elif (nan_policy == 'omit' and data_generator in {"all_nans", "mixed"} | ||
and hypotest not in too_small_special_case_funcs): | ||
with pytest.warns(SmallSampleWarning, match=too_small_nd_omit): | ||
hypotest(*data, axis=axis, nan_policy=nan_policy, *args, **kwds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still an open TODO, or am I misunderstanding your comment?
scipy/stats/_axis_nan_policy.py
Outdated
@@ -579,13 +607,15 @@ def hypotest_fun_out(*samples, **kwds): | |||
return tuple_to_result(*res) | |||
|
|||
# Addresses nan_policy == "omit" | |||
too_small_msg = too_small_nd_omit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this assignment is redundant? We only have one user in the rest of the function, which could just use too_small_nd_omit
directly
Yes, those things are still open items. I should rework the test before we merge, and I might be able to fix the issue with too few arguments not emitting an error easily. I also said I would try to include the more detailed information in the error messages, but I'd like to save that for a follow-up to prevent this from continuing to accumulate merge conflicts. I'll try to finish by tomorrow. |
[skip cirrus] [skip circle]
[skip circle] [skip cirrus]
OK, I think this is now doing what I intended to do here. A few limitations:
One remaining question: by design, this is going to produce some warnings where there were none before. Should we make the |
Reference issue
Address stats part of gh-19805
What does this implement/fix?
Eliminates inconsistent errors and excessive warnings emitted by most stats reducing functions when 1D input is too small. Documents required sample sizes. Emits a single warning when NaN outputs are generated without NaNs in the input or when
nan_policy='omit'
.Return values (usually NaNs) are already correct for the most part; this just deals with warnings and errors.
Examples below are for
skew
, but the behavior of most reducing functions is adjusted.Before:
After: (maybe I should change "is too small" → "contains too few observations")
Additional information
Consider ignoring whitepace changes and reviewing the commits separately. Each has a distinct purpose summarized by the commit message.
@rgommers @h-vetinari all I'd ask for now (if you're still interested in gh-19805) is to 1) do some before/after testing to see whether you prefer the new behavior (almost all reducing functions are fair game) and 2) help me answer these two questions.
Questions before I continue:
wilcoxon
with one argument,mannwhitneyu
with two arguments, andkruskal
with three arguments)?To do:
mode
,f_oneway
)_axis_nan_policy
test - it's very broken because it actually looked for the warnings/errors produced by each functionFor a follow-up - fix for array-API.
We still need to adjust a lot of things about the documentation. Some documentation assumes the samples are 1D, whereas in others it is careful to refer to axis-slices of ND arrays. I would prefer that we write the API documentation using 1D language but consistently link to a tutorial that describes how
axis
/nan_policy
/keepdims
work with N-D arrays. That's for a separate PR.Also, most hypothesis tests can easily get a
method
argument that computes a more accurate p-values instead of returning NaN for small samples (e.g. likepearsonr
andanderson_ksamp
now have). That can be a series of follow-up PRs.