Refactor plot_ecdf arguments #2316

sethaxen · 2024-02-23T11:27:11Z

Description

This PR introduces new keyword arguments to plot_ecdf and deprecates a few existing ones, following suggestions in #2309.

New keywords and features:

confidence_bands now may take string arguments as well as boolean
ci_prob specifies band probability. stats.ci_prob is a new rcParam.
eval_points allows the user to specify the evaluation points
rvs, random_state can be provided for simulation confidence bands

Deprecated arguments:

values2 is now deprecated, in favor of users passing an empirical CDF

Deprecated keywords:

pointwise is now confidence_bands="pointwise"
fpr is replaced with 1-ci_prob for consistency with other plotting functions.
pit is deprecated. We only need this for LOO-PIT, users who need it for something else will probably know how to make the plot, and if we really want to include it, it should be its own plotting function. There's now a documented example of how to plot PIT.
npoints

Deprecated rcparams:

stats.hdi_prob has been deprecated and replaced with stats.ci_prob

Additional changes:

If eval_points not provided, there's a warning that in the future eval_points will be the unique values of the sample. This would be breaking and is saved for a future release.

None of the changes are breaking.

Checklist

Follows official PR format
Includes a sample plot to visually illustrate the changes (only for plot-related functions)
New features are properly documented (with an example if appropriate)?
Includes new or updated tests to cover the new feature
Code style correct (follows pylint and black guidelines)
Changes are listed in changelog

📚 Documentation preview 📚: https://arviz--2316.org.readthedocs.build/en/2316/

arviz/plots/ecdfplot.py

sethaxen · 2024-02-23T14:13:46Z

arviz/plots/ecdfplot.py

+ band.
+ For simultaneous confidence bands to be correctly calibrated, provide `eval_points` that
+ are not dependent on the `values`.
+ band_prob : float, default 0.95


I'd argue that this should default to 0.94 for consistency with hdi_prob, but that would be breaking (since fpr was 0.05).

This will ultimately depend on the hdi_prob, band_prob, ci_prob discussion, but in my opinion changing the value of hdi_prob (or any other rcParam defined probability) is not a breaking change. It is documented in multiple places that these are completely arbitrary values and that might also change.

The only guarantee should be that is someone was using fpr=0.05 it still works for a while but changing the probability of the band they get when not providing fpr is ok.

sethaxen · 2024-02-23T14:15:08Z

arviz/rcparams.py

@@ -262,6 +262,7 @@ def validate_iterable(value):
 "mean",
 _make_validate_choice({"mean", "median", "mode"}, allow_none=True),
 ),
+ "plot.band_prob": (0.95, _validate_probability),


Potentially this should be stats.band_prob, especially if we add the ECDF confidence band computation functions to the API.

sethaxen · 2024-02-23T14:15:33Z

arviz/stats/ecdf_utils.py

@@ -90,7 +90,7 @@ def ecdf_confidence_band(
 A function that takes an integer `ndraws` and optionally the object passed to
 `random_state` and returns an array of `ndraws` samples from the same distribution
 as the original dataset. Required if `method` is "simulated" and variable is discrete.
- num_trials : int, default 1000
+ num_trials : int, default 500


Just changed for consistency with original behavior.

sethaxen · 2024-02-23T14:17:58Z

arviz/plots/ecdfplot.py

@@ -46,26 +52,41 @@ def plot_ecdf(
 values : array-like
 Values to plot from an unknown continuous or discrete distribution.
 values2 : array-like, optional
- Values to compare to the original sample.
+ deprecated: values to compare to the original sample. Instead use
+ `cdf=scipy.stats.ecdf(values2).cdf.evaluate`.


I hope in a future PR to add an ECDF-(difference)plot option to plot_ranks and then recommend that here.

sethaxen · 2024-02-23T14:18:23Z

arviz/plots/ecdfplot.py

+ confidence_bands : str or bool, optional
+ - False: No confidence bands are plotted.
+ - "pointwise": Compute the pointwise (i.e. marginal) confidence band.
+ - True or "simulated": Use Monte Carlo simulation to estimate a simultaneous confidence


In the next PR I will add deterministic bands, which will become the default here.

I would then put "True" and "simulated" on different lines. True saying bands are computed with the default algorithm (subject to change), and simulated can keep the current description.

review-notebook-app · 2024-04-04T09:15:15Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

sethaxen · 2024-04-04T13:07:40Z

@OriolAbril I've implemented all suggestions and updated the above description and changelog. This should be ready for final review.

OriolAbril

Missed some DeprecationWarnings that are user-facing. I'll batch commit the suggestions and merge, all the changes are extremely minor

CHANGELOG.md

arviz/plots/ecdfplot.py

arviz/tests/base_tests/test_plots_matplotlib.py

arviz/plots/ecdfplot.py

changing user-facing DeprecationWarnings to FutureWarnings

OriolAbril

After seeing the warnings and trying it out I am a bit on the fence on the behaviour of eval_points. It is basically a required argument right now, otherwise you get a FutureWarning.

It would be nice to continue allowing plot_ecdf(samples) to work without warnings.

sethaxen · 2024-04-04T18:13:43Z

After seeing the warnings and trying it out I am a bit on the fence on the behaviour of eval_points. It is basically a required argument right now, otherwise you get a FutureWarning.

It would be nice to continue allowing plot_ecdf(samples) to work without warnings.

The reason it raises a FutureWarning is because in the future plot_ecdf(samples) will do something entirely different than it does now. The alternative is that we currently don't raise a warning and instead change the behavior in the future without warning. Personally, I think the way we do it here could be less jarring.

OriolAbril · 2024-04-04T18:38:09Z

Maybe we could create a specific warning class for this? Something like BehaviourChangeWarning, I think it might signal better what is happening and also make it easier to silence (I am quite sure many users don't really care about the default as long as it works) and plot_ecdf(samples) will continue to work.

We could also silence it in the examples of the docstring. Now all examples use eval_points, but to illustrate how to generate confidence bands or how to make it a difference plot it doesn't matter which is the default behaviour of eval_points (and tests don't use it). So we could use the first example to describe the behaviour change, show how to maintain old behaviour and then silence the warning so following examples focus on what they want to illustrate without worrying about the warning.

What do you think?

sethaxen added 21 commits February 22, 2024 15:51

Add rcparam for band probability

1c73e88

Unify plot_ecdf keywords and deprecate old ones

7e53ce6

Change default num_trials to match plot_ecdf

22054f9

Support keywords rvs and random_state

fcd5773

Deprecate values2

551af15

Allow eval_points to be specified

bc1de44

Restore original name confidence_bands

6df1daf

Correctly specify ecdf usage

c64009a

Include recommended specification of eval_points

af508b4

Update examples in docstrings

7a23c93

Deprecate pit keyword

5d96bc2

Ravel values2

8007213

Change band_prob default to 0.95

460346d

Run black

ec86075

Add band_prob to to rctemplate

7020c1d

Use deprecation warnings

d881c3d

Test deprecations

4a5aa8e

test eval_points accepted

c8be43e

Check confidence_bands args

fcb7691

Test ecdfplot errors

5e8a3ce

Update CHANGELOG.md

605a2fb

sethaxen marked this pull request as ready for review February 23, 2024 14:11

Update CHANGELOG.md

19881ae

sethaxen commented Feb 23, 2024

View reviewed changes

sethaxen added 3 commits February 23, 2024 15:44

Move scipy.stats.ecdf import to top of file

9b279e1

Remove duplicate warnings import

f436591

Fix linting errors

29c4beb

sethaxen requested a review from OriolAbril February 23, 2024 14:52

sethaxen added 19 commits March 31, 2024 11:04

Mark valus2 as deprecated

ef42e8d

Fix bulleted list of confidence_bands options

08f6743

Clean up acceptable types for random_state

57447ab

Use deprecated directive for other keywords

9f907d1

Merge branch 'main' into ecdf_dep_kwargs

92b61f7

Make deprecation note more specific

1832cb8

Deprecate npoints

cfb00c3

Move deprecated keywords to end of keywords list

9c11ba0

Update CHANGELOG.md

b1ce3e6

Add missing newline

3e3c885

Rename band_prob to ci_prob

bb65bc9

Change ci_prob default to 0.94

28d9787

Deprecate hdi_prob and replace with stats.ci_prob

f6cff76

Support rvs with no keywords

63db748

Unify docstrings

e8684a8

Change deprecation warning to future warning

e0d5ee1

Test deprecated rcparams

903bedc

Use stats.ci_prob in plots

1ce8bcf

Use stats.ci_prob in docs

a9b61dc

Don't access protected class variable

74c6db8

sethaxen requested a review from OriolAbril April 4, 2024 13:07

OriolAbril approved these changes Apr 4, 2024

View reviewed changes

OriolAbril reviewed Apr 4, 2024

View reviewed changes

arviz/plots/ecdfplot.py Outdated Show resolved Hide resolved

Apply suggestions from code review

ef0ceae

changing user-facing DeprecationWarnings to FutureWarnings

OriolAbril reviewed Apr 4, 2024

View reviewed changes

Merge branch 'main' into ecdf_dep_kwargs

635a35a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor plot_ecdf arguments #2316

Refactor plot_ecdf arguments #2316

sethaxen commented Feb 23, 2024 •

edited

sethaxen Feb 23, 2024

OriolAbril Mar 7, 2024

sethaxen Feb 23, 2024

sethaxen Feb 23, 2024

sethaxen Feb 23, 2024

sethaxen Feb 23, 2024

OriolAbril Mar 7, 2024

review-notebook-app bot commented Apr 4, 2024

sethaxen commented Apr 4, 2024

OriolAbril left a comment

OriolAbril left a comment

sethaxen commented Apr 4, 2024

OriolAbril commented Apr 4, 2024

Refactor plot_ecdf arguments #2316

Are you sure you want to change the base?

Refactor plot_ecdf arguments #2316

Conversation

sethaxen commented Feb 23, 2024 • edited

Description

Checklist

sethaxen Feb 23, 2024

Choose a reason for hiding this comment

OriolAbril Mar 7, 2024

Choose a reason for hiding this comment

sethaxen Feb 23, 2024

Choose a reason for hiding this comment

sethaxen Feb 23, 2024

Choose a reason for hiding this comment

sethaxen Feb 23, 2024

Choose a reason for hiding this comment

sethaxen Feb 23, 2024

Choose a reason for hiding this comment

OriolAbril Mar 7, 2024

Choose a reason for hiding this comment

review-notebook-app bot commented Apr 4, 2024

sethaxen commented Apr 4, 2024

OriolAbril left a comment

Choose a reason for hiding this comment

OriolAbril left a comment

Choose a reason for hiding this comment

sethaxen commented Apr 4, 2024

OriolAbril commented Apr 4, 2024

sethaxen commented Feb 23, 2024 •

edited