Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor plot_ecdf arguments #2316

Open
wants to merge 48 commits into
base: main
Choose a base branch
from
Open

Refactor plot_ecdf arguments #2316

wants to merge 48 commits into from

Conversation

sethaxen
Copy link
Member

@sethaxen sethaxen commented Feb 23, 2024

Description

This PR introduces new keyword arguments to plot_ecdf and deprecates a few existing ones, following suggestions in #2309.

New keywords and features:

  • confidence_bands now may take string arguments as well as boolean
  • ci_prob specifies band probability. stats.ci_prob is a new rcParam.
  • eval_points allows the user to specify the evaluation points
  • rvs, random_state can be provided for simulation confidence bands

Deprecated arguments:

  • values2 is now deprecated, in favor of users passing an empirical CDF

Deprecated keywords:

  • pointwise is now confidence_bands="pointwise"
  • fpr is replaced with 1-ci_prob for consistency with other plotting functions.
  • pit is deprecated. We only need this for LOO-PIT, users who need it for something else will probably know how to make the plot, and if we really want to include it, it should be its own plotting function. There's now a documented example of how to plot PIT.
  • npoints

Deprecated rcparams:

  • stats.hdi_prob has been deprecated and replaced with stats.ci_prob

Additional changes:

  • If eval_points not provided, there's a warning that in the future eval_points will be the unique values of the sample. This would be breaking and is saved for a future release.

None of the changes are breaking.

Checklist

  • Follows official PR format
  • Includes a sample plot to visually illustrate the changes (only for plot-related functions)
  • New features are properly documented (with an example if appropriate)?
  • Includes new or updated tests to cover the new feature
  • Code style correct (follows pylint and black guidelines)
  • Changes are listed in changelog

📚 Documentation preview 📚: https://arviz--2316.org.readthedocs.build/en/2316/

@sethaxen sethaxen marked this pull request as ready for review February 23, 2024 14:11
arviz/plots/ecdfplot.py Show resolved Hide resolved
band.
For simultaneous confidence bands to be correctly calibrated, provide `eval_points` that
are not dependent on the `values`.
band_prob : float, default 0.95
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that this should default to 0.94 for consistency with hdi_prob, but that would be breaking (since fpr was 0.05).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will ultimately depend on the hdi_prob, band_prob, ci_prob discussion, but in my opinion changing the value of hdi_prob (or any other rcParam defined probability) is not a breaking change. It is documented in multiple places that these are completely arbitrary values and that might also change.

The only guarantee should be that is someone was using fpr=0.05 it still works for a while but changing the probability of the band they get when not providing fpr is ok.

@@ -262,6 +262,7 @@ def validate_iterable(value):
"mean",
_make_validate_choice({"mean", "median", "mode"}, allow_none=True),
),
"plot.band_prob": (0.95, _validate_probability),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially this should be stats.band_prob, especially if we add the ECDF confidence band computation functions to the API.

@@ -90,7 +90,7 @@ def ecdf_confidence_band(
A function that takes an integer `ndraws` and optionally the object passed to
`random_state` and returns an array of `ndraws` samples from the same distribution
as the original dataset. Required if `method` is "simulated" and variable is discrete.
num_trials : int, default 1000
num_trials : int, default 500
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just changed for consistency with original behavior.

@@ -46,26 +52,41 @@ def plot_ecdf(
values : array-like
Values to plot from an unknown continuous or discrete distribution.
values2 : array-like, optional
Values to compare to the original sample.
deprecated: values to compare to the original sample. Instead use
`cdf=scipy.stats.ecdf(values2).cdf.evaluate`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope in a future PR to add an ECDF-(difference)plot option to plot_ranks and then recommend that here.

confidence_bands : str or bool, optional
- False: No confidence bands are plotted.
- "pointwise": Compute the pointwise (i.e. marginal) confidence band.
- True or "simulated": Use Monte Carlo simulation to estimate a simultaneous confidence
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the next PR I will add deterministic bands, which will become the default here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would then put "True" and "simulated" on different lines. True saying bands are computed with the default algorithm (subject to change), and simulated can keep the current description.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@sethaxen
Copy link
Member Author

sethaxen commented Apr 4, 2024

@OriolAbril I've implemented all suggestions and updated the above description and changelog. This should be ready for final review.

@sethaxen sethaxen requested a review from OriolAbril April 4, 2024 13:07
Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed some DeprecationWarnings that are user-facing. I'll batch commit the suggestions and merge, all the changes are extremely minor

CHANGELOG.md Outdated Show resolved Hide resolved
arviz/plots/ecdfplot.py Outdated Show resolved Hide resolved
arviz/plots/ecdfplot.py Outdated Show resolved Hide resolved
arviz/plots/ecdfplot.py Outdated Show resolved Hide resolved
arviz/tests/base_tests/test_plots_matplotlib.py Outdated Show resolved Hide resolved
arviz/tests/base_tests/test_plots_matplotlib.py Outdated Show resolved Hide resolved
arviz/tests/base_tests/test_plots_matplotlib.py Outdated Show resolved Hide resolved
arviz/tests/base_tests/test_plots_matplotlib.py Outdated Show resolved Hide resolved
arviz/tests/base_tests/test_plots_matplotlib.py Outdated Show resolved Hide resolved
arviz/tests/base_tests/test_plots_matplotlib.py Outdated Show resolved Hide resolved
arviz/plots/ecdfplot.py Outdated Show resolved Hide resolved
changing user-facing DeprecationWarnings to FutureWarnings
Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After seeing the warnings and trying it out I am a bit on the fence on the behaviour of eval_points. It is basically a required argument right now, otherwise you get a FutureWarning.

It would be nice to continue allowing plot_ecdf(samples) to work without warnings.

@sethaxen
Copy link
Member Author

sethaxen commented Apr 4, 2024

After seeing the warnings and trying it out I am a bit on the fence on the behaviour of eval_points. It is basically a required argument right now, otherwise you get a FutureWarning.

It would be nice to continue allowing plot_ecdf(samples) to work without warnings.

The reason it raises a FutureWarning is because in the future plot_ecdf(samples) will do something entirely different than it does now. The alternative is that we currently don't raise a warning and instead change the behavior in the future without warning. Personally, I think the way we do it here could be less jarring.

@OriolAbril
Copy link
Member

Maybe we could create a specific warning class for this? Something like BehaviourChangeWarning, I think it might signal better what is happening and also make it easier to silence (I am quite sure many users don't really care about the default as long as it works) and plot_ecdf(samples) will continue to work.

We could also silence it in the examples of the docstring. Now all examples use eval_points, but to illustrate how to generate confidence bands or how to make it a difference plot it doesn't matter which is the default behaviour of eval_points (and tests don't use it). So we could use the first example to describe the behaviour change, show how to maintain old behaviour and then silence the warning so following examples focus on what they want to illustrate without worrying about the warning.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants