-
-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: constant valued line plot has strange/unexpected y-axis #28229
Comments
See https://matplotlib.org/stable/gallery/ticks/scalarformatter.html#sphx-glr-gallery-ticks-scalarformatter-py and https://matplotlib.org/stable/api/ticker_api.html#matplotlib.ticker.ScalarFormatter.set_useOffset , https://matplotlib.org/stable/api/ticker_api.html#matplotlib.ticker.ScalarFormatter.set_powerlimits This is expected behavior to try and make the numbers on the axis readable. In this case you can turn off both behaviors via import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(layout='constrained')
x = np.linspace(-0.49, 0.49, 10)
y = [
0.9999999999999997,
1.0,
1.0000000000000002,
1.0000000000000002,
1.0,
0.9999999999999999,
1.0,
0.9999999999999996,
0.999999999999999,
1.0000000000000036,
]
ax.yaxis.get_major_formatter().set_scientific(False)
ax.yaxis.get_major_formatter().set_useOffset(False)
ax.plot(x, y)
plt.show() but it is unreadable in its own way (and if you do not use constrained or tight layout the y tick labels will be clipped): I think this should be closed with no action because this is a particularly pathological case. Both behaviors make sense in one context or the other but there is no clear winner and the current behavior has been incumbent for most of the lifetime of the library (I suspect https://matplotlib.org/stable/users/prev_whats_new/whats_new_2.0.0.html#improved-offset-text-choice is the change that changed the behavior of #5657) so it is unlikely that we are going to change it. @fkiraly If you disagree please comment and we can re-open. |
I see, thanks for the explanation, @tacaswell. With your explanation I disagree for now that there is no bug or issue, and that is because of the inconsistency in behaviour, between the two code examples in "actual" and "expected". Why:
Given your explanation, I would expect the example in "expected" also to behave like "actual" - i.e., the bug lies in the inconsistency, and the plots in expected and actual should be switched. In particular, given your explanation, it seems the code snippet in "expected" is the buggy one, then. |
The behavior is dependent on the range of the data:
Explicitly setting the ylim to The "actual" data is all down near the limits of float precision (it looks like all but the last two values are within +/- 2 represent-able float values of 1). Maybe in some cases we would want to treat a range of < 100 eps as "all the same", but that would not really solve this problem just move it to a different threshold. It also does not make sense to auto-limit singular values to +/- the smallest range we can represent. Hence I think we are stuck in a bind where there is this abrupt change, but it is the least bad option (as "all being the same" and "all not being the same" is a discontinuous change). Plotting I'm still skeptical that we can do anything differently here, am I missing something @fkiraly ? |
We cannot start making assumptions about user data, so I am not clear what heuristic would give us the "expected" version but not wipe out a plot for someone who was looking at a very small signal centered on 1.0. I think we are erring on the side of giving a usable plot, and if the user thinks it is wrong they can either normalize their data or set the limits manually. |
In my opinion, the "discontinuous case distinction" is the root of evil here and should be considered a bug. That is, it is possible that there are only minor changes to the data, and the display "jumps" between display conventions in the "small" and "zero" cases. It is impossible to control for the user, because it is not properly documented when the one or the other default case applies, and it is impossible to "trust" the visual interface in some corner cases as it may jump back/forth between the case conventions. The "direct" solution is making the case treatment continuous, or constant even - as a subcase. What I mean with this: small changes in the data should never lead to big changes in the display setup, only gradual/small changes. Examples of continuous case treatment:
|
If I may also ask a small question, more "helpdesk" style - this may be a workaround in my use case, but would not resolve the more general issue. What is the most idiomatic way to plot a non-negative graph in a way that ensures that 0 is always the bottom of the y-axis, while the top is left free to scale with standard axis range heuristics? |
|
Well, that's very convenient. Did not know you could pass |
I don't think people run into the two cases in practice. Either your data is enforced to be constant, or your data has numerical noise and/or a small signal. Feel free to suggest an algorithm, but I strongly suspect it is not possible to both make something that is always pleasing and also doesn't hide some types of signals. Matplotlib strives to err on the side of showing all the data and the signal versus aesthetics, and we assume users can adjust the x and y limits to tell their own story. For instance I just came back from an intensive field course where I made many plots. I think all of them had manual adjustment to the limits. |
Well, with a package as widely used as Specifically, I have a use case where you do not have "either or", but I am producing panel plots which may contain cases of either type, or sequences of plots where you transition from the "constant" over the "near-constant with minimal noise" to the "varies a lot" case, gradually. This is a common occurrence, as I am dealing with probability distributions where constant means uniform, and that is a common case, just as non-constant or almost constant is.
My suggestion is as above, and it is based on the opinion that the plot I see for the near-constant signal is hard to read and borders on uninformative. There is no signal that is "revealed", it's just funny axis behaviour. My preferred behaviour would be to always have ax.yaxis.get_major_formatter().set_scientific(False)
ax.yaxis.get_major_formatter().set_useOffset(False) as @tacaswell suggested. I can live with this for my use case - but why should this not be the case always? |
While you could do this, it's not feasible in practice: This would mean that any data with <%5 variation would be plotted with a 5% margin, which means as soon as your data has ~<0.05% variation, it appears as as constant line on the chosen scale. For any reasonably sized data (*), we want to show them scaled to the maximal available space. Given that, there is no way to make the transition continuous to a 5% scale in the all-zero case. Changing the 5% scale is not an option either for backward-compatibility (and it would also be not too helpful to show constant data on an eps-scale). (*) The limit is somewhere around the scale of ~1e-12. For smaller values, you get discretization artefacts, which we've chosen to not draw by default: You divide the scale in ~1000px visually so the data distance of a pixel at 1e-12 scale is ~1e-15, which is slightly above double precision ulp. - But irrespective of whether we limit scaling here or not, you don't get continuity from any small scale to the 5% scale. |
I agree, that's why I think this is "weaker" than my preferred solution, as stated above:
ax.yaxis.get_major_formatter().set_scientific(False)
ax.yaxis.get_major_formatter().set_useOffset(False) Assuming I understand the effect correctly, see plot in post of @tacaswell here: This will lead to a margin that is proportional to the variation for small variation, without centering the y axis at zero by subtracting the average. |
only affects the formatting not the tick labels, not what the limits are. |
I am going to close this again as the consensus is that we can not do anything to avoid this discrete change (either the data is singular or it is not (or we treat it as singular or not)), and the default behavior of ensuring relatively short labels is both defend-able and long standing behavior. This is also only an issue with autoscaling where we try to infer behavior from a very limited set of information, if the user explicitly gives us any bounds we always respect those. To some level, autoscaling is always best effort and it will fail in some cases where there is reasonable user ambiguity about what "right" is but have to do something and can not avoid guessing. [edited because I hit post too soon] |
Bug summary
When plotting a line plot with near-constant values, the y axis has a very small range when compared with the magnitude of the actual values.
Very similar issue in #5657 which was closed as resolved - but it might not be resolved, given the below?
Code for reproduction
Actual outcome
Expected outcome
similar to the following, y range from 0.94 to 1.06:
Additional information
No response
Operating system
No response
Matplotlib Version
3.8.0
Matplotlib Backend
No response
Python version
3.11
Jupyter version
No response
Installation
pip
The text was updated successfully, but these errors were encountered: