Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUGFIX: Use correct terminology to avoid confusion #6832

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

hyiltiz
Copy link

@hyiltiz hyiltiz commented Jan 3, 2024

Uses correct terminology to avoid confusion around whiskers vs. fences. Fences are defined as the bounds around the quartiles, whose length always equals k IQR (plotly adopts k = 1.5). Whiskers finds the data point closest to this bound instead. Plotly implements the latter, hence should use correct names.

https://stats.stackexchange.com/questions/149161/confused-by-location-of-fences-in-box-whisker-plots

@alexcjohnson
Copy link
Collaborator

Thanks @hyiltiz! Clearly we weren't aware of this convention when we created our implementation of the box plot, but it does look to me as though the distinction you're describing is used fairly consistently in the stats community. Unfortunately, this naming is part of the API, for example:

lowerfence: {
valType: 'data_array',
editType: 'calc',
description: [
'Sets the lower fence values.',
'There should be as many items as the number of boxes desired.',
'This attribute has effect only under the q1/median/q3 signature.',
'If `lowerfence` is not provided but a sample (in `y` or `x`) is set,',
'we compute the lower as the last sample point below 1.5 times the IQR.'
].join(' ')
},
upperfence: {
valType: 'data_array',
editType: 'calc',
description: [
'Sets the upper fence values.',
'There should be as many items as the number of boxes desired.',
'This attribute has effect only under the q1/median/q3 signature.',
'If `upperfence` is not provided but a sample (in `y` or `x`) is set,',
'we compute the lower as the last sample point above 1.5 times the IQR.'
].join(' ')
},

(oops part of the upperfence description still talks about the lower fence, cc @archmoj)

So it would be a breaking change to alter that now, and I don't think this is severe enough to warrant that big a step. I suggest we leave the API as is and address the confusion in the docs / descriptions.

The hover text labels we could certainly change, though I'll note that this is not so simple either since it's included in all the translations, so we would need to update those too, or add some custom code to fall back on the translation of upper fence if the dictionary hasn't been updated to include upper whisker.

@hyiltiz
Copy link
Author

hyiltiz commented Jan 4, 2024

Thank you so much for the prompt feedback! I was confused about the hover text that I saw from ggplotly today, and it seemed the issue came from the js lib. I understand that introducing a breaking change to the API is undesirable, and would certainly be happy to accept that we change, as you suggested, all of the following:

  • user facing UI elements such as hover text
  • documentation for English and various other languages
  • internal code that are not part of the external API (to avoid breaking changes)

As this warrants no urgent work, it would probably make sense to change all translations at once rather than going through the hassle of falling back to the incorrect terminology. This terminology can become especially confusing as the data becomes less Gaussian-like (e.g. a trimodal dataset, depending on the distance between the modes, drawing whiskers vs. fences would really make a large difference).

@archmoj
Copy link
Contributor

archmoj commented Jan 4, 2024

(oops part of the upperfence description still talks about the lower fence, cc @archmoj)

Now fixed by #6834.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants