Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Violin plots should not exist #10

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ It requires R, RStudio, and the rmarkdown package.
# Table of contents

1. [Friends Don't Let Friends Make Bar Plots For Mean Separation](https://github.com/cxli233/FriendsDontLetFriends#1-friends-dont-let-friends-make-bar-plots-for-means-separation)
2. [Friends Don't Let Friends Make Violin Plots for Small Sample Sizes](https://github.com/cxli233/FriendsDontLetFriends#2-friends-dont-let-friends-make-violin-plots-for-small-sample-sizes)
2. [Friends Don't Let Friends Make Violin Plots](https://github.com/cxli233/FriendsDontLetFriends#2-friends-dont-let-friends-make-violin-plots)
3. [Friends Don't Let Friends Use Bidirectional Color Scales for Unidirectional Data](https://github.com/cxli233/FriendsDontLetFriends#3-friends-dont-let-friends-use-bidirectional-color-scales-for-unidirectional-data)
4. [Friends Don't Let Friends Make Bar Plot Meadow](https://github.com/cxli233/FriendsDontLetFriends#4-friends-dont-let-friends-make-bar-plot-meadow)
5. [Friends Don't Let Friends Make Heatmap without Reordering Rows & Columns](https://github.com/cxli233/FriendsDontLetFriends#5-friends-dont-let-friends-make-heatmap-without-considering-reordering-rows--columns)
Expand Down Expand Up @@ -50,16 +50,16 @@ Just don't use bar plot for means separation, or at least check a couple things
It's worth mentioning that I was inspired by many researchers who have tweeted on the limitation of bar graphs.
Here is a pulication: [Weissgerber et al., 2015, PLOS Biology](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002128).

# 2. Friends Don't Let Friends Make Violin Plots for Small Sample Sizes
# 2. Friends Don't Let Friends Make Violin Plots

This is quite common in the literature as well, but unfortunately, violin plots (or any sort of smoothed distribution curves) make no sense for small n.
Violin plots don't help your reader to understand the data. The whole justification is that regular box plots may misrepresent multimodel data distributions, so you want to show the data. But violin plots don't have units and tick marks that allow people to actually read the data and compare the distributions. Also the selection of the probability density function for the smoothing is almost never explained, and may even be completely misleading with small datasets.
Copy link

@egonelbre egonelbre Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Violin plots do have quartiles and medians etc. on them, all the same features as a box plot. See the paper https://www.stat.cmu.edu/~rnugent/PCMI2016/papers/ViolinPlots.pdf

People do remove them, which should be avoided.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they can have quartiles and medians. I was referring to the distributions' height tick marks, such as you would find on a histogram. So you can't actually compare the different distributions the violin plot depicts, you'd have to manually cut them out and overlay them, and still would be missing one axis' units.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Box plots also do not have that information, so we should avoid them as well?


![Beware of Violin Plots for Small Sample Sizes](https://github.com/cxli233/FriendsDontLetFriends/blob/main/Results/Beware_of_small_n_box_violin_plot.png)

Distributions and quartiles can vary widely with small n, even if the underlying observations are similar.
Distribution and quartiles are only meaningful with large n.
I did an experiment before, where I sampled the *same* normal distribution several times and computed the quartiles for each sample.
The quartiles only stablize when n gets larger than 50.
- If your data is multimodal or has an interesting distribution, use a histogram or multiple stacked histograms.
- If your data is not multimodal and you want to show averages and quartiles, just use a box plot.

[Violin Plots should not exist](https://youtu.be/_0QMKFzW9fw?si=o18kDWN5qSeaNfXV)

# 3. Friends Don't Let Friends Use Bidirectional Color Scales for Unidirectional Data

Expand Down