Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stat_bin() calculates wrong numbers when a geom_col() with custom data is present #5829

Closed
fkohrt opened this issue Apr 6, 2024 · 4 comments

Comments

@fkohrt
Copy link
Contributor

fkohrt commented Apr 6, 2024

In an attempt to create a bar plot while binning data, I have stumbled across a surprising interaction between geom_col() and geom_text().

The following code is supposed to do two things:

  1. Create bars with geom_col() by calculating data on my own
  2. Annotate the bars with geom_text() by using stat_bin() to calculate the data
faithful |>
  ggplot2::ggplot(ggplot2::aes(x = eruptions)) +
  ggplot2::geom_col(
    data = function(x) {
      bins <- ggplot2:::bin_breaks_bins(
        x_range = range(faithful$eruptions),
        bins = 30,
        center = NULL,
        boundary = 4,
        closed = "right"
      )
      h <- hist(
        faithful$eruptions,
        plot = FALSE,
        breaks = bins$breaks
      )
      data.frame(mids = h$mids, count = h$counts)
      # Alternatively, return ggplot2:::bin_vector(x = faithful$eruptions, bins = bins)
      # and map x = xmin + width/2
    },
    mapping = ggplot2::aes(
      x = mids,
      y = count
    ),
    inherit.aes = FALSE
  ) +
  ggplot2::geom_text(
    mapping = ggplot2::aes(
      y = ggplot2::after_stat(count),
      label = ggplot2::after_stat(count)
    ),
    stat = "bin",
    bins = 30,
    binwidth = NULL,
    center = NULL,
    boundary = 4,
    closed = "right"
  )

However, this leads to suprising behavior in two regards:

  1. The numbers calculated by stat_bin() are off.
  2. They become correct if the whole geom_col() is commented out.

Is this somehow correct behavior or a bug?

Using ggplot2 3.5.0 with R 4.3.3

(Why create the graph above like that in the first place? I deliberately want to avoid geom_histogram() because the count bars are not supposed to touch each other, hence geom_col().)

@teunbrand
Copy link
Collaborator

Thanks for the report! It seems that the computation of the bins do not match among the two layers. I'd just use geom_histogram() and override the width parameter computed by the stat. It throws a warning but it does the correct thing. We plan to make width a proper aesthetic at some point, so then the warning will be gone.

library(ggplot2)

faithful |>
  ggplot(aes(x = eruptions)) +
  geom_histogram(
    aes(width = after_stat(0.9 * width)),
    bins = 30,
    binwidth = NULL,
    center = NULL,
    boundary = 4,
    closed = "right"
  ) +
  geom_text(
    mapping = aes(
      y = after_stat(count),
      label = after_stat(count)
    ),
    stat = "bin",
    bins = 30,
    binwidth = NULL,
    center = NULL,
    boundary = 4,
    closed = "right"
  )
#> Warning in geom_histogram(aes(width = after_stat(0.9 * width)), bins = 30, :
#> Ignoring unknown aesthetics: width

Created on 2024-04-06 with reprex v2.1.0

@teunbrand
Copy link
Collaborator

Yeah I can confirm that the warning would be fixed by #5807, so you woun't need to circumvent geom_histogram() with custom data, which should make the discrepancy disappear.

@fkohrt
Copy link
Contributor Author

fkohrt commented Apr 11, 2024

I agree that setting width is an elegant way to achieve my goal, but isn't the interaction between geom_col() and stat_bin() still worth tracking?

@teunbrand
Copy link
Collaborator

teunbrand commented Apr 11, 2024

I think the interaction is due to that the bin breaks calculation in the text layer is based on the x-axis range of the plot as a whole. The range of h$mids is slightly larger than range(faithful$eruptions), resulting in different bins in the text layer. Computing bins in the full plot range is intended, so I don't think this is abug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants