Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geom_ribbon() > geom_area() speed difference #5788

Closed
japhir opened this issue Mar 21, 2024 · 3 comments · Fixed by #5789
Closed

geom_ribbon() > geom_area() speed difference #5788

japhir opened this issue Mar 21, 2024 · 3 comments · Fixed by #5789

Comments

@japhir
Copy link

japhir commented Mar 21, 2024

I was recently making some figures in which I wanted a simple geom_area() call on something with quite a few points. However, I noticed that my computer was waiting for a long time when calling the function. I had previously plotted something with geom_ribbon() instead, and it was almost instant there. After playing around, it seems that by default on my machine geom_area() is way slower than geom_ribbon() for some reason.

See below for an attempt at a reprex. Benchmark results are a bit iffy though.

library(ggplot2)
library(tibble)

dat <- tibble(
  x = 1:1e4,
  y = rnorm(seq_along(x)) + 5
)

dat |>
  ggplot() +
  geom_area(aes(x = x, y = y))

dat |>
  ggplot() +
  geom_ribbon(aes(x = x, ymin = 0, ymax = y))

# they seem identical
res <- bench::mark(
  area = dat |>
    ggplot() +
    geom_area(aes(x = x, y = y)),
  ribbon = dat |>
    ggplot() +
    geom_ribbon(aes(x = x, ymin = 0, ymax = y)),
  check = FALSE)

# until you actually print them!
res <- bench::mark(
  area = dat |>
    ggplot() +
    geom_area(aes(x = x, y = y)) |> print(), # sloooww
  ribbon = dat |>
    ggplot() +
    geom_ribbon(aes(x = x, ymin = 0, ymax = y)) |> print(), # fast
  check = FALSE)
#> mapping: x = ~x, y = ~y 
#> geom_area: na.rm = FALSE, orientation = NA, outline.type = upper
#> stat_align: na.rm = FALSE, orientation = NA
#> position_stack 
#> mapping: x = ~x, ymin = 0, ymax = ~y 
#> geom_ribbon: na.rm = FALSE, orientation = NA, outline.type = both
#> stat_identity: na.rm = FALSE
#> position_identity 
#> mapping: x = ~x, y = ~y 
#> geom_area: na.rm = FALSE, orientation = NA, outline.type = upper
#> stat_align: na.rm = FALSE, orientation = NA
#> position_stack 
#> mapping: x = ~x, y = ~y 
#> geom_area: na.rm = FALSE, orientation = NA, outline.type = upper
#> stat_align: na.rm = FALSE, orientation = NA
#> position_stack 
#> mapping: x = ~x, y = ~y 
#> geom_area: na.rm = FALSE, orientation = NA, outline.type = upper
#> stat_align: na.rm = FALSE, orientation = NA
#> position_stack 

# < omitted output >

#> geom_ribbon: na.rm = FALSE, orientation = NA, outline.type = both
#> stat_identity: na.rm = FALSE
#> position_identity

# geom_area seems to get many more gc/sec?
res
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 area         2.03ms   2.13ms      406.    44.3KB     6.19
#> 2 ribbon       2.03ms   2.23ms      375.    22.4KB     6.35

Created on 2024-03-20 with reprex v2.1.0

@japhir
Copy link
Author

japhir commented Mar 21, 2024

I guess the reason for this is the default position argument, which defaults to "identify" for ribbon and to stack for area, which may make it slow?

If I switch to:

dat |>
  ggplot() +
  geom_area(aes(x = x, y = y), position = "identity")

I seem to get the same speed.

Perhaps the function could check if there are any group or fill/colour aesthetics and if not, change the position argument to "identity"?

@teunbrand
Copy link
Collaborator

teunbrand commented Mar 21, 2024

I can reproduce the issue, though the benchmarks are measuring how long it takes to print(geom_area()) rather than how long it takes to render the plot :)

Here are more relevant benchmarks; for default options, geom_area() takes 2 seconds whereas geom_ribbon() takes 70 milliseconds.

library(ggplot2)

dat <- data.frame(
  x = 1:1e4,
  y = rnorm(1e4) + 5
)

area   <- ggplot(dat) + geom_area(aes(x, y))
ribbon <- ggplot(dat) + geom_ribbon(aes(x, ymin = 0, ymax = y))

ragg::agg_png(tempfile(fileext = ".png"))

res <- bench::mark(
  area   = print(area),
  ribbon = print(ribbon),
  check  = FALSE,
  min_iterations = 5
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
print(res)
#> # A tibble: 2 × 13
#>   expression      min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#>   <bch:expr> <bch:tm> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#> 1 area          2.02s   2.1s     0.469     129MB    16.7      5   178      10.7s
#> 2 ribbon      72.24ms 72.6ms     9.04     24.2MB     3.62     5     2    552.8ms
#> # ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>

Both are about 70 milliseconds when stat and position are identity.

area <- ggplot(dat) + 
  geom_area(aes(x, y), stat = "identity", position = "identity")
ribbon <- ggplot(dat) +
  geom_ribbon(aes(x, ymin = 0, ymax = y), stat = "identity", position = "identity")

res <- bench::mark(
  area   = print(area),
  ribbon = print(ribbon),
  check  = FALSE,
  min_iterations = 5
)
print(res)
#> # A tibble: 2 × 13
#>   expression      min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#>   <bch:expr> <bch:tm> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#> 1 area         70.6ms 72.1ms      13.8    20.1MB     2.30     6     1      435ms
#> 2 ribbon       70.2ms 71.4ms      14.0    19.5MB     5.61     5     2      356ms
#> # ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>

Created on 2024-03-21 with reprex v2.1.0

@japhir
Copy link
Author

japhir commented Mar 21, 2024

Nice work dude! Yeah I knew something was wrong in my benchmark, should have realized I could've just forced it to write to file ;-).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants