Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column_Spec time consuming for large data frame #817

Open
FrankYang1995 opened this issue Feb 9, 2024 · 4 comments
Open

Column_Spec time consuming for large data frame #817

FrankYang1995 opened this issue Feb 9, 2024 · 4 comments

Comments

@FrankYang1995
Copy link

Hi, loving this package. I found Column_spec function will take very long time to execute if the kable is a large data frame. I have a data.frame with 10k rows, I want to covert it to PDF. When I use the Column_spec to change the column width, it seems iterate all rows? I tried to change column width with 100 rows. It runs very fast. With 10K rows, it takes hours. The reason to change column width is I have 10+ columns, without changing the column width. 1 page is not enough to contain all information.

Did I do something wrong? Or KabelExtra is not meant for large data set.

@dmurdoch
Copy link
Collaborator

dmurdoch commented Feb 9, 2024

Could you show us some code that generates a table like the one you're working with, and then runs your formatting instructions on it, i.e. a reproducible example? Certainly 10k rows in a table is unusual (your PDF will have hundreds of pages), but perhaps if we have a working example we can spot where things could be improved.

@haozhu233
Copy link
Owner

Yeah, some of the codes there are probably not efficient enough and sometimes it was determined by the machanism of this package. Here is an example. I will take a deeper look at it later.

library(kableExtra)
library(dplyr)

big_mtcars = list()
for (i in 1:100) {
  big_mtcars[[i]] = mtcars
}
big_mtcars = bind_rows(big_mtcars)

aaa = kbl(big_mtcars, 'html') %>%
  column_spec(1, width='2in')

Anyway, my honest opinion is that when you have that many rows, you should think about using a different way to present the data (e.g. plotting or provide some kind of summary or reduction). Table, as one of the final representing methods, should only contain some distilled information. Going through a 100-page table is just not so fun.

@FrankYang1995
Copy link
Author

Hi Sorry for the delay. Here is the code I used:

kable <- kable(
x = data,
format = "latex",
align = "c",
caption = "",
escape = FALSE,
booktabs = TRUE,
longtable = TRUE,
linesep = "")

kable <- column_spec(kabel,1 ,width = "10") # This line costs a lot of time

I know there are too many rows. But for some reason, I really need to lists all data this way. Do you have any other suggestions to do this work? I need to list thousands of line into PDF/Word.

Thanks for helping

@haozhu233
Copy link
Owner

haozhu233 commented Mar 11, 2024

out <- kable(
x = data,
format = "latex",
align = c('>{\\raggedright\\arraybackslash}p{10cm}', 'c', 'c'), # depending on how many columns you have and whether you have row names
caption = "",
escape = FALSE,
booktabs = TRUE,
longtable = TRUE,
linesep = "")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants