Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not coerce to factor in tbl_svysummary() #1602

Open
ddsjoberg opened this issue Feb 9, 2024 · 8 comments
Open

Do not coerce to factor in tbl_svysummary() #1602

ddsjoberg opened this issue Feb 9, 2024 · 8 comments

Comments

@ddsjoberg
Copy link
Owner

@larmarange see this SO post: https://stackoverflow.com/questions/77957551

Is this something we should address? It seems that the survey method for subset() doesn't remove rows, but puts the weights to 0, and users can't remove unobserved levels from by variables in tbl_svysummary().

@larmarange
Copy link
Collaborator

Because the default (t.test) is not implemented for tbl_svysummary(). You should use "smd", cf. https://www.danieldsjoberg.com/gtsummary/reference/tests.html#tbl-svysummary-add-difference-

Currently, add_difference() does not change the default tests when applied to a tbl_svysummary()

@ddsjoberg
Copy link
Owner Author

I was thinking more about the tbl_svysummary() table itself. The unobserved columns appear in the table, even if we make the underlying column character.

library(gtsummary)
library(PNSIBGE)

pns <- get_pns(year = 2019, labels = TRUE)
pns.2 <- subset(pns, C009  %in% c("Branca", "Preta")) 
pns.2$variables$C009 <- as.character(pns.2$variables$C009)

pns.2 |> 
  gtsummary::tbl_svysummary(by = C009, include = c(C006)) |> 
  gtsummary::as_kable()
Characteristic Amarela, N = 0 Branca, N = 91,037,722 Ignorado, N = 0 Indígena, N = 0 Parda, N = 0 Preta, N = 21,786,515
C006
Homem 0 (NA%) 42,682,905 (47%) 0 (NA%) 0 (NA%) 0 (NA%) 10,691,164 (49%)
Mulher 0 (NA%) 48,354,817 (53%) 0 (NA%) 0 (NA%) 0 (NA%) 11,095,351 (51%)

But I just tried to tabulate directly with the survey package, and it still shows all levels, even when the column has previously been converted to a character.

image

So what they are dealing with is a non-standard situation, and they'd just need to write their own method in add_stat() for this, and hide the unobserved columns themselves.

@larmarange
Copy link
Collaborator

Probably because somewhere the levels are still declared. pns.2$variables$C009 <- as.character(pns.2$variables$C009) did not change metadata stored within the survey object.

It is much safier to use fct_drop() through srvyr::mutate()

@larmarange
Copy link
Collaborator

But a question remains open: if this is a tbl_svysummary table, should we apply, by default, a relevant test?

@ddsjoberg
Copy link
Owner Author

Even dropping the levels with srvry, the unobserved levels appear from the survey function.

pns.2 <- 
  srvyr::as_survey_design(pns) |> 
  srvyr::filter(C009 %in% c("Branca", "Preta")) |> 
  srvyr::mutate(C009 = as.character(C009))

survey::svytable(~C009,pns.2)
#> C009
#>  Amarela   Branca Ignorado Indígena    Parda    Preta 
#>        0 91037722        0        0        0 21786515 

@ddsjoberg
Copy link
Owner Author

But, yes, I better default is warrented!

@ddsjoberg ddsjoberg reopened this Feb 9, 2024
@larmarange
Copy link
Collaborator

If I remember, as.character keeps the levels attributes, while forcats::fct_drop() remove unobserved levels.

@ddsjoberg
Copy link
Owner Author

Same issue with forcats::fct_drop() unfortunately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants