-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generative vignette #24
base: main
Are you sure you want to change the base?
Conversation
Hi @Accio, One thing I was wondering is whether we, in addition to the benefit of randomization, also show the benefit of blocking? |
Thank you for the kind words! Yes, your proposal makes perfect sense. Please do it. I planned indeed to expand on the simple example. If you take over the blocking part, then I may contribute the next level: namely adding an operator/plate-batch effect on the top. Does that make sense? I can build then on what you have built. Best regards and thanks again for the encouraging words. |
Thanks David, yes, I'll add that part! |
Merge branch 'main' into generative-vignette # Conflicts: # vignettes/basic_examples.Rmd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi David sorry, for the delay, I didn't notice I didn't submit my suggestions!!
|
||
What is the difference between the two variants? Option 2 apparently involves more planning and labor than option 1. If manual instead of robotic pipetting is involved, option 2 is likely error-prone. So why bothering considering the later option? | ||
|
||
Randomization pays off when unwanted variance is large enough so that it may distort our estimate of the quantity in which we are interested in. In our example, the unwanted variance may come from the *plate effect*: due to variances in temperature, humidity, and evaporation between wells in the plate, cells may respond differently to *even the same treatment*. Such *plate effects* are difficult to judge practically because they are not known prior to the experiment, unless a calibration study is performed where the cells in a microtiter plate are indeed treated with the same condition and measurements are performed in order to quantify the plate effect. However, it is simple to *simulate* such plate effects *in silico* with *a generative model*, and test the effect of randomization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Randomization pays off when unwanted variance is large enough so that it may distort our estimate of the quantity in which we are interested in. In our example, the unwanted variance may come from the *plate effect*: due to variances in temperature, humidity, and evaporation between wells in the plate, cells may respond differently to *even the same treatment*. Such *plate effects* are difficult to judge practically because they are not known prior to the experiment, unless a calibration study is performed where the cells in a microtiter plate are indeed treated with the same condition and measurements are performed in order to quantify the plate effect. However, it is simple to *simulate* such plate effects *in silico* with *a generative model*, and test the effect of randomization. | |
Randomization pays off when unwanted variance is large enough so that it may distort our estimate of the quantity in which we are interested in. In our example, the unwanted variance may come from a *plate effect*: due to variances in temperature, humidity, and evaporation between wells in the plate, cells may respond differently to *even the same treatment*. Such *plate effects* are difficult to judge practically because they are not known prior to the experiment, unless a calibration study is performed where the cells in a microtiter plate are indeed treated with the same condition and measurements are performed in order to quantify the plate effect. However, it is simple to *simulate* such plate effects *in silico* with *a generative model*, and test the effect of randomization. |
@@ -0,0 +1,207 @@ | |||
--- | |||
title: "On the benefits of experiment design: a generative modelling approach" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
title: "On the benefits of experiment design: a generative modelling approach" | |
title: "On the benefits of experiment design: a simulation approach" |
--- | ||
|
||
In this document, we demonstrate the necessity of a proper experiment design | ||
with a generative model. We show that a proper experiment design helps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with a generative model. We show that a proper experiment design helps | |
with a generative model which we use to simulate data with "batch" effects. We show that a proper experiment design helps |
conds <- c("DMSO", sprintf("Compound%02d", 1:11)) | ||
dat <- data.frame(SampleIndex=1:96, | ||
Compound=factor(rep(conds, 8), levels=conds), | ||
rawRow=rep(1:8, each=12), | ||
rawCol=rep(1:12, 8)) %>% | ||
mutate(trueEffect=rnorm(96, mean=10, sd=1), | ||
plateEffect=0.5 * sqrt((rawRow-4.5)^2 + (rawCol-6.5)^2), | ||
measurement=trueEffect + plateEffect) | ||
bc <- BatchContainer$new( | ||
dimensions = list("plate" = 1, | ||
row = list(values=LETTERS[1:8]), | ||
col = list(values=sprintf("%02d", 1:12))) | ||
) | ||
|
||
bc <- assign_in_order(bc, dat) | ||
|
||
head(bc$get_samples()) %>% gt::gt() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conds <- c("DMSO", sprintf("Compound%02d", 1:11)) | |
dat <- data.frame(SampleIndex=1:96, | |
Compound=factor(rep(conds, 8), levels=conds), | |
rawRow=rep(1:8, each=12), | |
rawCol=rep(1:12, 8)) %>% | |
mutate(trueEffect=rnorm(96, mean=10, sd=1), | |
plateEffect=0.5 * sqrt((rawRow-4.5)^2 + (rawCol-6.5)^2), | |
measurement=trueEffect + plateEffect) | |
bc <- BatchContainer$new( | |
dimensions = list("plate" = 1, | |
row = list(values=LETTERS[1:8]), | |
col = list(values=sprintf("%02d", 1:12))) | |
) | |
bc <- assign_in_order(bc, dat) | |
head(bc$get_samples()) %>% gt::gt() | |
set.seed(2307111) | |
conditions <- c("DMSO", sprintf("Compound%02d", 1:11)) | |
# set up samples with conditions and true effects | |
dat <- data.frame(SampleIndex = 1:96, | |
Compound = factor(rep(conditions, 8), levels = conditions), | |
trueEffect = rnorm(96, mean = 10, sd = 1)) | |
# add the layout plus plate effect | |
dat <- dat %>% | |
mutate( | |
row=rep(1:8, each=12), col=rep(1:12, 8), | |
plateEffect=0.5 * sqrt((row-4.5)^2 + (col-6.5)^2), | |
measurement=trueEffect + plateEffect) | |
head(dat) %>% gt::gt() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Accio, how about writing it like this?
I know that you're setting up an example here that does not have treatment effect, but somehow for me it's more intuitive to first generate the samples with the compounds and the true effects and then generate the layout and add the plate effect.
Creating the batch container does not add much here as it is actually not made use of, and it confused me because I thought some optimization or randomization would be done with it.
We can plot the platelayout directly from dat
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All fine for me, thanks for going through my codes and making improving them!
cowplot::plot_grid( | ||
plotlist = list(plot_plate(bc, | ||
plate=plate, | ||
row=row, column=col, .color=Compound, | ||
title="Layout by treatment"), | ||
plot_plate(bc, | ||
plate = plate, row = row, column = col, .color = trueEffect, | ||
title = "True effect" | ||
), | ||
plot_plate(bc, | ||
plate = plate, row = row, column = col, .color = plateEffect, | ||
title = "Plate effect" | ||
), | ||
plot_plate(bc, | ||
plate = plate, row = row, column = col, .color = measurement, | ||
title = "Measurement" | ||
) | ||
), ncol = 2, nrow=2 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cowplot::plot_grid( | |
plotlist = list(plot_plate(bc, | |
plate=plate, | |
row=row, column=col, .color=Compound, | |
title="Layout by treatment"), | |
plot_plate(bc, | |
plate = plate, row = row, column = col, .color = trueEffect, | |
title = "True effect" | |
), | |
plot_plate(bc, | |
plate = plate, row = row, column = col, .color = plateEffect, | |
title = "Plate effect" | |
), | |
plot_plate(bc, | |
plate = plate, row = row, column = col, .color = measurement, | |
title = "Measurement" | |
) | |
), ncol = 2, nrow=2 | |
) | |
cowplot::plot_grid( | |
plotlist = list(plot_plate(dat, | |
plate=plate, | |
row=row, column=col, .color=Compound, | |
title="Layout by treatment"), | |
plot_plate(dat, | |
plate = plate, row = row, column = col, .color = trueEffect, | |
title = "True effect" | |
), | |
plot_plate(dat, | |
plate = plate, row = row, column = col, .color = plateEffect, | |
title = "Plate effect" | |
), | |
plot_plate(dat, | |
plate = plate, row = row, column = col, .color = measurement, | |
title = "Measurement" | |
) | |
), ncol = 2, nrow=2 | |
) |
rand_dat <- data.frame(SampleIndex=1:96, | ||
Compound=sample(factor(rep(conds, 8), levels=conds)), | ||
rawRow=rep(1:8, each=12), | ||
rawCol=rep(1:12, 8)) %>% | ||
mutate(trueEffect=rnorm(96, mean=10, sd=1), | ||
plateEffect=0.5 * sqrt((rawRow-4.5)^2 + (rawCol-6.5)^2), | ||
measurement=trueEffect + plateEffect) | ||
|
||
bc2 <- BatchContainer$new( | ||
dimensions = list("plate" = 1, | ||
row = list(values=LETTERS[1:8]), | ||
col = list(values=sprintf("%02d", 1:12))) | ||
) | ||
|
||
bc2 <- assign_in_order(bc2, rand_dat) | ||
head(bc2$get_samples()) %>% gt::gt() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rand_dat <- data.frame(SampleIndex=1:96, | |
Compound=sample(factor(rep(conds, 8), levels=conds)), | |
rawRow=rep(1:8, each=12), | |
rawCol=rep(1:12, 8)) %>% | |
mutate(trueEffect=rnorm(96, mean=10, sd=1), | |
plateEffect=0.5 * sqrt((rawRow-4.5)^2 + (rawCol-6.5)^2), | |
measurement=trueEffect + plateEffect) | |
bc2 <- BatchContainer$new( | |
dimensions = list("plate" = 1, | |
row = list(values=LETTERS[1:8]), | |
col = list(values=sprintf("%02d", 1:12))) | |
) | |
bc2 <- assign_in_order(bc2, rand_dat) | |
head(bc2$get_samples()) %>% gt::gt() | |
# add the layout plus plate effect | |
randomized_dat <- dat %>% | |
slice(sample(1:n())) %>% # shuffle the order of samples in the dataset | |
mutate( | |
row=rep(1:8, each=12), col=rep(1:12, 8), | |
plateEffect=0.5 * sqrt((row-4.5)^2 + (col-6.5)^2), | |
measurement=trueEffect + plateEffect) | |
head(randomized_dat) %>% gt::gt() |
cowplot::plot_grid( | ||
plotlist = list(plot_plate(bc2, | ||
plate=plate, | ||
row=row, column=col, .color=Compound, | ||
title="Layout by treatment"), | ||
plot_plate(bc2, | ||
plate = plate, row = row, column = col, .color = trueEffect, | ||
title = "True effect" | ||
), | ||
plot_plate(bc2, | ||
plate = plate, row = row, column = col, .color = plateEffect, | ||
title = "Plate effect" | ||
), | ||
plot_plate(bc2, | ||
plate = plate, row = row, column = col, .color = measurement, | ||
title = "Measurement" | ||
) | ||
), ncol = 2, nrow=2 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cowplot::plot_grid( | |
plotlist = list(plot_plate(bc2, | |
plate=plate, | |
row=row, column=col, .color=Compound, | |
title="Layout by treatment"), | |
plot_plate(bc2, | |
plate = plate, row = row, column = col, .color = trueEffect, | |
title = "True effect" | |
), | |
plot_plate(bc2, | |
plate = plate, row = row, column = col, .color = plateEffect, | |
title = "Plate effect" | |
), | |
plot_plate(bc2, | |
plate = plate, row = row, column = col, .color = measurement, | |
title = "Measurement" | |
) | |
), ncol = 2, nrow=2 | |
) | |
cowplot::plot_grid( | |
plotlist = list(plot_plate(randomized_dat, | |
plate=plate, | |
row=row, column=col, .color=Compound, | |
title="Layout by treatment"), | |
plot_plate(randomized_dat, | |
plate = plate, row = row, column = col, .color = trueEffect, | |
title = "True effect" | |
), | |
plot_plate(randomized_dat, | |
plate = plate, row = row, column = col, .color = plateEffect, | |
title = "Plate effect" | |
), | |
plot_plate(randomized_dat, | |
plate = plate, row = row, column = col, .color = measurement, | |
title = "Measurement" | |
) | |
), ncol = 2, nrow=2 | |
) |
|
||
```{r} | ||
randMeasureDiff <- TukeyHSD(aov(measurement ~ Compound, | ||
data=bc2$get_samples()))$Compound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data=bc2$get_samples()))$Compound | |
data=randomized_dat))$Compound |
We can also use the boxplot as a visual help to inspect the difference between the treatments, to confirm that randomization prevents plate effect from affecting the statistical inference. | ||
|
||
```{r randBoxplot, fig.height=5, fig.width=5} | ||
ggplot(bc2$get_samples(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ggplot(bc2$get_samples(), | |
ggplot(randomized_dat, |
|
||
## Discussions and conclusions | ||
|
||
The simple case study discussed in this vignette is an application of generative models, which means that assuming that we know the mechanism by which the data is generated, we can simulate the data generation process and use it for various purposes. In our cases, we simulated a linear additive model of true effects of compounds and control on cell viability and the plate effect induced by positions in a microtitre plate. Using the model, we demonstrate that (1) plate effect can impact statistical inference by introducing false positive (and in other case, false negative) findings, and (2) a full randomization can guard statistical inference by reducing the effect of plate effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The simple case study discussed in this vignette is an application of generative models, which means that assuming that we know the mechanism by which the data is generated, we can simulate the data generation process and use it for various purposes. In our cases, we simulated a linear additive model of true effects of compounds and control on cell viability and the plate effect induced by positions in a microtitre plate. Using the model, we demonstrate that (1) plate effect can impact statistical inference by introducing false positive (and in other case, false negative) findings, and (2) a full randomization can guard statistical inference by reducing the effect of plate effect. | |
The simple case study discussed in this vignette is an application of generative models, which means that assuming that we know the mechanism by which the data is generated, we can simulate the data generation process and use it for various purposes. In our cases, we simulated a linear additive model of true effects of compounds and control on cell viability and the plate effect induced by positions in a microtitre plate. Using the model, we demonstrate that (1) plate effect can impact statistical inference by introducing false positive (and in other case, false negative) findings, and (2) a full randomization can guard statistical inference by reducing the bias of the plate effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
Dear colleagues,
I have added a vignette using a simple generative model to show the benefit of randomization.
If you think the approach makes sense, I plan to add another 1-2 case studies, using more functionalities in our package, to make the case. Before I do that, please check out the simple example, and let me know what you think (example, coding style, wording, etc.). Criticism and suggestions are highly welcome.
Best regards, David