Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generative vignette #24

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Generative vignette #24

wants to merge 10 commits into from

Conversation

Accio
Copy link

@Accio Accio commented Aug 22, 2023

Dear colleagues,

I have added a vignette using a simple generative model to show the benefit of randomization.

If you think the approach makes sense, I plan to add another 1-2 case studies, using more functionalities in our package, to make the case. Before I do that, please check out the simple example, and let me know what you think (example, coding style, wording, etc.). Criticism and suggestions are highly welcome.

Best regards, David

@idavydov idavydov self-requested a review August 22, 2023 15:33
vignettes/generative_necessity.Rmd Outdated Show resolved Hide resolved
vignettes/generative_necessity.Rmd Outdated Show resolved Hide resolved
@julianesiebourg julianesiebourg added the documentation Improvements or additions to documentation label Sep 4, 2023
@julianesiebourg
Copy link
Collaborator

Hi @Accio,
thank you for the nicely written example. Clearly, so far a simulation example was missing.
I like how you plot the different effects on the readouts separately and then on-top of each other.

One thing I was wondering is whether we, in addition to the benefit of randomization, also show the benefit of blocking?
We can use the optimize_design function to ensure that most treatments cover most rows and columns (e.g. the randomization is really efficient), and thus get even less bias from the plate effect.
Maybe in your example it does not make much of a difference, but if the randomization is 'unlucky' you could still get many samples of Compound1 in the same column and due to the plate gradient still draw wrong conclusions.
What do you think?
I'm happy to add that part!

@Accio
Copy link
Author

Accio commented Sep 4, 2023

Thank you for the kind words! Yes, your proposal makes perfect sense. Please do it.

I planned indeed to expand on the simple example. If you take over the blocking part, then I may contribute the next level: namely adding an operator/plate-batch effect on the top. Does that make sense? I can build then on what you have built.

Best regards and thanks again for the encouraging words.

@julianesiebourg
Copy link
Collaborator

Thanks David, yes, I'll add that part!

julianesiebourg and others added 3 commits September 19, 2023 09:44
Merge branch 'main' into generative-vignette

# Conflicts:
#	vignettes/basic_examples.Rmd
Copy link
Collaborator

@julianesiebourg julianesiebourg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi David sorry, for the delay, I didn't notice I didn't submit my suggestions!!


What is the difference between the two variants? Option 2 apparently involves more planning and labor than option 1. If manual instead of robotic pipetting is involved, option 2 is likely error-prone. So why bothering considering the later option?

Randomization pays off when unwanted variance is large enough so that it may distort our estimate of the quantity in which we are interested in. In our example, the unwanted variance may come from the *plate effect*: due to variances in temperature, humidity, and evaporation between wells in the plate, cells may respond differently to *even the same treatment*. Such *plate effects* are difficult to judge practically because they are not known prior to the experiment, unless a calibration study is performed where the cells in a microtiter plate are indeed treated with the same condition and measurements are performed in order to quantify the plate effect. However, it is simple to *simulate* such plate effects *in silico* with *a generative model*, and test the effect of randomization.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Randomization pays off when unwanted variance is large enough so that it may distort our estimate of the quantity in which we are interested in. In our example, the unwanted variance may come from the *plate effect*: due to variances in temperature, humidity, and evaporation between wells in the plate, cells may respond differently to *even the same treatment*. Such *plate effects* are difficult to judge practically because they are not known prior to the experiment, unless a calibration study is performed where the cells in a microtiter plate are indeed treated with the same condition and measurements are performed in order to quantify the plate effect. However, it is simple to *simulate* such plate effects *in silico* with *a generative model*, and test the effect of randomization.
Randomization pays off when unwanted variance is large enough so that it may distort our estimate of the quantity in which we are interested in. In our example, the unwanted variance may come from a *plate effect*: due to variances in temperature, humidity, and evaporation between wells in the plate, cells may respond differently to *even the same treatment*. Such *plate effects* are difficult to judge practically because they are not known prior to the experiment, unless a calibration study is performed where the cells in a microtiter plate are indeed treated with the same condition and measurements are performed in order to quantify the plate effect. However, it is simple to *simulate* such plate effects *in silico* with *a generative model*, and test the effect of randomization.

@@ -0,0 +1,207 @@
---
title: "On the benefits of experiment design: a generative modelling approach"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
title: "On the benefits of experiment design: a generative modelling approach"
title: "On the benefits of experiment design: a simulation approach"

---

In this document, we demonstrate the necessity of a proper experiment design
with a generative model. We show that a proper experiment design helps
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with a generative model. We show that a proper experiment design helps
with a generative model which we use to simulate data with "batch" effects. We show that a proper experiment design helps

Comment on lines +55 to +71
conds <- c("DMSO", sprintf("Compound%02d", 1:11))
dat <- data.frame(SampleIndex=1:96,
Compound=factor(rep(conds, 8), levels=conds),
rawRow=rep(1:8, each=12),
rawCol=rep(1:12, 8)) %>%
mutate(trueEffect=rnorm(96, mean=10, sd=1),
plateEffect=0.5 * sqrt((rawRow-4.5)^2 + (rawCol-6.5)^2),
measurement=trueEffect + plateEffect)
bc <- BatchContainer$new(
dimensions = list("plate" = 1,
row = list(values=LETTERS[1:8]),
col = list(values=sprintf("%02d", 1:12)))
)

bc <- assign_in_order(bc, dat)

head(bc$get_samples()) %>% gt::gt()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
conds <- c("DMSO", sprintf("Compound%02d", 1:11))
dat <- data.frame(SampleIndex=1:96,
Compound=factor(rep(conds, 8), levels=conds),
rawRow=rep(1:8, each=12),
rawCol=rep(1:12, 8)) %>%
mutate(trueEffect=rnorm(96, mean=10, sd=1),
plateEffect=0.5 * sqrt((rawRow-4.5)^2 + (rawCol-6.5)^2),
measurement=trueEffect + plateEffect)
bc <- BatchContainer$new(
dimensions = list("plate" = 1,
row = list(values=LETTERS[1:8]),
col = list(values=sprintf("%02d", 1:12)))
)
bc <- assign_in_order(bc, dat)
head(bc$get_samples()) %>% gt::gt()
set.seed(2307111)
conditions <- c("DMSO", sprintf("Compound%02d", 1:11))
# set up samples with conditions and true effects
dat <- data.frame(SampleIndex = 1:96,
Compound = factor(rep(conditions, 8), levels = conditions),
trueEffect = rnorm(96, mean = 10, sd = 1))
# add the layout plus plate effect
dat <- dat %>%
mutate(
row=rep(1:8, each=12), col=rep(1:12, 8),
plateEffect=0.5 * sqrt((row-4.5)^2 + (col-6.5)^2),
measurement=trueEffect + plateEffect)
head(dat) %>% gt::gt()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Accio, how about writing it like this?
I know that you're setting up an example here that does not have treatment effect, but somehow for me it's more intuitive to first generate the samples with the compounds and the true effects and then generate the layout and add the plate effect.

Creating the batch container does not add much here as it is actually not made use of, and it confused me because I thought some optimization or randomization would be done with it.
We can plot the platelayout directly from dat.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All fine for me, thanks for going through my codes and making improving them!

Comment on lines +79 to +97
cowplot::plot_grid(
plotlist = list(plot_plate(bc,
plate=plate,
row=row, column=col, .color=Compound,
title="Layout by treatment"),
plot_plate(bc,
plate = plate, row = row, column = col, .color = trueEffect,
title = "True effect"
),
plot_plate(bc,
plate = plate, row = row, column = col, .color = plateEffect,
title = "Plate effect"
),
plot_plate(bc,
plate = plate, row = row, column = col, .color = measurement,
title = "Measurement"
)
), ncol = 2, nrow=2
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cowplot::plot_grid(
plotlist = list(plot_plate(bc,
plate=plate,
row=row, column=col, .color=Compound,
title="Layout by treatment"),
plot_plate(bc,
plate = plate, row = row, column = col, .color = trueEffect,
title = "True effect"
),
plot_plate(bc,
plate = plate, row = row, column = col, .color = plateEffect,
title = "Plate effect"
),
plot_plate(bc,
plate = plate, row = row, column = col, .color = measurement,
title = "Measurement"
)
), ncol = 2, nrow=2
)
cowplot::plot_grid(
plotlist = list(plot_plate(dat,
plate=plate,
row=row, column=col, .color=Compound,
title="Layout by treatment"),
plot_plate(dat,
plate = plate, row = row, column = col, .color = trueEffect,
title = "True effect"
),
plot_plate(dat,
plate = plate, row = row, column = col, .color = plateEffect,
title = "Plate effect"
),
plot_plate(dat,
plate = plate, row = row, column = col, .color = measurement,
title = "Measurement"
)
), ncol = 2, nrow=2
)

Comment on lines +146 to +161
rand_dat <- data.frame(SampleIndex=1:96,
Compound=sample(factor(rep(conds, 8), levels=conds)),
rawRow=rep(1:8, each=12),
rawCol=rep(1:12, 8)) %>%
mutate(trueEffect=rnorm(96, mean=10, sd=1),
plateEffect=0.5 * sqrt((rawRow-4.5)^2 + (rawCol-6.5)^2),
measurement=trueEffect + plateEffect)

bc2 <- BatchContainer$new(
dimensions = list("plate" = 1,
row = list(values=LETTERS[1:8]),
col = list(values=sprintf("%02d", 1:12)))
)

bc2 <- assign_in_order(bc2, rand_dat)
head(bc2$get_samples()) %>% gt::gt()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rand_dat <- data.frame(SampleIndex=1:96,
Compound=sample(factor(rep(conds, 8), levels=conds)),
rawRow=rep(1:8, each=12),
rawCol=rep(1:12, 8)) %>%
mutate(trueEffect=rnorm(96, mean=10, sd=1),
plateEffect=0.5 * sqrt((rawRow-4.5)^2 + (rawCol-6.5)^2),
measurement=trueEffect + plateEffect)
bc2 <- BatchContainer$new(
dimensions = list("plate" = 1,
row = list(values=LETTERS[1:8]),
col = list(values=sprintf("%02d", 1:12)))
)
bc2 <- assign_in_order(bc2, rand_dat)
head(bc2$get_samples()) %>% gt::gt()
# add the layout plus plate effect
randomized_dat <- dat %>%
slice(sample(1:n())) %>% # shuffle the order of samples in the dataset
mutate(
row=rep(1:8, each=12), col=rep(1:12, 8),
plateEffect=0.5 * sqrt((row-4.5)^2 + (col-6.5)^2),
measurement=trueEffect + plateEffect)
head(randomized_dat) %>% gt::gt()

Comment on lines +165 to +183
cowplot::plot_grid(
plotlist = list(plot_plate(bc2,
plate=plate,
row=row, column=col, .color=Compound,
title="Layout by treatment"),
plot_plate(bc2,
plate = plate, row = row, column = col, .color = trueEffect,
title = "True effect"
),
plot_plate(bc2,
plate = plate, row = row, column = col, .color = plateEffect,
title = "Plate effect"
),
plot_plate(bc2,
plate = plate, row = row, column = col, .color = measurement,
title = "Measurement"
)
), ncol = 2, nrow=2
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cowplot::plot_grid(
plotlist = list(plot_plate(bc2,
plate=plate,
row=row, column=col, .color=Compound,
title="Layout by treatment"),
plot_plate(bc2,
plate = plate, row = row, column = col, .color = trueEffect,
title = "True effect"
),
plot_plate(bc2,
plate = plate, row = row, column = col, .color = plateEffect,
title = "Plate effect"
),
plot_plate(bc2,
plate = plate, row = row, column = col, .color = measurement,
title = "Measurement"
)
), ncol = 2, nrow=2
)
cowplot::plot_grid(
plotlist = list(plot_plate(randomized_dat,
plate=plate,
row=row, column=col, .color=Compound,
title="Layout by treatment"),
plot_plate(randomized_dat,
plate = plate, row = row, column = col, .color = trueEffect,
title = "True effect"
),
plot_plate(randomized_dat,
plate = plate, row = row, column = col, .color = plateEffect,
title = "Plate effect"
),
plot_plate(randomized_dat,
plate = plate, row = row, column = col, .color = measurement,
title = "Measurement"
)
), ncol = 2, nrow=2
)


```{r}
randMeasureDiff <- TukeyHSD(aov(measurement ~ Compound,
data=bc2$get_samples()))$Compound
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data=bc2$get_samples()))$Compound
data=randomized_dat))$Compound

We can also use the boxplot as a visual help to inspect the difference between the treatments, to confirm that randomization prevents plate effect from affecting the statistical inference.

```{r randBoxplot, fig.height=5, fig.width=5}
ggplot(bc2$get_samples(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ggplot(bc2$get_samples(),
ggplot(randomized_dat,


## Discussions and conclusions

The simple case study discussed in this vignette is an application of generative models, which means that assuming that we know the mechanism by which the data is generated, we can simulate the data generation process and use it for various purposes. In our cases, we simulated a linear additive model of true effects of compounds and control on cell viability and the plate effect induced by positions in a microtitre plate. Using the model, we demonstrate that (1) plate effect can impact statistical inference by introducing false positive (and in other case, false negative) findings, and (2) a full randomization can guard statistical inference by reducing the effect of plate effect.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The simple case study discussed in this vignette is an application of generative models, which means that assuming that we know the mechanism by which the data is generated, we can simulate the data generation process and use it for various purposes. In our cases, we simulated a linear additive model of true effects of compounds and control on cell viability and the plate effect induced by positions in a microtitre plate. Using the model, we demonstrate that (1) plate effect can impact statistical inference by introducing false positive (and in other case, false negative) findings, and (2) a full randomization can guard statistical inference by reducing the effect of plate effect.
The simple case study discussed in this vignette is an application of generative models, which means that assuming that we know the mechanism by which the data is generated, we can simulate the data generation process and use it for various purposes. In our cases, we simulated a linear additive model of true effects of compounds and control on cell viability and the plate effect induced by positions in a microtitre plate. Using the model, we demonstrate that (1) plate effect can impact statistical inference by introducing false positive (and in other case, false negative) findings, and (2) a full randomization can guard statistical inference by reducing the bias of the plate effect.

Copy link
Author

@Accio Accio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expand generative model vignette with simple optimize design call
2 participants