Skip to content

Commit

Permalink
website
Browse files Browse the repository at this point in the history
  • Loading branch information
DominiqueMakowski committed May 26, 2023
1 parent 9aef1f1 commit 78e6ea1
Show file tree
Hide file tree
Showing 50 changed files with 14,887 additions and 266 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -77,4 +77,4 @@ $RECYCLE.BIN/
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk
.apdisk
285 changes: 20 additions & 265 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,15 @@ editor_options:
chunk_output_type: console
---

# driftR
# easyRT

*Tools and examples for fitting (Hierarchical) Drift Diffusion Models in R*
*Tools and examples for modelling Reaction Times in R*

**[WORK IN PROGRESS]**

```{r setup, echo = FALSE, warning=FALSE, message=FALSE}
options(digits = 3)
library(easyRT)
knitr::opts_chunk$set(
collapse = TRUE,
Expand All @@ -22,289 +23,43 @@ knitr::opts_chunk$set(
set.seed(111)
```

## Installation

```{r eval=FALSE}
remotes::install_github("DominiqueMakowski/driftR")
library(driftR)
```


## Motivation

This repo is my attempt at understanding and implementing sequential models, starting with DDMs for reaction times in R. **Please don't hesitate** to open an issue to discuss and suggest things that could be improved or clarified.

I tried to use HDDM and PyDDM in Python, and had a look at R alternatives, but I didn't find any solution that was easy to use out-of-the-box.

In particular, a massive thanks to @singmann, that made [this blogpost](http://singmann.org/wiener-model-analysis-with-brms-part-i/), the [rtdists](https://github.com/rtdists/rtdists/) and [fddm](https://github.com/rtdists/fddm) packages.

## Theory

```{r echo=FALSE, fig.cap="Something", message=FALSE, warning=FALSE}
knitr::include_graphics("man/figures/wiener_example.webp")
```

*A graphical illustration of the Wiener diffusion model for two-choice reaction times. An evidence counter starts at value $\alpha*\beta$ and evolves with random increments. The mean increment is $\delta$ . The process terminates as soon as the accrued evidence exceeds $\alpha$ or deceeds 0. The decision process starts at time $\tau$ from the stimulus presentation and terminates at the reaction time. [This figure and caption are taken from Wabersich and Vandekerckhove (2014, The R Journal, CC-BY license).]*

DDms are based on Wiener distributions that are defined by 4 parameters:

- **drift**: The **drift** rate (delta $\delta$) is the average slope of the accumulation process towards the boundaries. The larger the (absolute value of the) drift rate, the more effective the evidence accumulation for the corresponding response option. A drift rate close to 0 suggests an ambiguous stimulus. Typical range: [-5, 5].
- **bs**: The **boundary separation** threshold (alpha $\alpha$) is the distance between the two decision bounds and interpreted as a measure of response caution (i.e., of speed-accuracy trade-off, with high *bs* being related to high accuracy). It represents the amount of evidence that is needed to make a response. Typical range: [0.5, 2].
- **bias**: Initial bias (beta $\beta$) for any of the responses. The starting point of the accumulation process. Typical range: [0.3, 0.7].
- **ndt**: The **non-decision time** (tau $\tau$) captures all non-decisional process, such as stimulus encoding, motor processes, etc. Typical range: [0.1, 0.5] s.

Additional variability parameters can include:

- **ndt_var**: Also called `st0`. Typical range: [0, 0.2] s.

## Simple Example

In this example, we are going to simulate data for 4 conditions with *known* parameters, and we will investigate how we can then model and recover these parameters.

### Data

```{r warning=FALSE, message=FALSE}
library(tidyverse)
library(easystats)
library(patchwork)
library(brms)
library(driftR)
sim <- ddm_data(n = c(200, 200, 200, 200),
drift = c(-1, 0, 1, 2),
bs = 1,
bias = c(0.4, 0.5, 0.6, 0.7),
ndt = 0.2)
ddm_plot(sim)
```

- Condition 1: $drift~\delta = -1,~bias~\beta = 0.4$
- Condition 2: $drift~\delta = 0,~bias~\beta = 0.5$
- Condition 3: $drift~\delta = 1,~bias~\beta = 0.6$
- Condition 4: $drift~\delta = 2,~bias~\beta = 0.7$

Boundary separation *bs* and *ndt* have been fixed to 1 and 0.15 for all conditions. Let's visualize the raw data and the theoretical distribution it comes from.


```{r warning=FALSE, message=FALSE}
df <- sim$data
head(df)
```

### FDDM


```{r warning=FALSE, message=FALSE, eval=FALSE}
model <- fddm::ddm(rt + response ~ condition, data = df)
summary(model)
```

Unfortunately, `fdmm` doesn't seem to work and throws the following error:

```
Error in nlminb(start = init, objective = objective, gradient = gradient, :
NA/NaN gradient evaluation
```

### Formula

Let's start with modelling only the drift rate.

```{r warning=FALSE, message=FALSE}
formula <- bf(rt | dec(response) ~ 0 + condition)
family <- wiener(link_bs = "identity",
link_ndt = "identity",
link_bias = "identity")
```

@singmaann [gives this](http://singmann.org/wiener-model-analysis-with-brms-part-i/) rationale for setting an `identity` link:

> Because the drift rate can take on any value (i.e., from -Inf to Inf), the default link function is "identity" (i.e., no transformation) which we retain. The other three parameters all have a restricted range. The boundary needs to be larger than 0, the non-decision time needs to be larger than 0 and smaller than the smallest RT, and the starting point needs to be between 0 and 1. The default link-functions respect these constraints and use "log" for the first two parameters and "logit" for the bias. This certainly is a possibility, but has a number of drawbacks leading me to use the "identity" link function for all parameters. First, when parameters are transformed, the priors need to be specified on the untransformed scale. Second, the individual-levels deviations (i.e., the random-effects estimates) are assumed to come from a multivariate normal distribution. Parameters transformations would entail that these individual-deviations are only normally distributed on the untransformed scale. Likewise, the correlations of parameter deviations across parameters would also be on the untransformed scale. Both make the interpretation of the random-effects difficult.

> When specifying the parameters without transformation (i.e., link = "identity") care must be taken that the priors places most mass on values inside the allowed range. Likewise, starting values need to be inside the allowed range. Using the identity link function also comes with drawbacks discussed at the end. However, as long as parameter outside the allowed range only occur rarely, such a model can converge successfully and it makes the interpretation easier.
### Priors

- **b**: Our priors on the effect of conditions on the drift rate are centred around 0 and give enough probability mass to plausible values (the typical range for drift being [-5, 5]). A student distribution has fatter tails than a Gaussian one, dealing better with outliers (a Cauchy distribution would also be a good alternative).
- **ndt**: Our priors on the non-decision time is a *gamma* distribution, that naturally excludes 0. We set it slightly skewed to the left, so that most of the mass covers the 0 - 0.5 s (we don't expect non-decision processes to take more than 0.5 s). Note that we have to specify the upper bound (ub) as empty to overrite the default value.
- **bs**: Our priors on the boundary separation is a wider *gamma* distribution.
- **bias**: Our priors is a *beta* distribution (naturally bounded at ]0, 1[) centered around 0.5.


```{r warning=FALSE, message=FALSE}
# get_prior(formula, data = df, family = family)
prior <- c(
prior("student_t(3, 0, 1.5)", class = "b"),
set_prior("gamma(1.5, 3)", class = "ndt", ub = "min_Y"),
set_prior("gamma(3, 2)", class = "bs"),
set_prior("beta(1.3, 1.3)", class = "bias")
) |> brms::validate_prior(formula,
family = family,
data = df)
prior_data <- brms:::prepare_print_prior(prior) |>
mutate(ub = ifelse(ub == "min_Y", min(df$rt), ub),
Parameter = ifelse(coef != "", paste0(class, "_", coef), class)
) |>
filter(Parameter != "b") |>
ggdist::parse_dist()
This repo started as my attempt at understanding and implementing sequential models, starting with (Hierarchical) Drift Diffusion Models (DDMs) for reaction times in R. **Please don't hesitate** to open an issue to discuss and suggest things that could be improved or clarified.


prior_data |>
ggplot(aes(y = Parameter, xdist = .dist_obj)) +
ggdist::stat_halfeye(geom = "slab", n = 10001, normalize="xy") +
# geom_vline(xintercept = 0.3) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
coord_cartesian(xlim = c(-5, 5)) +
labs(title = "Priors") +
theme_minimal()
```
## Content

### Sampling
Reaction time (RTs) have been traditionally modelled using traditional linear models (e.g., ANOVAs). However, it is problematic because RTs are **not** normally distributed. A popular mitigation method is to **transform** the data (e.g., usinga log-transform), but it is **not a good idea** ([Schramm & Rouder, 2019](https://doi.org/10.31234/osf.io/9ksa6)). Instead, one should use statistical models that **describe** or **generate** RT-like data.

```{r warning=FALSE, message=FALSE}
init_func <- function(chain_id=1) {
list(bias = 0.5, bs = 1 , ndt = 0.2, b = rep(0, 4))
}
You should start by reading:

# str(m$fit@inits)
- [**Lindelov's overview of RT models**](https://lindeloev.github.io/shiny-rt/): An absolute must-read.
- [**De Boeck & Jeon (2019)**](https://www.frontiersin.org/articles/10.3389/fpsyg.2019.00102/full): A paper providing an overview of RT models.

This repository contain the following vignettes:

m <- brm(formula,
data = df,
family = family,
prior = prior,
algorithm = "fullrank",
init = init_func,
# Options
iter = 10000, # N of convergence iterations, defaults to 2000.
output_samples = 4000, # N of posterior samples to draw and save, defaults to 1000.
elbo_samples = 100, # N of samples for Monte Carlo estimate of ELBO (objective function), defaults to 100.
tol_rel_obj = 0.01, # convergence tolerance on the relative norm of the objective, defaults to 0.01.
importance_resampling = TRUE) # adjust the draws at the optimum to be more like draws from the posterior distribution
```


```{r warning=FALSE, message=FALSE}
m
as.data.frame(m) |>
select(starts_with("b_"), bs, ndt, bias) |>
datawizard::data_to_long(names_to="Parameter", rows_to = "iter") |>
ggplot(aes(y = Parameter)) +
ggdist::stat_halfeye(data = prior_data, aes(xdist = .dist_obj), geom = "slab", n = 10001, normalize="xy") +
# ggdist::stat_dotsinterval(aes(x = value, fill = Parameter), color = "black", slab_linewidth = NA, scale = 1, dotsize = 2, normalize = "xy") +
ggdist::stat_slabinterval(aes(x = value, fill = Parameter), slab_linewidth = NA, scale = 1, normalize = "xy") +
geom_vline(xintercept = 0, linetype = "dashed") +
coord_cartesian(xlim = c(-2.5, 5))
```

### Posterior Predictive Check

```{r warning=FALSE, message=FALSE}
brms::pp_check(m)
df$Predicted <- as.numeric(get_predicted(m, predict = "prediction", iterations = 100))
ddm_plot(mutate(df, rt = Predicted), density = sim$density)
```
- DDMs
- Ex-Gaussian

## Installation

```{r eval=FALSE}
remotes::install_github("DominiqueMakowski/easyRT")

## Better Model


### Formula

```{r warning=FALSE, message=FALSE}
df$condition <- as.numeric(df$condition)
formula <- bf(rt | dec(response) ~ condition,
bias ~ condition)
family <- wiener(link_bs = "identity",
link_ndt = "identity",
link_bias = "identity")
library(easyRT)
```

## What does this package do?

### Priors

Not much. It is mostly about its vignettes, but it also implement some convenience function to generate and plot drift diffusion models.

```{r warning=FALSE, message=FALSE}
# get_prior(formula, data = df, family = family)
prior <- c(
prior("student_t(3, 0, 1.5)", class = "b", dpar = ""),
prior("normal(0, 0.1)", class = "b", dpar = "bias", lb = -1, ub = 1),
prior("student_t(2, 0, 2)", class = "Intercept", dpar = ""),
set_prior("beta(2, 2)", class = "Intercept", dpar = "bias", lb = 0, ub = 1),
set_prior("gamma(1.5, 3)", class = "ndt"),
set_prior("gamma(2, 1.5)", class = "bs")
) |> brms::validate_prior(c(),formula,
family = family,
data = df)
sim <- ddm_data(drift = 1, bs = 1, bias = 0.5, ndt = 0.2)
prior_data <- brms:::prepare_print_prior(prior) |>
filter(!(class == "b" & coef == "")) |>
mutate(ub = ifelse(ub == "min_Y", min(df$rt), ub),
Parameter = c("b_condition", "b_bias_condition", "bs", "b_Intercept", "b_bias_Intercept", "ndt")) |>
ggdist::parse_dist()
prior_data |>
ggplot(aes(y = Parameter, xdist = .dist_obj)) +
ggdist::stat_halfeye(geom = "slab", n = 10001, normalize="xy") +
# geom_vline(xintercept = 0.3) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
coord_cartesian(xlim = c(-5, 5)) +
labs(title = "Priors") +
theme_minimal()
ddm_plot(sim)
```

### Sampling

```{r warning=FALSE, message=FALSE}
# stancode(m)
init_func <- function(chain_id=1) {
list(bs = 1, ndt = 0.1,
b = 0, Intercept = 0,
Intercept_bias = 0.5, b_bias_Intercept = 0, b_bias = 0)
}
m <- brm(formula,
data = df,
family = family,
prior = prior,
algorithm = "fullrank",
init = init_func,
sample_prior = "only",
# Options
iter = 10000, # N of convergence iterations, defaults to 2000.
output_samples = 4000, # N of posterior samples to draw and save, defaults to 1000.
elbo_samples = 100, # N of samples for Monte Carlo estimate of ELBO (objective function), defaults to 100.
tol_rel_obj = 0.0095, # convergence tolerance on the relative norm of the objective, defaults to 0.01.
importance_resampling = TRUE) # adjust the draws at the optimum to be more like draws from the posterior distribution
x <- str(m$fit@inits)
x
```

```{r warning=FALSE, message=FALSE}
m
names(as.data.frame(m))
as.data.frame(m) |>
select(starts_with("b_"), bs, ndt) |>
datawizard::data_to_long(names_to="Parameter", rows_to = "iter") |>
ggplot(aes(y = Parameter)) +
ggdist::stat_halfeye(data = prior_data, aes(xdist = .dist_obj), geom = "slab", n = 10001, normalize="xy") +
ggdist::stat_slabinterval(aes(x = value, fill = Parameter), slab_linewidth = NA, scale = 1, normalize = "xy") +
geom_vline(xintercept = 0, linetype = "dashed") +
coord_cartesian(xlim = c(-2.5, 5))
```
12 changes: 12 additions & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
url: ~
template:
bootstrap: 5

navbar:
left:
- text: Functions
href: reference/index.html
- text: "DDMs"
href: articles/ddm.html
- text: "Ex-Gaussian"
href: articles/exgaussian.html
Loading

0 comments on commit 78e6ea1

Please sign in to comment.