You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to make a summary of all the code in a structured and unified manner for the exam. Comment if you think something is missing or should be added!
Analyse the posterior distribution and summarize the results
General
The following libraries must be activated for all the
methods we use. Set a seed to make your results reproducible.
## General ##
library(rstanarm)
library(rstan)
library(loo)
library(bayesplot)
library(ggplot2)
set.seed(123)
Distributions
Some useful distributions and how to specify them in R. Mainly for visualization purposes.
## Distributions ##
a <- 2
b <- 2
curve(expr = dbeta(x, shape1 = a, shape2 = b), from = 0, to = 1, add = F) # Show the density of a beta-distribution
df <- 3 # degrees of freedom
curve(expr = dt(x, df= df), from = -3, to = 3, add = F) # Show the density of a Students-t-distribution
Parameter sampling
## Parameter sampling ##
################
#### Modify ####
################
sample <- # Our sample parameters
mean(sample); sd(sample) # Main statistics
quantile(sample, probs = c(0.025, 0.25, 0.5, 0.75, 0.975)) # Distribution quantiles
hist(sample, breaks = 90) # Histogram for sample distribution
plot(sample) # Show the parameter for each iteration, indicates convergence
sd(sample)/sqrt(length(sample)) # Monte Carlo standard error, indicates accuracy of chains (e.g. how big the estimation noise is)
Data overview
## Data overview ##
################
#### Modify ####
################
data <- # Our data
str(data)
pairs(data)
Fit models
Here are several examples for different model specifications. The exact specification depends on the exercise.
This includes especially the modification of the keywords family, prior, prior_intercept and prior_aux.
Fixed-effects models
Note: Different model equations are possible, depending on the question
y ~ 1 (Fixed effects random intercept)
y ~ x1 + x2 (Fixed effects random internships and covariates)
## Fit Models ##
################
#### Modify ####
################
mod_equation <- # Adjust the model equation for what is asked in the exercises
model_default <- stan_glm(mod_equation, data = data, family = "gaussian")
model_flat <- stan_glm(mod_equation,
data = data,
family = "gaussian",
prior = NULL)
model_adjusted <- stan_glm(mod_equation,
data = data,
family = "Gamma",
prior = normal(0, 100),
prior_intercept = normal(0, 100),
prior_aux = cauchy(0, 1))
Random-effects model
Note: Different model equations are possible, depending on the question
y ~ (1|group) (Random intercept, thus random effects model)
y ~ x1 + x2 + (1 + x1|group) (Random intercept + random slope for x1, covariates x1 and x2)
## Fit Models ##
################
#### Modify ####
################
mod_equation_er <- # Adjust the model equation for what is asked in the exercises
model_default_er <- stan_lmer(mod_equation_er, data = data)
model_flat_er <- stan_glm(mod_equation_er,
data = data,
family = "poisson",
prior = NULL)
model_adjusted_er <- stan_glm(mod_equation_er,
data = data,
family = "binomial",
prior = normal(0, 100),
prior_intercept = normal(0, 100),
prior_aux = cauchy(0, 1))
Model information
## Model information ##
################
#### Modify ####
################
model <- ; # Adjust for the model you want to investigate
#### Retrieve model information ####
prior_summary(model) # Get information about the specified and adjusted priors
summary(model) # Get information about coefficient estimations and algorithm statistics like R_hat and N_eff
Model diagnostics
Convergence check
The stan_trace plot show the time series over the iterations for each chain.
Every parameter should converge after some iterations. If not, try to include
more iterations to the model fit through the argument "iter"
The stan_ac plots the autocorrelation between the elements of each parameter for each chain.
If a high autocorrelation is present somewhere, change argument "keep_everywhere" integer
higher than 1.
R_hat
n_eff Effective number of iterations -> Formula: number of iterations / sum(autocorrelation)
## Model convergence ##
################
#### Modify ####
################
pars <- ; # Add the parameters like c("par1", "par2") that are of interest
#### Check convergence ####
stan_trace(model, pars = pars, nrow = length(pars), ncol = 1) # 1. Trace plot
stan_ac(model, pars = pars) # 2. Autocorrelation plot
#### Alternative: plot(model, plotfun = "ac") ####
summary(model) # 3. R_hat and 4. n_eff
plot(model, plotfun = "rhat") # 3. R_hat
plot(model, plotfun = "neff") # 4. n_eff
#### Update model ####
# In case the model did not converge or you are not sure
iterations = 5000 # Modify and increase if it does not converge or decrease if it take too long
model <- update(model, iter=iterations)
Posterior Predictive Distribution
Check the appropriateness of the model. This is done by
comparing generated samples to the original data. We have
as many sample datasets as iterations in the model.
## Posterior predictive distribution ##
################
#### Modify ####
################
model <- # The model to investigate
y <- data$y # Enter the response variable here (the original data)
#### Sample generation ####
y_tilde <- posterior_predict(model) # Generate new sample from the model
#### Checks ####
# Density comparison of the sample datasets and original data
ppc_dens_overlay(y = y, yrep = y_tilde[100:200,]) # Different ranges possible
# Posterior predictive checks for main statistics in the samples compared to original data
ppc_stat(y = y, yrep = y_tilde, stat = "mean")
ppc_stat(y = y, yrep = y_tilde, stat = "sd")
ppc_stat(y = y, yrep = y_tilde, stat = "max")
ppc_stat(y = y, yrep = y_tilde, stat = "min")
Model Comparison
## Model comparison ##
################
#### Modify ####
################
alternative_model <- ;
#### Compare via WAIC, the lower the better ####
waic(model)
waic(alternative_model)
Posterior Inference
Explore the posterior distribution
and draw conclusions about the estimated parameters.
## Posterior Inference ##
#### Parameter posterior distributions ####
summary(model, digits = 3) # Displays the main statistics of the estimates (like mean)
plot(model, pars = pars) # All the parameter boxplots specified in pars, suppress pars to get all parameters
stan_hist(model, pars = pars) # Displays the whole posterior distribution of the parameters
#### Credibility intervals for the parameters ####
prob = 0.8 # The probability that the parameter is included in the interval
posterior_interval(model, prob = prob, pars = pars) # Gives back the credibility interval of the parameters in pars
#### Linear predictor posterior distribution ####
mu <- posterior_linpred(model, transform = T) # Get draws from linear predictor (mod_equation) for all samples, transformed by link function
#### Explore observation i
# Here we analyse the posterior distribution of observation i of the linear prediction mu (not the parameters anymore)
################
#### Modify ####
################
i <- ; # Index of observation of interest
hist(mu[,i], breaks = 90)
mean(mu[,i]); sd(mu[,i])
quantile(mu[,i], probs = c(0.05, 0.5, 0.95))
#### R^2 posterior distribution ####
# Calculate R^2
sigma_post <- as.matrix(model, pars = "sigma")
n <- nrow(data)
var_y <- var(y) * (n - 1) / n
R2bayes <- 1 - sigma_post^2 / var_y
# Posterior inference on R^2 (R2bayes)
mean(R2bayes); sd(R2bayes)
quantile(R2bayes, probs = c(0.025, 0.5, 0.975))
hist(R2bayes, breaks = 90)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I tried to make a summary of all the code in a structured and unified manner for the exam. Comment if you think something is missing or should be added!
Code cheat sheet
Procedure
Not in R
Fit Models
Model diagnostics
Model assumptions and Model comparison
Posterior Inference
General
The following libraries must be activated for all the
methods we use. Set a seed to make your results reproducible.
Distributions
Some useful distributions and how to specify them in R. Mainly for visualization purposes.
Parameter sampling
Data overview
Fit models
Here are several examples for different model specifications. The exact specification depends on the exercise.
This includes especially the modification of the keywords family, prior, prior_intercept and prior_aux.
Fixed-effects models
Note: Different model equations are possible, depending on the question
Random-effects model
Note: Different model equations are possible, depending on the question
Model information
Model diagnostics
Convergence check
Every parameter should converge after some iterations. If not, try to include
more iterations to the model fit through the argument "iter"
If a high autocorrelation is present somewhere, change argument "keep_everywhere" integer
higher than 1.
Posterior Predictive Distribution
Check the appropriateness of the model. This is done by
comparing generated samples to the original data. We have
as many sample datasets as iterations in the model.
Model Comparison
Posterior Inference
Explore the posterior distribution
and draw conclusions about the estimated parameters.
Beta Was this translation helpful? Give feedback.
All reactions