Code for exam #33

SebastianVeuskens · 2023-12-12T11:52:19Z

SebastianVeuskens
Dec 12, 2023
Maintainer

I tried to make a summary of all the code in a structured and unified manner for the exam. Comment if you think something is missing or should be added!

Code cheat sheet

Procedure

Not in R

Understand the data
Create a theoretical form of Model

Fit Models

Specify the priors
Specify the initial values for the simulation
Specify the model equation

Model diagnostics

Check convergence of algorithm

Model assumptions and Model comparison

Check that the assumptions hold via posterior predictive distribution and choose a model via WAIC

Posterior Inference

Analyse the posterior distribution and summarize the results

General

The following libraries must be activated for all the
methods we use. Set a seed to make your results reproducible.

## General ##
library(rstanarm)
library(rstan)
library(loo)
library(bayesplot)
library(ggplot2)
set.seed(123)

Distributions

Some useful distributions and how to specify them in R. Mainly for visualization purposes.

## Distributions ##
a <- 2
b <- 2
curve(expr = dbeta(x, shape1 = a, shape2 = b), from = 0, to = 1, add = F)  # Show the density of a beta-distribution 
df <- 3                                                                    # degrees of freedom
curve(expr = dt(x, df= df), from = -3, to = 3, add = F)                    # Show the density of a Students-t-distribution

Parameter sampling

## Parameter sampling ## 
################
#### Modify ####
################
sample <-                                                   # Our sample parameters
mean(sample); sd(sample)                                    # Main statistics
quantile(sample, probs = c(0.025, 0.25, 0.5, 0.75, 0.975))  # Distribution quantiles 
hist(sample, breaks = 90)                                   # Histogram for sample distribution
plot(sample)                                                # Show the parameter for each iteration, indicates convergence 
sd(sample)/sqrt(length(sample))                             # Monte Carlo standard error, indicates accuracy of chains (e.g. how big the estimation noise is)

Data overview

## Data overview ##
################
#### Modify ####
################
data <- # Our data 
str(data)
pairs(data)

Fit models

Here are several examples for different model specifications. The exact specification depends on the exercise.
This includes especially the modification of the keywords family, prior, prior_intercept and prior_aux.

Fixed-effects models

Note: Different model equations are possible, depending on the question

y ~ 1 (Fixed effects random intercept)
y ~ x1 + x2 (Fixed effects random internships and covariates)

## Fit Models ##
################
#### Modify ####
################
mod_equation <- # Adjust the model equation for what is asked in the exercises
model_default <- stan_glm(mod_equation, data = data, family = "gaussian")
model_flat <- stan_glm(mod_equation, 
                       data = data, 
                       family = "gaussian", 
                       prior = NULL) 
model_adjusted <- stan_glm(mod_equation, 
                           data = data, 
                           family = "Gamma", 
                           prior = normal(0, 100), 
                           prior_intercept = normal(0, 100), 
                           prior_aux = cauchy(0, 1))

Random-effects model

Note: Different model equations are possible, depending on the question

y ~ (1|group) (Random intercept, thus random effects model)
y ~ x1 + x2 + (1 + x1|group) (Random intercept + random slope for x1, covariates x1 and x2)

## Fit Models ## 
################
#### Modify ####
################
mod_equation_er <- # Adjust the model equation for what is asked in the exercises
model_default_er <- stan_lmer(mod_equation_er, data = data) 
model_flat_er <- stan_glm(mod_equation_er, 
                          data = data, 
                          family = "poisson", 
                          prior = NULL) 
model_adjusted_er <- stan_glm(mod_equation_er, 
                           data = data, 
                           family = "binomial", 
                           prior = normal(0, 100), 
                           prior_intercept = normal(0, 100), 
                           prior_aux = cauchy(0, 1))

Model information

## Model information ## 
################
#### Modify ####
################
model <- ;              # Adjust for the model you want to investigate 

#### Retrieve model information ####
prior_summary(model)    # Get information about the specified and adjusted priors 
summary(model)          # Get information about coefficient estimations and algorithm statistics like R_hat and N_eff

Model diagnostics

Convergence check

The stan_trace plot show the time series over the iterations for each chain.
Every parameter should converge after some iterations. If not, try to include
more iterations to the model fit through the argument "iter"
The stan_ac plots the autocorrelation between the elements of each parameter for each chain.
If a high autocorrelation is present somewhere, change argument "keep_everywhere" integer
higher than 1.
R_hat
n_eff Effective number of iterations -> Formula: number of iterations / sum(autocorrelation)

## Model convergence ##
################
#### Modify ####
################
pars <- ;                                                     # Add the parameters like c("par1", "par2") that are of interest 

#### Check convergence ####
stan_trace(model, pars = pars, nrow = length(pars), ncol = 1) # 1. Trace plot 
stan_ac(model, pars = pars)                                   # 2. Autocorrelation plot 

####  Alternative: plot(model, plotfun = "ac") ####
summary(model)                                                # 3. R_hat and 4. n_eff 
plot(model, plotfun = "rhat")                                 # 3. R_hat
plot(model, plotfun = "neff")                                 # 4. n_eff 

#### Update model ####
# In case the model did not converge or you are not sure
iterations = 5000                                             # Modify and increase if it does not converge or decrease if it take too long 
model <- update(model, iter=iterations)

Posterior Predictive Distribution

Check the appropriateness of the model. This is done by
comparing generated samples to the original data. We have
as many sample datasets as iterations in the model.

## Posterior predictive distribution ## 
################
#### Modify ####
################
model <-                                            # The model to investigate 
y <- data$y                                         # Enter the response variable here (the original data)

#### Sample generation ####
y_tilde <- posterior_predict(model)                 # Generate new sample from the model 

#### Checks ####

# Density comparison of the sample datasets and original data 
ppc_dens_overlay(y = y, yrep = y_tilde[100:200,])   # Different ranges possible 

# Posterior predictive checks for main statistics in the samples compared to original data 
ppc_stat(y = y, yrep = y_tilde, stat = "mean")
ppc_stat(y = y, yrep = y_tilde, stat = "sd")
ppc_stat(y = y, yrep = y_tilde, stat = "max")
ppc_stat(y = y, yrep = y_tilde, stat = "min")

Model Comparison

## Model comparison ## 
################
#### Modify ####
################
alternative_model <- ;

#### Compare via WAIC, the lower the better #### 
waic(model)
waic(alternative_model)

Posterior Inference

Explore the posterior distribution
and draw conclusions about the estimated parameters.

## Posterior Inference ## 
#### Parameter posterior distributions ####
summary(model, digits = 3)                          # Displays the main statistics of the estimates (like mean) 
plot(model, pars = pars)                            # All the parameter boxplots specified in pars, suppress pars to get all parameters
stan_hist(model, pars = pars)                       # Displays the whole posterior distribution of the parameters 

#### Credibility intervals for the parameters ####
prob = 0.8                                          # The probability that the parameter is included in the interval
posterior_interval(model, prob = prob, pars = pars) # Gives back the credibility interval of the parameters in pars

#### Linear predictor posterior distribution ####
mu <- posterior_linpred(model, transform = T)       # Get draws from linear predictor (mod_equation) for all samples, transformed by link function 

#### Explore observation i 
# Here we analyse the posterior distribution of observation i of the linear prediction mu (not the parameters anymore)

################
#### Modify ####
################
i <- ;                                               # Index of observation of interest 
hist(mu[,i], breaks = 90)
mean(mu[,i]); sd(mu[,i])
quantile(mu[,i], probs = c(0.05, 0.5, 0.95)) 

#### R^2 posterior distribution ####

# Calculate R^2 
sigma_post <- as.matrix(model, pars = "sigma")
n <- nrow(data)
var_y <- var(y) * (n - 1) / n 
R2bayes <- 1 - sigma_post^2 / var_y 

# Posterior inference on R^2 (R2bayes) 
mean(R2bayes); sd(R2bayes) 
quantile(R2bayes, probs = c(0.025, 0.5, 0.975)) 
hist(R2bayes, breaks = 90)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unibo-StatStudents

Code for exam #33

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Unibo-StatStudents

Code for exam #33

SebastianVeuskens Dec 12, 2023 Maintainer

Code cheat sheet

Procedure

Not in R

Fit Models

Model diagnostics

Model assumptions and Model comparison

Posterior Inference

General

Distributions

Parameter sampling

Data overview

Fit models

Fixed-effects models

Random-effects model

Model information

Model diagnostics

Convergence check

Posterior Predictive Distribution

Model Comparison

Posterior Inference

Replies: 0 comments

SebastianVeuskens
Dec 12, 2023
Maintainer