I was recently faced with the challenge of testing the importance of non-linear effects in a linear mixed model with fixed effect interaction terms. This was an extension of previous work in which I fitted a linear mixed model and tested the importance of interacion terms using Bayesian Variable Selection.
Bayesian Variable Selection
Here, I present a short and non-technical summary of Bayesian Variable Selection (BVS) as I am using it, and direct the interested reader to the excellent review by Hooten & Hobbs (2015), featuring the method attributed to Kuo & Mallick (1998) and extended by Ntzoufras (2002). In essence, Kuo & Mallick propose multiplying the coefficent for the \(k\)th explanatory variable \(\beta_k\) by a stochastic Bernoulli indicator variable \(I_k\) that evaluates to \(I_k \in {0, 1}\) so that \(I_{k=1}\beta_k \sim \beta_k\) and \(I_{k=0}\beta_k = 0\). Ntzoufras (2002) extends this to interaction terms by multiplying the coefficient for the \(k\)th explanatory variable measured in the \(j\)th group \(\beta_{k,j}\) by an independent stochastic Bernoulli indicator variable \(I_{k,j}\), the mean of which is used as the probability of including an indicator variable representing the importance of the whole interaction term, \(I_k^{int}\). If the interaction term is judged to be “important”, then the main effect for the \(k\)th explanatory variable should also be retained. This is acheived by specifying the inclusion probability of the main effect \(p_k\) as conditional on the probability of inclusion of the interaction term according to \(p_k=I_k^{int} + (1 - I_k^{int})p_{k,r}\), where \(p_{k,r}\) is the mean probability of including the whole interaction term.
What is “importance”?
In this procedure, the easiest way to describe “importance” is when a coefficient is estimated to be substantially more or less than zero, i.e., the 95% credible intervals do not intercept 0. This is particularly elegant when the data set is large such that the chance of coefficient representing a small effect size is “statistically significant” due simply to the large number of cases.
What is meant by a quadratic effect?
I am using a quadratic function to represent a non-linear term. To estimate a quadratic function, one usually specifies it as function with two terms, \(\beta_1x\) and \(\beta_2x^2\). In what follows, the model I have designed treats a quadratic effect as a single term. In other words, if \(\beta_2\) is “important”, then \(\beta_1\) should also be included. The rationale behind this “dependency” is that \(\beta_2\) is a coefficient that “modifies” \(\beta_1\), i.e., \(\beta_2\) should not be estimated unless \(\beta_1\) is also estimated.
In the model below, I specify this dependency by specifying the probability of inclusion of \(\beta_1\) as a function of the probability of inclusion of \(\beta_2\), as described above. I then use this and similar dependencies to test the importance of a main quadratic effect (\(\beta_1x\) and \(\beta_2x^2\)) and a quadratic effect in interaction with group (\(\beta_{1,j}x_j\) and \(\beta_{2,j}x_j^2\)).
The model
This is the model written in JAGS. The annotations (e.g., # note 1) are explained below.
# model
<- function(){
md
~ dnorm(0, 0.001)
a 1] <- 0
ag[for(g in 2:ngroups){
~ dnorm(0, 0.001)
ag[g]
}
~ dnorm(0, 0.001)
b1MT <- iLI + ((1 - iLI) * 0.5) # note 1
iLM_tmp ~ dbern(iLM_tmp)
iLM <- iLM * b1MT
b1M
~ dnorm(0, 0.001)
b2MT <- iLM * 0.5 # note 2
iQM_tmp ~ dbern(iQM_tmp)
iQM <- iQM * b2MT
b2M
~ dbeta(1, 1) # uniform probability between 0 and 1
pLI for(g in 1:ngroups){
~ dbern(pLI)
igLI[g]
}~ dbern(sum(igLI) / ngroups) # note 3
iLI 1] <- 0 # contrast
b1QT[for(g in 2:ngroups){
~ dnorm(0, taub1)
b1QT[g]
}~ dgamma(1, 0.001) # hierarchical variance for shrinkage
taub1 for(g in 1:ngroups){
<- iLI * b1QT[g]
b1Q[g]
}
<- iLI * 0.5 # note 4
pQI for(g in 1:ngroups){
~ dbern(pQI)
igQI[g]
}~ dbern(sum(igQI) / ngroups)
iQI 1] <- 0 # contrast
b2QT[for(g in 2:ngroups){
~ dnorm(0, taub2)
b2QT[g]
}~ dgamma(1, 0.001) # hierarchical variance for shrinkage
taub2 for(g in 1:ngroups){
<- iQI * b2QT[g]
b2Q[g]
}
for(i in 1:n){
~ dnorm(mu[i], tau)
y[i] <- a + ag[groups[i]] + (b1M * x[i]) + (b2M * x2[i]) + (b1Q[groups[i]] * x[i]) + (b2Q[groups[i]] * x2[i])
mu[i]
}
~ dgamma(0.001, 0.001)
tau <- 1 / sqrt(tau)
sigma
}
Notes
- this is dependent on iLI: “if not interaction, then try linear” (not “if linear, then try interaction”)
- this is dependent on iLM: “if linear, then try quadratic” (not “if quadratic, then keep linear”)
- this is independent of all other terms: “try linear interaction”
- this is dependent on iLM: “if linear interaction, then try quadratic interaction” (not “if quadratic interaction, then keep linear interaction”)
An example in R with JAGS
I have written a small R function to generate data for with optional non-zero coefficients for a linear main effect, quadratic main effect, linear interaction effect and quadratic interaction effect. The response variable y
is normally distributed with \(\bar{x} = 0\) and \(\sigma = 0.5\).
# simulate quadratic data function ----------------------------------------
<- function(ngroups, nsamples, a, b1, b2, sigma,
quad.f std.x = TRUE, incl.Xmat = FALSE){
## coefs
if(length(a) == 1) stop('length(a) != ngroups')
if(length(b1) == 1) stop('length(b1) != ngroups')
if(length(b2) == 1) stop('length(b2) != ngroups')
## sample sizes
<- ngroups * nsamples
n
## simulation params
<- gl(ngroups, nsamples)
groups <- lapply(1:ngroups, function(v) seq(1, 100, length.out = nsamples))
x if(std.x) x <- lapply(x, function(v) (v - mean(v)) / sd(v))
<- unlist(x)
x <- x^2
x2 <- rnorm(n, 0, sigma)
epsilon
## simulate data
<- model.matrix(~ groups + x + x2 + groups : x + groups : x2)
Xmat <- c(a, b1[1], b2[1], b1[-1], b2[-1])
ab1b2 <- Xmat %*% ab1b2
lin.pred <- as.numeric(lin.pred + epsilon)
y
# result list
<- list('y' = y, 'groups' = groups, 'x' = x, 'x2' = x2,
res 'n' = n, 'ngroups' = ngroups)
if(incl.Xmat) res$Xmat <- Xmat
## return
return(res)
}
# setup jags fit ----------------------------------------------------------
## monitors
<- c('a', 'ag', 'b1M', 'iLM', 'b2M', 'iQM', 'b1Q', 'b2Q', 'iLI', 'iQI', 'sigma', 'mu')
ps
## iters
<- list('a' = 1000, 'b' = 1000, 'i' = 10000, 't' = 20)
it
# significant b1M, b1Q, b2M and b2Q fit -----------------------------------
if(sig.b1Mb1Qb2Mb2Q){
## simulate data
<- 3
ng <- rnorm(ng, 0.5, 0.01)
a <- rnorm(ng, 1.5, 0.01)
b1 <- rnorm(ng, -1.5, 0.01)
b2 <- 0.5
sigma <- quad.f(ngroups = ng, nsamples = 100,
foo a = a, b1 = b1, b2 = b2, sigma = sigma)
}
# fitting -----------------------------------------------------------------
## fit
<- makePSOCKcluster(3)
cl parLoadModule(cl, 'lecuyer')
parLoadModule(cl, 'dic')
parLoadModule(cl, 'glm')
parJagsModel(cl = cl, name = 'out',
file = md, data = foo, n.chains = 3, n.adapt = it$a)
parUpdate(cl = cl, object = 'out', n.iter = it$b)
<- parCodaSamples(cl = cl, model = 'out',
s variable.names = ps,
n.iter = it$i,
thin = it$t)
parUnloadModule(cl, 'glm')
parUnloadModule(cl, 'dic')
parUnloadModule(cl, 'lecuyer')
stopCluster(cl)
After extensive testing, it seems that the BVS model with backward (for the linear terms) and forward (for the quadratic terms) dependencies returns estimates similar to R
’s lm()
function, often with better results.
:
jags estimates2] ag[3] b1M b2M b1Q[2] b1Q[3] b2Q[2]
a ag[0.4388473 0.4937112 0.6182427 1.6335965 -1.5419742 1.3599954 1.3835574 -1.3888898
3] sigma
b2Q[-1.5621374 0.4846993
:
lm estimates2] ag[3] b1M b2M b1Q[2] b1Q[3] b2Q[2]
a ag[0.4324800 0.5042679 0.6289886 1.6268507 -1.5351445 1.3692690 1.3927054 -1.4003605
3] sigma
b2Q[-1.5749945 0.4826420
:
simulation parameters2] ag[3] b1M b2M b1Q[2] b1Q[3] b2Q[2]
a ag[0.4903288 0.5032463 0.4908905 1.5120682 -1.4975522 1.5014423 1.5124471 -1.5010414
3] sigma
b2Q[-1.4994300 0.5000000
And the statistical significance
of the main effect and interaction terms are:
"1 - statistical significance":
jags
iLM iLI iQM iQI1 1 1 1
A few more examples
Here are a few typical example fits:
I hope this is helpful to someone.
Please contact me (stephendavidgregory@gmail.com) for more details.
Reuse
Citation
@online{d. gregory2016,
author = {D. Gregory, Stephen},
title = {Quadratic Interaction Terms Fitted by {Bayesian} {Variable}
{Selection}},
date = {2016-12-21},
url = {https://stephendavidgregory.github.io/posts/quadratic-interaction-jags},
langid = {en}
}