Here is a quick post showing how to do a fixed effect logistic regression in jags, extend it to a mixed effects logistic regression and show how it can be fitted using two contrasting specifications, each giving the same result (within stochastic error).
Perhaps these models will work in *BUGS variants but I tend to prefer jags because it is platform-independent, more similar to R, actively developed and with excellent support (thanks, Martin!) and it works nicely with a number of R packages, including dclone. dclone facilitates running jags, particularly in parallel, and is also under continued development (thanks, Peter!).
I will soon write a blog post presenting a template I use to run analyses in jags using dclone, in case it is useful.
A simple fixed effect logistic regression
Before delving into mixed effects logistic regression, I thought it would be a good idea (and reassuring) to fit a simple fixed effect logistic regression in R and jags.
Here is the script:
require(dclone)
require(lme4)
# model
<- function(){
m for (i in 1:N){
~ dbern(p[i])
y[i] logit(p[i]) <- a + b * x[i]
}~ dnorm(0, 0.01)
a ~ dnorm(0, 0.01)
b
}
# data
<- 1000
N <- rnorm(N)
x <- 1
a <- 2
b <- a + b * x
z <- 1 / (1 + exp(-z))
pr <- rbinom(N, 1, pr)
y <- data.frame(y = y, x = x)
df
# glm fit
<- glm(y ~ x, data = df, family = 'binomial')
m0 print(m0)
##
## Call: glm(formula = y ~ x, family = "binomial", data = df)
##
## Coefficients:
## (Intercept) x
## 1.056 2.093
##
## Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
## Null Deviance: 1278
## Residual Deviance: 838.9 AIC: 842.9
# jags data
<- as.list(df)
dat $N <- N
dat
# monitors
<- c('a', 'b')
params
# run it
<- jagsModel(file = m, data = dat, n.chains = 3, n.adapt = 1e3) out
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1000
## Unobserved stochastic nodes: 2
## Total graph size: 5007
##
## Initializing model
update(out, n.iter = 1e3)
<- codaSamples(out, variable.names = params, n.iter = 1e4, n.thin = 1e2)
s <- summary(s)
s_sum
# print medians
print(s_sum$statistics)
## Mean SD Naive SE Time-series SE
## a 1.058522 0.0940543 0.0005430228 0.0008407071
## b 2.101140 0.1404204 0.0008107177 0.0012805193
print(s_sum$quantiles)
## 2.5% 25% 50% 75% 97.5%
## a 0.8778516 0.9944358 1.057302 1.121390 1.245571
## b 1.8307969 2.0055994 2.098017 2.193111 2.388057
Comparing the glm and jags results shows that the estimates are very similar, suggesting that the jags model is well-specified.
So, let’s move on to the mixed effects logistic regression…
A mixed effect logistic regression
I fit a mixed effect logistic regression with a random effect of group specified as a error term or as an intercept term.
require(dclone)
require(lme4)
# models
<- function(){
m_epsilon for (i in 1:N){
~ dbern(p[i])
y[i] logit(p[i]) <- a + (b * x[i]) + epsilon[g[i]]
}for(j in 1:G){
~ dnorm(0, tau)
epsilon[j]
}~ dgamma(0.1, 0.1)
tau <- tau^-0.5
std_dev <- std_dev^2
variance ~ dnorm(0, 0.01)
a <- 1 / (1 + exp(-a))
mean_phi ~ dnorm(0, 0.01)
b
}
<- function(){
m_intercept for (i in 1:N){
~ dbern(p[i])
y[i] logit(p[i]) <- a_[g[i]] + b * x[i]
}for(j in 1:G){
~ dnorm(a, tau)
a_[j]
}~ dgamma(0.1, 0.1)
tau <- tau^-0.5
std_dev <- std_dev^2
variance ~ dnorm(0, 0.01)
a <- 1 / (1 + exp(-a))
mean_phi ~ dnorm(0, 0.01)
b
}
# data
<- 100
n <- 10
G <- gl(G, n)
g <- n * G
N <- rnorm(N)
x <- model.matrix(~ g + x - 1)
Xmat <- 1
mu <- 1 / (1 + exp(-mu))
mean_phi <- 0.5
sd <- rnorm(G, mu, sd)
a <- 2
b <- c(a, b)
ab <- Xmat %*% ab
z <- 1 / (1 + exp(-z))
pr <- rbinom(N, 1, pr)
y <- data.frame(y = y, g = g, x = x)
df
# glmer fit
<- glmer(y ~ x + (1|g), data = df, family = 'binomial')
m0 print(m0)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: y ~ x + (1 | g)
## Data: df
## AIC BIC logLik deviance df.resid
## 834.0712 848.7945 -414.0356 828.0712 997
## Random effects:
## Groups Name Std.Dev.
## g (Intercept) 0.586
## Number of obs: 1000, groups: g, 10
## Fixed Effects:
## (Intercept) x
## 1.095 2.118
print(exp(round(ranef(m0)$g, 3)))
## (Intercept)
## 1 1.3391030
## 2 0.5347264
## 3 1.2788996
## 4 2.6090861
## 5 0.6250023
## 6 0.4073836
## 7 1.2226248
## 8 0.9617507
## 9 0.8411376
## 10 1.6015950
# jags data
<- as.list(df)
dat $N <- N
dat$G <- G
dat
# monitors
<- c('a', 'b', 'mean_phi')
params
# run it
<- jagsModel(file = m_epsilon, data = dat, n.chains = 3, n.adapt = 1e3) out
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1000
## Unobserved stochastic nodes: 13
## Total graph size: 6047
##
## Initializing model
update(out, n.iter = 1e3)
<- codaSamples(out, variable.names = params, n.iter = 1e4, n.thin = 1e2)
s_epsilon <- summary(s_epsilon)
s_epsilon_sum
# print medians
print(s_epsilon_sum$statistics)
## Mean SD Naive SE Time-series SE
## a 1.091157 0.24343847 0.0014054927 0.006733634
## b 2.130777 0.14122313 0.0008153521 0.001298608
## mean_phi 0.745888 0.04591976 0.0002651179 0.001266695
print(s_epsilon_sum$quantiles)
## 2.5% 25% 50% 75% 97.5%
## a 0.6043638 0.9365635 1.0935534 1.2476852 1.5835063
## b 1.8630997 2.0346300 2.1280164 2.2235364 2.4152579
## mean_phi 0.6466540 0.7184050 0.7490503 0.7768989 0.8297005
# run it
<- jagsModel(file = m_intercept, data = dat, n.chains = 3, n.adapt = 1e3) out
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1000
## Unobserved stochastic nodes: 13
## Total graph size: 6037
##
## Initializing model
update(out, n.iter = 1e3)
<- codaSamples(out, variable.names = params, n.iter = 1e4, n.thin = 1e2)
s_intercept <- summary(s_intercept)
s_intercept_sum
# print medians
print(s_intercept_sum$statistics)
## Mean SD Naive SE Time-series SE
## a 1.0996923 0.24700895 0.0014261068 0.001824264
## b 2.1323963 0.13937092 0.0008046584 0.001362175
## mean_phi 0.7474055 0.04612672 0.0002663127 0.000340099
print(s_intercept_sum$quantiles)
## 2.5% 25% 50% 75% 97.5%
## a 0.6111628 0.9446477 1.0958199 1.2533071 1.5996372
## b 1.8698045 2.0363684 2.1291373 2.2254063 2.4140774
## mean_phi 0.6482060 0.7200375 0.7494761 0.7778718 0.8319677
Based on the statistics and quantiles of the MCMC samples, I hope you can see that both the lme4
and Bayesian fits are similar, and that the Bayesian results with group random effect specified as a error term or as an intercept term also produce similar results.
I hope this is useful to someone.
Reuse
Citation
@online{d. gregory2017,
author = {D. Gregory, Stephen},
title = {(Mixed) {Logistic} Regression in {JAGS}},
date = {2017-05-16},
url = {https://stephendavidgregory.github.io/posts/mixed-logistic-regression},
langid = {en}
}