# libraries
library(tweedie)
library(data.table)
library(ggplot2)
# xis (also known as powers)
<- seq(1, 2, length.out = 12)
xis
# make wide
<- as.data.table(lapply(xis,
d function(v)
rtweedie(n = 1000, mu = 1, phi = 1, xi = v)))
# set names
setnames(d, colnames(d), paste0("xi = ", sprintf("%.01f", xis)))
# make long
<- stack(d)
dl
# make plot
<- ggplot(dl, aes(x = values)) +
p geom_histogram(binwidth = 0.1) +
facet_wrap(~ ind) +
labs(x = "value", y = "frequency") +
theme_minimal()
# save plot
jpeg("tweedie-histograms.jpg", res = 300, width = (480 * 5), height = (480 * 5))
print(p)
dev.off()
I’m working with some spatio-temporally patchy fisheries count data. To allow for their highly variable nature, I was recommended to look at the Tweedie family of distributions. I thought I’d write a post to introduce Tweedie and what I learnt.
Tweedie definition
The Tweedie distributions are a subfamily of the Exponential distributions, but with a special mean-variance relationship with:
- a mean of \(E(Y) = \mu\);
- a positive dispersion parameter \(\sigma^2\); and
- a variance of \(Var(Y) = \sigma^2 \mu^p\).
The \(p\) in the variance function is often called the “Tweedie power” parameter and acts as an additional shape parameter for the distribution. It is sometimes written in terms of the shape parameter \(\alpha\):
\[ p = \frac{(\alpha - 2)}{(\alpha - 1)} \]
Some familiar distributions are special cases of the Tweedie distribution:
- \(p = 0\): Normal distribution;
- \(p = 1\): Poisson distribution;
- \(1 < p < 2\): Compound Poisson/gamma distribution;
- \(p = 2\): gamma distribution;
- \(2 < p < 3\): Positive stable distributions;
- \(p = 3\): Inverse Gaussian distribution / Wald distribution;
- \(p > 3\): Positive stable distributions; and
- \(p = \infty\): Extreme stable distributions.
(Note that the distribution is not defined for \(p\) values \(0 < p < 1\).)
Poisson-like Tweedie cases
Since I am working with counts, the Tweedie distributions related to the Poisson distribution were of particular interest to me. Specifically, I was interested to explore the Tweedie distributions with power parameter values from 1 to 2.
With the help of the R
package tweedie
, we can plot histograms of those distributions as follows:
It is interesting to note that as \(p \to 1\) so the Tweedie distribution allows for greater mass of points at 0 (zero). This is a particularly useful feature of the Tweedie when dealing with highly variable and sometimes low counts that are frequently at or around 0.
Using the Tweedie
I’m going to adapt the example from the tweedie()
distribution:
# library
library(statmod)
# response and explanatory variables
<- rgamma(n = 200, scale = 1, shape = 1)
y <- rpois(200, lambda = 10)
x
# tweedie profile
<- tweedie.profile(y ~ 1, p.vec = seq(1.5, 2.5, by=0.2)) out
Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts argument
ignored
1.5 1.7 1.9 2.1 2.3 2.5
......Done.
print(out$p.max)
[1] 2.05102
print(out$ci)
[1] 1.885009 2.230231
# fit a poisson generalized linear model with identity link
<- glm(y ~ x, family = tweedie(var.power = 1, link.power = 1))
m1 print(summary(m1))
Call:
glm(formula = y ~ x, family = tweedie(var.power = 1, link.power = 1))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4165 -0.9003 -0.3048 0.4581 2.8102
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.86376 0.25531 3.383 0.000864 ***
x 0.01391 0.02423 0.574 0.566455
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Tweedie family taken to be 0.989408)
Null deviance: 178.02 on 199 degrees of freedom
Residual deviance: 177.67 on 198 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5
# fit an inverse-Gaussion glm with log-link
<- glm(y ~ x, family = tweedie(var.power = 3, link.power = 0))
m0 print(summary(m0))
Call:
glm(formula = y ~ x, family = tweedie(var.power = 3, link.power = 0))
Deviance Residuals:
Min 1Q Median 3Q Max
-12.9320 -1.4851 -0.3246 0.3920 1.8081
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.12647 0.25390 -0.498 0.619
x 0.01287 0.02408 0.534 0.594
(Dispersion parameter for Tweedie family taken to be 0.9764311)
Null deviance: 1051.8 on 199 degrees of freedom
Residual deviance: 1051.5 on 198 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 25
I’m going to revisit Tweedie soon, so watch this space…
Reuse
Citation
@online{d. gregory2023,
author = {D. Gregory, Stephen},
title = {Meet {Tweedie}},
date = {2023-03-07},
url = {https://stephendavidgregory.github.io/posts/meet-tweedie},
langid = {en}
}