Meet Tweedie

An exploration of the Tweedie family of distributions

statistics
code
Author
Affiliation
Published

March 7, 2023

I’m working with some spatio-temporally patchy fisheries count data. To allow for their highly variable nature, I was recommended to look at the Tweedie family of distributions. I thought I’d write a post to introduce Tweedie and what I learnt.

Tweedie definition

The Tweedie distributions are a subfamily of the Exponential distributions, but with a special mean-variance relationship with:

  • a mean of \(E(Y) = \mu\);
  • a positive dispersion parameter \(\sigma^2\); and
  • a variance of \(Var(Y) = \sigma^2 \mu^p\).

The \(p\) in the variance function is often called the “Tweedie power” parameter and acts as an additional shape parameter for the distribution. It is sometimes written in terms of the shape parameter \(\alpha\):

\[ p = \frac{(\alpha - 2)}{(\alpha - 1)} \]

Some familiar distributions are special cases of the Tweedie distribution:

  • \(p = 0\): Normal distribution;
  • \(p = 1\): Poisson distribution;
  • \(1 < p < 2\): Compound Poisson/gamma distribution;
  • \(p = 2\): gamma distribution;
  • \(2 < p < 3\): Positive stable distributions;
  • \(p = 3\): Inverse Gaussian distribution / Wald distribution;
  • \(p > 3\): Positive stable distributions; and
  • \(p = \infty\): Extreme stable distributions.

(Note that the distribution is not defined for \(p\) values \(0 < p < 1\).)

Poisson-like Tweedie cases

Since I am working with counts, the Tweedie distributions related to the Poisson distribution were of particular interest to me. Specifically, I was interested to explore the Tweedie distributions with power parameter values from 1 to 2.

With the help of the R package tweedie, we can plot histograms of those distributions as follows:

# libraries
library(tweedie)
library(data.table)
library(ggplot2)

# xis (also known as powers)
xis <- seq(1, 2, length.out = 12)

# make wide
d <- as.data.table(lapply(xis, 
                          function(v) 
                            rtweedie(n = 1000, mu = 1, phi = 1, xi = v)))

# set names
setnames(d, colnames(d), paste0("xi = ", sprintf("%.01f", xis)))

# make long
dl <- stack(d)

# make plot
p <- ggplot(dl, aes(x = values)) +
  geom_histogram(binwidth = 0.1) +
  facet_wrap(~ ind) +
  labs(x = "value", y = "frequency") +
  theme_minimal()

# save plot
jpeg("tweedie-histograms.jpg", res = 300, width = (480 * 5), height = (480 * 5))
print(p)
dev.off()

It is interesting to note that as \(p \to 1\) so the Tweedie distribution allows for greater mass of points at 0 (zero). This is a particularly useful feature of the Tweedie when dealing with highly variable and sometimes low counts that are frequently at or around 0.

Using the Tweedie

I’m going to adapt the example from the tweedie() distribution:

# library
library(statmod)

# response and explanatory variables
y <- rgamma(n = 200, scale = 1, shape = 1)
x <- rpois(200, lambda = 10)

# tweedie profile
out <- tweedie.profile(y ~ 1, p.vec = seq(1.5, 2.5, by=0.2))
Warning in model.matrix.default(mt, mf, contrasts): non-list contrasts argument
ignored
1.5 1.7 1.9 2.1 2.3 2.5 
......Done.
print(out$p.max)
[1] 2.05102
print(out$ci)
[1] 1.885009 2.230231
# fit a poisson generalized linear model with identity link
m1 <- glm(y ~ x, family = tweedie(var.power = 1, link.power = 1))
print(summary(m1))

Call:
glm(formula = y ~ x, family = tweedie(var.power = 1, link.power = 1))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.4165  -0.9003  -0.3048   0.4581   2.8102  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.86376    0.25531   3.383 0.000864 ***
x            0.01391    0.02423   0.574 0.566455    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Tweedie family taken to be 0.989408)

    Null deviance: 178.02  on 199  degrees of freedom
Residual deviance: 177.67  on 198  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 5
# fit an inverse-Gaussion glm with log-link
m0 <- glm(y ~ x, family = tweedie(var.power = 3, link.power = 0))
print(summary(m0))

Call:
glm(formula = y ~ x, family = tweedie(var.power = 3, link.power = 0))

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-12.9320   -1.4851   -0.3246    0.3920    1.8081  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.12647    0.25390  -0.498    0.619
x            0.01287    0.02408   0.534    0.594

(Dispersion parameter for Tweedie family taken to be 0.9764311)

    Null deviance: 1051.8  on 199  degrees of freedom
Residual deviance: 1051.5  on 198  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 25

I’m going to revisit Tweedie soon, so watch this space…

Reuse

Citation

BibTeX citation:
@online{d. gregory2023,
  author = {D. Gregory, Stephen},
  title = {Meet {Tweedie}},
  date = {2023-03-07},
  url = {https://stephendavidgregory.github.io/posts/meet-tweedie},
  langid = {en}
}
For attribution, please cite this work as:
D. Gregory, Stephen. 2023. “Meet Tweedie.” March 7, 2023. https://stephendavidgregory.github.io/posts/meet-tweedie.