say hi to the Bell distribution

4 minute read

I believe that everyone who has had contact with statistical tools has heard of the famous Poisson distribution. Especially when we talk about generalized linear models (GLM). This famous probability distribution is always used when you want to model count data, but it has its weaknesses. The main one is the bad fit to overdispersed data, that is when the variance of the data is greater than the average.

Just to remember, when \(X\) has Poisson distribution with parameter \(\theta\), so its mean and variance have the same value, that is, \(\mathbb{E}[X] = \mathbb{VAR}[X] = \theta\). But if \(\mathbb{E}[X] <\mathbb{VAR}[X]\) the Poisson distribution would not be suitable for this data, do you agree? And this is what we call overdispersion. Well, normally for these cases, the Negative Binomial distribution provides good results adding to the model one more parameter (responsible for adjusting the overdispersion).

However, at the end of 2017, Castellares published a paper presenting a new distribution for counting data that has only one parameter and is capable of modeling data with super dispersion. What I’m going to do now is introduce you to this new distribution and some interesting properties, in the simplest possible way.

The Bell distribution emerged considering the following expansion provided by Bell

\[\exp(e^x-1) = \sum_{n=0}^\infty\frac{B_n}{n!}x^n; \hspace{5mm} x \in \mathbb{R},\]

where the coefficients \(B_n\) are the Bell numbers defined by

\[B_n = \frac{1}{e}\sum_{k=1}^\infty \frac{k^n}{k!},\]

starting with \(B_0 =B_1 =1\).

The Bell number (\(B_n\)) in the above equation is the \(n\)th moment of the Poisson distribution with parameter equal to 1, and considering these equation it’s possible define the Bell distribution.

So, a discrete random variable \(Y\) has a Bell distribution with parameter \(\theta\) if its probability mass function is given by

\[\mathbb{P}(Y = y) = \frac{\theta^y e^{-(e^\theta-1)}B_y}{y!} \hspace{5mm} y=0,1,2,\dots,\]

with \(\theta>0\), and \(B_y\) are the Bell numbers.

The main paper brings a lot of demonstration about proprieties on the new distribution, like the fact of the Bell distribution be a particular solution of a multiple Poisson process. But my main point is below:

The mean and variance of \(Y\sim Bell(\theta)\) are

\[\mathbb{E}[Y] = \theta e^\theta \hspace{.5cm} \text{and} \hspace{.5cm} \mathbb{VAR}[Y] = \theta (1+\theta) e^\theta.\]

This makes it clear that this distribution is capable of adapting to overdispersed data without the addition of a parameter. If overdispersion is not yet clear, we can find a new parameterization for the distribution taking \(\mu = \theta e^\theta\) and hence \(\theta = W_0(\mu)\), where \(W_0(\mu)\) is the Lambert function. Using this new parameterization, we can write the Bell probability mass function as

\[\mathbb{P}(Y = y) = \frac{W_0(\mu)^y e^{-(W_0(\mu)-1)}B_y}{y!} \hspace{5mm} y=0,1,2,\dots,\]

with the mean and variance being

\[\mathbb{E}[Y] = \mu \hspace{.5cm} \text{and} \hspace{.5cm} \mathbb{VAR}[Y] = \mu[1+W_0(\mu)].\]

An advantage of the Bell distribution in relation to the Negative Binomial distribution is that no additional (dispersion) parameter is necessary to accommodate overdispersion. Now taking a look at the probability mass function above, we can see it belongs to the one-parameter exponential family, it means that we can easily use it as a generalized linear model.

can we fit a mortality model using the Bell distribution?

I was looking forward to commenting on that. YEEEEES, we can adjust mortality models using the Bell distribution, and that makes a lot of sense. Because according to Macdonald, Richards and Currie (2018), overdispersion is extremely common in mortality data (and this fact is often ignored).

The estimation method widely used for mortality models is the maximum likelihood estimation considering the Poisson distribution, which was proposed in Brillinger (1986) and provides good results. Then, with the use of the Bell distribution, it is expected to obtain at least similar results to those obtained with the Poisson distribution. However, the main advantage of the Bell distribution should be noted in the confidence intervals for the estimates.

but how do we fit the model to the mortality data?

Well, the answer to that question makes me very excited. So I will make a special post to answer it.

I see ya in the next post, or on twitter.

References

BELL, Eric T. Exponential numbers. The American Mathematical Monthly, v. 41, n. 7, p. 411-419, 1934.

BRILLINGER, David R. A biometrics invited paper with discussion: the natural variability of vital rates and associated statistics. Biometrics, p. 693-734, 1986.

Castellares, Fredy, Silvia LP Ferrari, and Artur J. Lemonte. “On the Bell distribution and its associated regression model for count data.” Applied Mathematical Modelling 56 (2018): 172-185.

CORLESS, Robert M. et al. On the LambertW function. Advances in Computational mathematics, v. 5, n. 1, p. 329-359, 1996.

MACDONALD, Angus S.; RICHARDS, Stephen J.; CURRIE, Iain D. Modelling mortality with actuarial applications. Cambridge University Press, 2018.

Leave a comment