# Gamma distribution and Poisson distribution

The gamma distribution is important in many statistical applications. This post discusses the connections of the gamma distribution with Poisson distribution. The following is the probability density function of the gamma distribution.

$\displaystyle (1) \ \ \ \ \ f(x)=\frac{1}{\Gamma(\beta)} \ \alpha^\beta \ x^{\beta-1} \ e^{-\alpha x} \ \ \ \ \ \ x>0$

The numbers $\alpha$ and $\beta$, both positive, are fixed constants and are the parameters of the distribution. The symbol $\Gamma(\cdot)$ is the gamma function. This density function describes how the potential gamma observations distribute across the positive x-axis. Baked into this gamma probability density function are two pieces of information about the Poisson distribution. We can simply read off the information from the parameters $\alpha$ and $\beta$ in the density function.

Poisson-Gamma Mixture

For a reason that will be given shortly, the parameters $\alpha$ and $\beta$ in (1) gives a negative binomial distribution. The following is the probability function of this negative binomial distribution.

$\displaystyle (2) \ \ \ \ \ P(N=n)=\binom{\beta+n-1}{n} \ \biggl(\frac{\alpha}{1+\alpha} \biggr)^\beta \ \biggl(\frac{1}{1+\alpha} \biggr)^n \ \ \ \ \ \ n=0,1,2,3,\cdots$

If the parameter $\beta$ is a positive integer, then (2) has a nice interpretation in the context of a series of independent Bernoulli trials. Consider performing a series of independent trials where each trial has one of two distinct outcomes (called success or failure). Assume that the probability of a success in each trial is $p=\alpha/(1+\alpha)$. In this case, the quantity in (2) is the probability of having $n$ failures before the occurrence of the $\beta$th success.

Note that when $\beta$ is a positive integer, the binomial coefficient $\binom{\beta+n-1}{n}$ has the usual calculation $\frac{(\beta+n-1)!}{n! (\beta-1)!}$. When $\beta$ is not an integer but is only a positive number, the binomial coefficient $\binom{\beta+n-1}{n}$ is calculated as follows:

$\displaystyle \binom{\beta+n-1}{n}=\frac{(\beta+n-1) (\beta+n-2) \cdots (\beta+1) \beta}{n!}$

For this new calculation to work, $\beta$ does not have to be a positive integer. It only needs to be a positive real number. Then the distribution in (2) does not have a natural interpretation in terms of performing a series of independent Bernoulli trials. It is simply a counting distribution, i.e. a random discrete variable modeling the number of occurrences of a type of random events.

What does this have to do with gamma distribution and Poisson distribution? When a conditional random variable $X \lvert \lambda$ has a Poisson distribution such that its mean $\lambda$ is an unknown random quantity but follows a gamma distribution with parameters $\beta$ and $\alpha$ as described in (1), the unconditional distribution for $X$ has a negative binomial distribution as described in (2). In other words, the mixture of Poisson distributions with gamma mixing weights is a negative binomial distribution.

There is an insurance interpretation of the Poisson-gamma mixture. Suppose that in a large pool of insureds, the annual claim frequency of an insured is a Poisson distribution with mean $\lambda$. The quantity $\lambda$ varies from insured to insured but is supposed to follow a gamma distribution. Here we have a family of conditional Poisson distributions where each one is conditional on the characteristics of the particular insured in question (a low risk insured has a low $\lambda$ value and a high risk insured has a high $\lambda$ value). The “average” of the conditional Poisson distributions will be a negative binomial distribution (using the gamma distribution to weight the parameter $\lambda$). Thus the claim frequency for an “average” insured in the pool should be modeled by a negative binomial distribution. For a randomly selected insured from the pool, if we do not know anything about this insured, we can use the unconditional negative binomial distribution to model the claim frequency.

A detailed discussion of the negative binomial distribution is found here and here. The notion of mixture distributions and Poisson-gamma mixture in particular are discussed here. Many distributions applicable in actuarial applications are mixture distributions (see here for examples).

Waiting Times in a Poisson Process

The parameter $\beta$ in (1) is called the shape parameter while the parameter $\alpha$ is called the rate parameter. The name of rate parameter will be made clear shortly. For this interpretation of gamma distribution to work, the shape parameter $\beta$ must be a positive integer.

The gamma distribution as described in (1) is intimately connected to a Poisson process through the rate parameter $\alpha$. In a Poisson process, event of a particular interest occurs at random at the rate of $\alpha$ per unit time. Two questions are of interest. How many times will this event occur in a given time interval? How long do we have to wait to observe the first occurrence, the second occurrence and so on? The gamma distribution as described in (1) gives an answer to the second question when $\beta$ is a positive integer.

A counting process is a random experiment that observes the random occurrences of a certain type of events of interest in a time period. A Poisson process is a counting process that satisfies three simplifying assumptions that deal with independence and uniformity in time.

A good example of a Poisson process is the well known experiment in radioactivity conducted by Rutherford and Geiger in 1910. In this experiment, $\alpha$-particles were emitted from a polonium source and the number of $\alpha$-particles were counted during an interval of 7.5 seconds (2,608 many such time intervals were observed). In these 2,608 intervals, a total of 10,097 particles were observed. Thus the mean count per period of 7.5 seconds is 10097 / 2608 = 3.87. In this experiment, $\alpha$-particles are observed at a rate of 3.87 per unit time (7.5 seconds).

In general, consider a Poisson process with a rate of $\lambda$ per unit time, i.e. the event that is of interest occurs at the rate of $\lambda$ per unit time. There are two random variables of interest. One is $N_t$ which is the number of occurrences of the event of interest from time 0 to time $t$. The other is $X_n$ which is the waiting time until the $n$th occurrence of the event of interest where $n$ is a positive integer.

What can we say about the random variables $N_t$? First, the random variable $N_1$, the number of occurrences of the event of interest in a unit time interval, has a Poisson distribution with mean $\lambda$. The following is the probability function.

$\displaystyle (3) \ \ \ \ \ P(N_1=k)=\frac{1}{k!} \ \lambda^k \ e^{-\lambda} \ \ \ \ \ \ \ \ \ k=0,1,2,3,\cdots$

For any positive real number $t$, the random variable $N_t$, which is a discrete random variable, follows a Poisson distribution with mean $\lambda t$. The following is the probability function.

$\displaystyle (4) \ \ \ \ \ P(N_t=k)=\frac{1}{k!} \ (\lambda t)^k \ e^{-\lambda t} \ \ \ \ \ k=0,1,2,3,\cdots$

For a more detailed discussion of why $P(N_1=k)$ and $P(N_t=k)$ are Poisson, see here.

We now discuss the distributions for $X_1$ and $X_n$. The random variable $X_1$ would be the mean time to the first occurrence. It has an exponential distribution with mean $1/\lambda$. The following is the probability density function.

$\displaystyle (5) \ \ \ \ \ f_{X_1}(t)=\lambda \ e^{-\lambda t} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ t>0$

The random variable $X_n$ has a gamma distribution with shape parameter $\beta=n$ and rate parameter $\alpha=\lambda$. The following is the probability density function.

$\displaystyle (6) \ \ \ \ \ f_{X_n}(t)=\frac{1}{(n-1)!} \ \lambda^n \ t^{n-1} \ e^{-\lambda t} \ \ \ \ \ \ t>0$

The density functions in (5) and (6) are derived from the Poisson distributions in (3) and (4). For example, for $X_1>t$ (the first occurrence taking place after time $t$) to happen, there is no occurrence in the time interval $(0,t)$. Thus $P(X_1>t)=P(N_t=0)$. Note that $N_t$ is a Poisson random variable with mean $\lambda t$. The following derives the density function in (5).

$\displaystyle P(X_1>t)=P(N_t=0)=e^{-\lambda t}$

$\displaystyle P(X_1 \le t)=1-e^{-\lambda t}$

Note that the cumulative distribution function $P(X_1 \le t)$ is that for an exponential distribution. Taking the derivative gives the density function in (5). Extending the same reasoning, for $X_n>t$ (the $n$th occurrence taking place after time $t$) to happen, there must be at most $n-1$ occurrences in the time interval $(0,t)$. Once again, we are relating $X_n$ to the Poisson random variable $N_t$. More specifically $P(X_n>t)=P(N_t \le n-1)$.

$\displaystyle (7) \ \ \ \ \ P(X_n>t)=P(N_t \le n-1)=\sum \limits_{k=0}^{n-1} \biggl[ \frac{(\lambda t)^k}{k!} \ e^{-\lambda t} \biggr]$

$\displaystyle (8) \ \ \ \ \ P(X_n \le t)=1-\sum \limits_{k=0}^{n-1} \biggl[ \frac{(\lambda t)^k}{k!} \ e^{-\lambda t} \biggr]=\sum \limits_{k=n}^{\infty} \biggl[ \frac{(\lambda t)^k}{k!} \ e^{-\lambda t} \biggr]$

Taking the derivative of (8) gives the density function in (6). See here for a more detailed discussion of the relation between gamma distribution and Poisson distribution in a Poisson process.

Evaluating Gamma Survival Function

The relation (7) shows that the gamma survival function is the cumulative distribution function (CDF) of the corresponding Poisson distribution. Consider the following integral.

$\displaystyle (9) \ \ \ \ \ \int_t^\infty \frac{1}{(n-1)!} \ \lambda^n \ x^{n-1} \ e^{-\lambda x} \ dx$

The integrand in the above integral is the density function of a gamma distribution (with the shape parameter being a positive integer). The limits of the integral are from $t$ to $\infty$. Conceptually the integral is identical to (7) since the integral gives $P(X_n>t)$, which is the survival function of the gamma distribution in question. According to (7), the integral in (9) has a closed form as follows.

$\displaystyle (10) \ \ \ \ \ \int_t^\infty \frac{1}{(n-1)!} \ \lambda^n \ x^{n-1} \ e^{-\lambda x} \ dx=\sum \limits_{k=0}^{n-1} \biggl[ \frac{(\lambda t)^k}{k!} \ e^{-\lambda t} \biggr]$

The above is a combination of (7) and (9) and is a way to evaluate the right tail of the gamma distribution when the shape parameter is a positive integer $n$. Instead of memorizing it, we can focus on the thought process.

Given the integral in (9), note the shape parameter $n$ and the rate parameter $\lambda$. Consider a Poisson process with rater parameter $\lambda$. In this Poisson process, the random variable $N_1$, the number of occurrences of the event of interest in a unit time interval, has a Poisson distribution with mean $\lambda$. The random variable $N_t$, the number of occurrences of the event of interest in a time interval of length $t$, has a Poisson distribution with mean $\lambda t$. Then the integral in (9) is equivalent to $P(N_t \le n-1)$, the probability that there are at most $n-1$ occurrences in the time interval $(0,t)$.

The above thought process works even if the gamma distribution has no relation to any Poisson process, e.g. it is just the model for size of insurance losses. We can still pretend there is a Poisson relationship and obtain a closed form evaluation of the survival function. It must be emphasized that this evaluation of the gamma survival function in closed form is possible only when the shape parameter is a positive integer. If it is not, we then need to evaluate the gamma survival function using software.

Remarks

The discussion here presents two relationships between the gamma distribution and the Poisson distribution. In the first one, mixing conditional Poisson distributions with gamma mixing weights produces a negative binomial distribution. Taking the shape parameter $\beta$ and rate parameter $\alpha$ in (1) produces the negative binomial distribution as in (2). This is one interpretation of the two gamma parameters.

In the second interpretation, the rate parameter of the gamma model is intimately tied to the rate parameter of a Poisson process. The interplay between the number of occurrences of the event of interest and the waiting time until the $n$th occurrence connects the gamma distribution with the Poisson distribution. For a fuller discussion of this interplay, see here.

The gamma distribution is mathematically derived from the gamma function. See here for an introduction to the gamma distribution.

The three assumptions for a Poisson process that are alluded to above allow us to obtain a binomial distribution when dividing a time interval into $n$ subintervals (for a sufficiently large $n$). As the subintervals get more and more granular, the binomial distributions have Poisson distribution as the limiting distribution. The derivation from binomial to Poisson is discussed here, here and here.

Practice problems can be found here in a companion blog for practice problems.

Dan Ma gamma distribution
Dan Ma Poisson distribution
Dan Ma math

Daniel Ma gamma distribution
Daniel Ma mathematics
Daniel Ma Poisson distribution

$\copyright$ 2018 – Dan Ma

# Getting from Binomial to Poisson

In many binomial problems, the number of Bernoulli trials $n$ is large, relatively speaking, and the probability of success $p$ is small such that $n p$ is of moderate magnitude. For example, consider problems that deal with rare events where the probability of occurrence is small (as a concrete example, counting the number of people with July 1 as birthday out of a random sample of 1000 people). It is often convenient to approximate such binomial problems using the Poisson distribution. The justification for using the Poisson approximation is that the Poisson distribution is a limiting case of the binomial distribution. Now that cheap computing power is widely available, it is quite easy to use computer or other computing devices to obtain exact binomial probabiities for experiments up to 1000 trials or more. Though the Poisson approximation may no longer be necessary for such problems, knowing how to get from binomial to Poisson is important for understanding the Poisson distribution itself.

Consider a counting process that describes the occurrences of a certain type of events of interest in a unit time interval subject to three simplifying assumptions (discussed below). We are interested in counting the number of occurrences of the event of interest in a unit time interval. As a concrete example, consider the number of cars arriving at an observation point in a certain highway in a period of time, say one hour. We wish to model the probability distribution of how many cars that will arrive at the observation point in this particular highway in one hour. Let $X$ be the random variable described by this probability distribution. We wish to konw the probability that there are $k$ cars arriving in one hour. We start with using a binomial distribution as an approximation to the probability $P(X=k)$. We will see that upon letting $n \rightarrow \infty$, the $P(X=k)$ is a Poisson probability.

Suppose that we know $E(X)=\alpha$, perhaps an average obtained after observing cars at the observation points for many hours. The simplifying assumptions alluded to earlier are the following:

1. The numbers of cars arriving in nonoverlapping time intervals are independent.
2. The probability of one car arriving in a very short time interval of length $h$ is $\alpha h$.
3. The probability of having more than one car arriving in a very short time interval is esstentially zero.

Assumption 1 means that a large number of cars arriving in one period does not imply fewer cars will arrival in the next period and vice versa. In other words, the number of cars that arrive in any one given moment does affect the number of cars that will arrive subsequently. Knowing how many cars arriving in one minute will not help predict the number of cars arriving at the next minute. Assumption 2 means that the rate of cars arriving is dependent only on the length of the time interval and not on when the time interval occurs (e.g. not on whether it is at the beginning of the hour or toward the end of the hour). The assumptions 2 and 3 allow us to think of a very short period of time as a Bernoulli trial. Thinking of the arrival of a car as a success, each short time interval will result in only one success or one failure.

To start, we can break up the hour into 60 minutes (into 60 Bernoulli trials). We then consider the binomial distribution with $n=60$ and $p=\frac{\alpha}{60}$. So the following is an approximation to our desired probability distribution.

$\displaystyle (1) \ \ \ \ \ P(X=k) \approx \binom{60}{k} \biggl(\frac{\alpha}{60}\biggr)^k \biggr(1-\frac{\alpha}{60}\biggr)^{60-k} \ \ \ \ \ k=0,1,2,\cdots, 60$

Conceivably, there can be more than 1 car arriving in a minute and observing cars in a one-minute interval may not be a Bernoulli trial. For a one-minute interval to qualify as a Bernoulli trial, there is either no car arriving or 1 car arriving in that one minute. So we can break up an hour into 3,600 seconds (into 3,600 Bernoulli trials). We now consider the binomial distribution with $n=3600$ and $p=\frac{\alpha}{3600}$.

$\displaystyle (2) \ \ \ \ \ P(X=k) \approx \binom{3600}{k} \biggl(\frac{\alpha}{3600}\biggr)^k \biggr(1-\frac{\alpha}{3600}\biggr)^{3600-k} \ \ \ \ \ k=0,1,2,\cdots, 3600$

It is also conceivable that more than 1 car can arrive in one second and observing cars in one-second interval may still not qualify as a Bernoulli trial. So we need to get more granular. We can divide up the hour into $n$ equal subintervals, each of length $\frac{1}{n}$. The assumptions 2 and 3 ensure that each subinterval is a Bernoulli trial (either it is a success or a failure; one car arriving or no car arriving). Assumption 1 tells us that all the $n$ subintervals are independent. So breaking up the hour into $n$ moments and counting the number of moments that are successes will result in a binomial distribution with parameters $n$ and $p=\frac{\alpha}{n}$. So we are ready to proceed with the following approximation to our probability distribution $P(X=k)$.

$\displaystyle (3) \ \ \ \ \ P(X=k) \approx \binom{n}{k} \biggl(\frac{\alpha}{n}\biggr)^k \biggr(1-\frac{\alpha}{n}\biggr)^{n-k} \ \ \ \ \ k=0,1,2,\cdots, n$

As we get more granular, $n \rightarrow \infty$. We show that the limit of the binomial probability in $(3)$ is the Poisson distribution with parameter $\alpha$. We show the following.

$\displaystyle (4) \ \ \ \ \ P(X=k) = \lim \limits_{n \rightarrow \infty} \binom{n}{k} \biggl(\frac{\alpha}{n}\biggr)^k \biggr(1-\frac{\alpha}{n}\biggr)^{n-k}=\frac{e^{-\alpha} \alpha^k}{k!} \ \ \ \ \ \ k=0,1,2,\cdots$

In the derivation of $(4)$, we need the following two mathematical tools. The statement $(5)$ is one of the definitions of the mathematical constant e. In the statement $(6)$, the integer $n$ in the numerator is greater than the integer $k$ in the denominator. It says that whenever we work with such a ratio of two factorials, the result is the product of $n$ with the smaller integers down to $(n-(k-1))$. There are exactly $k$ terms in the product.

$\displaystyle (5) \ \ \ \ \ \lim \limits_{n \rightarrow \infty} \biggl(1+\frac{x}{n}\biggr)^n=e^x$

$\displaystyle (6) \ \ \ \ \ \frac{n!}{(n-k)!}=n(n-1)(n-2) \cdots (n-k+1) \ \ \ \ \ \ \ \ k

The following is the derivation of $(4)$.

\displaystyle \begin{aligned}(7) \ \ \ \ \ P(X=k)&=\lim \limits_{n \rightarrow \infty} \binom{n}{k} \biggl(\frac{\alpha}{n}\biggr)^k \biggr(1-\frac{\alpha}{n}\biggr)^{n-k} \\&=\lim \limits_{n \rightarrow \infty} \ \frac{n!}{k! (n-k)!} \biggl(\frac{\alpha}{n}\biggr)^k \biggr(1-\frac{\alpha}{n}\biggr)^{n-k} \\&=\lim \limits_{n \rightarrow \infty} \ \frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k} \biggl(\frac{\alpha^k}{k!}\biggr) \biggr(1-\frac{\alpha}{n}\biggr)^{n} \biggr(1-\frac{\alpha}{n}\biggr)^{-k} \\&=\biggl(\frac{\alpha^k}{k!}\biggr) \lim \limits_{n \rightarrow \infty} \ \frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k} \\&\times \ \ \ \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\alpha}{n}\biggr)^{n} \ \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\alpha}{n}\biggr)^{-k} \\&=\frac{e^{-\alpha} \alpha^k}{k!} \end{aligned}

In $(7)$, we have $\displaystyle \lim \limits_{n \rightarrow \infty} \ \frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k}=1$. The reason being that the numerator is a polynomial where the leading term is $n^k$. Upon dividing by $n^k$ and taking the limit, we get 1. Based on $(5)$, we have $\displaystyle \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\alpha}{n}\biggr)^{n}=e^{-\alpha}$. For the last limit in the derivation we have $\displaystyle \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\alpha}{n}\biggr)^{-k}=1$.

We conclude with some comments. As the above derivation shows, the Poisson distribution is at heart a binomial distribution. When we divide the unit time interval into more and more subintervals (as the subintervals get more and more granular), the resulting binomial distribution behaves more and more like the Poisson distribution.

The three assumtions used in the derivation are called the Poisson postulates, which are the underlying assumptions that govern a Poisson process. Such a random process describes the occurrences of some type of events that are of interest (e.g. the arrivals of cars in our example) in a fixed period of time. The positive constant $\alpha$ indicated in Assumption 2 is the parameter of the Poisson process, which can be interpreted as the rate of occurrences of the event of interest (or rate of changes, or rate of arrivals) in a unit time interval, meaning that the positive constant $\alpha$ is the mean number of occurrences in the unit time interval. The derivation in $(7)$ shows that whenever a certain type of events occurs according to a Poisson process with parameter $\alpha$, the counting variable of the number of occurrences in the unit time interval is distributed according to the Poisson distribution as indicated in $(4)$.

If we observe the occurrences of events over intervals of length other than unit length, say, in an interval of length $t$, the counting process is governed by the same three postulates, with the modification to Assumption 2 that the rate of changes of the process is now $\alpha t$. The mean number of occurrences in the time interval of length $t$ is now $\alpha t$. The Assumption 2 now states that for any very short time interval of length $h$ (and that is also a subinterval of the interval of length $t$ under observation), the probability of having one occurrence of event in this short interval is $(\alpha t)h$. Applyng the same derivation, it can be shown that the number of occurrences ($X_t$) in a time interval of length $t$ has the Poisson distribution with the following probability mass function.

$\displaystyle (8) \ \ \ \ \ P(X_t=k)=\frac{e^{-\alpha t} \ (\alpha t)^k}{k!} \ \ \ \ \ \ \ \ k=0,1,2,\cdots$

# Relating Binomial and Negative Binomial

The negative binomial distribution has a natural intepretation as a waiting time until the arrival of the rth success (when the parameter r is a positive integer). The waiting time refers to the number of independent Bernoulli trials needed to reach the rth success. This interpretation of the negative binomial distribution gives us a good way of relating it to the binomial distribution. For example, if the rth success takes place after $k$ failed Bernoulli trials (for a total of $k+r$ trials), then there can be at most $r-1$ successes in the first $k+r$ trials. This tells us that the survival function of the negative binomial distribution is the cumulative distribution function (cdf) of a binomial distribution. In this post, we gives the details behind this observation. A previous post on the negative binomial distribution is found here.

A random experiment resulting in two distinct outcomes (success or failure) is called a Bernoulli trial (e.g. head or tail in a coin toss, whether or not the birthday of a customer is the first of January, whether an insurance claim is above or below a given threshold etc). Suppose a series of independent Bernoulli trials are performed until reaching the rth success where the probability of success in each trial is $p$. Let $X_r$ be the number of failures before the occurrence of the rth success. The following is the probablity mass function of $X_r$.

$\displaystyle (1) \ \ \ \ P(X_r)=\binom{k+r-1}{k} p^r (1-p)^k \ \ \ \ \ \ k=0,1,2,3,\cdots$

Be definition, the survival function and cdf of $X_r$ are:

$\displaystyle (2) \ \ \ \ P(X_r > k)=\sum \limits_{j=k+1}^\infty \binom{j+r-1}{j} p^r (1-p)^j \ \ \ \ \ \ k=0,1,2,3,\cdots$

$\displaystyle (3) \ \ \ \ P(X_r \le k)=\sum \limits_{j=0}^k \binom{j+r-1}{j} p^r (1-p)^j \ \ \ \ \ \ k=0,1,2,3,\cdots$

For each positive integer $k$, let $Y_{r+k}$ be the number of successes in performing a sequence of $r+k$ independent Bernoulli trials where $p$ is the probability of success. In other words, $Y_{r+k}$ has a binomial distribution with parameters $r+k$ and $p$.

If the random experiment requires more than $k$ failures to reach the rth success, there are at most $r-1$ successes in the first $k+r$ trails. Thus the survival function of $X_r$ is the same as the cdf of a binomial distribution. Equivalently, the cdf of $X_r$ is the same as the survival function of a binomial distribution. We have the following:

\displaystyle \begin{aligned}(4) \ \ \ \ P(X_r > k)&=P(Y_{k+r} \le r-1) \\&=\sum \limits_{j=0}^{r-1} \binom{k+r}{j} p^j (1-p)^{k+r-j} \ \ \ \ \ \ k=0,1,2,3,\cdots \end{aligned}

\displaystyle \begin{aligned}(5) \ \ \ \ P(X_r \le k)&=P(Y_{k+r} > r-1) \ \ \ \ \ \ k=0,1,2,3,\cdots \end{aligned}

Remark
The relation $(4)$ is analogous to the relationship between the Gamma distribution and the Poisson distribution. Recall that a Gamma distribution with shape parameter $\alpha$ and scale parameter $n$, where $n$ is a positive integer, can be interpreted as the waiting time until the nth change in a Poisson process. Thus, if the nth change takes place after time $t$, there can be at most $n-1$ arrivals in the time interval $[0,t]$. Thus the survival function of this Gamma distribution is the same as the cdf of a Poisson distribution. The relation $(4)$ is analogous to the following relation.

$\displaystyle (5) \ \ \ \ \int_t^\infty \frac{\alpha^n}{(n-1)!} \ x^{n-1} \ e^{-\alpha x} \ dx=\sum \limits_{j=0}^{n-1} \frac{e^{-\alpha t} \ (\alpha t)^j}{j!}$

A previous post on the negative binomial distribution is found here.

# The Poisson Distribution

Let $\alpha$ be a positive constant. Consider the following probability distribution:

$\displaystyle (1) \ \ \ \ \ P(X=j)=\frac{e^{-\alpha} \alpha^j}{j!} \ \ \ \ \ j=0,1,2,\cdots$

The above distribution is said to be a Poisson distribution with parameter $\alpha$. The Poisson distribution is usually used to model the random number of events occurring in a fixed time interval. As will be shown below, $E(X)=\alpha$. Thus the parameter $\alpha$ is the rate of occurrence of the random events; it indicates on average how many events occur per unit of time. Examples of random events that may be modeled by the Poisson distribution include the number of alpha particles emitted by a radioactive substance counted in a prescribed area during a fixed period of time, the number of auto accidents in a fixed period of time or the number of losses arising from a group of insureds during a policy period.

Each of the above examples can be thought of as a process that generates a number of arrivals or changes in a fixed period of time. If such a counting process leads to a Poisson distribution, then the process is said to be a Poisson process.

We now discuss some basic properties of the Poisson distribution. Using the Taylor series expansion of $e^{\alpha}$, the following shows that $(1)$ is indeed a probability distribution.

$\displaystyle . \ \ \ \ \ \ \ \sum \limits_{j=0}^\infty \frac{e^{-\alpha} \alpha^j}{j!}=e^{-\alpha} \sum \limits_{j=0}^\infty \frac{\alpha^j}{j!}=e^{-\alpha} e^{\alpha}=1$

The generating function of the Poisson distribution is $g(z)=e^{\alpha (z-1)}$ (see The generating function). The mean and variance can be calculated using the generating function.

\displaystyle \begin{aligned}(2) \ \ \ \ \ &E(X)=g'(1)=\alpha \\&\text{ } \\&E[X(X-1)]=g^{(2)}(1)=\alpha^2 \\&\text{ } \\&Var(X)=E[X(X-1)]+E(X)-E(X)^2=\alpha^2+\alpha-\alpha^2=\alpha \end{aligned}

The Poisson distribution can also be interpreted as an approximation to the binomial distribution. It is well known that the Poisson distribution is the limiting case of binomial distributions (see [1] or this post).

$\displaystyle (3) \ \ \ \ \ \lim \limits_{n \rightarrow \infty} \binom{n}{j} \biggl(\frac{\alpha}{n}\biggr)^j \biggl(1-\frac{\alpha}{n}\biggr)^{n-j}=\frac{e^{-\alpha} \alpha^j}{j!}$

One application of $(3)$ is that we can use Poisson probabilities to approximate Binomial probabilities. The approximation is reasonably good when the number of trials $n$ in a binomial distribution is large and the probability of success $p$ is small. The binomial mean is $n p$ and the variance is $n p (1-p)$. When $p$ is small, $1-p$ is close to 1 and the binomial variance is approximately $np \approx n p (1-p)$. Whenever the mean of a discrete distribution is approximately equaled to the mean, the Poisson approximation is quite good. As a rule of thumb, we can use Poisson to approximate binomial if $n \le 100$ and $p \le 0.01$.

As an example, we use the Poisson distribution to estimate the probability that at most 1 person out of 1000 will have a birthday on the New Year Day. Let $n=1000$ and $p=365^{-1}$. So we use the Poisson distribution with $\alpha=1000 \times 365^{-1}$. The following is an estimate using the Poisson distribution.

$\displaystyle . \ \ \ \ \ \ \ P(X \le 1)=e^{-\alpha}+\alpha e^{-\alpha}=(1+\alpha) e^{-\alpha}=0.2415$

Another useful property is that the independent sum of Poisson distributions also has a Poisson distribution. Specifically, if each $X_i$ has a Poisson distribution with parameter $\alpha_i$, then the independent sum $X=X_1+\cdots+X_n$ has a Poisson distribution with parameter $\alpha=\alpha_1+\cdots+\alpha_n$. One way to see this is that the product of Poisson generating functions has the same general form as $g(z)=e^{\alpha (z-1)}$ (see The generating function). One interpretation of this property is that when merging several arrival processes, each of which follow a Poisson distribution, the result is still a Poisson distribution.

For example, suppose that in an airline ticket counter, the arrival of first class customers follows a Poisson process with a mean arrival rate of 8 per 15 minutes and the arrival of customers flying coach follows a Poisson distribution with a mean rate of 12 per 15 minutes. Then the arrival of customers of either types has a Poisson distribution with a mean rate of 20 per 15 minutes or 80 per hour.

A Poisson distribution with a large mean can be thought of as an independent sum of Poisson distributions. For example, a Poisson distribution with a mean of 50 is the independent sum of 50 Poisson distributions each with mean 1. Because of the central limit theorem, when the mean is large, we can approximate the Poisson using the normal distribution.

In addition to merging several Poisson distributions into one combined Poisson distribution, we can also split a Poisson into several Poisson distributions. For example, suppose that a stream of customers arrives according to a Poisson distribution with parameter $\alpha$ and each customer can be classified into one of two types (e.g. no purchase vs. purchase) with probabilities $p_1$ and $p_2$, respectively. Then the number of “no purchase” customers and the number of “purchase” customers are independent Poisson random variables with parameters $\alpha p_1$ and $\alpha p_2$, respectively. For more details on the splitting of Poisson, see Splitting a Poisson Distribution.

Reference

1. Feller W. An Introduction to Probability Theory and Its Applications, Third Edition, John Wiley & Sons, New York, 1968

# The hazard rate function

In this post, we introduce the hazard rate function using the notions of non-homogeneous Poisson process.

In a Poisson process, changes occur at a constant rate $\lambda$ per unit time. Suppose that we interpret the changes in a Poisson process from a mortality point of view, i.e. a change in the Poisson process mean a termination of a system, be it biological or manufactured, and this Poisson process counts the number of of terminations as they occur. Then the rate of change $\lambda$ is interpreted as a hazard rate (or failure rate or force of mortality). With a constant force of mortality, the time until the next change is exponentially distributed. In this post, we discuss the hazard rate function in a more general setting. The process that counts of the number of terminations will no longer have a constant hazard rate, and instead will have a hazard rate function $\lambda(t)$, a function of time $t$. Such a counting process is called a non-homogeneous Poisson process. We discuss the survival probability models (the time to the next termination) associated with a non-homogeneous Poisson process. We then discuss several important examples of survival probability models, including the Weibull distribution, the Gompertz distribution and the model based on the Makeham’s law. We aso comment briefly the connection between the hazard rate function and the tail weight of a distribution.

$\text{ }$

The Poisson Process
We start with the three postulates of a Poisson process. Consider an experiment in which the occurrences of a certain type of events are counted during a given time interval. We call the occurrence of the type of events in question a change. We assume the following three conditions:

1. The numbers of changes occurring in nonoverlapping intervals are independent.
2. The probability of two or more changes taking place in a sufficiently small interval is essentially zero.
3. The probability of exactly one change in the short interval $(t,t+\delta)$ is approximately $\lambda \delta$ where $\delta$ is sufficiently small and $\lambda$ is a positive constant.

$\text{ }$

When we interpret the Poisson process in a mortality point of view, the constant $\lambda$ is a hazard rate (or force of mortality), which can be interpreted as the rate of failure at the next instant given that the life has survived to time $t$. With a constant force of mortality, the survival model (the time until the next termination) has an exponential distribution with mean $\frac{1}{\lambda}$. We wish to relax the constant force of mortality assumption by making $\lambda$ a function of $t$ instead. The remainder of this post is based on the non-homogeneous Poisson process defined below.

$\text{ }$

The Non-Homogeneous Poisson Process
We modifiy condition 3 above by making $\lambda(t)$ a function of $t$. We have the following modified counting process.

1. The numbers of changes occurring in nonoverlapping intervals are independent.
2. The probability of two or more changes taking place in a sufficiently small interval is essentially zero.
3. The probability of exactly one change in the short interval $(t,t+\delta)$ is approximately $\lambda(t) \delta$ where $\delta$ is sufficiently small and $\lambda(t)$ is a nonnegative function of $t$.

$\text{ }$

We focus on the survival model aspect of such counting processes. Such process can be interpreted as models for the number of changes occurred in a time interval where a change means “termination” or ‘failure” of a system under consideration. The rate of change function $\lambda(t)$ indicated in condition 3 is called the hazard rate function. It is also called the failure rate function in reliability engineering and the force of mortality in life contingency theory.

Based on condition 3 in the non-homogeneous Poisson process, the hazard rate function $\lambda(t)$ can be interpreted as the rate of failure at the next instant given that the life has survived to time $t$.

Two random variables naturally arise from a non-homogeneous Poisson process are described here. One is the discrete variable $N_t$, defined as the number of changes in the time interval $(0,t)$. The other is the continuous random variable $T$, defined as the time until the occurrence of the first change. The probability distribution of $T$ is called a survival model. The following is the link between $N_t$ and $T$.

$\text{ }$

\displaystyle \begin{aligned}(1) \ \ \ \ \ \ \ \ \ &P[T > t]=P[N_t=0] \end{aligned}

$\text{ }$

Note that $P[T > t]$ is the probability that the next change occurs after time $t$. This means that there is no change within the interval $(0,t)$. We have the following theorems.

$\text{ }$

Theorem 1.
Let $\displaystyle \Lambda(t)=\int_{0}^{t} \lambda(y) dy$. Then $e^{-\Lambda(t)}$ is the probability that there is no change in the interval $(0,t)$. That is, $\displaystyle P[N_t=0]=e^{-\Lambda(t)}$.

Proof. We are interested in finding the probability of zero changes in the interval $(0,y+\delta)$. By condition 1, the numbers of changes in the nonoverlapping intervals $(0,y)$ and $(y,y+\delta)$ are independent. Thus we have:

$\text{ }$

$\displaystyle (2) \ \ \ \ \ \ \ \ P[N_{y+\delta}=0] \approx P[N_y=0] \times [1-\lambda(y) \delta]$

$\text{ }$

Note that by condition 3, the probability of exactly one change in the small interval $(y,y+\delta)$ is $\lambda(y) \delta$. Thus $[1-\lambda(y) \delta]$ is the probability of no change in the interval $(y,y+\delta)$. Continuing with equation $(2)$, we have the following derivation:

$\text{ }$

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\frac{P[N_{y+\delta}=0] - P[N_y=0]}{\delta} \approx -\lambda(y) P[N_y=0] \\&\text{ } \\&\frac{d}{dy} P[N_y=0]=-\lambda(y) P[N_y=0] \\&\text{ } \\&\frac{\frac{d}{dy} P[N_y=0]}{P[N_y=0]}=-\lambda(y) \\&\text{ } \\&\int_0^{t} \frac{\frac{d}{dy} P[N_y=0]}{P[N_y=0]} dy=-\int_0^{t} \lambda(y)dy \end{aligned}

$\text{ }$

Evaluating the integral on the left hand side with the boundary condition of $P[N_0=0]=1$ produces the following results:

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &ln P[N_t=0]=-\int_0^{t} \lambda(y)dy \\&\text{ } \\&P[N_t=0]=\displaystyle e^{\displaystyle -\int_0^{t} \lambda(y)dy} \end{aligned}

$\text{ }$

Theorem 2
As discussed above, let $T$ be the length of the interval that is required to observe the first change. Then the following are the distribution function, survival function and pdf of $T$:

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &F_T(t)=\displaystyle 1-e^{-\int_0^t \lambda(y) dy} \\&\text{ } \\&S_T(t)=\displaystyle e^{-\int_0^t \lambda(y) dy} \\&\text{ } \\&f_T(t)=\displaystyle \lambda(t) e^{-\int_0^t \lambda(y) dy} \end{aligned}

Proof. In Theorem 1, we derive the probability $P[N_y=0]$ for the discrete variable $N_y$ derived from the non-homogeneous Poisson process. We now consider the continuous random variable $T$, the time until the first change, which is related to $N_t$ by $(1)$. Thus $S_T(t)=P[T > t]=P[N_t=0]=e^{-\int_0^t \lambda(y) dy}$. The distribution function and density function can be derived accordingly.

$\text{ }$

Theorem 3
The hazard rate function $\lambda(t)$ is equivalent to each of the following:

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\lambda(t)=\frac{f_T(t)}{1-F_T(t)} \\&\text{ } \\&\lambda(t)=\frac{-S_T^{'}(t)}{S_T(t)} \end{aligned}

$\text{ }$

Remark
Theorem 1 and Theorem 2 show that in a non-homogeneous Poisson process as described above, the hazard rate function $\lambda(t)$ completely specifies the probability distribution of the survival model $T$ (the time until the first change) . Once the rate of change function $\lambda(t)$ is known in the non-homogeneous Poisson process, we can use it to generate the survival function $S_T(t)$. All of the examples of survival models given below are derived by assuming the functional form of the hazard rate function. The result in Theorem 2 holds even outside the context of a non-homogeneous Poisson process, that is, given the hazard rate function $\lambda(t)$, we can derive the three distributional items $S_T(t)$, $F_T(t)$, $f_T(t)$.

The ratio in Theorem 3 indicates that the probability distribution determines the hazard rate function. In fact, the ratio in Theorem 3 is the usual definition of the hazard rate function. That is, the hazard rate function can be defined as the ratio of the density and the survival function (one minus the cdf). With this definition, we can also recover the survival function. Whenever $\displaystyle \lambda(x)=\frac{f_X(x)}{1-F_X(x)}$, we can derive:

$\text{ }$

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &S_X(x)=\displaystyle e^{-\int_0^t \lambda(y) dy} \end{aligned}

$\text{ }$

As indicated above, the hazard rate function can be interpreted as the failure rate at time $t$ given that the life in question has survived to time $t$. It is the rate of failure at the next instant given that the life or system being studied has survived up to time $t$.

It is interesting to note that the function $\Lambda(t)=\int_0^t \lambda(y) dy$ defined in Theorem 1 is called the cumulative hazard rate function. Thus the cumulative hazard rate function is an alternative way of representing the hazard rate function (see the discussion on Weibull distribution below).

——————————————————————————————————————
Examples of Survival Models

–Exponential Distribution–
In many applications, especially those for biological organisms and mechanical systems that wear out over time, the hazard rate $\lambda(t)$ is an increasing function of $t$. In other words, the older the life in question (the larger the $t$), the higher chance of failure at the next instant. For humans, the probability of a 85 years old dying in the next year is clearly higher than for a 20 years old. In a Poisson process, the rate of change $\lambda(t)=\lambda$ indicated in condition 3 is a constant. As a result, the time $T$ until the first change derived in Theorem 2 has an exponential distribution with parameter $\lambda$. In terms of mortality study or reliability study of machines that wear out over time, this is not a realistic model. However, if the mortality or failure is caused by random external events, this could be an appropriate model.

–Weibull Distribution–
This distribution is an excellent model choice for describing the life of manufactured objects. It is defined by the following cumulative hazard rate function:

$\text{ }$

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\Lambda(t)=\biggl(\frac{t}{\beta}\biggr)^{\alpha} \end{aligned} where $\alpha > 0$ and $\beta>0$

$\text{ }$

As a result, the hazard rate function, the density function and the survival function for the lifetime distribution are:

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\lambda(t)=\frac{\alpha}{\beta} \biggl(\frac{t}{\beta}\biggr)^{\alpha-1} \\&\text{ } \\&f_T(t)=\frac{\alpha}{\beta} \biggl(\frac{t}{\beta}\biggr)^{\alpha-1} \displaystyle e^{\displaystyle -\biggl[\frac{t}{\beta}\biggr]^{\alpha}} \\&\text{ } \\&S_T(t)=\displaystyle e^{\displaystyle -\biggl[\frac{t}{\beta}\biggr]^{\alpha}} \end{aligned}

$\text{ }$

The parameter $\alpha$ is the shape parameter and $\beta$ is the scale parameter. When $\alpha=1$, the hazard rate becomes a constant and the Weibull distribution becomes an exponential distribution.

When the parameter $\alpha<1$, the failure rate decreases over time. One interpretation is that most of the defective items fail early on in the life cycle. Once they they are removed from the population, failure rate decreases over time.

When the parameter $1<\alpha$, the failure rate increases with time. This is a good candidate for a model to describe the lifetime of machines or systems that wear out over time.

–The Gompertz Distribution–
The Gompertz law states that the force of mortality or failure rate increases exponentially over time. It describe human mortality quite accurately. The following is the hazard rate function:

$\text{ }$

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\lambda(t)=\alpha e^{\beta t} \end{aligned} where $\alpha>0$ and $\beta>0$.

$\text{ }$

The following are the cumulative hazard rate function as well as the survival function, distribution function and the pdf of the lifetime distribution $T$.

$\text{ }$

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\Lambda(t)=\int_0^t \alpha e^{\beta y} dy=\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta} \\&\text{ } \\&S_T(t)=\displaystyle e^{\displaystyle -\biggl(\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}\biggr)} \\&\text{ } \\&F_T(t)=\displaystyle 1-e^{\displaystyle -\biggl(\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}\biggr)} \\&\text{ } \\&f_T(t)=\displaystyle \alpha \ e^{\beta t} \ e^{\displaystyle -\biggl(\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}\biggr)} \end{aligned}

$\text{ }$

–Makeham’s Law–
The Makeham’s Law states that the force of mortality is the Gompertz failure rate plus an age-indpendent component that accounts for external causes of mortality. The following is the hazard rate function:

$\text{ }$

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\lambda(t)=\alpha e^{\beta t}+\mu \end{aligned} where $\alpha>0$, $\beta>0$ and $\mu>0$.

$\text{ }$

The following are the cumulative hazard rate function as well as the survival function, distribution function and the pdf of the lifetime distribution $T$.

$\text{ }$

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\Lambda(t)=\int_0^t (\alpha e^{\beta y}+\mu) dy=\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}+\mu t \\&\text{ } \\&S_T(t)=\displaystyle e^{\displaystyle -\biggl(\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}+\mu t\biggr)} \\&\text{ } \\&F_T(t)=\displaystyle 1-e^{\displaystyle -\biggl(\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}+\mu t\biggr)} \\&\text{ } \\&f_T(t)=\biggl( \alpha e^{\beta t}+\mu t \biggr) \ e^{\displaystyle -\biggl(\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}+\mu t\biggr)} \end{aligned}

$\text{ }$

The Tail Weight of a Distribution
The hazard rate function can provide information about the tail of a distribution. If the hazard rate function is decreasing, it is an indication that the distribution has a heavy tail, i.e., the distribution significantly puts more probability on larger values. Conversely, if the hazard rate function is increasing, it is an indication of a lighter tail. In an insurance context, heavy tailed distributions (e.g. the Pareto distribution) are suitable candidates for modeling large insurance losses (see this previous post or [1]).

$\text{ }$

Reference

1. Klugman S.A., Panjer H. H., Wilmot G. E. Loss Models, From Data to Decisions, Second Edition., Wiley-Interscience, a John Wiley & Sons, Inc., New York, 2004