# Gamma distribution and Poisson distribution

The gamma distribution is important in many statistical applications. This post discusses the connections of the gamma distribution with Poisson distribution. The following is the probability density function of the gamma distribution.

$\displaystyle (1) \ \ \ \ \ f(x)=\frac{1}{\Gamma(\beta)} \ \alpha^\beta \ x^{\beta-1} \ e^{-\alpha x} \ \ \ \ \ \ x>0$

The numbers $\alpha$ and $\beta$, both positive, are fixed constants and are the parameters of the distribution. The symbol $\Gamma(\cdot)$ is the gamma function. This density function describes how the potential gamma observations distribute across the positive x-axis. Baked into this gamma probability density function are two pieces of information about the Poisson distribution. We can simply read off the information from the parameters $\alpha$ and $\beta$ in the density function.

Poisson-Gamma Mixture

For a reason that will be given shortly, the parameters $\alpha$ and $\beta$ in (1) gives a negative binomial distribution. The following is the probability function of this negative binomial distribution.

$\displaystyle (2) \ \ \ \ \ P(N=n)=\binom{\beta+n-1}{n} \ \biggl(\frac{\alpha}{1+\alpha} \biggr)^\beta \ \biggl(\frac{1}{1+\alpha} \biggr)^n \ \ \ \ \ \ n=0,1,2,3,\cdots$

If the parameter $\beta$ is a positive integer, then (2) has a nice interpretation in the context of a series of independent Bernoulli trials. Consider performing a series of independent trials where each trial has one of two distinct outcomes (called success or failure). Assume that the probability of a success in each trial is $p=\alpha/(1+\alpha)$. In this case, the quantity in (2) is the probability of having $n$ failures before the occurrence of the $\beta$th success.

Note that when $\beta$ is a positive integer, the binomial coefficient $\binom{\beta+n-1}{n}$ has the usual calculation $\frac{(\beta+n-1)!}{n! (\beta-1)!}$. When $\beta$ is not an integer but is only a positive number, the binomial coefficient $\binom{\beta+n-1}{n}$ is calculated as follows:

$\displaystyle \binom{\beta+n-1}{n}=\frac{(\beta+n-1) (\beta+n-2) \cdots (\beta+1) \beta}{n!}$

For this new calculation to work, $\beta$ does not have to be a positive integer. It only needs to be a positive real number. Then the distribution in (2) does not have a natural interpretation in terms of performing a series of independent Bernoulli trials. It is simply a counting distribution, i.e. a random discrete variable modeling the number of occurrences of a type of random events.

What does this have to do with gamma distribution and Poisson distribution? When a conditional random variable $X \lvert \lambda$ has a Poisson distribution such that its mean $\lambda$ is an unknown random quantity but follows a gamma distribution with parameters $\beta$ and $\alpha$ as described in (1), the unconditional distribution for $X$ has a negative binomial distribution as described in (2). In other words, the mixture of Poisson distributions with gamma mixing weights is a negative binomial distribution.

There is an insurance interpretation of the Poisson-gamma mixture. Suppose that in a large pool of insureds, the annual claim frequency of an insured is a Poisson distribution with mean $\lambda$. The quantity $\lambda$ varies from insured to insured but is supposed to follow a gamma distribution. Here we have a family of conditional Poisson distributions where each one is conditional on the characteristics of the particular insured in question (a low risk insured has a low $\lambda$ value and a high risk insured has a high $\lambda$ value). The “average” of the conditional Poisson distributions will be a negative binomial distribution (using the gamma distribution to weight the parameter $\lambda$). Thus the claim frequency for an “average” insured in the pool should be modeled by a negative binomial distribution. For a randomly selected insured from the pool, if we do not know anything about this insured, we can use the unconditional negative binomial distribution to model the claim frequency.

A detailed discussion of the negative binomial distribution is found here and here. The notion of mixture distributions and Poisson-gamma mixture in particular are discussed here. Many distributions applicable in actuarial applications are mixture distributions (see here for examples).

Waiting Times in a Poisson Process

The parameter $\beta$ in (1) is called the shape parameter while the parameter $\alpha$ is called the rate parameter. The name of rate parameter will be made clear shortly. For this interpretation of gamma distribution to work, the shape parameter $\beta$ must be a positive integer.

The gamma distribution as described in (1) is intimately connected to a Poisson process through the rate parameter $\alpha$. In a Poisson process, event of a particular interest occurs at random at the rate of $\alpha$ per unit time. Two questions are of interest. How many times will this event occur in a given time interval? How long do we have to wait to observe the first occurrence, the second occurrence and so on? The gamma distribution as described in (1) gives an answer to the second question when $\beta$ is a positive integer.

A counting process is a random experiment that observes the random occurrences of a certain type of events of interest in a time period. A Poisson process is a counting process that satisfies three simplifying assumptions that deal with independence and uniformity in time.

A good example of a Poisson process is the well known experiment in radioactivity conducted by Rutherford and Geiger in 1910. In this experiment, $\alpha$-particles were emitted from a polonium source and the number of $\alpha$-particles were counted during an interval of 7.5 seconds (2,608 many such time intervals were observed). In these 2,608 intervals, a total of 10,097 particles were observed. Thus the mean count per period of 7.5 seconds is 10097 / 2608 = 3.87. In this experiment, $\alpha$-particles are observed at a rate of 3.87 per unit time (7.5 seconds).

In general, consider a Poisson process with a rate of $\lambda$ per unit time, i.e. the event that is of interest occurs at the rate of $\lambda$ per unit time. There are two random variables of interest. One is $N_t$ which is the number of occurrences of the event of interest from time 0 to time $t$. The other is $X_n$ which is the waiting time until the $n$th occurrence of the event of interest where $n$ is a positive integer.

What can we say about the random variables $N_t$? First, the random variable $N_1$, the number of occurrences of the event of interest in a unit time interval, has a Poisson distribution with mean $\lambda$. The following is the probability function.

$\displaystyle (3) \ \ \ \ \ P(N_1=k)=\frac{1}{k!} \ \lambda^k \ e^{-\lambda} \ \ \ \ \ \ \ \ \ k=0,1,2,3,\cdots$

For any positive real number $t$, the random variable $N_t$, which is a discrete random variable, follows a Poisson distribution with mean $\lambda t$. The following is the probability function.

$\displaystyle (4) \ \ \ \ \ P(N_t=k)=\frac{1}{k!} \ (\lambda t)^k \ e^{-\lambda t} \ \ \ \ \ k=0,1,2,3,\cdots$

For a more detailed discussion of why $P(N_1=k)$ and $P(N_t=k)$ are Poisson, see here.

We now discuss the distributions for $X_1$ and $X_n$. The random variable $X_1$ would be the mean time to the first occurrence. It has an exponential distribution with mean $1/\lambda$. The following is the probability density function.

$\displaystyle (5) \ \ \ \ \ f_{X_1}(t)=\lambda \ e^{-\lambda t} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ t>0$

The random variable $X_n$ has a gamma distribution with shape parameter $\beta=n$ and rate parameter $\alpha=\lambda$. The following is the probability density function.

$\displaystyle (6) \ \ \ \ \ f_{X_n}(t)=\frac{1}{(n-1)!} \ \lambda^n \ t^{n-1} \ e^{-\lambda t} \ \ \ \ \ \ t>0$

The density functions in (5) and (6) are derived from the Poisson distributions in (3) and (4). For example, for $X_1>t$ (the first occurrence taking place after time $t$) to happen, there is no occurrence in the time interval $(0,t)$. Thus $P(X_1>t)=P(N_t=0)$. Note that $N_t$ is a Poisson random variable with mean $\lambda t$. The following derives the density function in (5).

$\displaystyle P(X_1>t)=P(N_t=0)=e^{-\lambda t}$

$\displaystyle P(X_1 \le t)=1-e^{-\lambda t}$

Note that the cumulative distribution function $P(X_1 \le t)$ is that for an exponential distribution. Taking the derivative gives the density function in (5). Extending the same reasoning, for $X_n>t$ (the $n$th occurrence taking place after time $t$) to happen, there must be at most $n-1$ occurrences in the time interval $(0,t)$. Once again, we are relating $X_n$ to the Poisson random variable $N_t$. More specifically $P(X_n>t)=P(N_t \le n-1)$.

$\displaystyle (7) \ \ \ \ \ P(X_n>t)=P(N_t \le n-1)=\sum \limits_{k=0}^{n-1} \biggl[ \frac{(\lambda t)^k}{k!} \ e^{-\lambda t} \biggr]$

$\displaystyle (8) \ \ \ \ \ P(X_n \le t)=1-\sum \limits_{k=0}^{n-1} \biggl[ \frac{(\lambda t)^k}{k!} \ e^{-\lambda t} \biggr]=\sum \limits_{k=n}^{\infty} \biggl[ \frac{(\lambda t)^k}{k!} \ e^{-\lambda t} \biggr]$

Taking the derivative of (8) gives the density function in (6). See here for a more detailed discussion of the relation between gamma distribution and Poisson distribution in a Poisson process.

Evaluating Gamma Survival Function

The relation (7) shows that the gamma survival function is the cumulative distribution function (CDF) of the corresponding Poisson distribution. Consider the following integral.

$\displaystyle (9) \ \ \ \ \ \int_t^\infty \frac{1}{(n-1)!} \ \lambda^n \ x^{n-1} \ e^{-\lambda x} \ dx$

The integrand in the above integral is the density function of a gamma distribution (with the shape parameter being a positive integer). The limits of the integral are from $t$ to $\infty$. Conceptually the integral is identical to (7) since the integral gives $P(X_n>t)$, which is the survival function of the gamma distribution in question. According to (7), the integral in (9) has a closed form as follows.

$\displaystyle (10) \ \ \ \ \ \int_t^\infty \frac{1}{(n-1)!} \ \lambda^n \ x^{n-1} \ e^{-\lambda x} \ dx=\sum \limits_{k=0}^{n-1} \biggl[ \frac{(\lambda t)^k}{k!} \ e^{-\lambda t} \biggr]$

The above is a combination of (7) and (9) and is a way to evaluate the right tail of the gamma distribution when the shape parameter is a positive integer $n$. Instead of memorizing it, we can focus on the thought process.

Given the integral in (9), note the shape parameter $n$ and the rate parameter $\lambda$. Consider a Poisson process with rater parameter $\lambda$. In this Poisson process, the random variable $N_1$, the number of occurrences of the event of interest in a unit time interval, has a Poisson distribution with mean $\lambda$. The random variable $N_t$, the number of occurrences of the event of interest in a time interval of length $t$, has a Poisson distribution with mean $\lambda t$. Then the integral in (9) is equivalent to $P(N_t \le n-1)$, the probability that there are at most $n-1$ occurrences in the time interval $(0,t)$.

The above thought process works even if the gamma distribution has no relation to any Poisson process, e.g. it is just the model for size of insurance losses. We can still pretend there is a Poisson relationship and obtain a closed form evaluation of the survival function. It must be emphasized that this evaluation of the gamma survival function in closed form is possible only when the shape parameter is a positive integer. If it is not, we then need to evaluate the gamma survival function using software.

Remarks

The discussion here presents two relationships between the gamma distribution and the Poisson distribution. In the first one, mixing conditional Poisson distributions with gamma mixing weights produces a negative binomial distribution. Taking the shape parameter $\beta$ and rate parameter $\alpha$ in (1) produces the negative binomial distribution as in (2). This is one interpretation of the two gamma parameters.

In the second interpretation, the rate parameter of the gamma model is intimately tied to the rate parameter of a Poisson process. The interplay between the number of occurrences of the event of interest and the waiting time until the $n$th occurrence connects the gamma distribution with the Poisson distribution. For a fuller discussion of this interplay, see here.

The gamma distribution is mathematically derived from the gamma function. See here for an introduction to the gamma distribution.

The three assumptions for a Poisson process that are alluded to above allow us to obtain a binomial distribution when dividing a time interval into $n$ subintervals (for a sufficiently large $n$). As the subintervals get more and more granular, the binomial distributions have Poisson distribution as the limiting distribution. The derivation from binomial to Poisson is discussed here, here and here.

Practice problems can be found here in a companion blog for practice problems.

Dan Ma gamma distribution
Dan Ma Poisson distribution
Dan Ma math

Daniel Ma gamma distribution
Daniel Ma mathematics
Daniel Ma Poisson distribution

$\copyright$ 2018 – Dan Ma

# Parametric severity models

Modeling the severity of losses is an important part of actuarial modeling (severity is the dollar value per claim). One approach is to employ parametric models in the modeling process. For example, the process may involve using claims data to estimate the parameters of the fitted model and then using the fitted model for estimation of future claim costs. A companion blog called Topics in Actuarial Modeling introduces a catalog of parametric severity models (the catalog is found here).

The catalog lists the models according to their mathematical properties (e.g. how they are derived and how they are related to the gamma distribution). The models have been discussed quite extensively in the blog for Topics in Actuarial Modeling. The catalog puts all the models in one place so to speak. It has the links to the blog posts that describe the models. Just click on a link if anyone is interested in knowing more about a particular parametric model. The parametric models in the catalog include the following:

• Gamma distribution
• Erlang distribution
• Exponential distribution
• Chi-squared distribution
• Hypo-Exponential distribution
• Hyper-Exponential distribution
• Lognormal distribution
• Weibull distribution
• Transformed exponential distribution
• Inverse exponential distribution
• Inverse transformed exponential distribution
• Transformed gamma distribution
• Inverse gamma distribution
• Inverse transformed gamma distribution
• Transformed Pareto distribution = Burr distribution
• Inverse Pareto distribution
• Inverse transformed Pareto distribution = Inverse Burr distribution
• Paralogistic distribution
• Inverse paralogistic distribution
• Pareto distribution
• Generalized Pareto distribution
• Loglogistic distribution
• Student t distribution
• Beta distribution
• Generalized beta distribution

This list is by no means comprehensive or exhaustive. It should be a good resource to begin the modeling process (or the study of the actuarial modeling process). In fact, the modeling process is part of the exam syllabus of the Society of Actuaries. The catalog of models is found here.

actuarial modeling

parametric severity models

Dan Ma math

Daniel Ma mathematics

$\copyright$ 2017 – Dan Ma

# More on Pareto distribution

There are several types of Pareto distributions. The Pareto distribution described in this previous post is called the Pareto Type II distribution (also called Lomax distribution). There is also a Pareto Type I distribution. The difference between the two Pareto types is that the support of Pareto Type I is the interval $(\theta, \infty)$. In other words, the random variable takes on positive real numbers larger than some fixed positive parameter $\theta$. On the other hand, Pareto Type II distribution can take on any positive real number. However, the two types are mathematically related. Their density functions have the same shape (if they have the same parameters).

Pareto Type I Density Curve (Red) versus Pareto Type II Density Curve (Green)

In the above diagram, the red curve is the density function for the Pareto Type I distribution while the green curve is the density function for the Pareto Type II distribution. For both distributions, the shape parameter is $\alpha=2$ and the scale parameter is $\theta=5$.

Note that the green density curve is the result of shifting the red curve horizontally 5 units to the left. The following are the density functions.

Red Curve
$\displaystyle f(x)=\frac{2 (5)^2}{x^3}=\frac{50}{x^3} \ \ \ \ \ x>5$

Green Curve
$\displaystyle g(x)=\frac{2 (5)^2}{(x+5)^3}=\frac{50}{(x+5)^3} \ \ \ \ \ x>0$

From the diagram and the density functions, we see that the green density curve is the result of shifting the red curve to the left by 5. Otherwise, both curve have the same distributional shape.

Since Pareto Type II is a shifting of Pareto Type I, they share similar mathematical properties. For example, they are both heavy tailed distributions. Thus they are appropriate for modeling extreme losses (in insurance applications) or financial fiasco or financial ruin (in financial applications). Which one to use depends on where the starting point of the random variable is.

However, the mathematical calculations are different to a large degree since the Pareto Type II density function is the result of a horizontal shift.

We would like to introduce blog posts in two companion blogs that discuss the Pareto distribution in greater details. This blog post is from a companion blog called Topics in Actuarial Modeling. It describes Pareto Type I and Type II in greater details. This blog post from another companion blog called Actuarial Modeling Practice has a side-by-side comparison of Pareto Type I and Pareto Type II. It also discusses the two Pareto types from a calculations stand point. This blog post has practice problems on the two Pareto types.

Another important mathematical property of Pareto Type II is that it is the mixture of exponential distributions with gamma mixing weights (exponential-gamma mixture). This is also discussed in the highlighted blog posts.

$\text{ }$

$\text{ }$

$\text{ }$

$\copyright$ 2017 – Dan Ma

# Derive some facts of the negative binomial distribution

The previous post called The Negative Binomial Distribution gives a fairly comprehensive discussion of the negative binomial distribution. In this post, we fill in some of the details that are glossed over in that previous post. We derive the following points:

• Discuss the several versions of the negative binomial distribution.
• The negative binomial probabilities sum to one, i.e., the negative binomial probability function is a valid one.
• Derive the moment generating function of the negative binomial distribution.
• Derive the first and second moments and the variance of the negative binomial distribution.
• An observation about independent sum of negative binomial distributions.

________________________________________________________________________

Three versions

The negative binomial distribution has two parameters $r$ and $p$, where $r$ is a positive real number and $0. The first two versions arise from the case that $r$ is a positive integer, which can be interpreted as the random experiment of a sequence of independent Bernoulli trials until the $r$th success (the trials have the same probability of success $p$). In this interpretation, there are two ways of recording the random experiment:

$X =$ the number of Bernoulli trials required to get the $r$th success.
$Y =$ the number of Bernoulli trials that end in failure before getting the $r$th success.

The other parameter $p$ is the probability of success in each Bernoulli trial. The notation $\binom{m}{n}$ is the binomial coefficient where $m$ and $n$ are non-negative integers and $m \ge n$ is defined as:

$\displaystyle \binom{m}{n}=\frac{m!}{n! \ (m-n)!}=\frac{m(m-1) \cdots (m-(n-1))}{n!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (0)$

With this in mind, the following are the probability functions of the random variables $X$ and $Y$.

$\displaystyle P(X=x)= \binom{x-1}{r-1} p^r (1-p)^{x-r} \ \ \ \ \ \ \ x=r,r+1,r+2,\cdots \ \ \ \ \ \ \ (1)$

$\displaystyle P(Y=y)=\binom{y+r-1}{y} p^r (1-p)^y \ \ \ \ \ \ \ y=0,1,2,\cdots \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)$

The thought process for (1) is that for the event $X=x$ to happen, there can only be $r-1$ successes in the first $x-1$ trials and one additional success occurring in the last trial (the $x$th trial). The thought process for (2) is that for the event $Y=y$ to happen, there are $y+r$ trials ($y$ failures and $r$ successes). In the first $y+r-1$ trials, there can be only $y$ failures (or equivalently $r-1$ successes). Note that $X=Y+r$. Thus knowing the mean of $Y$ will derive the mean of $X$, a fact we will use below.

Instead of memorizing the probability functions (1) and (2), it is better to understand and remember the thought processes involved. Because of the natural interpretation of performing Bernoulli trials until the $r$th success, it is a good idea to introduce the negative binomial distribution via the distributions described by (1) and (2), i.e., the case where the parameter $r$ is a positive integer. When $r=1$, the random experiment is a sequence of independent Bernoulli trials until the first success (this is called the geometric distribution).

Of course, (1) and (2) can also simply be used as counting distributions without any connection with a series of Bernoulli trials (e.g. used in an insurance context as the number of losses or claims arising from a group of insurance policies).

The binomial coefficient in (0) is defined when both numbers are non-negative integers and that the top one is greater than or equal to the bottom one. However, the rightmost term in (0) can be calculated even when the top number $m$ is not a non-negative integer. Thus when $m$ is any real number, the rightmost term (0) can be calculated provided that the bottom number $n$ is a positive integer. For convenience we define $\binom{m}{0}=1$. With this in mind, the binomial coefficient $\binom{m}{n}$ is defined for any real number $m$ and any non-negative integer $n$.

The third version of the negative binomial distribution arises from the relaxation of the binomial coefficient $\binom{m}{n}$ just discussed. With this in mind, the probability function in (2) can be defined for any positive real number $r$:

$\displaystyle P(Y=y)=\binom{y+r-1}{y} p^r (1-p)^y \ \ \ \ \ \ \ y=0,1,2,\cdots \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)$

where $\displaystyle \binom{y+r-1}{y}=\frac{(y+r-1)(y+r-2) \cdots (r+1)r}{y!}$.

Of course when $r$ is a positive integer, versions (2) and (3) are identical. When $r$ is a positive real number but is not an integer, the distribution cannot be interpreted as the number of failures until the occurrence of $r$th success. Instead, it is used as a counting distribution.

________________________________________________________________________

The probabilities sum to one

Do the probabilities in (1), (2) or (3) sum to one? For the interpretations of (1) and (2), is it possible to repeatedly perform Bernoulli trials and never get the $r$th success? For $r=1$, is it possible to never even get a success? In tossing a fair coin repeatedly, soon enough you will get a head and even if $r$ is a large number, you will eventually get $r$ number of heads. Here we wish to prove this fact mathematically.

To show that (1), (2) and (3) are indeed probability functions, we use a fact concerning Maclaurin’s series expansion of the function $(1-x)^{-r}$, a fact that is covered in a calculus course. In the following two results, $r$ is a fixed positive real number and $y$ is any non-negative integer:

$\displaystyle \binom{y+r-1}{y}=(-1)^y \ \binom{-r}{y} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4)$

$\displaystyle \sum \limits_{y=0}^\infty (-1)^y \ \binom{-r}{y} \ x^y=(1-x)^{-r} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (5)$

The result (4) is to rearrange the binomial coefficient in probability function (3) to another binomial coefficient with a negative number. This is why there is the word “negative” in negative binomial distribution. The result (5) is the Maclaurin’s series expansion for the function $(1-x)^{-r}$. We first derive these two facts and then use them to show that the negative binomial probabilities in (3) sum to one. The following derives (4).

\displaystyle \begin{aligned} \binom{y+r-1}{y}&=\frac{(y+r-1)(y+r-2) \cdots (r+1)r}{y!} \\&=(-1)^y \ \frac{(-r)(-r-1) \cdots (-r-(y-1))}{y!} \\&=(-1)^y \ \binom{-r}{y} \end{aligned}

To derive (5), let $f(x)=(1-x)^{-r}$. Based on a theorem that can be found in most calculus text, the function $f(x)$ has the following Maclaurin’s series expansion (Maclaurin’s series is simply Taylor’s series with center = 0).

$\displaystyle (1-x)^{-r}=f(0)+f^{'}(0)x+\frac{f^{(2)}(0)}{2!}x^2+\frac{f^{(3)}(0)}{3!}x^3+\cdots + \frac{f^{(n)}(0)}{n!}x^n+\cdots$

where $-1. Now, filling in the derivatives $f^{(n)}(0)$, we have the following derivation.

\displaystyle \begin{aligned} (1-x)^{-r}&=1+rx+\frac{(r+1)r}{2!}x^2+\frac{(r+2)(r+1)r}{3!}x^3 \\& \ \ \ \ \ \ \ \ +\cdots+\frac{(r+y-1)(r+y-2) \cdots (r+1)r}{y!}x^y +\cdots \\&=1+(-1)^1 (-r)x+(-1)^2\frac{(-r)(-r-1)}{2!}x^2 \\& \ \ \ \ \ \ +(-1)^3 \frac{(-r)(-r-1)(-r-2)}{3!}x^3 +\cdots \\& \ \ \ \ \ \ +(-1)^y \frac{(-r)(-r-1) \cdots (-r-y+2)(-r-y+1)}{y!}x^y +\cdots \\&=(-1)^0 \binom{-r}{0}x^0 +(-1)^1 \binom{-r}{1}x^1+(-1)^2 \binom{-r}{2}x^2 \\& \ \ \ \ \ \ +(-1)^3 \binom{-r}{3}x^3+\cdots +(-1)^y \binom{-r}{y}x^y+\cdots \\&=\sum \limits_{y=0}^\infty (-1)^y \ \binom{-r}{y} \ x^y \end{aligned}

We can now show that the negative binomial probabilities in (3) sum to one. Let $q=1-p$.

\displaystyle \begin{aligned} \sum \limits_{y=0}^\infty \binom{y+r-1}{y} \ p^r \ q^y &=p^r \ \sum \limits_{y=0}^\infty (-1)^y \ \binom{-r}{y} \ q^y \ \ \ \ \ \ \ \ \ \ \ \text{using } (4) \\&=p^r \ (1-q)^{-r} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{using } (5)\\&=p^r p^{-r} \\&=1 \end{aligned}

________________________________________________________________________

The moment generating function

We now derive the moment generating function of the negative binomial distribution according to (3). The moment generation function is $M(t)=E(e^{tY})$ over all real numbers $t$ for which $M(t)$ is defined. The following derivation does the job.

\displaystyle \begin{aligned} M(t)&=E(e^{tY}) \\&=\sum \limits_{y=0}^\infty \ e^{t y} \ \binom{y+r-1}{y} \ p^r \ (1-p)^y \\&=p^r \ \sum \limits_{y=0}^\infty \ \binom{y+r-1}{y} \ [(1-p) e^t]^y \\&=p^r \ \sum \limits_{y=0}^\infty \ (-1)^y \binom{-r}{y} \ [(1-p) e^t]^y \ \ \ \ \ \ \ \ \ \ \ \text{using } (4) \\&=p^r \ [1-(1-p) \ e^t]^{-r} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{using } (5) \\&=\frac{p^r}{[1-(1-p) \ e^t]^{r}}\end{aligned}

The above moment generating function works for the negative binomial distribution with respect to (3) and thus to (2). For the distribution in (1), note that $X=Y+r$. Thus $E(e^{tX})=E(e^{t(Y+r)})=e^{tr} \ E(e^{tY})$. The moment generating function of (1) is simply the above moment generating function multiplied by the factor $e^{tr}$. To summarize, the moment generating functions for the three versions are:

$\displaystyle M_X(t)=E[e^{tX}]=\frac{p^r \ e^{tr}}{[1-(1-p) \ e^t]^{r}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{for } (1)$

$\displaystyle M_Y(t)=E[e^{tY}]=\frac{p^r}{[1-(1-p) \ e^t]^{r}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{for } (2) \text{ and } (3)$

The domain of the moment generating function is the set of all $t$ that for which $M_X(t)$ or $M_Y(t)$ is defined and is positive. Based on the form that it takes, we focus on making sure that $1-(1-p) \ e^t>0$. This leads to the domain $t<-\text{ln}(1-p)$.

________________________________________________________________________

The mean and the variance

With the moment generating function derived in the above section, we can now focus on finding the moments of the negative binomial distribution. To find the moments, simply take the derivatives of the moment generating function and evaluate at $t=0$. For the distribution represented by the probability function in (3), we calculate the following:

$E(Y)=M_Y^{'}(0)$

$E(Y^2)=M_Y^{(2)}(0)$

$Var(Y)=E(Y^2)-E(Y)^2$

After taking the first and second derivatives and evaluate at $t=0$, the first and the second moments are:

$\displaystyle E(Y)=r \ \frac{1-p}{p}$

$\displaystyle E(Y^2)=\frac{r(1-p)[1+(1-p)]}{p^2}$

The following derives the variance.

\displaystyle \begin{aligned} Var(Y)&=E(Y^2)-E(Y)^2 \\&=\frac{r(1-p)[1+(1-p)]}{p^2}-\frac{(1-p)^2}{p^2} \\&=\frac{r(1-p)[1+r(1-p)-r(1-p)]}{p^2} \\&=\frac{r(1-p)}{p^2} \end{aligned}

The above formula is the variance for the three versions (1), (2) and (3). Note that $Var(Y)>E(Y)$. In contrast, the variance of the Poisson distribution is identical to its mean. Thus in the situation where the variance of observed data is greater than the sample mean, the negative binomial distribution should be a better fit than the Poisson distribution.

________________________________________________________________________

The independent sum

There is an easy consequence that follows from the moment generating function derived above. The sum of several independent negative binomial distributions is also a negative binomial distribution. For example, suppose $T_1,T_2, \cdots,T_n$ are independent negative binomial random variables (version (3)). Suppose each $T_j$ has parameters $r_j$ and $p$ (the second parameter is identical). The moment generating function of the independent sum is the product of the individual moment generating functions. Thus the following is the moment generating function of $T=T_1+\cdots+T_n$.

$\displaystyle M_T(t)=E[e^{tT}]=\frac{p^g}{[1-(1-p) \ e^t]^{g}}$

where $g=r_1+\cdots+r_n$. The moment generating function uniquely identifies the distribution. The above $M_T(t)$ is that of a negative binomial distribution with parameters $g$ and $p$ according to (3).

A special case is that the sum of $n$ independent geometric distributions is a negative binomial distribution with the $r$ parameter being $r=n$. The following is the moment generating function of the sum $W$ of $n$ independent geometric distributions.

$\displaystyle M_W(t)=E[e^{tW}]=\frac{p^n}{[1-(1-p) \ e^t]^{n}}$

________________________________________________________________________
$\copyright \ \text{2015 by Dan Ma}$

# Getting from Binomial to Poisson

In many binomial problems, the number of Bernoulli trials $n$ is large, relatively speaking, and the probability of success $p$ is small such that $n p$ is of moderate magnitude. For example, consider problems that deal with rare events where the probability of occurrence is small (as a concrete example, counting the number of people with July 1 as birthday out of a random sample of 1000 people). It is often convenient to approximate such binomial problems using the Poisson distribution. The justification for using the Poisson approximation is that the Poisson distribution is a limiting case of the binomial distribution. Now that cheap computing power is widely available, it is quite easy to use computer or other computing devices to obtain exact binomial probabiities for experiments up to 1000 trials or more. Though the Poisson approximation may no longer be necessary for such problems, knowing how to get from binomial to Poisson is important for understanding the Poisson distribution itself.

Consider a counting process that describes the occurrences of a certain type of events of interest in a unit time interval subject to three simplifying assumptions (discussed below). We are interested in counting the number of occurrences of the event of interest in a unit time interval. As a concrete example, consider the number of cars arriving at an observation point in a certain highway in a period of time, say one hour. We wish to model the probability distribution of how many cars that will arrive at the observation point in this particular highway in one hour. Let $X$ be the random variable described by this probability distribution. We wish to konw the probability that there are $k$ cars arriving in one hour. We start with using a binomial distribution as an approximation to the probability $P(X=k)$. We will see that upon letting $n \rightarrow \infty$, the $P(X=k)$ is a Poisson probability.

Suppose that we know $E(X)=\alpha$, perhaps an average obtained after observing cars at the observation points for many hours. The simplifying assumptions alluded to earlier are the following:

1. The numbers of cars arriving in nonoverlapping time intervals are independent.
2. The probability of one car arriving in a very short time interval of length $h$ is $\alpha h$.
3. The probability of having more than one car arriving in a very short time interval is esstentially zero.

Assumption 1 means that a large number of cars arriving in one period does not imply fewer cars will arrival in the next period and vice versa. In other words, the number of cars that arrive in any one given moment does affect the number of cars that will arrive subsequently. Knowing how many cars arriving in one minute will not help predict the number of cars arriving at the next minute. Assumption 2 means that the rate of cars arriving is dependent only on the length of the time interval and not on when the time interval occurs (e.g. not on whether it is at the beginning of the hour or toward the end of the hour). The assumptions 2 and 3 allow us to think of a very short period of time as a Bernoulli trial. Thinking of the arrival of a car as a success, each short time interval will result in only one success or one failure.

To start, we can break up the hour into 60 minutes (into 60 Bernoulli trials). We then consider the binomial distribution with $n=60$ and $p=\frac{\alpha}{60}$. So the following is an approximation to our desired probability distribution.

$\displaystyle (1) \ \ \ \ \ P(X=k) \approx \binom{60}{k} \biggl(\frac{\alpha}{60}\biggr)^k \biggr(1-\frac{\alpha}{60}\biggr)^{60-k} \ \ \ \ \ k=0,1,2,\cdots, 60$

Conceivably, there can be more than 1 car arriving in a minute and observing cars in a one-minute interval may not be a Bernoulli trial. For a one-minute interval to qualify as a Bernoulli trial, there is either no car arriving or 1 car arriving in that one minute. So we can break up an hour into 3,600 seconds (into 3,600 Bernoulli trials). We now consider the binomial distribution with $n=3600$ and $p=\frac{\alpha}{3600}$.

$\displaystyle (2) \ \ \ \ \ P(X=k) \approx \binom{3600}{k} \biggl(\frac{\alpha}{3600}\biggr)^k \biggr(1-\frac{\alpha}{3600}\biggr)^{3600-k} \ \ \ \ \ k=0,1,2,\cdots, 3600$

It is also conceivable that more than 1 car can arrive in one second and observing cars in one-second interval may still not qualify as a Bernoulli trial. So we need to get more granular. We can divide up the hour into $n$ equal subintervals, each of length $\frac{1}{n}$. The assumptions 2 and 3 ensure that each subinterval is a Bernoulli trial (either it is a success or a failure; one car arriving or no car arriving). Assumption 1 tells us that all the $n$ subintervals are independent. So breaking up the hour into $n$ moments and counting the number of moments that are successes will result in a binomial distribution with parameters $n$ and $p=\frac{\alpha}{n}$. So we are ready to proceed with the following approximation to our probability distribution $P(X=k)$.

$\displaystyle (3) \ \ \ \ \ P(X=k) \approx \binom{n}{k} \biggl(\frac{\alpha}{n}\biggr)^k \biggr(1-\frac{\alpha}{n}\biggr)^{n-k} \ \ \ \ \ k=0,1,2,\cdots, n$

As we get more granular, $n \rightarrow \infty$. We show that the limit of the binomial probability in $(3)$ is the Poisson distribution with parameter $\alpha$. We show the following.

$\displaystyle (4) \ \ \ \ \ P(X=k) = \lim \limits_{n \rightarrow \infty} \binom{n}{k} \biggl(\frac{\alpha}{n}\biggr)^k \biggr(1-\frac{\alpha}{n}\biggr)^{n-k}=\frac{e^{-\alpha} \alpha^k}{k!} \ \ \ \ \ \ k=0,1,2,\cdots$

In the derivation of $(4)$, we need the following two mathematical tools. The statement $(5)$ is one of the definitions of the mathematical constant e. In the statement $(6)$, the integer $n$ in the numerator is greater than the integer $k$ in the denominator. It says that whenever we work with such a ratio of two factorials, the result is the product of $n$ with the smaller integers down to $(n-(k-1))$. There are exactly $k$ terms in the product.

$\displaystyle (5) \ \ \ \ \ \lim \limits_{n \rightarrow \infty} \biggl(1+\frac{x}{n}\biggr)^n=e^x$

$\displaystyle (6) \ \ \ \ \ \frac{n!}{(n-k)!}=n(n-1)(n-2) \cdots (n-k+1) \ \ \ \ \ \ \ \ k

The following is the derivation of $(4)$.

\displaystyle \begin{aligned}(7) \ \ \ \ \ P(X=k)&=\lim \limits_{n \rightarrow \infty} \binom{n}{k} \biggl(\frac{\alpha}{n}\biggr)^k \biggr(1-\frac{\alpha}{n}\biggr)^{n-k} \\&=\lim \limits_{n \rightarrow \infty} \ \frac{n!}{k! (n-k)!} \biggl(\frac{\alpha}{n}\biggr)^k \biggr(1-\frac{\alpha}{n}\biggr)^{n-k} \\&=\lim \limits_{n \rightarrow \infty} \ \frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k} \biggl(\frac{\alpha^k}{k!}\biggr) \biggr(1-\frac{\alpha}{n}\biggr)^{n} \biggr(1-\frac{\alpha}{n}\biggr)^{-k} \\&=\biggl(\frac{\alpha^k}{k!}\biggr) \lim \limits_{n \rightarrow \infty} \ \frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k} \\&\times \ \ \ \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\alpha}{n}\biggr)^{n} \ \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\alpha}{n}\biggr)^{-k} \\&=\frac{e^{-\alpha} \alpha^k}{k!} \end{aligned}

In $(7)$, we have $\displaystyle \lim \limits_{n \rightarrow \infty} \ \frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k}=1$. The reason being that the numerator is a polynomial where the leading term is $n^k$. Upon dividing by $n^k$ and taking the limit, we get 1. Based on $(5)$, we have $\displaystyle \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\alpha}{n}\biggr)^{n}=e^{-\alpha}$. For the last limit in the derivation we have $\displaystyle \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\alpha}{n}\biggr)^{-k}=1$.

We conclude with some comments. As the above derivation shows, the Poisson distribution is at heart a binomial distribution. When we divide the unit time interval into more and more subintervals (as the subintervals get more and more granular), the resulting binomial distribution behaves more and more like the Poisson distribution.

The three assumtions used in the derivation are called the Poisson postulates, which are the underlying assumptions that govern a Poisson process. Such a random process describes the occurrences of some type of events that are of interest (e.g. the arrivals of cars in our example) in a fixed period of time. The positive constant $\alpha$ indicated in Assumption 2 is the parameter of the Poisson process, which can be interpreted as the rate of occurrences of the event of interest (or rate of changes, or rate of arrivals) in a unit time interval, meaning that the positive constant $\alpha$ is the mean number of occurrences in the unit time interval. The derivation in $(7)$ shows that whenever a certain type of events occurs according to a Poisson process with parameter $\alpha$, the counting variable of the number of occurrences in the unit time interval is distributed according to the Poisson distribution as indicated in $(4)$.

If we observe the occurrences of events over intervals of length other than unit length, say, in an interval of length $t$, the counting process is governed by the same three postulates, with the modification to Assumption 2 that the rate of changes of the process is now $\alpha t$. The mean number of occurrences in the time interval of length $t$ is now $\alpha t$. The Assumption 2 now states that for any very short time interval of length $h$ (and that is also a subinterval of the interval of length $t$ under observation), the probability of having one occurrence of event in this short interval is $(\alpha t)h$. Applyng the same derivation, it can be shown that the number of occurrences ($X_t$) in a time interval of length $t$ has the Poisson distribution with the following probability mass function.

$\displaystyle (8) \ \ \ \ \ P(X_t=k)=\frac{e^{-\alpha t} \ (\alpha t)^k}{k!} \ \ \ \ \ \ \ \ k=0,1,2,\cdots$

# Relating Binomial and Negative Binomial

The negative binomial distribution has a natural intepretation as a waiting time until the arrival of the rth success (when the parameter r is a positive integer). The waiting time refers to the number of independent Bernoulli trials needed to reach the rth success. This interpretation of the negative binomial distribution gives us a good way of relating it to the binomial distribution. For example, if the rth success takes place after $k$ failed Bernoulli trials (for a total of $k+r$ trials), then there can be at most $r-1$ successes in the first $k+r$ trials. This tells us that the survival function of the negative binomial distribution is the cumulative distribution function (cdf) of a binomial distribution. In this post, we gives the details behind this observation. A previous post on the negative binomial distribution is found here.

A random experiment resulting in two distinct outcomes (success or failure) is called a Bernoulli trial (e.g. head or tail in a coin toss, whether or not the birthday of a customer is the first of January, whether an insurance claim is above or below a given threshold etc). Suppose a series of independent Bernoulli trials are performed until reaching the rth success where the probability of success in each trial is $p$. Let $X_r$ be the number of failures before the occurrence of the rth success. The following is the probablity mass function of $X_r$.

$\displaystyle (1) \ \ \ \ P(X_r)=\binom{k+r-1}{k} p^r (1-p)^k \ \ \ \ \ \ k=0,1,2,3,\cdots$

Be definition, the survival function and cdf of $X_r$ are:

$\displaystyle (2) \ \ \ \ P(X_r > k)=\sum \limits_{j=k+1}^\infty \binom{j+r-1}{j} p^r (1-p)^j \ \ \ \ \ \ k=0,1,2,3,\cdots$

$\displaystyle (3) \ \ \ \ P(X_r \le k)=\sum \limits_{j=0}^k \binom{j+r-1}{j} p^r (1-p)^j \ \ \ \ \ \ k=0,1,2,3,\cdots$

For each positive integer $k$, let $Y_{r+k}$ be the number of successes in performing a sequence of $r+k$ independent Bernoulli trials where $p$ is the probability of success. In other words, $Y_{r+k}$ has a binomial distribution with parameters $r+k$ and $p$.

If the random experiment requires more than $k$ failures to reach the rth success, there are at most $r-1$ successes in the first $k+r$ trails. Thus the survival function of $X_r$ is the same as the cdf of a binomial distribution. Equivalently, the cdf of $X_r$ is the same as the survival function of a binomial distribution. We have the following:

\displaystyle \begin{aligned}(4) \ \ \ \ P(X_r > k)&=P(Y_{k+r} \le r-1) \\&=\sum \limits_{j=0}^{r-1} \binom{k+r}{j} p^j (1-p)^{k+r-j} \ \ \ \ \ \ k=0,1,2,3,\cdots \end{aligned}

\displaystyle \begin{aligned}(5) \ \ \ \ P(X_r \le k)&=P(Y_{k+r} > r-1) \ \ \ \ \ \ k=0,1,2,3,\cdots \end{aligned}

Remark
The relation $(4)$ is analogous to the relationship between the Gamma distribution and the Poisson distribution. Recall that a Gamma distribution with shape parameter $\alpha$ and scale parameter $n$, where $n$ is a positive integer, can be interpreted as the waiting time until the nth change in a Poisson process. Thus, if the nth change takes place after time $t$, there can be at most $n-1$ arrivals in the time interval $[0,t]$. Thus the survival function of this Gamma distribution is the same as the cdf of a Poisson distribution. The relation $(4)$ is analogous to the following relation.

$\displaystyle (5) \ \ \ \ \int_t^\infty \frac{\alpha^n}{(n-1)!} \ x^{n-1} \ e^{-\alpha x} \ dx=\sum \limits_{j=0}^{n-1} \frac{e^{-\alpha t} \ (\alpha t)^j}{j!}$

A previous post on the negative binomial distribution is found here.

# The Negative Binomial Distribution

A counting distribution is a discrete distribution with probabilities only on the nonnegative integers. Such distributions are important in insurance applications since they can be used to model the number of events such as losses to the insured or claims to the insurer. Though playing a prominent role in statistical theory, the Poisson distribution is not appropriate in all situations, since it requires that the mean and the variance are equaled. Thus the negative binomial distribution is an excellent alternative to the Poisson distribution, especially in the cases where the observed variance is greater than the observed mean.

The negative binomial distribution arises naturally from a probability experiment of performing a series of independent Bernoulli trials until the occurrence of the rth success where r is a positive integer. From this starting point, we discuss three ways to define the distribution. We then discuss several basic properties of the negative binomial distribution. Emphasis is placed on the close connection between the Poisson distribution and the negative binomial distribution.

________________________________________________________________________

Definitions
We define three versions of the negative binomial distribution. The first two versions arise from the view point of performing a series of independent Bernoulli trials until the rth success where r is a positive integer. A Bernoulli trial is a probability experiment whose outcome is random such that there are two possible outcomes (success or failure).

Let $X_1$ be the number of Bernoulli trials required for the rth success to occur where r is a positive integer. Let $p$ is the probability of success in each trial. The following is the probability function of $X_1$:

$\displaystyle (1) \ \ \ \ \ P(X_1=x)= \binom{x-1}{r-1} p^r (1-p)^{x-r} \ \ \ \ \ \ \ x=r,r+1,r+2,\cdots$

The idea for $(1)$ is that for $X_1=x$ to happen, there must be $r-1$ successes in the first $x-1$ trials and one additional success occurring in the last trial (the $x$th trial).

A more common version of the negative binomial distribution is the number of Bernoulli trials in excess of r in order to produce the rth success. In other words, we consider the number of failures before the occurrence of the rth success. Let $X_2$ be this random variable. The following is the probability function of $X_2$:

$\displaystyle (2) \ \ \ \ \ P(X_2=x)=\binom{x+r-1}{x} p^r (1-p)^x \ \ \ \ \ \ \ x=0,1,2,\cdots$

The idea for $(2)$ is that there are $x+r$ trials and in the first $x+r-1$ trials, there are $x$ failures (or equivalently $r-1$ successes).

In both $(1)$ and $(2)$, the binomial coefficient is defined by

$\displaystyle (3) \ \ \ \ \ \binom{y}{k}=\frac{y!}{k! \ (y-k)!}=\frac{y(y-1) \cdots (y-(k-1))}{k!}$

where $y$ is a positive integer and $k$ is a nonnegative integer. However, the right-hand-side of $(3)$ can be calculated even if $y$ is not a positive integer. Thus the binomial coefficient $\displaystyle \binom{y}{k}$ can be expanded to work for all real number $y$. However $k$ must still be nonnegative integer.

$\displaystyle (4) \ \ \ \ \ \binom{y}{k}=\frac{y(y-1) \cdots (y-(k-1))}{k!}$

For convenience, we let $\displaystyle \binom{y}{0}=1$. When the real number $y>k-1$, the binomial coefficient in $(4)$ can be expressed as:

$\displaystyle (5) \ \ \ \ \ \binom{y}{k}=\frac{\Gamma(y+1)}{\Gamma(k+1) \Gamma(y-k+1)}$

where $\Gamma(\cdot)$ is the gamma function.

With the more relaxed notion of binomial coefficient, the probability function in $(2)$ above can be defined for all real number r. Thus the general version of the negative binomial distribution has two parameters r and $p$, both real numbers, such that $0. The following is its probability function.

$\displaystyle (6) \ \ \ \ \ P(X=x)=\binom{x+r-1}{x} p^r (1-p)^x \ \ \ \ \ \ \ x=0,1,2,\cdots$

Whenever r in $(6)$ is a real number that is not a positive integer, the interpretation of counting the number of failures until the occurrence of the rth success is no longer important. Instead we can think of it simply as a count distribution.

The following alternative parametrization of the negative binomial distribution is also useful.

$\displaystyle (6a) \ \ \ \ \ P(X=x)=\binom{x+r-1}{x} \biggl(\frac{\alpha}{\alpha+1}\biggr)^r \biggl(\frac{1}{\alpha+1}\biggr)^x \ \ \ \ \ \ \ x=0,1,2,\cdots$

The parameters in this alternative parametrization are r and $\alpha>0$. Clearly, the ratio $\frac{\alpha}{\alpha+1}$ takes the place of $p$ in $(6)$. Unless stated otherwise, we use the parametrization of $(6)$.
________________________________________________________________________

What is negative about the negative binomial distribution?
What is negative about this distribution? What is binomial about this distribution? The name is suggested by the fact that the binomial coefficient in $(6)$ can be rearranged as follows:

\displaystyle \begin{aligned}(7) \ \ \ \ \ \binom{x+r-1}{x}&=\frac{(x+r-1)(x+r-2) \cdots r}{x!} \\&=(-1)^x \frac{(-r-(x-1))(-r-(x-2)) \cdots (-r)}{x!} \\&=(-1)^x \frac{(-r)(-r-1) \cdots (-r-(x-1))}{x!} \\&=(-1)^x \binom{-r}{x} \end{aligned}

The calculation in $(7)$ can be used to verify that $(6)$ is indeed a probability function, that is, all the probabilities sum to 1.

\displaystyle \begin{aligned}(8) \ \ \ \ \ 1&=p^r p^{-r}\\&=p^r (1-q)^{-r} \\&=p^r \sum \limits_{x=0}^\infty \binom{-r}{x} (-q)^x \ \ \ \ \ \ \ \ (8.1) \\&=p^r \sum \limits_{x=0}^\infty (-1)^x \binom{-r}{x} q^x \\&=\sum \limits_{x=0}^\infty \binom{x+r-1}{x} p^r q^x \end{aligned}

In $(8)$, we take $q=1-p$. The step $(8.1)$ above uses the following formula known as the Newton’s binomial formula.

$\displaystyle (9) \ \ \ \ \ (1+t)^w=\sum \limits_{k=0}^\infty \binom{w}{k} t^k$

________________________________________________________________________

The Generating Function
By definition, the following is the generating function of the negative binomial distribution, using :

$\displaystyle (10) \ \ \ \ \ g(z)=\sum \limits_{x=0}^\infty \binom{r+x-1}{x} p^r q^x z^x$

where $q=1-p$. Using a similar calculation as in $(8)$, the generating function can be simplified as:

$\displaystyle (11) \ \ \ \ \ g(z)=p^r (1-q z)^{-r}=\frac{p^r}{(1-q z)^r}=\frac{p^r}{(1-(1-p) z)^r}; \ \ \ \ \ z<\frac{1}{1-p}$

As a result, the moment generating function of the negative binomial distribution is:

$\displaystyle (12) \ \ \ \ \ M(t)=\frac{p^r}{(1-(1-p) e^t)^r}; \ \ \ \ \ \ \ t<-ln(1-p)$

________________________________________________________________________

Independent Sum

One useful property of the negative binomial distribution is that the independent sum of negative binomial random variables, all with the same parameter $p$, also has a negative binomial distribution. Let $Y=Y_1+Y_2+\cdots+Y_n$ be an independent sum such that each $X_i$ has a negative binomial distribution with parameters $r_i$ and $p$. Then the sum $Y=Y_1+Y_2+\cdots+Y_n$ has a negative binomial distribution with parameters $r=r_1+\cdots+r_n$ and $p$.

Note that the generating function of an independent sum is the product of the individual generating functions. The following shows that the product of the individual generating functions is of the same form as $(11)$, thus proving the above assertion.

$\displaystyle (13) \ \ \ \ \ h(z)=\frac{p^{\sum \limits_{i=1}^n r_i}}{(1-(1-p) z)^{\sum \limits_{i=1}^n r_i}}$
________________________________________________________________________

Mean and Variance
The mean and variance can be obtained from the generating function. From $E(X)=g'(1)$ and $E(X^2)=g'(1)+g^{(2)}(1)$, we have:

$\displaystyle (14) \ \ \ \ \ E(X)=\frac{r(1-p)}{p} \ \ \ \ \ \ \ \ \ \ \ \ \ Var(X)=\frac{r(1-p)}{p^2}$

Note that $Var(X)=\frac{1}{p} E(X)>E(X)$. Thus when the sample data suggest that the variance is greater than the mean, the negative binomial distribution is an excellent alternative to the Poisson distribution. For example, suppose that the sample mean and the sample variance are 3.6 and 7.1. In exploring the possibility of fitting the data using the negative binomial distribution, we would be interested in the negative binomial distribution with this mean and variance. Then plugging these into $(14)$ produces the negative binomial distribution with $r=3.7$ and $p=0.507$.
________________________________________________________________________

The Poisson-Gamma Mixture
One important application of the negative binomial distribution is that it is a mixture of a family of Poisson distributions with Gamma mixing weights. Thus the negative binomial distribution can be viewed as a generalization of the Poisson distribution. The negative binomial distribution can be viewed as a Poisson distribution where the Poisson parameter is itself a random variable, distributed according to a Gamma distribution. Thus the negative binomial distribution is known as a Poisson-Gamma mixture.

In an insurance application, the negative binomial distribution can be used as a model for claim frequency when the risks are not homogeneous. Let $N$ has a Poisson distribution with parameter $\theta$, which can be interpreted as the number of claims in a fixed period of time from an insured in a large pool of insureds. There is uncertainty in the parameter $\theta$, reflecting the risk characteristic of the insured. Some insureds are poor risks (with large $\theta$) and some are good risks (with small $\theta$). Thus the parameter $\theta$ should be regarded as a random variable $\Theta$. The following is the conditional distribution of $N$ (conditional on $\Theta=\theta$):

$\displaystyle (15) \ \ \ \ \ P(N=n \lvert \Theta=\theta)=\frac{e^{-\theta} \ \theta^n}{n!} \ \ \ \ \ \ \ \ \ \ n=0,1,2,\cdots$

Suppose that $\Theta$ has a Gamma distribution with scale parameter $\alpha$ and shape parameter $\beta$. The following is the probability density function of $\Theta$.

$\displaystyle (16) \ \ \ \ \ g(\theta)=\frac{\alpha^\beta}{\Gamma(\beta)} \theta^{\beta-1} e^{-\alpha \theta} \ \ \ \ \ \ \ \ \ \ \theta>0$

Then the joint density of $N$ and $\Theta$ is:

$\displaystyle (17) \ \ \ \ \ P(N=n \lvert \Theta=\theta) \ g(\theta)=\frac{e^{-\theta} \ \theta^n}{n!} \ \frac{\alpha^\beta}{\Gamma(\beta)} \theta^{\beta-1} e^{-\alpha \theta}$

The unconditional distribution of $N$ is obtained by summing out $\theta$ in $(17)$.

\displaystyle \begin{aligned}(18) \ \ \ \ \ P(N=n)&=\int_0^\infty P(N=n \lvert \Theta=\theta) \ g(\theta) \ d \theta \\&=\int_0^\infty \frac{e^{-\theta} \ \theta^n}{n!} \ \frac{\alpha^\beta}{\Gamma(\beta)} \ \theta^{\beta-1} \ e^{-\alpha \theta} \ d \theta \\&=\int_0^\infty \frac{\alpha^\beta}{n! \ \Gamma(\beta)} \ \theta^{n+\beta-1} \ e^{-(\alpha+1) \theta} d \theta \\&=\frac{\alpha^\beta}{n! \ \Gamma(\beta)} \ \frac{\Gamma(n+\beta)}{(\alpha+1)^{n+\beta}} \int_0^\infty \frac{(\alpha+1)^{n+\beta}}{\Gamma(n+\beta)} \theta^{n+\beta-1} \ e^{-(\alpha+1) \theta} d \theta \\&=\frac{\alpha^\beta}{n! \ \Gamma(\beta)} \ \frac{\Gamma(n+\beta)}{(\alpha+1)^{n+\beta}} \\&=\frac{\Gamma(n+\beta)}{\Gamma(n+1) \ \Gamma(\beta)} \ \biggl( \frac{\alpha}{\alpha+1}\biggr)^\beta \ \biggl(\frac{1}{\alpha+1}\biggr)^n \\&=\binom{n+\beta-1}{n} \ \biggl( \frac{\alpha}{\alpha+1}\biggr)^\beta \ \biggl(\frac{1}{\alpha+1}\biggr)^n \ \ \ \ \ \ \ \ \ n=0,1,2,\cdots \end{aligned}

Note that the integral in the fourth step in $(18)$ is 1.0 since the integrand is the pdf of a Gamma distribution. The above probability function is that of a negative binomial distribution. It is of the same form as $(6a)$. Equivalently, it is also of the form $(6)$ with parameter $r=\beta$ and $p=\frac{\alpha}{\alpha+1}$.

The variance of the negative binomial distribution is greater than the mean. In a Poisson distribution, the mean equals the variance. Thus the unconditional distribution of $N$ is more dispersed than its conditional distributions. This is a characteristic of mixture distributions. The uncertainty in the parameter variable $\Theta$ has the effect of increasing the unconditional variance of the mixture distribution of $N$. The variance of a mixture distribution has two components, the weighted average of the conditional variances and the variance of the conditional means. The second component represents the additional variance introduced by the uncertainty in the parameter $\Theta$ (see The variance of a mixture).

________________________________________________________________________

The Poisson Distribution as Limit of Negative Binomial
There is another connection to the Poisson distribution, that is, the Poisson distribution is a limiting case of the negative binomial distribution. We show that the generating function of the Poisson distribution can be obtained by taking the limit of the negative binomial generating function as $r \rightarrow \infty$. Interestingly, the Poisson distribution is also the limit of the binomial distribution.

In this section, we use the negative binomial parametrization of $(6a)$. By replacing $\frac{\alpha}{\alpha+1}$ for $p$, the following are the mean, variance, and the generating function for the probability function in $(6a)$:

\displaystyle \begin{aligned}(19) \ \ \ \ \ \ &E(X)=\frac{r}{\alpha} \\&\text{ }\\&Var(X)=\frac{\alpha+1}{\alpha} \ \frac{r}{\alpha}=\frac{r(\alpha+1)}{\alpha^2} \\&\text{ } \\&g(z)=\frac{1}{[1-\frac{1}{\alpha}(z-1)]^r} \ \ \ \ \ \ \ z<\alpha+1 \end{aligned}

Let r goes to infinity and $\displaystyle \frac{1}{\alpha}$ goes to zero and at the same time keeping their product constant. Thus $\displaystyle \mu=\frac{r}{\alpha}$ is constant (this is the mean of the negative binomial distribution). We show the following:

$\displaystyle (20) \ \ \ \ \ \lim \limits_{r \rightarrow \infty} [1-\frac{\mu}{r}(z-1)]^{-r}=e^{\mu (z-1)}$

The right-hand side of $(20)$ is the generating function of the Poisson distribution with mean $\mu$. The generating function in the left-hand side is that of a negative binomial distribution with mean $\displaystyle \mu=\frac{r}{\alpha}$. The following is the derivation of $(20)$.

\displaystyle \begin{aligned}(21) \ \ \ \ \ \lim \limits_{r \rightarrow \infty} [1-\frac{\mu}{r}(z-1)]^{-r}&=\lim \limits_{r \rightarrow \infty} e^{\displaystyle \biggl(ln[1-\frac{\mu}{r}(z-1)]^{-r}\biggr)} \\&=\lim \limits_{r \rightarrow \infty} e^{\displaystyle \biggl(-r \ ln[1-\frac{\mu}{r}(z-1)]\biggr)} \\&=e^{\displaystyle \biggl(\lim \limits_{r \rightarrow \infty} -r \ ln[1-\frac{\mu}{r}(z-1)]\biggr)} \end{aligned}

We now focus on the limit in the exponent.

\displaystyle \begin{aligned}(22) \ \ \ \ \ \lim \limits_{r \rightarrow \infty} -r \ ln[1-\frac{\mu}{r}(z-1)]&=\lim \limits_{r \rightarrow \infty} \frac{ln(1-\frac{\mu}{r} (z-1))^{-1}}{r^{-1}} \\&=\lim \limits_{r \rightarrow \infty} \frac{(1-\frac{\mu}{r} (z-1)) \ \mu (z-1) r^{-2}}{r^{-2}} \\&=\mu (z-1) \end{aligned}

The middle step in $(22)$ uses the L’Hopital’s Rule. The result in $(20)$ is obtained by combining $(21)$ and $(22)$.

________________________________________________________________________

Reference

1. Klugman S.A., Panjer H. H., Wilmot G. E. Loss Models, From Data to Decisions, Second Edition., Wiley-Interscience, a John Wiley & Sons, Inc., New York, 2004