Getting from Binomial to Poisson

In many binomial problems, the number of Bernoulli trials n is large, relatively speaking, and the probability of success p is small such that n p is of moderate magnitude. For example, consider problems that deal with rare events where the probability of occurrence is small (as a concrete example, counting the number of people with July 1 as birthday out of a random sample of 1000 people). It is often convenient to approximate such binomial problems using the Poisson distribution. The justification for using the Poisson approximation is that the Poisson distribution is a limiting case of the binomial distribution. Now that cheap computing power is widely available, it is quite easy to use computer or other computing devices to obtain exact binomial probabiities for experiments up to 1000 trials or more. Though the Poisson approximation may no longer be necessary for such problems, knowing how to get from binomial to Poisson is important for understanding the Poisson distribution itself.

Consider a counting process that describes the occurrences of a certain type of events of interest in a unit time interval subject to three simplifying assumptions (discussed below). We are interested in counting the number of occurrences of the event of interest in a unit time interval. As a concrete example, consider the number of cars arriving at an observation point in a certain highway in a period of time, say one hour. We wish to model the probability distribution of how many cars that will arrive at the observation point in this particular highway in one hour. Let X be the random variable described by this probability distribution. We wish to konw the probability that there are k cars arriving in one hour. We start with using a binomial distribution as an approximation to the probability P(X=k). We will see that upon letting n \rightarrow \infty, the P(X=k) is a Poisson probability.

Suppose that we know E(X)=\alpha, perhaps an average obtained after observing cars at the observation points for many hours. The simplifying assumptions alluded to earlier are the following:

  1. The numbers of cars arriving in nonoverlapping time intervals are independent.
  2. The probability of one car arriving in a very short time interval of length h is \alpha h.
  3. The probability of having more than one car arriving in a very short time interval is esstentially zero.

Assumption 1 means that a large number of cars arriving in one period does not imply fewer cars will arrival in the next period and vice versa. In other words, the number of cars that arrive in any one given moment does affect the number of cars that will arrive subsequently. Knowing how many cars arriving in one minute will not help predict the number of cars arriving at the next minute. Assumption 2 means that the rate of cars arriving is dependent only on the length of the time interval and not on when the time interval occurs (e.g. not on whether it is at the beginning of the hour or toward the end of the hour). The assumptions 2 and 3 allow us to think of a very short period of time as a Bernoulli trial. Thinking of the arrival of a car as a success, each short time interval will result in only one success or one failure.

To start, we can break up the hour into 60 minutes (into 60 Bernoulli trials). We then consider the binomial distribution with n=60 and p=\frac{\alpha}{60}. So the following is an approximation to our desired probability distribution.

\displaystyle (1) \ \ \ \ \ P(X=k) \approx \binom{60}{k} \biggl(\frac{\alpha}{60}\biggr)^k \biggr(1-\frac{\alpha}{60}\biggr)^{60-k} \ \ \ \ \ k=0,1,2,\cdots, 60

Conceivably, there can be more than 1 car arriving in a minute and observing cars in a one-minute interval may not be a Bernoulli trial. For a one-minute interval to qualify as a Bernoulli trial, there is either no car arriving or 1 car arriving in that one minute. So we can break up an hour into 3,600 seconds (into 3,600 Bernoulli trials). We now consider the binomial distribution with n=3600 and p=\frac{\alpha}{3600}.

\displaystyle (2) \ \ \ \ \ P(X=k) \approx \binom{3600}{k} \biggl(\frac{\alpha}{3600}\biggr)^k \biggr(1-\frac{\alpha}{3600}\biggr)^{3600-k} \ \ \ \ \ k=0,1,2,\cdots, 3600

It is also conceivable that more than 1 car can arrive in one second and observing cars in one-second interval may still not qualify as a Bernoulli trial. So we need to get more granular. We can divide up the hour into n equal subintervals, each of length \frac{1}{n}. The assumptions 2 and 3 ensure that each subinterval is a Bernoulli trial (either it is a success or a failure; one car arriving or no car arriving). Assumption 1 tells us that all the n subintervals are independent. So breaking up the hour into n moments and counting the number of moments that are successes will result in a binomial distribution with parameters n and p=\frac{\alpha}{n}. So we are ready to proceed with the following approximation to our probability distribution P(X=k).

\displaystyle (3) \ \ \ \ \ P(X=k) \approx \binom{n}{k} \biggl(\frac{\alpha}{n}\biggr)^k \biggr(1-\frac{\alpha}{n}\biggr)^{n-k} \ \ \ \ \ k=0,1,2,\cdots, n

As we get more granular, n \rightarrow \infty. We show that the limit of the binomial probability in (3) is the Poisson distribution with parameter \alpha. We show the following.

\displaystyle (4) \ \ \ \ \ P(X=k) = \lim \limits_{n \rightarrow \infty} \binom{n}{k} \biggl(\frac{\alpha}{n}\biggr)^k \biggr(1-\frac{\alpha}{n}\biggr)^{n-k}=\frac{e^{-\alpha} \alpha^k}{k!} \ \ \ \ \ \ k=0,1,2,\cdots

In the derivation of (4), we need the following two mathematical tools. The statement (5) is one of the definitions of the mathematical constant e. In the statement (6), the integer n in the numerator is greater than the integer k in the denominator. It says that whenever we work with such a ratio of two factorials, the result is the product of n with the smaller integers down to (n-(k-1)). There are exactly k terms in the product.

\displaystyle (5) \ \ \ \ \ \lim \limits_{n \rightarrow \infty} \biggl(1+\frac{x}{n}\biggr)^n=e^x

\displaystyle (6) \ \ \ \ \ \frac{n!}{(n-k)!}=n(n-1)(n-2) \cdots (n-k+1) \ \ \ \ \ \ \ \  k<n

The following is the derivation of (4).

\displaystyle \begin{aligned}(7) \ \ \ \ \  P(X=k)&=\lim \limits_{n \rightarrow \infty} \binom{n}{k} \biggl(\frac{\alpha}{n}\biggr)^k \biggr(1-\frac{\alpha}{n}\biggr)^{n-k} \\&=\lim \limits_{n \rightarrow \infty} \ \frac{n!}{k! (n-k)!} \biggl(\frac{\alpha}{n}\biggr)^k \biggr(1-\frac{\alpha}{n}\biggr)^{n-k} \\&=\lim \limits_{n \rightarrow \infty} \ \frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k} \biggl(\frac{\alpha^k}{k!}\biggr) \biggr(1-\frac{\alpha}{n}\biggr)^{n} \biggr(1-\frac{\alpha}{n}\biggr)^{-k} \\&=\biggl(\frac{\alpha^k}{k!}\biggr) \lim \limits_{n \rightarrow \infty} \ \frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k} \\&\times \ \ \ \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\alpha}{n}\biggr)^{n} \ \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\alpha}{n}\biggr)^{-k} \\&=\frac{e^{-\alpha} \alpha^k}{k!} \end{aligned}

In (7), we have \displaystyle \lim \limits_{n \rightarrow \infty} \ \frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k}=1. The reason being that the numerator is a polynomial where the leading term is n^k. Upon dividing by n^k and taking the limit, we get 1. Based on (5), we have \displaystyle \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\alpha}{n}\biggr)^{n}=e^{-\alpha}. For the last limit in the derivation we have \displaystyle \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\alpha}{n}\biggr)^{-k}=1.

We conclude with some comments. As the above derivation shows, the Poisson distribution is at heart a binomial distribution. When we divide the unit time interval into more and more subintervals (as the subintervals get more and more granular), the resulting binomial distribution behaves more and more like the Poisson distribution.

The three assumtions used in the derivation are called the Poisson postulates, which are the underlying assumptions that govern a Poisson process. Such a random process describes the occurrences of some type of events that are of interest (e.g. the arrivals of cars in our example) in a fixed period of time. The positive constant \alpha indicated in Assumption 2 is the parameter of the Poisson process, which can be interpreted as the rate of occurrences of the event of interest (or rate of changes, or rate of arrivals) in a unit time interval, meaning that the positive constant \alpha is the mean number of occurrences in the unit time interval. The derivation in (7) shows that whenever a certain type of events occurs according to a Poisson process with parameter \alpha, the counting variable of the number of occurrences in the unit time interval is distributed according to the Poisson distribution as indicated in (4).

If we observe the occurrences of events over intervals of length other than unit length, say, in an interval of length t, the counting process is governed by the same three postulates, with the modification to Assumption 2 that the rate of changes of the process is now \alpha t. The mean number of occurrences in the time interval of length t is now \alpha t. The Assumption 2 now states that for any very short time interval of length h (and that is also a subinterval of the interval of length t under observation), the probability of having one occurrence of event in this short interval is (\alpha t)h. Applyng the same derivation, it can be shown that the number of occurrences (X_t) in a time interval of length t has the Poisson distribution with the following probability mass function.

\displaystyle (8) \ \ \ \ \ P(X_t=k)=\frac{e^{-\alpha t} \ (\alpha t)^k}{k!} \ \ \ \ \ \ \ \  k=0,1,2,\cdots

Relating Binomial and Negative Binomial

The negative binomial distribution has a natural intepretation as a waiting time until the arrival of the rth success (when the parameter r is a positive integer). The waiting time refers to the number of independent Bernoulli trials needed to reach the rth success. This interpretation of the negative binomial distribution gives us a good way of relating it to the binomial distribution. For example, if the rth success takes place after k failed Bernoulli trials (for a total of k+r trials), then there can be at most r-1 successes in the first k+r trials. This tells us that the survival function of the negative binomial distribution is the cumulative distribution function (cdf) of a binomial distribution. In this post, we gives the details behind this observation. A previous post on the negative binomial distribution is found here.

A random experiment resulting in two distinct outcomes (success or failure) is called a Bernoulli trial (e.g. head or tail in a coin toss, whether or not the birthday of a customer is the first of January, whether an insurance claim is above or below a given threshold etc). Suppose a series of independent Bernoulli trials are performed until reaching the rth success where the probability of success in each trial is p. Let X_r be the number of failures before the occurrence of the rth success. The following is the probablity mass function of X_r.

\displaystyle (1) \ \ \ \ P(X_r)=\binom{k+r-1}{k} p^r (1-p)^k \ \ \ \ \ \ k=0,1,2,3,\cdots

Be definition, the survival function and cdf of X_r are:

\displaystyle (2) \ \ \ \ P(X_r > k)=\sum \limits_{j=k+1}^\infty \binom{j+r-1}{j} p^r (1-p)^j \ \ \ \ \ \ k=0,1,2,3,\cdots

\displaystyle (3) \ \ \ \ P(X_r \le k)=\sum \limits_{j=0}^k \binom{j+r-1}{j} p^r (1-p)^j \ \ \ \ \ \ k=0,1,2,3,\cdots

For each positive integer k, let Y_{r+k} be the number of successes in performing a sequence of r+k independent Bernoulli trials where p is the probability of success. In other words, Y_{r+k} has a binomial distribution with parameters r+k and p.

If the random experiment requires more than k failures to reach the rth success, there are at most r-1 successes in the first k+r trails. Thus the survival function of X_r is the same as the cdf of a binomial distribution. Equivalently, the cdf of X_r is the same as the survival function of a binomial distribution. We have the following:

\displaystyle \begin{aligned}(4) \ \ \ \ P(X_r > k)&=P(Y_{k+r} \le r-1) \\&=\sum \limits_{j=0}^{r-1} \binom{k+r}{j} p^j (1-p)^{k+r-j} \ \ \ \ \ \ k=0,1,2,3,\cdots \end{aligned}

\displaystyle \begin{aligned}(5) \ \ \ \ P(X_r \le k)&=P(Y_{k+r} > r-1) \ \ \ \ \ \ k=0,1,2,3,\cdots \end{aligned}

The relation (4) is analogous to the relationship between the Gamma distribution and the Poisson distribution. Recall that a Gamma distribution with shape parameter \alpha and scale parameter n, where n is a positive integer, can be interpreted as the waiting time until the nth change in a Poisson process. Thus, if the nth change takes place after time t, there can be at most n-1 arrivals in the time interval [0,t]. Thus the survival function of this Gamma distribution is the same as the cdf of a Poisson distribution. The relation (4) is analogous to the following relation.

\displaystyle (5) \ \ \ \ \int_t^\infty \frac{\alpha^n}{(n-1)!} \ x^{n-1} \ e^{-\alpha x} \ dx=\sum \limits_{j=0}^{n-1} \frac{e^{-\alpha t} \ (\alpha t)^j}{j!}

A previous post on the negative binomial distribution is found here.