In many binomial problems, the number of Bernoulli trials is large, relatively speaking, and the probability of success is small such that is of moderate magnitude. For example, consider problems that deal with rare events where the probability of occurrence is small (as a concrete example, counting the number of people with July 1 as birthday out of a random sample of 1000 people). It is often convenient to approximate such binomial problems using the Poisson distribution. The justification for using the Poisson approximation is that the Poisson distribution is a limiting case of the binomial distribution. Now that cheap computing power is widely available, it is quite easy to use computer or other computing devices to obtain exact binomial probabiities for experiments up to 1000 trials or more. Though the Poisson approximation may no longer be necessary for such problems, knowing how to get from binomial to Poisson is important for understanding the Poisson distribution itself.

Consider a counting process that describes the occurrences of a certain type of events of interest in a unit time interval subject to three simplifying assumptions (discussed below). We are interested in counting the number of occurrences of the event of interest in a unit time interval. As a concrete example, consider the number of cars arriving at an observation point in a certain highway in a period of time, say one hour. We wish to model the probability distribution of how many cars that will arrive at the observation point in this particular highway in one hour. Let be the random variable described by this probability distribution. We wish to konw the probability that there are cars arriving in one hour. We start with using a binomial distribution as an approximation to the probability . We will see that upon letting , the is a Poisson probability.

Suppose that we know , perhaps an average obtained after observing cars at the observation points for many hours. The simplifying assumptions alluded to earlier are the following:

- The numbers of cars arriving in nonoverlapping time intervals are independent.
- The probability of one car arriving in a very short time interval of length is .
- The probability of having more than one car arriving in a very short time interval is esstentially zero.

Assumption 1 means that a large number of cars arriving in one period does not imply fewer cars will arrival in the next period and vice versa. In other words, the number of cars that arrive in any one given moment does affect the number of cars that will arrive subsequently. Knowing how many cars arriving in one minute will not help predict the number of cars arriving at the next minute. Assumption 2 means that the rate of cars arriving is dependent only on the length of the time interval and not on when the time interval occurs (e.g. not on whether it is at the beginning of the hour or toward the end of the hour). The assumptions 2 and 3 allow us to think of a very short period of time as a Bernoulli trial. Thinking of the arrival of a car as a success, each short time interval will result in only one success or one failure.

To start, we can break up the hour into 60 minutes (into 60 Bernoulli trials). We then consider the binomial distribution with and . So the following is an approximation to our desired probability distribution.

Conceivably, there can be more than 1 car arriving in a minute and observing cars in a one-minute interval may not be a Bernoulli trial. For a one-minute interval to qualify as a Bernoulli trial, there is either no car arriving or 1 car arriving in that one minute. So we can break up an hour into 3,600 seconds (into 3,600 Bernoulli trials). We now consider the binomial distribution with and .

It is also conceivable that more than 1 car can arrive in one second and observing cars in one-second interval may still not qualify as a Bernoulli trial. So we need to get more granular. We can divide up the hour into equal subintervals, each of length . The assumptions 2 and 3 ensure that each subinterval is a Bernoulli trial (either it is a success or a failure; one car arriving or no car arriving). Assumption 1 tells us that all the subintervals are independent. So breaking up the hour into moments and counting the number of moments that are successes will result in a binomial distribution with parameters and . So we are ready to proceed with the following approximation to our probability distribution .

As we get more granular, . We show that the limit of the binomial probability in is the Poisson distribution with parameter . We show the following.

In the derivation of , we need the following two mathematical tools. The statement is one of the definitions of the mathematical constant e. In the statement , the integer in the numerator is greater than the integer in the denominator. It says that whenever we work with such a ratio of two factorials, the result is the product of with the smaller integers down to . There are exactly terms in the product.

The following is the derivation of .

In , we have . The reason being that the numerator is a polynomial where the leading term is . Upon dividing by and taking the limit, we get 1. Based on , we have . For the last limit in the derivation we have .

We conclude with some comments. As the above derivation shows, the Poisson distribution is at heart a binomial distribution. When we divide the unit time interval into more and more subintervals (as the subintervals get more and more granular), the resulting binomial distribution behaves more and more like the Poisson distribution.

The three assumtions used in the derivation are called the Poisson postulates, which are the underlying assumptions that govern a Poisson process. Such a random process describes the occurrences of some type of events that are of interest (e.g. the arrivals of cars in our example) in a fixed period of time. The positive constant indicated in Assumption 2 is the parameter of the Poisson process, which can be interpreted as the rate of occurrences of the event of interest (or rate of changes, or rate of arrivals) in a unit time interval, meaning that the positive constant is the mean number of occurrences in the unit time interval. The derivation in shows that whenever a certain type of events occurs according to a Poisson process with parameter , the counting variable of the number of occurrences in the unit time interval is distributed according to the Poisson distribution as indicated in .

If we observe the occurrences of events over intervals of length other than unit length, say, in an interval of length , the counting process is governed by the same three postulates, with the modification to Assumption 2 that the rate of changes of the process is now . The mean number of occurrences in the time interval of length is now . The Assumption 2 now states that for any very short time interval of length (and that is also a subinterval of the interval of length under observation), the probability of having one occurrence of event in this short interval is . Applyng the same derivation, it can be shown that the number of occurrences () in a time interval of length has the Poisson distribution with the following probability mass function.

Pingback: Gamma distribution and Poisson distribution | Applied Probability and Statistics