# The Poisson Distribution

Let $\alpha$ be a positive constant. Consider the following probability distribution:

$\displaystyle (1) \ \ \ \ \ P(X=j)=\frac{e^{-\alpha} \alpha^j}{j!} \ \ \ \ \ j=0,1,2,\cdots$

The above distribution is said to be a Poisson distribution with parameter $\alpha$. The Poisson distribution is usually used to model the random number of events occurring in a fixed time interval. As will be shown below, $E(X)=\alpha$. Thus the parameter $\alpha$ is the rate of occurrence of the random events; it indicates on average how many events occur per unit of time. Examples of random events that may be modeled by the Poisson distribution include the number of alpha particles emitted by a radioactive substance counted in a prescribed area during a fixed period of time, the number of auto accidents in a fixed period of time or the number of losses arising from a group of insureds during a policy period.

Each of the above examples can be thought of as a process that generates a number of arrivals or changes in a fixed period of time. If such a counting process leads to a Poisson distribution, then the process is said to be a Poisson process.

We now discuss some basic properties of the Poisson distribution. Using the Taylor series expansion of $e^{\alpha}$, the following shows that $(1)$ is indeed a probability distribution.

$\displaystyle . \ \ \ \ \ \ \ \sum \limits_{j=0}^\infty \frac{e^{-\alpha} \alpha^j}{j!}=e^{-\alpha} \sum \limits_{j=0}^\infty \frac{\alpha^j}{j!}=e^{-\alpha} e^{\alpha}=1$

The generating function of the Poisson distribution is $g(z)=e^{\alpha (z-1)}$ (see The generating function). The mean and variance can be calculated using the generating function.

\displaystyle \begin{aligned}(2) \ \ \ \ \ &E(X)=g'(1)=\alpha \\&\text{ } \\&E[X(X-1)]=g^{(2)}(1)=\alpha^2 \\&\text{ } \\&Var(X)=E[X(X-1)]+E(X)-E(X)^2=\alpha^2+\alpha-\alpha^2=\alpha \end{aligned}

The Poisson distribution can also be interpreted as an approximation to the binomial distribution. It is well known that the Poisson distribution is the limiting case of binomial distributions (see [1] or this post).

$\displaystyle (3) \ \ \ \ \ \lim \limits_{n \rightarrow \infty} \binom{n}{j} \biggl(\frac{\alpha}{n}\biggr)^j \biggl(1-\frac{\alpha}{n}\biggr)^{n-j}=\frac{e^{-\alpha} \alpha^j}{j!}$

One application of $(3)$ is that we can use Poisson probabilities to approximate Binomial probabilities. The approximation is reasonably good when the number of trials $n$ in a binomial distribution is large and the probability of success $p$ is small. The binomial mean is $n p$ and the variance is $n p (1-p)$. When $p$ is small, $1-p$ is close to 1 and the binomial variance is approximately $np \approx n p (1-p)$. Whenever the mean of a discrete distribution is approximately equaled to the mean, the Poisson approximation is quite good. As a rule of thumb, we can use Poisson to approximate binomial if $n \le 100$ and $p \le 0.01$.

As an example, we use the Poisson distribution to estimate the probability that at most 1 person out of 1000 will have a birthday on the New Year Day. Let $n=1000$ and $p=365^{-1}$. So we use the Poisson distribution with $\alpha=1000 \times 365^{-1}$. The following is an estimate using the Poisson distribution.

$\displaystyle . \ \ \ \ \ \ \ P(X \le 1)=e^{-\alpha}+\alpha e^{-\alpha}=(1+\alpha) e^{-\alpha}=0.2415$

Another useful property is that the independent sum of Poisson distributions also has a Poisson distribution. Specifically, if each $X_i$ has a Poisson distribution with parameter $\alpha_i$, then the independent sum $X=X_1+\cdots+X_n$ has a Poisson distribution with parameter $\alpha=\alpha_1+\cdots+\alpha_n$. One way to see this is that the product of Poisson generating functions has the same general form as $g(z)=e^{\alpha (z-1)}$ (see The generating function). One interpretation of this property is that when merging several arrival processes, each of which follow a Poisson distribution, the result is still a Poisson distribution.

For example, suppose that in an airline ticket counter, the arrival of first class customers follows a Poisson process with a mean arrival rate of 8 per 15 minutes and the arrival of customers flying coach follows a Poisson distribution with a mean rate of 12 per 15 minutes. Then the arrival of customers of either types has a Poisson distribution with a mean rate of 20 per 15 minutes or 80 per hour.

A Poisson distribution with a large mean can be thought of as an independent sum of Poisson distributions. For example, a Poisson distribution with a mean of 50 is the independent sum of 50 Poisson distributions each with mean 1. Because of the central limit theorem, when the mean is large, we can approximate the Poisson using the normal distribution.

In addition to merging several Poisson distributions into one combined Poisson distribution, we can also split a Poisson into several Poisson distributions. For example, suppose that a stream of customers arrives according to a Poisson distribution with parameter $\alpha$ and each customer can be classified into one of two types (e.g. no purchase vs. purchase) with probabilities $p_1$ and $p_2$, respectively. Then the number of “no purchase” customers and the number of “purchase” customers are independent Poisson random variables with parameters $\alpha p_1$ and $\alpha p_2$, respectively. For more details on the splitting of Poisson, see Splitting a Poisson Distribution.

Reference

1. Feller W. An Introduction to Probability Theory and Its Applications, Third Edition, John Wiley & Sons, New York, 1968

# The generating function

Consider the function $g(z)=\displaystyle e^{\alpha (z-1)}$ where $\alpha$ is a positive constant. The following shows the derivatives of this function.

\displaystyle \begin{aligned}. \ \ \ \ \ \ &g(z)=e^{\alpha (z-1)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ g(0)=e^{-\alpha} \\&\text{ } \\&g'(z)=e^{\alpha (z-1)} \ \alpha \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ g'(0)=e^{-\alpha} \ \alpha \\&\text{ } \\&g^{(2)}(z)=e^{\alpha (z-1)} \ \alpha^2 \ \ \ \ \ \ \ \ \ \ \ \ \ \ g^{(2)}(0)=2! \ \frac{e^{-\alpha} \ \alpha^2}{2!} \\&\text{ } \\&g^{(3)}(z)=e^{\alpha (z-1)} \ \alpha^3 \ \ \ \ \ \ \ \ \ \ \ \ \ \ g^{(3)}(0)=3! \ \frac{e^{-\alpha} \ \alpha^3}{3!} \\&\text{ } \\&\ \ \ \ \ \ \ \ \cdots \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \cdots \\&\text{ } \\&g^{(n)}(z)=e^{\alpha (z-1)} \ \alpha^n \ \ \ \ \ \ \ \ \ \ \ \ \ \ g^{(n)}(0)=n! \ \frac{e^{-\alpha} \ \alpha^n}{n!} \end{aligned}

Note that the derivative of $g(z)$ at each order is a multiple of a Poisson probability. Thus the Poisson distribution is coded by the function $g(z)=\displaystyle e^{\alpha (z-1)}$. Because of this reason, such a function is called a generating function (or probability generating function). This post discusses some basic facts about the generating function (gf) and its cousin, the moment generating function (mgf). One important characteristic is that these functions generate probabilities and moments. Another important characteristic is that there is a one-to-one correspondence between a probability distribution and its generating function and moment generating function, i.e. two random variables with different cumulative distribution functions cannot have the same gf or mgf. In some situations, this fact is useful in working with independent sum of random variables.

————————————————————————————————————

The Generating Function
Suppose that $X$ is a random variable that takes only nonegative integer values with the probability function given by

$\text{ }$

$(1) \ \ \ \ \ \ P(X=j)=a_j, \ \ \ \ j=0,1,2,\cdots$

$\text{ }$

The idea of the generating function is that we use a power series to capture the entire probability distribution. The following defines the generating function that is associated with the above sequence $a_j$, .

$(2) \ \ \ \ \ \ g(z)=a_0+a_1 \ z+a_2 \ z^2+ \cdots=\sum \limits_{j=0}^\infty a_j \ z^j$

$\text{ }$

Since the elements of the sequence $a_j$ are probabilities, we can also call $g(z)$ the generating function of the probability distribution defined by the sequence in $(1)$. The generating function $g(z)$ is defined wherever the power series converges. It is clear that at the minimum, the power series in $(2)$ converges for $\lvert z \lvert \le 1$.

We discuss the following three properties of generating functions:

1. The generating function completely determines the distribution.
2. The moments of the distribution can be derived from the derivatives of the generating function.
3. The generating function of a sum of independent random variables is the product of the individual generating functions.

The Poisson generating function at the beginning of the post is an example demonstrating property 1 (see Example 0 below for the derivation of the generating function). In some cases, the probability distribution of an independent sum can be deduced from the product of the individual generating functions. Some examples are given below.

————————————————————————————————————
Generating Probabilities
We now discuss the property 1 indicated above. To see that $g(z)$ generates the probabilities, let’s look at the derivatives of $g(z)$:

\displaystyle \begin{aligned}(3) \ \ \ \ \ \ &g'(z)=a_1+2 a_2 \ z+3 a_3 \ z^2+\cdots=\sum \limits_{j=1}^\infty j a_j \ z^{j-1} \\&\text{ } \\&g^{(2)}(z)=2 a_2+6 a_3 \ z+ 12 a_4 \ z^2=\sum \limits_{j=2}^\infty j (j-1) a_j \ z^{j-2} \\&\text{ } \\&g^{(3)}(z)=6 a_3+ 24 a_4 \ z+60 a_5 \ z^2=\sum \limits_{j=3}^\infty j (j-1)(j-2) a_j \ z^{j-3} \\&\text{ } \\&\ \ \ \ \ \ \ \ \cdots \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \cdots \\&\text{ } \\&g^{(n)}(z)=\sum \limits_{j=n}^\infty j(j-1) \cdots (j-n+1) a_j \ z^{j-n}=\sum \limits_{j=n}^\infty \binom{j}{n} n! \ a_j \ z^{j-n} \end{aligned}

$\text{ }$

By letting $z=0$ above, all the terms vanishes except for the constant term. We have:

$\text{ }$

$(4) \ \ \ \ \ \ g^{(n)}(0)=n! \ a_n=n! \ P(X=n)$

$\text{ }$

Thus the generating function is a compact way of encoding the probability distribution. The probability distribution determines the generating function as seen in $(2)$. On the other hand, $(3)$ and $(4)$ demonstrate that the generating function also determines the probability distribution.

————————————————————————————————————
Generating Moments
The generating function also determines the moments (property 2 indicated above). For example, we have:

\displaystyle \begin{aligned}(5) \ \ \ \ \ \ &g'(1)=0 \ a_0+a_1+2 a_2+3 a_3+\cdots=\sum \limits_{j=0}^\infty j a_j=E(X) \\&\text{ } \\&g^{(2)}(1)=0 a_0 + 0 a_1+2 a_2+6 a_3+ 12 a_4+\cdots=\sum \limits_{j=0}^\infty j (j-1) a_j=E[X(X-1)] \\&\text{ } \\&E(X)=g'(1) \\&\text{ } \\&E(X^2)=g'(1)+g^{(2)}(1) \end{aligned}

$\text{ }$

Note that $g^{(n)}(1)=E[X(X-1) \cdots (X-(n-1))]$. Thus the higher moment $E(X^n)$ can be expressed in terms of $g^{(n)}(1)$ and $g^{(k)}(1)$ where $k.
————————————————————————————————————
More General Definitions
Note that the definition in $(2)$ can also be interpreted as the mathematical expectation of $z^X$, i.e., $g(z)=E(z^X)$. This provides a way to define the generating function for random variables that may take on values outside of the nonnegative integers. The following is a more general definition of the generating function of the random variable $X$, which is defined for all $z$ where the expectation exists.

$\text{ }$

$(6) \ \ \ \ \ \ g(z)=E(z^X)$

$\text{ }$

————————————————————————————————————
The Generating Function of Independent Sum
Let $X_1,X_2,\cdots,X_n$ be independent random variables with generating functions $g_1,g_2,\cdots,g_n$, respectively. Then the generating function of $X_1+X_2+\cdots+X_n$ is given by the product $g_1 \cdot g_2 \cdots g_n$.

Let $g(z)$ be the generating function of the independent sum $X_1+X_2+\cdots+X_n$. The following derives $g(z)$. Note that the general form of generating function $(6)$ is used.

\displaystyle \begin{aligned}(7) \ \ \ \ \ \ g(z)&=E(z^{X_1+\cdots+X_n}) \\&\text{ } \\&=E(z^{X_1} \cdots z^{X_n}) \\&\text{ } \\&=E(z^{X_1}) \cdots E(z^{X_n}) \\&\text{ } \\&=g_1(z) \cdots g_n(z) \end{aligned}

The probability distribution of a random variable is uniquely determined by its generating function. In particular, the generating function $g(z)$ of the independent sum $X_1+X_2+\cdots+X_n$ that is derived in $(7)$ is unique. So if the generating function is of a particular distribution, we can deduce that the distribution of the sum must be of the same distribution. See the examples below.

————————————————————————————————————
Example 0
In this example, we derive the generating function of the Poisson distribution. Based on the definition, we have:

\displaystyle \begin{aligned}. \ \ \ \ \ \ g(z)&=\sum \limits_{j=0}^\infty \frac{e^{-\alpha} \alpha^j}{j!} \ z^j \\&\text{ } \\&=\sum \limits_{j=0}^\infty \frac{e^{-\alpha} (\alpha z)^j}{j!} \\&\text{ } \\&=\frac{e^{-\alpha}}{e^{- \alpha z}} \sum \limits_{j=0}^\infty \frac{e^{-\alpha z} (\alpha z)^j}{j!} \\&\text{ } \\&=e^{\alpha (z-1)} \end{aligned}

$\text{ }$

Example 1
Suppose that $X_1,X_2,\cdots,X_n$ are independent random variables where each $X_i$ has a Bernoulli distribution with probability of success $p$. Let $q=1-p$. The following is the generating function for each $X_i$.

$\text{ }$

$. \ \ \ \ \ \ g(z)=q+p z$

$\text{ }$

Then the generating function of the sum $X=X_1+\cdots+X_n$ is $g(z)^n=(q+p z)^n$. The following is the binomial expansion:

$\text{ }$

\displaystyle \begin{aligned}(8) \ \ \ \ \ \ g(z)^n&=(q+p z)^n \\&\text{ } \\&=\sum \limits_{j=0}^n \binom{n}{j} q^{n-j} \ p^j \ z^j \end{aligned}

$\text{ }$

By definition $(2)$, the generating function of $X=X_1+\cdots+X_n$ is:

$\text{ }$

$\text{ }$

$(9) \ \ \ \ \ \ g(z)^n=\sum \limits_{j=0}^\infty P(X=j) \ z^j$

$\text{ }$

Comparing $(8)$ and $(9)$, we have

$\displaystyle (10) \ \ \ \ \ \ P(X=j)=\left\{\begin{matrix}\displaystyle \binom{n}{j} p^j \ q^{n-j}&\ 0 \le j \le n\\{0}&\ j>n \end{matrix}\right.$

The probability distribution indicated by $(8)$ and $(10)$ is that of a binomial distribution. Since the probability distribution of a random variable is uniquely determined by its generating function, the independent sum of Bernoulli distributions must ave a Binomial distribution.

$\text{ }$

Example 2
Suppose that $X_1,X_2,\cdots,X_n$ are independent and have Poisson distributions with parameters $\alpha_1,\alpha_2,\cdots,\alpha_n$, respectively. Then the independent sum $X=X_1+\cdots+X_n$ has a Poisson distribution with parameter $\alpha=\alpha_1+\cdots+\alpha_n$.

Let $g(z)$ be the generating function of $X=X_1+\cdots+X_n$. For each $i$, the generating function of $X_i$ is $g_i(z)=e^{\alpha_i (z-1)}$. The key to the proof is that the product of the $g_i$ has the same general form as the individual $g_i$.

\displaystyle \begin{aligned}(11) \ \ \ \ \ \ g(z)&=g_1(z) \cdots g_n(z) \\&\text{ } \\&=e^{\alpha_1 (z-1)} \cdots e^{\alpha_n (z-1)} \\&\text{ } \\&=e^{(\alpha_1+\cdots+\alpha_n)(z-1)} \end{aligned}

The generating function in $(11)$ is that of a Poisson distribution with mean $\alpha=\alpha_1+\cdots+\alpha_n$. Since the generating function uniquely determines the distribution, we can deduce that the sum $X=X_1+\cdots+X_n$ has a Poisson distribution with parameter $\alpha=\alpha_1+\cdots+\alpha_n$.

$\text{ }$

Example 3
In rolling a fair die, let $X$ be the number shown on the up face. The associated generating function is:

$\displaystyle. \ \ \ \ \ \ g(z)=\frac{1}{6}(z+z^2+z^3+z^4+z^5+z^6)=\frac{z(1-z^6)}{6(1-z)}$

The generating function can be further reduced as:

\displaystyle \begin{aligned}. \ \ \ \ \ \ g(z)&=\frac{z(1-z^6)}{6(1-z)} \\&\text{ } \\&=\frac{z(1-z^3)(1+z^3)}{6(1-z)} \\&\text{ } \\&=\frac{z(1-z)(1+z+z^2)(1+z^3)}{6(1-z)} \\&\text{ } \\&=\frac{z(1+z+z^2)(1+z^3)}{6} \end{aligned}

Suppose that we roll the fair dice 4 times. Let $W$ be the sum of the 4 rolls. Then the generating function of $Z$ is

$\displaystyle. \ \ \ \ \ \ g(z)^4=\frac{z^4 (1+z^3)^4 (1+z+z^2)^4}{6^4}$

The random variable $W$ ranges from 4 to 24. Thus the probability function ranges from $P(W=4)$ to $P(W=24)$. To find these probabilities, we simply need to decode the generating function $g(z)^4$. For example, to find $P(W=12)$, we need to find the coefficient of the term $z^{12}$ in the polynomial $g(z)^4$. To help this decoding, we can expand two of the polynomials in $g(z)^4$.

\displaystyle \begin{aligned}. \ \ \ \ \ \ g(z)^4&=\frac{z^4 (1+z^3)^4 (1+z+z^2)^4}{6^4} \\&\text{ } \\&=\frac{z^4 \times A \times B}{6^4} \\&\text{ } \\&A=(1+z^3)^4=1+4z^3+6z^6+4z^9+z^{12} \\&\text{ } \\&B=(1+z+z^2)^4=1+4z+10z^2+16z^3+19z^4+16z^5+10z^6+4z^7+z^8 \end{aligned}

Based on the above polynomials, there are three ways of forming $z^{12}$. They are: $(z^4 \times 1 \times z^8)$, $(z^4 \times 4z^3 \times 16z^5)$, $(z^4 \times 6z^6 \times 10z^2)$. Thus we have:

$\displaystyle. \ \ \ \ \ \ P(W=12)=\frac{1}{6^4}(1+4 \times 16+6 \times 10)=\frac{125}{6^4}$

To find the other probabilities, we can follow the same decoding process.

————————————————————————————————————
Remark
The probability distribution of a random variable is uniquely determined by its generating function. This fundamental property is useful in determining the distribution of an independent sum. The generating function of the independent sum is simply the product of the individual generating functions. If the product is of a certain distributional form (as in Example 1 and Example 2), then we can deduce that the sum must be of the same distribution.

We can also decode the product of generating functions to obtain the probability function of the independent sum (as in Example 3). The method in Example 3 is quite tedious. But one advantage is that it is a “machine process”, a pretty fool proof process that can be performed mechanically.

The machine process is this: Code the individual probability distribution in a generating function $g(z)$. Then raise it to $n$. After performing some manipulation to $g(z)^n$, decode the probabilities from $g(z)^n$.

As long as we can perform the algebraic manipulation carefully and correctly, this process will be sure to provide the probability distribution of an independent sum.

————————————————————————————————————
The Moment Generating Function
The moment generating function of a random variable $X$ is $M_X(t)=E(e^{tX})$ on all real numbers $t$ for which the expected value exists. The moments can be computed more directly using an mgf. From the theory of mathematical analysis, it can be shown that if $M_X(t)$ exists on some interval $-a, then the derivatives of $M_X(t)$ of all orders exist at $t=0$. Furthermore, it can be show that $E(X^n)=M_X^{(n)}(0)$.

Suppose that $g(z)$ is the generating function of a random variable. The following relates the generating function and the moment generating function.

\displaystyle \begin{aligned}. \ \ \ \ \ \ &M_X(t)=g(e^t) \\&\text{ } \\&g(z)=M_X(ln z) \end{aligned}

————————————————————————————————————

Reference

1. Feller W. An Introduction to Probability Theory and Its Applications, Third Edition, John Wiley & Sons, New York, 1968