# Splitting a Poisson Distribution

We consider a remarkable property of the Poisson distribution that has a connection to the multinomial distribution. We start with the following examples.

Example 1
Suppose that the arrivals of customers in a gift shop at an airport follow a Poisson distribution with a mean of $\alpha=5$ per 10 minutes. Furthermore, suppose that each arrival can be classified into one of three distinct types – type 1 (no purchase), type 2 (purchase under $20), and type 3 (purchase over$20). Records show that about 25% of the customers are of type 1. The percentages of type 2 and type 3 are 60% and 15%, respectively. What is the probability distribution of the number of customers per hour of each type?

Example 2
Roll a fair die $N$ times where $N$ is random and follows a Poisson distribution with parameter $\alpha$. For each $i=1,2,3,4,5,6$, let $N_i$ be the number of times the upside of the die is $i$. What is the probability distribution of each $N_i$? What is the joint distribution of $N_1,N_2,N_3,N_4,N_5,N_6$?

In Example 1, the stream of customers arrive according to a Poisson distribution. It can be shown that the stream of each type of customers also has a Poisson distribution. One way to view this example is that we can split the Poisson distribution into three Poisson distributions.

Example 2 also describes a splitting process, i.e. splitting a Poisson variable into 6 different Poisson variables. We can also view Example 2 as a multinomial distribution where the number of trials is not fixed but is random and follows a Poisson distribution. If the number of rolls of the die is fixed in Example 2 (say 10), then each $N_i$ would be a binomial distribution. Yet, with the number of trials being Poisson, each $N_i$ has a Poisson distribution with mean $\displaystyle \frac{\alpha}{6}$. In this post, we describe this Poisson splitting process in terms of a “random” multinomial distribution (the view point of Example 2).

________________________________________________________________________

Suppose we have a multinomial experiment with parameters $N$, $r$, $p_1, \cdots, p_r$, where

• $N$ is the number of multinomial trials,
• $r$ is the number of distinct possible outcomes in each trial (type 1 through type $r$),
• the $p_i$ are the probabilities of the $r$ possible outcomes in each trial.

Suppose that $N$ follows a Poisson distribution with parameter $\alpha$. For each $i=1, \cdots, r$, let $N_i$ be the number of occurrences of the $i^{th}$ type of outcomes in the $N$ trials. Then $N_1,N_2,\cdots,N_r$ are mutually independent Poisson random variables with parameters $\alpha p_1,\alpha p_2,\cdots,\alpha p_r$, respectively.

The variables $N_1,N_2,\cdots,N_r$ have a multinomial distribution and their joint probability function is:

$\displaystyle (1) \ \ \ \ P(N_1=n_1,N_2=n_2,\cdots,N_r=n_r)=\frac{N!}{n_1! n_2! \cdots n_r!} \ p_1^{n_1} p_2^{n_2} \cdots p_r^{n_r}$

where $n_i$ are nonnegative integers such that $N=n_1+n_2+\cdots+n_r$.

Since the total number of multinomial trials $N$ is not fixed and is random, $(1)$ is not the end of the story. The following is the joint probability function of $N_1,N_2,\cdots,N_r$:

\displaystyle \begin{aligned}(2) \ \ \ \ P(N_1=n_1,N_2=n_2,\cdots,N_r=n_r)&=P(N_1=n_1,N_2=n_2,\cdots,N_r=n_r \lvert N=\sum \limits_{k=0}^r n_k) \\&\ \ \ \ \ \times P(N=\sum \limits_{k=0}^r n_k) \\&\text{ } \\&=\frac{(\sum \limits_{k=0}^r n_k)!}{n_1! \ n_2! \ \cdots \ n_r!} \ p_1^{n_1} \ p_2^{n_2} \ \cdots \ p_r^{n_r} \ \times \frac{e^{-\alpha} \alpha^{\sum \limits_{k=0}^r n_k}}{(\sum \limits_{k=0}^r n_k)!} \\&\text{ } \\&=\frac{e^{-\alpha p_1} \ (\alpha p_1)^{n_1}}{n_1!} \ \frac{e^{-\alpha p_2} \ (\alpha p_2)^{n_2}}{n_2!} \ \cdots \ \frac{e^{-\alpha p_r} \ (\alpha p_r)^{n_r}}{n_r!} \end{aligned}

To obtain the marginal probability function of $N_j$, $j=1,2,\cdots,r$, we sum out the other variables $N_k=n_k$ ($k \ne j$) in $(2)$ and obtain the following:

$\displaystyle (3) \ \ \ \ P(N_j=n_j)=\frac{e^{-\alpha p_j} \ (\alpha p_j)^{n_j}}{n_j!}$

Thus we can conclude that $N_j$, $j=1,2,\cdots,r$, has a Poisson distribution with parameter $\alpha p_j$. Furrthermore, the joint probability function of $N_1,N_2,\cdots,N_r$ is the product of the marginal probability functions. Thus we can conclude that $N_1,N_2,\cdots,N_r$ are mutually independent.

________________________________________________________________________
Example 1
Let $N_1,N_2,N_3$ be the number of customers per hour of type 1, type 2, and type 3, respectively. Here, we attempt to split a Poisson distribution with mean 30 per hour (based on 5 per 10 minutes). Thus $N_1,N_2,N_3$ are mutually independent Poisson variables with means $30 \times 0.25=7.5$, $30 \times 0.60=18$, $30 \times 0.15=4.5$, respectively.

Example 2
As indicated earlier, each $N_i$, $i=1,2,3,4,5,6$, has a Poisson distribution with mean $\frac{\alpha}{6}$. According to $(2)$, the joint probability function of $N_1,N_2,N_3,N_4,N_5,N_6$ is simply the product of the six marginal Poisson probability functions.

# The variance of a mixture

Suppose $X$ is a mixture distribution that is the result of mixing a family of conditional distributions indexed by a parameter random variable $\Theta$. The uncertainty in the parameter variable $\Theta$ has the effect of increasing the unconditional variance of the mixture $X$. Thus, $Var(X)$ is not simply the weighted average of the conditional variance $Var(X \lvert \Theta)$. The unconditional variance $Var(X)$ is the sum of two components. They are:

$\displaystyle Var(X)=E[Var(X \lvert \Theta)]+Var[E(X \lvert \Theta)]$

The above relationship is called the law of total variance, which is the proper way of computing the unconditional variance $Var(X)$. The first component $E[Var(X \lvert \Theta)]$ is called the expected value of conditional variances, which is the weighted average of the conditional variances. The second component $Var[E(X \lvert \Theta)]$ is called the variance of the conditional means, which represents the additional variance as a result of the uncertainty in the parameter $\Theta$.

We use an example of a two-point mixture to illustrate the law of total variance. The example is followed by a proof of the total law of variance.

Example
Let $U$ be the uniform distribution on the unit interval $(0, 1)$. Suppose that a large population of insureds is composed of “high risk” and “low risk” individuals. The proportion of insured classified as “low risk” is $p$ where $0. The random loss amount $X$ of a “low risk” insured is $U$. The random loss amount $X$ of a “high risk” insured is $U$ shifted by a positive constant $w>0$, i.e. $w+U$. What is the variance of the loss amount of an insured randomly selected from this population?

For convenience, we use $\Theta$ as a parameter to indicate the risk class ($\Theta=1$ is “low risk” and $\Theta=2$ is “high risk”). The following shows the relevant conditional distributional quantities of $X$.

$\displaystyle E(X \lvert \Theta=1)=\frac{1}{2} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ E(X \lvert \Theta=2)=w+\frac{1}{2}$

$\displaystyle Var(X \lvert \Theta=1)=\frac{1}{12} \ \ \ \ \ \ \ \ \ \ \ \ Var(X \lvert \Theta=2)=\frac{1}{12}$

The unconditional mean loss is the weighted average of the conditional mean loss amounts. However, the same idea does not work for variance.

\displaystyle \begin{aligned}E[X]&=p \times E(X \lvert \Theta=1)+(1-p) \times E(X \lvert \Theta=2) \\&=p \times \frac{1}{2}+(1-p) \times (w+\frac{1}{2}) \\&=\frac{1}{2}+(1-p) \ w \end{aligned}

\displaystyle \begin{aligned}Var[X]&\ne p \times Var(X \lvert \Theta=1)+(1-p) \times Var(X \lvert \Theta=2)=\frac{1}{12} \end{aligned}

The conditional variance is the same for both risk classes since the “high risk” loss is a shifted distribution of the “low risk” loss. However, the unconditional variance is more than $\frac{1}{12}$ since the mean loss for the two casses are different (heterogeneous risks across the classes). The uncertainty in the risk classes (i.e. uncertainty in the parameter $\Theta$) introduces additional variance in the loss for a randomly selected insured. The unconditional variance $Var(X)$ is the sum of the following two components:

\displaystyle \begin{aligned}E[Var(X \lvert \Theta)]&=p \times Var(X \lvert \Theta=1)+(1-p) \times Var(X \lvert \Theta=2) \\&=p \times \frac{1}{12}+(1-p) \times \frac{1}{12} \\&=\frac{1}{12} \end{aligned}

\displaystyle \begin{aligned}Var[E(X \lvert \Theta)]&=p \times E(X \lvert \Theta=1)^2+(1-p) \times E(X \lvert \Theta=2)^2 \\&\ \ \ -\biggl(p \times E(X \lvert \Theta=1)+(1-p) \times E(X \lvert \Theta=2)\biggr)^2 \\&=p \times \biggl(\frac{1}{2}\biggr)^2+(1-p) \times \biggl(w+\frac{1}{2}\biggr)^2 \\&\ \ \ -\biggl(p \times \frac{1}{2}+(1-p) \times (w+\frac{1}{2})\biggr)^2 \\&=p \ (1-p) \ w^2 \end{aligned}

\displaystyle \begin{aligned}Var(X)&=E[Var(X \lvert \Theta)]+Var[E(X \lvert \Theta)] \\&=\frac{1}{12}+p \ (1-p) \ w^2 \end{aligned}

The additional variance is in the amount of $p(1-p)w^2$. This is the variance of the conditional means of the risk classes. Note that $w$ is the additional mean loss for a “high risk” insured. The higher the additional mean loss $w$, the more heterogeneous in risk between the two classes, hence the larger the dispersion in unconditional loss.

The total law of variance gives the unconditional variance of a random variable $X$ that is indexed by another random variable $\Theta$. The unconditional variance of $X$ is the sum of two components, namely, the expected value of conditional variances and the variance of the conditional means. The formula is:

$\displaystyle Var(X)=E[Var(X \lvert \Theta)]+Var[E(X \lvert \Theta)]$

The following is the derivation of the formula:

\displaystyle \begin{aligned}Var[X]&=E[X^2]-E[X]^2 \\&=E[E(X^2 \lvert \Theta)]-E[E(X \lvert \Theta)]^2 \\&=E\left\{Var(X \lvert \Theta)+E(X \lvert \Theta)^2 \right\}-E[E(X \lvert \Theta)]^2 \\&=E[Var(X \lvert \Theta)]+E[E(X \lvert \Theta)^2]-E[E(X \lvert \Theta)]^2 \\&=E[Var(X \lvert \Theta)]+Var[E(X \lvert \Theta)] \end{aligned}

See this blog post for practice problems on mixture distributions.

Reference

1. Klugman S.A., Panjer H. H., Wilmot G. E. Loss Models, From Data to Decisions, Second Edition., Wiley-Interscience, a John Wiley & Sons, Inc., New York, 2004

# An example of a mixture

We use an example to motivate the definition of a mixture distribution.

Example 1

Suppose that the loss arising from an insured randomly selected from a large group of insureds follow an exponential distribution with probability density function (pdf) $f_X(x)=\theta e^{-\theta x}$, $x>0$, where $\theta$ is a parameter that is a positive constant. The mean claim cost for this randomly selected insured is $\frac{1}{\theta}$. So the parameter $\theta$ reflects the risk characteristics of the insured. Since the population of insureds is large, there is uncertainty in the parameter $\theta$. It is more appropriate to regard $\theta$ as a random variable in order to capture the wide range of risk characteristics across the individuals in the population. As a result, the pdf indicated above is not an unconditional pdf, but, rather, a conditional pdf of $X$. The below pdf is conditional on a realized value of the random variable $\Theta$.

$\displaystyle f_{X \lvert \Theta}(x \lvert \theta)=\theta e^{-\theta x}, \ \ \ \ \ x>0$

What about the marginal (unconditional) pdf of $X$? Let’s assume that the pdf of $\Theta$ is given by $\displaystyle f_\Theta(\theta)=\frac{1}{2} \ \theta^2 \ e^{-\theta}$. Then the unconditional pdf of $X$ is the weighted average of the conditional pdf.

\displaystyle \begin{aligned}f_X(x)&=\int_0^{\infty} f_{X \lvert \Theta}(x \lvert \theta) \ f_\Theta(\theta) \ d \theta \\&=\int_0^{\infty} \biggl[\theta \ e^{-\theta x}\biggr] \ \biggl[\frac{1}{2} \ \theta^2 \ e^{-\theta}\biggr] \ d \theta \\&=\int_0^{\infty} \frac{1}{2} \ \theta^3 \ e^{-\theta(x+1)} \ d \theta \\&=\frac{1}{2} \frac{6}{(x+1)^4} \int_0^{\infty} \frac{(x+1)^4}{3!} \ \theta^{4-1} \ e^{-\theta(x+1)} \ d \theta \\&=\frac{3}{(x+1)^4} \end{aligned}

Several other distributional quantities are also weighted averages, which include the unconditional mean, and the second moment.

\displaystyle \begin{aligned}E(X)&=\int_0^{\infty} E(X \lvert \Theta=\theta) \ f_\Theta(\theta) \ d \theta \\&=\int_0^{\infty} \biggl[\frac{1}{\theta} \biggr] \ \biggl[\frac{1}{2} \ \theta^2 \ e^{-\theta}\biggr] \ d \theta \\&=\int_0^{\infty} \frac{1}{2} \ \theta \ e^{-\theta} \ d \theta \\&=\frac{1}{2} \end{aligned}

\displaystyle \begin{aligned}E(X^2)&=\int_0^{\infty} E(X^2 \lvert \Theta=\theta) \ f_\Theta(\theta) \ d \theta \\&=\int_0^{\infty} \biggl[\frac{2}{\theta^2} \biggr] \ \biggl[\frac{1}{2} \ \theta^2 \ e^{-\theta}\biggr] \ d \theta \\&=\int_0^{\infty} e^{-\theta} \ d \theta \\&=1 \end{aligned}

As a result, the unconditional variance is $Var(X)=1-\frac{1}{4}=\frac{3}{4}$. Note that the unconditional variance is not the weighted average of the conditional variance. The weighted average of the conditional variance only produces $\frac{1}{2}$.

\displaystyle \begin{aligned}E[Var(X \lvert \Theta)]&=\int_0^{\infty} Var(X \lvert \Theta=\theta) \ f_\Theta(\theta) \ d \theta \\&=\int_0^{\infty} \biggl[\frac{1}{\theta^2} \biggr] \ \biggl[\frac{1}{2} \ \theta^2 \ e^{-\theta}\biggr] \ d \theta \\&=\int_0^{\infty} \frac{1}{2} \ e^{-\theta} \ d \theta \\&=\frac{1}{2} \end{aligned}

It turns out that the unconditional variance has two components, the expected value of the conditional variances and the variance of the conditional means. In this example, the former is $\frac{1}{2}$ and the latter is $\frac{1}{4}$. The additional variance in the amount of $\frac{1}{4}$ is a reflection that there is uncertainty in the parameter $\theta$.

\displaystyle \begin{aligned}Var(X)&=E[Var(X \lvert \Theta)]+Var[E(X \lvert \Theta)] \\&=\frac{1}{2}+\frac{1}{4}\\&=\frac{3}{4} \end{aligned}

——————————————————————————————————————-

The Definition of Mixture
The unconditional pdf $f_X(x)$ derived in Example 1 is that of a Pareto distribution. Thus the Pareto distribution is a continuous mixture of exponential distributions with Gamma mixing weights.

Mathematically speaking, a mixture arises when a probability density function $f(x \lvert \theta)$ depends on a parameter $\theta$ that is uncertain and is itself a random variable with density $g(\theta)$. Then taking the weighted average of $f(x \lvert \theta)$ with $g(\theta)$ as weight produces the mixture distribution.

A continuous random variable $X$ is said to be a mixture if its probability density function $f_X(x)$ is a weighted average of a family of probability density functions $f(x \lvert \theta)$. The random variable $\Theta$ is said to be the mixing random variable and its pdf $g(\theta)$ is said to be the mixing weight. An equivalent definition of mixture is that the distribution function $F_X(x)$ is a weighted average of a family of distribution functions indexed by a mixing variable. Thus $X$ is a mixture if one of the following holds.

$\displaystyle f_X(x)=\int_{-\infty}^{\infty} f(x \lvert \theta) \ g(\theta) \ d \theta$

$\displaystyle F_X(x)=\int_{-\infty}^{\infty} F(x \lvert \theta) \ g(\theta) \ d \theta$

Similarly, a discrete random variable is a mixture if its probability function (or distribution function) is a weighted sum of a family of probability functions (or distribution functions). Thus $X$ is a mixture if one of the following holds.

$\displaystyle P(X=x)=\sum \limits_{y} P(X=x \lvert Y=y) \ P(Y=y)$

$\displaystyle P(X \le x)=\sum \limits_{y} P(X \le x \lvert Y=y) \ P(Y=y)$

See this blog post for practice problems on mixture distributions.

Reference

1. Klugman S.A., Panjer H. H., Wilmot G. E. Loss Models, From Data to Decisions, Second Edition., Wiley-Interscience, a John Wiley & Sons, Inc., New York, 2004

# A basic look at joint distributions

This is a discussion of how to work with joint distributions of two random variables. We limit the discussion on continuous random variables. The discussion of the discrete case is similar (for the most part replacing the integral signs with summation signs). Suppose $X$ and $Y$ are continuous random variables where $f_{X,Y}(x,y)$ is the joint probability density function. What this means is that $f_{X,Y}(x,y)$ satisfies the following two properties:

• for each point $(x,y)$ in the Euclidean plane, $f_{X,Y}(x,y)$ is a nonnegative real number,
• $\displaystyle \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_{X,Y}(x,y) \ dx \ dy=1$.

Because of the second bullet point, the function $f_{X,Y}(x,y)$ must be an integrable function. We will not overly focus on this point and instead be satisfied with knowing that it is possible to integrate $f_{X,Y}(x,y)$ over the entire $xy$ plane and its many reasonable subregions.

Another way to think about $f_{X,Y}(x,y)$ is that it assigns the density to each point in the $xy$ plane (i.e. it tells us how much weight is assigned to each point). Consequently, if we want to know the probability that $(X,Y)$ falls in the region $A$, we simply evaluate the following integral:

$\displaystyle \int_{A} f_{X,Y}(x,y) \ dx \ dy$.

For instance, to find $P(X and $P(X+Y \le z)$, where $z>0$, we evaluate the integral over the regions $x and $x+y \le z$, respectively. The integrals are:

$\displaystyle P(X

$\displaystyle P(X+Y \le z)=\int_{-\infty}^{\infty} \int_{-\infty}^{x} f_{X,Y}(x,y) \ dy \ dx$

Note that $P(X+Y \le z)$ is the distribution function $F_Z(z)=P(X+Y \le z)$ where $Z=X+Y$. Then the pdf of $Z$ is obtained by differentiation, i.e. $f_Z(z)=F_Z^{'}(z)$.

In practice, all integrals involving the density functions need be taken only over those $x$ and $y$ values where the density is positive.

——————————————————————————————————————–

Marginal Density

The joint density function $f_{X,Y}(x,y)$ describes how the two variables behave in relation to one another. The marginal probability density function (marginal pdf) is of interest if we are only concerned in one of the variables. To obtain the marginal pdf of $X$, we simply integrate $f_{X,Y}(x,y)$ and sum out the other variable. The following integral produces the marginal pdf of $X$:

$\displaystyle f_X(x)=\int_{-\infty}^{\infty} f_{X,Y}(x,y) \ dy$

The marginal pdf of $X$ is obtained by summing all the density along the vertical line that meets the $x$ axis at the point $(x,0)$ (see Figure 1). Thus $f_X(x)$ represents the sum total of all density $f_{X,Y}(x,y)$ along a vertical line.

Obviously, if we find the marginal pdf for each vertical line and sum all the marginal pdfs, the result will be 1.0. Thus $f_X(x)$ can be regarded as a single-variable pdf.

\displaystyle \begin{aligned}\int_{-\infty}^{\infty}f_X(x) \ dx&=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_{X,Y}(x,y) \ dy \ dx=1 \\&\text{ } \end{aligned}

The same can be said for the marginal pdf of the other variable $Y$, except that $f_Y(y)$ is the sum (integral in this case) of all the density on a horizontal line that meets the $y$ axis at the point $(0,y)$.

$\displaystyle f_Y(y)=\int_{-\infty}^{\infty} f_{X,Y}(x,y) \ dx$

——————————————————————————————————————–

Example 1

Let $X$ and $Y$ be jointly distributed according to the following pdf:

$\displaystyle f_{X,Y}(x,y)=y^2 \ e^{-y(x+1)}, \text{ where } x>0,y>0$

The following derives the marginal pdfs for $X$ and $Y$:

\displaystyle \begin{aligned}f_X(x)&=\int_0^{\infty} y^2 \ e^{-y(x+1)} \ dy \\&\text{ } \\&=\frac{2}{(x+1)^3} \int_0^{\infty} \frac{(x+1)^3}{2!} y^{3-1} \ e^{-y(x+1)} \ dy \\&\text{ } \\&=\frac{2}{(x+1)^3} \end{aligned}

\displaystyle \begin{aligned}f_Y(y)&=\int_0^{\infty} y^2 \ e^{-y(x+1)} \ dx \\&\text{ } \\&=y \ e^{-y} \int_0^{\infty} y \ e^{-y x} \ dx \\&\text{ } \\&=y \ e^{-y} \end{aligned}

In the middle step of the derivation of $f_X(x)$, the integrand is the Gamma pdf with parameters $x+1$ and 3, hence the integral in that step becomes 1. In the middle step for $f_Y(y)$, the integrand is the pdf of an exponential distribution.

——————————————————————————————————————–

Conditional Density

Now consider the joint density $f_{X,Y}(x,y)$ restricted to a vertical line, treating the vertical line as a probability distribution. In essense, we are restricting our focus on one particular realized value of $X$. Given a realized value $x$ of $X$, how do we describe the behavior of the other variable $Y$? Since the marginal pdf $f_X(x)$ is the sum total of all density on a vertical line, we express the conditional density as joint density $f_{X,Y}(x,y)$ as a fraction of $f_X(x)$.

$\displaystyle f_{Y \lvert X}(y \lvert x)=\frac{f_{X,Y}(x,y)}{f_X(x)}$

It is easy to see that $f_{Y \lvert X}(y \lvert x)$ is a probability density function of $Y$. When we already know that $X$ has a realized value, this pdf tells us information about how $Y$ behaves. Thus this pdf is called the conditional pdf of $Y$ given $X=x$.

Given a realized value $x$ of $X$, we may want to know the conditional mean and the higher moments of $Y$.

$\displaystyle E(Y \lvert X=x)=\int_{-\infty}^{\infty} y \ f_{Y \lvert X}(y \lvert x) \ dy$

$\displaystyle E(Y^n \lvert X=x)=\int_{-\infty}^{\infty} y^n \ f_{Y \lvert X}(y \lvert x) \ dy \text{ where } n>1$

In particular, the conditional variance of $Y$ is:

$\displaystyle Var(Y \lvert X=x)=E(Y^2 \lvert X=x)-E(Y \lvert X=x)^2$

The discussion for the conditional density of $X$ given a realized value $y$ of $Y$ is similar, except that we restrict the joint density $f_{X,Y}(x,y)$ on a horizontal line. We have the following information about the conditional distribution of $X$ given a realized value $Y=y$.

$\displaystyle f_{X \lvert Y}(x \lvert y)=\frac{f_{X,Y}(x,y)}{f_Y(y)}$

$\displaystyle E(X \lvert Y=y)=\int_{-\infty}^{\infty} x \ f_{X \lvert Y}(x \lvert y) \ dx$

$\displaystyle E(X^n \lvert Y=y)=\int_{-\infty}^{\infty} x^n \ f_{X \lvert Y}(x \lvert y) \ dx \text{ where } n>1$

In particular, the conditional variance of $X$ is:

$\displaystyle Var(X \lvert Y=y)=E(X^2 \lvert Y=y)-E(X \lvert Y=y)^2$

——————————————————————————————————————–

Example 1 (Continued)

The following derives the conditional density functions:

\displaystyle \begin{aligned}f_{Y \lvert X}(y \lvert x)&=\frac{f_{X,Y}(x,y)}{f_X(x)} \\&\text{ } \\&=\displaystyle \frac{y^2 e^{-y(x+1)}}{\frac{2}{(x+1)^3}} \\&\text{ } \\&=\frac{(x+1)^3}{2!} \ y^2 \ e^{-y(x+1)} \end{aligned}

\displaystyle \begin{aligned}f_{X \lvert Y}(x \lvert y)&=\frac{f_{X,Y}(x,y)}{f_Y(y)} \\&\text{ } \\&=\displaystyle \frac{y^2 e^{-y(x+1)}}{y \ e^{-y}} \\&\text{ } \\&=y \ e^{-y \ x} \end{aligned}

The conditional density $f_{Y \lvert X}(y \lvert x)$ is that of a Gamma distribution with parameters $x+1$ and 3. So given a realized value $x$ of $X$, $Y$ has a Gamma distribution whose scale parameter is $x+1$ and whose shape parameter is 3. On the other hand, the conditional density $f_{X \lvert Y}(x \lvert y)$ is that of an exponential distribution. Given a realized value $y$ of $Y$, $X$ has an exponential distribution with parameter $y$. Since the conditional distributions are familiar parametric distributions, we have the following conditional means and conditional variances.

$\displaystyle E(Y \lvert X=x)=\frac{3}{x+1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ Var(Y \lvert X=x)=\frac{3}{(x+1)^2}$

$\displaystyle E(X \lvert Y=y)=\frac{1}{y} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Var(X \lvert Y=y)=\frac{1}{y^2}$

Note that both conditional means are decreasing functions. The larger the realized value of $X$, the smaller the mean $E(Y \lvert X=x)$. Likewise, the larger the realized value of $Y$, the smaller the mean $E(X \lvert Y=y)$. It appears that $X$ and $Y$ moves opposite of each other. This is also confirmed by the fact that $Cov(X,Y)=-1$.
——————————————————————————————————————–

Mixture Distributions

In the preceding discussion, the conditional distributions are derived from the joint distributions and the marginal distributions. In some applications, it is the opposite: we know the conditional distribution of one variable given the other variable and construct the joint distributions. We have the following:

\displaystyle \begin{aligned}f_{X,Y}(x,y)&=f_{Y \lvert X}(y \lvert x) \ f_X(x) \\&\text{ } \\&=f_{X \lvert Y}(x \lvert y) \ f_Y(y) \end{aligned}

The form of the joint pdf indicated above has an interesting interpretation as a mixture. Using an insurance example, suppose that $f_{X \lvert Y}(x \lvert y)$ is a model of the claim cost of a randomly selected insured where $y$ is a realized value of a parameter $Y$ that is to indicate the risk characteristics of an insured. The members of this large population have a wide variety of risk characteristics and the random variable $Y$ is to capture the risk charateristics across the entire population. Consequently, the unconditional claim cost for a randomly selected insured is:

$\displaystyle f_X(x)=\int_{-\infty}^{\infty} f_{X \lvert Y}(x \lvert y) \ f_Y(y) \ dy$

Note that the above unconditional pdf $f_X(x)$ is a weighted average of conditional pdfs. Thus the distribution derived in this manner is called a mixture distribution. The pdf $f_Y(y)$ is called the mixture weight or mixing weight. Some distributional quantities of a mixture distribution are also the weighted average of the conditional counterpart. These include the distribution function, mean, higher moments. Thus we have;

$\displaystyle F_X(x)=\int_{-\infty}^{\infty} F_{X \lvert Y}(x \lvert y) \ f_Y(y) \ dy$

$\displaystyle E(X)=\int_{-\infty}^{\infty} E(X \lvert Y=y) \ f_Y(y) \ dy$

$\displaystyle E(X^k)=\int_{-\infty}^{\infty} E(X^k \lvert Y=y) \ f_Y(y) \ dy$

In the above derivations, the cumulative distribution function $F_X(x)$ and the moments of $E(X^k)$ are weighted averages of their conditional counterparts. However, the variance $Var(X)$ cannot be the weighted average of conditional variances. To find out why, see the post The variance of a mixture.