A basic look at joint distributions

This is a discussion of how to work with joint distributions of two random variables. We limit the discussion on continuous random variables. The discussion of the discrete case is similar (for the most part replacing the integral signs with summation signs). Suppose X and Y are continuous random variables where f_{X,Y}(x,y) is the joint probability density function. What this means is that f_{X,Y}(x,y) satisfies the following two properties:

  • for each point (x,y) in the Euclidean plane, f_{X,Y}(x,y) is a nonnegative real number,
  • \displaystyle \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_{X,Y}(x,y) \ dx \ dy=1.

Because of the second bullet point, the function f_{X,Y}(x,y) must be an integrable function. We will not overly focus on this point and instead be satisfied with knowing that it is possible to integrate f_{X,Y}(x,y) over the entire xy plane and its many reasonable subregions.

Another way to think about f_{X,Y}(x,y) is that it assigns the density to each point in the xy plane (i.e. it tells us how much weight is assigned to each point). Consequently, if we want to know the probability that (X,Y) falls in the region A, we simply evaluate the following integral:

    \displaystyle \int_{A} f_{X,Y}(x,y) \ dx \ dy.

For instance, to find P(X<Y) and P(X+Y \le z), where z>0, we evaluate the integral over the regions x<y and x+y \le z, respectively. The integrals are:

    \displaystyle P(X<Y)=\int_{-\infty}^{\infty} \int_{x}^{\infty} f_{X,Y}(x,y) \ dy \ dx

    \displaystyle P(X+Y \le z)=\int_{-\infty}^{\infty} \int_{-\infty}^{x} f_{X,Y}(x,y) \ dy \ dx

Note that P(X+Y \le z) is the distribution function F_Z(z)=P(X+Y \le z) where Z=X+Y. Then the pdf of Z is obtained by differentiation, i.e. f_Z(z)=F_Z^{'}(z).

In practice, all integrals involving the density functions need be taken only over those x and y values where the density is positive.


Marginal Density

The joint density function f_{X,Y}(x,y) describes how the two variables behave in relation to one another. The marginal probability density function (marginal pdf) is of interest if we are only concerned in one of the variables. To obtain the marginal pdf of X, we simply integrate f_{X,Y}(x,y) and sum out the other variable. The following integral produces the marginal pdf of X:

    \displaystyle f_X(x)=\int_{-\infty}^{\infty} f_{X,Y}(x,y) \ dy

The marginal pdf of X is obtained by summing all the density along the vertical line that meets the x axis at the point (x,0) (see Figure 1). Thus f_X(x) represents the sum total of all density f_{X,Y}(x,y) along a vertical line.

Obviously, if we find the marginal pdf for each vertical line and sum all the marginal pdfs, the result will be 1.0. Thus f_X(x) can be regarded as a single-variable pdf.

    \displaystyle \begin{aligned}\int_{-\infty}^{\infty}f_X(x) \ dx&=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_{X,Y}(x,y) \ dy \ dx=1 \\&\text{ } \end{aligned}

The same can be said for the marginal pdf of the other variable Y, except that f_Y(y) is the sum (integral in this case) of all the density on a horizontal line that meets the y axis at the point (0,y).

    \displaystyle f_Y(y)=\int_{-\infty}^{\infty} f_{X,Y}(x,y) \ dx


Example 1

Let X and Y be jointly distributed according to the following pdf:

    \displaystyle f_{X,Y}(x,y)=y^2 \ e^{-y(x+1)}, \text{ where } x>0,y>0

The following derives the marginal pdfs for X and Y:

    \displaystyle \begin{aligned}f_X(x)&=\int_0^{\infty} y^2 \ e^{-y(x+1)} \ dy \\&\text{ } \\&=\frac{2}{(x+1)^3} \int_0^{\infty} \frac{(x+1)^3}{2!} y^{3-1} \ e^{-y(x+1)} \ dy \\&\text{ } \\&=\frac{2}{(x+1)^3} \end{aligned}

    \displaystyle \begin{aligned}f_Y(y)&=\int_0^{\infty} y^2 \ e^{-y(x+1)} \ dx \\&\text{ } \\&=y \ e^{-y} \int_0^{\infty} y \ e^{-y x} \ dx \\&\text{ } \\&=y \ e^{-y} \end{aligned}

In the middle step of the derivation of f_X(x), the integrand is the Gamma pdf with parameters x+1 and 3, hence the integral in that step becomes 1. In the middle step for f_Y(y), the integrand is the pdf of an exponential distribution.


Conditional Density

Now consider the joint density f_{X,Y}(x,y) restricted to a vertical line, treating the vertical line as a probability distribution. In essense, we are restricting our focus on one particular realized value of X. Given a realized value x of X, how do we describe the behavior of the other variable Y? Since the marginal pdf f_X(x) is the sum total of all density on a vertical line, we express the conditional density as joint density f_{X,Y}(x,y) as a fraction of f_X(x).

    \displaystyle f_{Y \lvert X}(y \lvert x)=\frac{f_{X,Y}(x,y)}{f_X(x)}

It is easy to see that f_{Y \lvert X}(y \lvert x) is a probability density function of Y. When we already know that X has a realized value, this pdf tells us information about how Y behaves. Thus this pdf is called the conditional pdf of Y given X=x.

Given a realized value x of X, we may want to know the conditional mean and the higher moments of Y.

    \displaystyle E(Y \lvert X=x)=\int_{-\infty}^{\infty} y \ f_{Y \lvert X}(y \lvert x) \ dy

    \displaystyle E(Y^n \lvert X=x)=\int_{-\infty}^{\infty} y^n \ f_{Y \lvert X}(y \lvert x) \ dy \text{ where } n>1

In particular, the conditional variance of Y is:

    \displaystyle Var(Y \lvert X=x)=E(Y^2 \lvert X=x)-E(Y \lvert X=x)^2

The discussion for the conditional density of X given a realized value y of Y is similar, except that we restrict the joint density f_{X,Y}(x,y) on a horizontal line. We have the following information about the conditional distribution of X given a realized value Y=y.

    \displaystyle f_{X \lvert Y}(x \lvert y)=\frac{f_{X,Y}(x,y)}{f_Y(y)}

    \displaystyle E(X \lvert Y=y)=\int_{-\infty}^{\infty} x \ f_{X \lvert Y}(x \lvert y) \ dx

    \displaystyle E(X^n \lvert Y=y)=\int_{-\infty}^{\infty} x^n \ f_{X \lvert Y}(x \lvert y) \ dx \text{ where } n>1

In particular, the conditional variance of X is:

    \displaystyle Var(X \lvert Y=y)=E(X^2 \lvert Y=y)-E(X \lvert Y=y)^2


Example 1 (Continued)

The following derives the conditional density functions:

    \displaystyle \begin{aligned}f_{Y \lvert X}(y \lvert x)&=\frac{f_{X,Y}(x,y)}{f_X(x)} \\&\text{ } \\&=\displaystyle \frac{y^2 e^{-y(x+1)}}{\frac{2}{(x+1)^3}}  \\&\text{ } \\&=\frac{(x+1)^3}{2!} \ y^2 \ e^{-y(x+1)} \end{aligned}

    \displaystyle \begin{aligned}f_{X \lvert Y}(x \lvert y)&=\frac{f_{X,Y}(x,y)}{f_Y(y)} \\&\text{ } \\&=\displaystyle \frac{y^2 e^{-y(x+1)}}{y \ e^{-y}}  \\&\text{ } \\&=y \ e^{-y \ x} \end{aligned}

The conditional density f_{Y \lvert X}(y \lvert x) is that of a Gamma distribution with parameters x+1 and 3. So given a realized value x of X, Y has a Gamma distribution whose scale parameter is x+1 and whose shape parameter is 3. On the other hand, the conditional density f_{X \lvert Y}(x \lvert y) is that of an exponential distribution. Given a realized value y of Y, X has an exponential distribution with parameter y. Since the conditional distributions are familiar parametric distributions, we have the following conditional means and conditional variances.

    \displaystyle E(Y \lvert X=x)=\frac{3}{x+1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ Var(Y \lvert X=x)=\frac{3}{(x+1)^2}

    \displaystyle E(X \lvert Y=y)=\frac{1}{y} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Var(X \lvert Y=y)=\frac{1}{y^2}

Note that both conditional means are decreasing functions. The larger the realized value of X, the smaller the mean E(Y \lvert X=x). Likewise, the larger the realized value of Y, the smaller the mean E(X \lvert Y=y). It appears that X and Y moves opposite of each other. This is also confirmed by the fact that Cov(X,Y)=-1.

Mixture Distributions

In the preceding discussion, the conditional distributions are derived from the joint distributions and the marginal distributions. In some applications, it is the opposite: we know the conditional distribution of one variable given the other variable and construct the joint distributions. We have the following:

    \displaystyle \begin{aligned}f_{X,Y}(x,y)&=f_{Y \lvert X}(y \lvert x) \ f_X(x) \\&\text{ } \\&=f_{X \lvert Y}(x \lvert y) \ f_Y(y) \end{aligned}

The form of the joint pdf indicated above has an interesting interpretation as a mixture. Using an insurance example, suppose that f_{X \lvert Y}(x \lvert y) is a model of the claim cost of a randomly selected insured where y is a realized value of a parameter Y that is to indicate the risk characteristics of an insured. The members of this large population have a wide variety of risk characteristics and the random variable Y is to capture the risk charateristics across the entire population. Consequently, the unconditional claim cost for a randomly selected insured is:

    \displaystyle f_X(x)=\int_{-\infty}^{\infty} f_{X \lvert Y}(x \lvert y) \ f_Y(y) \ dy

Note that the above unconditional pdf f_X(x) is a weighted average of conditional pdfs. Thus the distribution derived in this manner is called a mixture distribution. The pdf f_Y(y) is called the mixture weight or mixing weight. Some distributional quantities of a mixture distribution are also the weighted average of the conditional counterpart. These include the distribution function, mean, higher moments. Thus we have;

    \displaystyle F_X(x)=\int_{-\infty}^{\infty} F_{X \lvert Y}(x \lvert y) \ f_Y(y) \ dy

    \displaystyle E(X)=\int_{-\infty}^{\infty} E(X \lvert Y=y) \ f_Y(y) \ dy

    \displaystyle E(X^k)=\int_{-\infty}^{\infty} E(X^k \lvert Y=y) \ f_Y(y) \ dy

In the above derivations, the cumulative distribution function F_X(x) and the moments of E(X^k) are weighted averages of their conditional counterparts. However, the variance Var(X) cannot be the weighted average of conditional variances. To find out why, see the post The variance of a mixture.