# The Pareto distribution

This post takes a closer look at the Pareto distribution. A previous post demonstrates that the Pareto distribution is a mixture of exponential distributions with Gamma mixing weights. We now elaborate more on this point. Through looking at various properties of the Pareto distribution, we also demonstrate that the Pareto distribution is a heavy tailed distribution. In insurance applications, heavy-tailed distributions are essential tools for modeling extreme loss, especially for the more risky types of insurance such as medical malpractice insurance. In financial applications, the study of heavy-tailed distributions provides information about the potential for financial fiasco or financial ruin. The Pareto distribution is a great way to open up a discussion on heavy-tailed distribution.

$\text{ }$

Update (11/12/2017). This blog post introduces a catalog of many other parametric severity models in addition to Pareto distribution. The link to the catalog is found in that blog post. To go there directly, this is the link.

Update (10/29/2017). This blog post has updated information on Pareto distribution. It also has links to more detailed contents on Pareto distribution in two companion blogs. These links are also given here: more detailed post on Pareto, Pareto Type I and Type II and practice problems on Pareto.

$\text{ }$

The continuous random variable $X$ with positive support is said to have the Pareto distribution if its probability density function is given by

$\displaystyle f_X(x)=\frac{\beta \ \alpha^\beta}{(x+\alpha)^{\beta+1}} \ \ \ \ \ x>0$

where $\alpha>0$ and $\beta>0$ are constant. The constant $\alpha$ is the scale parameter and $\beta$ is the shape parameter. The following lists several other distributional quantities of the Pareto distribution, which will be used in the discussion below.

$\displaystyle S_X(x)=\frac{\alpha^\beta}{(x+\alpha)^\beta}=\biggl(\frac{\alpha}{x+\alpha}\biggr)^\beta \ \ \ \ \ \ \ \ \ \text{survival function}$

$\displaystyle F_X(x)=1-\biggl(\frac{\alpha}{x+\alpha}\biggr)^\beta \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{distribution function}$

$\displaystyle E(X)=\frac{\alpha}{\beta-1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{mean},\beta>1$

$\displaystyle E(X^2)=\frac{2 \alpha^2}{(\beta-1)(\beta-2)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{second momemt},\beta>2$

$\displaystyle Var(X)=\frac{\alpha^2 \beta}{(\beta-1)^2(\beta-2)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{variance},\beta>2$

$\displaystyle E(X^k)=\frac{k! \alpha^k}{(\beta-1) \cdots (\beta-k)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{higher moments},\beta>k, \text{ k positive integer}$

——————————————————————————————————————
The Pareto Distribution as a Mixture
The Pareto pdf indicated above can be obtained by mixing exponential distributions using Gamma distributions as weights. Suppose that $X$ follows an exponential distribution (conditional on a parameter value $\theta$). The following is the conditional pdf of $X$.

$\displaystyle f_{X \lvert \Theta}(x \lvert \theta)=\theta e^{-\theta x} \ \ \ x>0$

There is uncertainty in the parameter, which can be viewed as a random variable $\Theta$. Suppose that $\Theta$ follows a Gamma distribution with scale parameter $\alpha$ and shape parameter $\beta$. The following is the pdf of $\Theta$.

$\displaystyle f_{\Theta}(\theta)=\frac{\alpha^\beta}{\Gamma(\beta)} \ \theta^{\beta-1} \ e^{-\alpha \theta} \ \ \ \theta>0$

The unconditional pdf of $X$ is the weighted average of the conditional pdfs with the Gamma pdf as weight.

\displaystyle \begin{aligned}f_X(x)&=\int_0^{\infty} f_{X \lvert \Theta}(x \lvert \theta) \ f_\Theta(\theta) \ d \theta \\&=\int_0^{\infty} \biggl[\theta \ e^{-\theta x}\biggr] \ \biggl[\frac{\alpha^\beta}{\Gamma(\beta)} \ \theta^{\beta-1} \ e^{-\alpha \theta}\biggr] \ d \theta \\&=\int_0^{\infty} \frac{\alpha^\beta}{\Gamma(\beta)} \ \theta^\beta \ e^{-\theta(x+\alpha)} \ d \theta \\&=\frac{\alpha^\beta}{\Gamma(\beta)} \frac{\Gamma(\beta+1)}{(x+\alpha)^{\beta+1}} \int_0^{\infty} \frac{(x+\alpha)^{\beta+1}}{\Gamma(\beta+1)} \ \theta^{\beta+1-1} \ e^{-\theta(x+\alpha)} \ d \theta \\&=\frac{\beta \ \alpha^\beta}{(x+\alpha)^{\beta+1}} \end{aligned}

In the following discussion, $X$ will denote the Pareto distribution as defined above. As will be shown below, the exponential distribution is considered a light tailed distribution. Yet mixing exponentials produces the heavy tailed Pareto distribution. Mixture distributions tend to heavy tailed (see [1]). The Pareto distribution is a handy example.

——————————————————————————————————————

The Tail Weight of the Pareto Distribution
When a distribution significantly puts more probability on larger values, the distribution is said to be a heavy tailed distribution (or said to have a larger tail weight). According to [1], there are four ways to look for indication that a distribution is heavy tailed.

1. Existence of moments.
2. Speed of decay of the survival function to zero.
3. Hazard rate function.
4. Mean excess loss function.

Existence of moments
Note that the existence of the Pareto higher moments $E(X^k)$ is capped by the shape parameter $\beta$. In particular, the mean $E(X)=\frac{\alpha}{\beta-1}$ does not exist for $\beta \le 1$. If the Pareto distribution is to model a random loss, and if the mean is infinite (when $\beta=1$), the risk is uninsurable! On the other hand, when $\beta=2$, the Pareto variance does not exist. This shows that for a heavy tailed distribution, the variance may not be a good measure of risk.

For a given random variable $Z$, the existence of all moments $E(Z^k)$, for all positive integers $k$, indicates with a light (right) tail for the distribution of $Z$. The existence of positive moments exists only up to a certain value of a positive integer $k$ is an indication that the distribution has a heavy right tail. In contrast, the exponential distribution and the Gamma distribution are considered to have light tails since all moments exist.

The speed of decay of the survival function
The survival function $S_X(x)=P(X>x)$ captures the probability of the tail of a distribution. If a distribution whose survival function decays slowly to zero (equivalently the cdf goes slowly to one), it is another indication that the distribution is heavy tailed.

The following is a comparison of a Pareto survival function and an exponential survival function. The Pareto survival function has parameters ($\alpha=2$ and $\beta=2$). The two survival functions are set to have the same 75th percentile ($x=2$).

$\displaystyle \begin{pmatrix} \text{x}&\text{Pareto }S_X(x)&\text{Exponential }S_Y(x)&\displaystyle \frac{S_X(x)}{S_Y(x)} \\\text{ }&\text{ }&\text{ }&\text{ } \\{2}&0.25&0.25&1 \\{10}&0.027777778&0.000976563&28 \\{20}&0.008264463&9.54 \times 10^{-7}&8666 \\{30}&0.00390625&9.31 \times 10^{-10}&4194304 \\{40}&0.002267574&9.09 \times 10^{-13}&2.49 \times 10^{9} \\{60}&0.001040583&8.67 \times 10^{-19}&1.20 \times 10^{15} \\{80}&0.000594884&8.27 \times 10^{-25}&7.19 \times 10^{20} \\{100}&0.000384468&7.89 \times 10^{-31}&4.87 \times 10^{26} \\{120}&0.000268745&7.52 \times 10^{-37}&3.57 \times 10^{32} \\{140}&0.000198373&7.17 \times 10^{-43}&2.76 \times 10^{38} \\{160}&0.000152416&6.84 \times 10^{-49}&2.23 \times 10^{44} \\{180}&0.000120758&6.53 \times 10^{-55}&1.85 \times 10^{50} \end{pmatrix}$

Note that at the large values, the Pareto right tails retain much more probability. This is also confirmed by the ratio of the two survival functions, with the ratio approaching infinity. If a random loss is a heavy tailed phenomenon that is described by the above Pareto survival function ($\alpha=2$ and $\beta=2$), then the above exponential survival function is woefully inadequate as a model for this phenomenon even though it may be a good model for describing the loss up to the 75th percentile. It is the large right tail that is problematic (and catastrophic)!

Since the Pareto survival function and the exponential survival function have closed forms, We can also look at their ratio.

$\displaystyle \frac{\text{pareto survival}}{\text{exponential survival}}=\frac{\displaystyle \frac{\alpha^\beta}{(x+\alpha)^\beta}}{e^{-\lambda x}}=\frac{\alpha^\beta e^{\lambda x}}{(x+\alpha)^\beta} \longrightarrow \infty \ \text{ as } x \longrightarrow \infty$

In the above ratio, the numerator has an exponential function with a positive quantity in the exponent, while the denominator has a polynomial in $x$. This ratio goes to infinity as $x \rightarrow \infty$.

In general, whenever the ratio of two survival functions diverges to infinity, it is an indication that the distribution in the numerator of the ratio has a heavier tail. When the ratio goes to infinity, the survival function in the numerator is said to decay slowly to zero as compared to the denominator. We have the same conclusion in comparing the Pareto distribution and the Gamma distribution, that the Pareto is heavier in the tails. In comparing the tail weight, it is equivalent to consider the ratio of density functions (due to the L’Hopital’s rule).

$\displaystyle \lim_{x \rightarrow \infty} \frac{S_1(x)}{S_2(x)}=\lim_{x \rightarrow \infty} \frac{S_1^{'}(x)}{S_2^{'}(x)}=\lim_{x \rightarrow \infty} \frac{f_1(x)}{f_2(x)}$

The Hazard Rate Function
The hazard rate function $h_X(x)$ of a random variable $X$ is defined as the ratio of the density function and the survival function.

$\displaystyle h_X(x)=\frac{f_X(x)}{S_X(s)}$

The hazard rate is called the force of mortality in a life contingency context and can be interpreted as the rate that a person aged $x$ will die in the next instant. The hazard rate is called the failure rate in reliability theory and can be interpreted as the rate that a machine will fail at the next instant given that it has been functioning for $x$ units of time. The following is the hazard rate function of the Pareto distribution.

\displaystyle \begin{aligned}h_X(x)&=\frac{f_X(s)}{S_X(x)} \\&=\frac{\beta}{x+\alpha} \end{aligned}

The interesting point is that the Pareto hazard rate function is an decreasing function in $x$. Another indication of heavy tail weight is that the distribution has a decreasing hazard rate function. One key characteristic of hazard rate function is that it can generate the survival function.

$\displaystyle S_X(x)=e^{\displaystyle -\int_0^x h_X(t) \ dt}$

Thus if the hazard rate function is decreasing in $x$, then the survival function will decay more slowly to zero. To see this, let $H_X(x)=\int_0^x h_X(t) \ dt$, which is called the cumulative hazard rate function. As indicated above, the survival function can be generated by $e^{-H_X(x)}$. If $h_X(x)$ is decreasing in $x$, $H_X(x)$ is smaller than $H_Y(x)$ where $h_Y(x)$ is constant in $x$ or increasing in $x$. Consequently $e^{-H_X(x)}$ is decaying to zero much more slowly than $e^{-H_Y(x)}$.

In contrast, the exponential distribution has a constant hazard rate function, making it a medium tailed distribution. As explained above, any distribution having an increasing hazard rate function is a light tailed distribution.

The Mean Excess Loss Function
Suppose that a property owner is exposed to a random loss $Y$. The property owner buys an insurance policy with a deductible $d$ such that the insurer will pay a claim in the amount of $Y-d$ if a loss occurs with $Y>d$. The insuerer will pay nothing if the loss is below the deductible. Whenever a loss is above $d$, what is the average claim the insurer will have to pay? This is one way to look at mean excess loss function, which represents the expected excess loss over a threshold conditional on the event that the threshold has been exceeded.

Given a loss variable $Y$ and given a deductible $d>0$, the mean excess loss function is $e_Y(d)=E(Y-d \lvert X>d)$. For a continuous random variable, it is computed by

$\displaystyle e_Y(d)=\frac{\int_d^{\infty} (y-d) \ f_Y(y) \ dy}{S_Y(d)}$

Applying the technique of integration by parts produces the following formula:

$\displaystyle e_Y(d)=\frac{\int_d^{\infty} S_Y(y) \ dy}{S_Y(d)}$

It turns out that the mean excess loss function is one more way to examine the tail property of a distribution. The following is the mean excess loss function of the Pareto distribution:

$\displaystyle e_X(d)=\frac{d+\alpha}{\beta-1}=\frac{1}{\beta-1} \ d + \frac{\alpha}{\beta-1}$

Note that the Pareto mean excess loss function is a linear increasing function of the deductible $d$. This means that the larger the deductible, the larger the expected claim if such a large loss occurs! If a random loss is modeled by such a distribution, it is a catastrophic risk situation. In general, an increasing mean excess loss function is an indication of a heavy tailed distribution. On the other hand, a decreasing mean excess loss function indicates a light tailed distribution. The exponential distribution has a constant mean excess loss function and is considered a medium tailed distribution.

——————————————————————————————————————
The Pareto distribution has many economic applications. Since it is a heavy tailed distribution, it is a good candidate for modeling income above a theoretical value and the distribution of insurance claims above a threshold value.

——————————————————————————————————————

Reference

1. Klugman S.A., Panjer H. H., Wilmot G. E. Loss Models, From Data to Decisions, Second Edition., Wiley-Interscience, a John Wiley & Sons, Inc., New York, 2004