next up previous contents
Next: 8.2 Jointly Distributed Random Up: 8. PROBABILISTIC APPROACH FOR Previous: 8. PROBABILISTIC APPROACH FOR

Subsections

8.1 Random Variables

Random variables are mathematical quantities that are used to represent probabilistic uncertainty. They can describe the probabilities associated with each value, or each sub-range of values, an uncertain quantity can take. For example, if a random variable \ensuremath{\textstyle\mbox{\boldmath${c}$ }} represents the uncertainty in the concentration of a pollutant, the following questions can be answered by analyzing the random variable \ensuremath{\textstyle\mbox{\boldmath${c}$ }}: Given a concentration $c_0$, what is the probability that \ensuremath{\textstyle\mbox{\boldmath${c}$ }} takes a value lower than $c_0$; or given two numbers $c_1$ and $c_2$, what is the probability that \ensuremath{\textstyle\mbox{\boldmath${c}$ }} takes values between $c_1$ and $c_2$? Mathematically, a random variable ${\ensuremath{\textstyle\mbox{\boldmath${x}$ }} }$maps a probability space ${\ensuremath{\textstyle\mbox{\boldmath${\Omega}$ }} }$ onto the real line. The random variable \ensuremath{\textstyle\mbox{\boldmath${x}$ }} can assume values from $-\infty$ to $+\infty$, and there is an associated probability for each value (or interval) that \ensuremath{\textstyle\mbox{\boldmath${x}$ }} takes. Based on the nature of the values that random variables can assume, they can be classified into three types:
(a)
continuous random variables: these variables can assume continuous values from an interval. Examples include contaminant concentrations in the environment; emissions from an industrial source; and physical parameters in an exposure model, such as body weight or respiration rate. In these cases, one cannot define the probability that a random variable \ensuremath{\textstyle\mbox{\boldmath${x}$ }} is exactly equal to a value $x_0$, since there are uncountably infinite number of possible values, and the answer for each point would be zero. Hence, in such cases, probabilities are defined on intervals (e.g., the probability that \ensuremath{\textstyle\mbox{\boldmath${x}$ }} lies between $x_1$ and $x_2$). Further, a probability density can be defined at each point in the interval; the probability density at a point is representative of the probability in the vicinity of that point.
(b)
discrete random variables: these variables can assume discrete values from a set. Examples include the following: rolling of dice, where the outcome can have only integer values between 1 and 6; the number of days an air quality standard is violated in a year - this can assume integral values between 0 and 365; and the number of defective cars in a production line. These variables have probabilities associated with a countable number of values they can assume. Continuous and discrete random variables can be contrasted as follows: one can define the probability that the number of air quality exceedences to be exactly equal to a given number $n$, where as one cannot define the probability that the atmospheric concentration is exactly equal to a given concentration $c$.
(c)
mixed random variables: these variables can assume continuous as well as discrete values. For example, the sum of a discrete and a continuous random variable results in a mixed random variable. They may have the properties of continuous random variables in certain ranges, and may have properties of discrete random variables in others.
A major part of uncertainty analysis in environmental modeling and risk characterization involves uncertainties in continuous quantities. Examples include the uncertainties in measured or predicted concentrations, and the uncertainties in the estimated time of exposure to a contaminant. The present work focuses mainly on uncertainties described by continuous random variables.

8.1.1 Continuous Random Variables

Continuous random variables are characterized through the following functions:

(a)
Cumulative density function, \ensuremath{F_{{\ensuremath{\textstyle\mbox{\boldmath${x}$ }}}}(x)}. This denotes the probability that the random variable \ensuremath{\textstyle\mbox{\boldmath${x}$ }} has a value less than or equal to $x$. This function is also known as the cumulative distribution. The probability that \ensuremath{\textstyle\mbox{\boldmath${x}$ }} has a value greater than $x$ is given by 1- \ensuremath{F_{{\ensuremath{\textstyle\mbox{\boldmath${x}$ }}}}(x)}. Further, the probability that \ensuremath{\textstyle\mbox{\boldmath${x}$ }} takes on values between $x_1$ and $x_2$ for $x_2 \ge x_1$, is given by:

\begin{displaymath}\ensuremath{\mbox{Pr}}\{x_1 \le {\ensuremath{\textstyle\mbox{...
...emath{F_{{\ensuremath{\textstyle\mbox{\boldmath ${x}$}}}}(x_1)}\end{displaymath}

The important characteristics of the cumulative density function are:

\begin{displaymath}\ensuremath{F_{{\ensuremath{\textstyle\mbox{\boldmath ${x}$}}...
...uremath{\textstyle\mbox{\boldmath ${x}$}}}}(x)} is monotonous}
\end{displaymath}


\begin{displaymath}0 \le \ensuremath{F_{{\ensuremath{\textstyle\mbox{\boldmath ${x}$}}}}(x)}\le 1, \mbox{\ for\ } -\infty < x < \infty
\end{displaymath}


\begin{displaymath}\ensuremath{F_{{\ensuremath{\textstyle\mbox{\boldmath ${x}$}}...
...{{\ensuremath{\textstyle\mbox{\boldmath ${x}$}}}}(\infty)} = 1
\end{displaymath}

In population risk characterization, the corresponding cumulative density function can be considered analogous to the fraction of the population that is at risk with respect to a given risk criterion.

(b)
Probability density function, \ensuremath{f_{{\ensuremath{\textstyle\mbox{\boldmath${x}$ }}}}(x)}. This function is also known as the probability distribution. This is the derivative of the cumulative density function.

\begin{displaymath}\ensuremath{f_{{\ensuremath{\textstyle\mbox{\boldmath ${x}$}}...
...F_{{\ensuremath{\textstyle\mbox{\boldmath ${x}$}}}}(x)} }{d x}
\end{displaymath}

The main properties of the probability density function are:

\begin{displaymath}\ensuremath{f_{{\ensuremath{\textstyle\mbox{\boldmath ${x}$}}}}(x)}\ge 0, \mbox{\ for\ } -\infty < x < \infty
\end{displaymath}


\begin{displaymath}\ensuremath{f_{{\ensuremath{\textstyle\mbox{\boldmath ${x}$}}...
...{{\ensuremath{\textstyle\mbox{\boldmath ${x}$}}}}(\infty)} = 0
\end{displaymath}


\begin{displaymath}\ensuremath{F_{{\ensuremath{\textstyle\mbox{\boldmath ${x}$}}...
...ath{f_{{\ensuremath{\textstyle\mbox{\boldmath ${u}$}}}}(u)} du
\end{displaymath}


\begin{displaymath}\ensuremath{\mbox{Pr}}\{ x_1 \le {\ensuremath{\textstyle\mbox...
...\mbox{\boldmath ${x}$}}}}(x_1)} =
\int_{x_1}^{x_2} f_x (x) dx
\end{displaymath}

(c)
Expected value of a function, $g({\ensuremath{\textstyle\mbox{\boldmath${x}$ }} })$, of a random variable. This is defined as

\begin{displaymath}\ensuremath{\mbox{E}\{g({\ensuremath{\textstyle\mbox{\boldmat...
...ath{f_{{\ensuremath{\textstyle\mbox{\boldmath ${x}$}}}}(x)} dx
\end{displaymath}

A probability density function or a cumulative density function fully characterizes a random variable. In the following sections, the probability density function \ensuremath{f_{{\ensuremath{\textstyle\mbox{\boldmath${x}$ }}}}(x)} and the random variable \ensuremath{\textstyle\mbox{\boldmath${x}$ }} are interchangeably used, as they both represent the same. Further, the terms ``distribution'' and ``random variable'' are also interchangeably used.

Even though the density functions of a random variable provide all the information about that random variable, they do not provide information on uncertainty at a quick glance, especially when the distribution functions consist of a complex algebraic expressions. In such cases, the moments of a random variable serve as useful metrics that provide a significant amount of information about a distribution.

8.1.2 Moments of a Random Variable

The moments of a random variable provide concise information about a random variable. The following are the moments that describe a random variable.

(a)
The expected value or mean of a random variable: this denotes the average value for the distribution of the random variable. If a large number of samples from the distribution are considered, the expected value of the distribution is equal to the arithmetic mean of the sample values. Mathematically, the mean $\eta$ of a random variable \ensuremath{\textstyle\mbox{\boldmath${x}$ }} is given by

\begin{displaymath}\eta = \ensuremath{\mbox{E}\{{\ensuremath{\textstyle\mbox{\bo...
...th{f_{{\ensuremath{\textstyle\mbox{\boldmath ${x}$}}}}(x)} dx
\end{displaymath}

It should be noted that the mean does not necessarily represent a realistic value from the distribution. For example, if a coin is tossed a large number of times, and if heads is assigned a value 1, and tails a value 0, the mean is 1/2, which is not a possible outcome in the coin tossing problem.
(b)
The variance or dispersion of a distribution: this indicates the spread of the distribution with respect to the mean value. A lower value of variance indicates that the distribution is concentrated close to the mean value, and a higher value indicates that the distribution is spread out over a wider range of possible values. The variance $\sigma^2$ of \ensuremath{\textstyle\mbox{\boldmath${x}$ }} is given by:

\begin{displaymath}\sigma^2 = \ensuremath{\mbox{E}\{({\ensuremath{\textstyle\mbo...
...th{f_{{\ensuremath{\textstyle\mbox{\boldmath ${x}$}}}}(x)} dx
\end{displaymath}

The square root of variance is called the standard deviation ($\sigma$).

(c)
The skewness of a distribution indicates the asymmetry of the distribution around its mean, characterizing the shape of the distribution. It is given by

\begin{displaymath}\gamma_1 = \frac{1}{\sigma^3}\ensuremath{\mbox{E}\{({\ensurem...
...th{f_{{\ensuremath{\textstyle\mbox{\boldmath ${x}$}}}}(x)} dx
\end{displaymath}

A positive value of skewness indicates that the distribution is skewed towards values greater than the mean (i.e., skewed towards the right side) Refine the definition] and a negative value indicates that the distribution is skewed towards the left side.

(d)
The kurtosis of a distribution indicates the flatness of the distribution with respect to the normal distribution. It is given by

\begin{displaymath}\gamma_2 = \frac{1}{\sigma^4}\ensuremath{\mbox{E}\{({\ensurem...
...th{f_{{\ensuremath{\textstyle\mbox{\boldmath ${x}$}}}}(x)} dx
\end{displaymath}

A value of kurtosis higher than 3 indicates that the distribution is flatter compared to the normal distribution, and a smaller value indicates a higher peak (relative to the normal distribution) around the mean value.

(e)
Higher order moments: The higher order moments of a random variable are defined as follows: The $k$th moment $m_k$ of \ensuremath{\textstyle\mbox{\boldmath${x}$ }} is given by

\begin{displaymath}m_k = \ensuremath{\mbox{E}\{{\ensuremath{\textstyle\mbox{\bol...
...ath{f_{{\ensuremath{\textstyle\mbox{\boldmath ${x}$}}}}(x)} dx
\end{displaymath}

The $k$th central moment $\mu_k$ of a random variable \ensuremath{\textstyle\mbox{\boldmath${x}$ }} is defined as

\begin{displaymath}\mu_k = \ensuremath{\mbox{E}\{({\ensuremath{\textstyle\mbox{\...
...ath{f_{{\ensuremath{\textstyle\mbox{\boldmath ${x}$}}}}(x)} dx
\end{displaymath}

Clearly, $m_0$ = 1, $m_1$ = $\eta$, $\mu_0$ = 1, $\mu_1$ = 0, and $\mu_2$ = $\sigma^2$. Further, the central moments $\mu_k$'s and the moments $m_k$'s are related as follows:
    $\displaystyle \mu_k = \sum_{r=0}^{k} {\,}^k\!C_r (-1)^r\eta^rm_{k-r} \mbox{\ \ \ and}$ (8.1)
    $\displaystyle m_k = \sum_{r=0}^{k} {\,}^k\!C_r \eta^r\mu_{k-r}$ (8.2)

where

\begin{displaymath}^k\!C_r = \frac{k.(k-1)\ldots(k-r-2)(k-r-1)}{1.2{\ldots}r} =
\frac{k!}{(k-r)!\ r!}
\end{displaymath}

8.1.3 Median, Mode and Percentiles of a distribution

In addition to the information provided by the moments of a distribution, some other metrics such as the median and the mode provide useful information. The median of a distribution is the value for the 50th percentile of the distribution (i.e., the probability that a random variable takes a value below the median is 0.5). The mode of a distribution is the value at which the probability density is the highest. For example, for the normal distribution N( $\mu,\sigma$), with zero mean, the mean, median, and mode are equal to $\mu$, whereas for the lognormal distribution, LN( $\mu,\sigma$), the mean is $\displaystyle e^{(\mu + \sigma^2/2)}$, the median is $\displaystyle
e^\mu$, and the mode is $\displaystyle e^{(\mu - \sigma^2)}$. Additionally, other percentiles of the distribution may sometimes be desired. For example, the 95th percentile indicate the values above which 5% of the samples occur.


next up previous contents
Next: 8.2 Jointly Distributed Random Up: 8. PROBABILISTIC APPROACH FOR Previous: 8. PROBABILISTIC APPROACH FOR
Sastry S. Isukapalli
1999-01-19