Confidence Interval

This appendix describes the method of confidence interval computation for a one-dimensional normal distribution.

Cumulative Probability

The cumulative probability is the likelihood that the value of a random variable is within a specific range.

\[ P\left( a \leq X \leq b \right) \]

Let us return to the pizza delivery distribution example (see Essential background I section). We want to find the likelihood that the pizza in city 'A' would be delivered within 33 minutes:

\[ P\left( 0 \leq X \leq 33 \right) \]

As a reminder, the pizza delivery time in the city 'A' is normally distributed with a mean of 30 minutes and a standard deviation of 5 minutes \( \left( \mu=30, \sigma=5 \right) \).

We need to find the area under the PDF curve between zero and 33 minutes:

Example-driven guide to Kalman Filter

Get the book

The filled area under Gaussian is given by:

\[ F \left( x;\mu,\sigma^{2} \right) = \frac{1}{\sqrt{2\pi\sigma^{2}}} \int_{0}^{r}exp \left(\frac{-(x-\mu)^2}{2\sigma^{2}} \right)dx \]

In our case:

\[ F \left( x;\mu,\sigma^{2} \right) = \frac{1}{\sqrt{2\pi \cdot 5^{2}}} \int_{0}^{33}exp \left(\frac{-(x-30)^2}{2\cdot 5^{2}} \right)dx \]

Don't worry. We won't need to compute this integral.

Let us define a standardized score (also called a z-score) to simplify the problem.

z-score is a standardized random variable with a mean of 0 and a standard deviation of 1 \( \left( \mu=0, \sigma=1 \right) \).

\[ z = \frac{x-\mu}{\sigma} \]

A z-score defines the distance of \( x \) from the mean in units of standard deviations. For example:

If \(z-score=1\), the value of \( z \) is one standard deviation above the mean.
If \(z-score=-2.5\), the value of \( z \) is 2.5 standard deviations below the mean.
If \(z-score=0\), the value of \( z \) equals the mean.

The pizza delivery time in city 'A' is a random variable with a mean of 30 and a standard deviation of 5 \( \left( \mu=30, \sigma=5 \right) \).

z-score for 33 minutes is:

\[ z = \frac{33-30}{5}=0.6 \]

z-score for 0 minutes is:

\[ z = \frac{0-30}{5}=-6 \]

The PDF of \( z \) is a standard normal distribution:

\[ F \left( z \right) = \frac{1}{\sqrt{2\pi}}exp \left(-0.5z^{2} \right) \]

The cumulative probability is the area under the PDF between \( -\infty \) and \( z \).

The Cumulative Probability of \( z \) is given by:

\[ CP \left( z \right) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{z}exp \left( -0.5z^{2} \right)dz \]

For our example, we need to find the following:

\[ P(-6 \leq z \leq 0.6)= CP(z=0.6)- CP(z=-6) \]

Calculating the PDF integral is not straightforward and requires much work. The faster method is to use statistical z-score tables or computer software packages.

z-score tables contain cumulative probabilities for different z-scores. The following figure exemplifies the location of the cumulative probability for z-score (z=0.6).

\[ CP \left( z = 0.6 \right) = 0.7257 \]

You can use scientific computer software packages for a z-score integral computation.

The following commands compute the z-score integral in different computer software packages:

Computer Software Package	Command
Python	`from scipy.stats import norm` `norm.cdf(z)`
MATLAB	`normcdf(z)`
Excel	`NORM.DIST(z, 0, 1, TRUE)`


from scipy.stats import norm

norm.cdf(0.6)
0.7257468822499265

norm.cdf(-6)
9.865876450376946e-10


normcdf(0.6)
0.7257

normcdf(-6)
9.8659e-10

\[ P(-6 \leq z \leq 0.6) = 0.7257 - 9.8659\times10^{-10} = 0.7257 \]

The likelihood of having a pizza in city 'A' within 33 minutes is 72.57%.

Or in other words, the pizza delivery time 72.57 percentile in city 'A' is 33 minutes.

Hint: When using computer software packages, you don't need to calculate the z-score. You can specify the mean and standard deviation as an argument of the software function.

The following commands compute the cumulative distribution in different computer software packages:

Computer Software Package	Command
Python	`from scipy.stats import norm` `norm.cdf(x, mu, sigma)`
MATLAB	`norm.cdf(x, mu, sigma)`
Excel	`NORM.DIST(x, mu, sigma, TRUE)`


from scipy.stats import norm

norm.cdf(33, 30, 5)
0.7257468822499265


normcdf(33, 30, 5)

0.7257

Normal inverse cumulative distribution

In this chapter, we would like to answer a reverse question. What is the cumulative distribution for a given percentile?

For example, what is the 80^th percentile for the pizza delivery time in the city' A'?

One method is to use the z – score table:

In the table below, find the cumulative distribution value closest to 0.8.
The \( z-score \) is a combination of the row \( z-value \) and column \( z-value \): \( z=0.84 \).

Now, we must convert \( z \) to \( x \):

\[ z = \frac{x-\mu}{\sigma} \]

\[ x =z\sigma + \mu = 0.84 \times 5+30=34.2 \]

The 80^th percentile for the pizza delivery time in the city 'A' is 34.2 minutes.

If you use computer software, you can use the following commands:

Computer Software Package	Command
Python	`from scipy.stats import norm` `norm.ppf(x, mu, sigma)`
MATLAB	`norminv(p, mu, sigma)`
Excel	`NORMINV(x, mu, sigma)`


from scipy.stats import norm

norm.ppf(0.8, 30, 5)
34.20810616786457


norminv(0.8, 30, 5)

34.2081

Confidence interval

A normally distributed random variable is described by mean \( (\mu) \) and standard deviation \( (\sigma) \). A confidence interval is a probability that a parameter falls between a set of values for a certain proportion of times.

Assume a weight measurement of 80kg with a measurement standard deviation \( (\sigma) \) of 2kg. The probability that the true weight falls between 78kg and 82kg is 68.25%.

Usually, we are interested in higher confidence levels, such as 90% or 95%. Let us see how to find it.

The following plot describes the standard normal distribution \( (\mu=0, \sigma=1) \). We want to find a 90% confidence interval.

The area of the filled region under the curve is 90% of the total area. The area of the unfilled region is 10% of the total area. The area of the unfilled region on the left is 5% of the total area. We can find a z-score for percentile 5 or percentile 95.


from scipy.stats import norm

norm.ppf(0.05)
-1.6448536269514729

norm.ppf(0.95)
1.6448536269514722


norminv(0.05)
-1.6449

norminv(0.95)
1.6449

The 90% confidence interval is \( (\pm 1.645 \sigma) \).

For the weight measurement example, the 90% confidence interval is ±3.29kg. The probability that the true weight falls between 76.71kg and 83.29kg is 90%.

Confidence Interval

Cumulative Probability

Python example

MATLAB example

Python example

MATLAB example

Normal inverse cumulative distribution

Python example

MATLAB example

Confidence interval

Python example

MATLAB example