Confidence Interval

This appendix describes the method of confidence interval computation for a one-dimensional normal distribution.

Cumulative Probability

The cumulative probability is the likelihood that the value of a random variable is within a specific range.

\[ P\left( a \leq X \leq b \right) \]

Let us return to the pizza delivery distribution example (see Essential background I section). We want to find the likelihood that the pizza in city 'A' would be delivered within 33 minutes:

\[ P\left( 0 \leq X \leq 33 \right) \]

As a reminder, the pizza delivery time in the city 'A' is normally distributed with a mean of 30 minutes and a standard deviation of 5 minutes \( \left( \mu=30, \sigma=5 \right) \).

We need to find the area under the PDF curve between zero and 33 minutes:

Cumulative Probability

The filled area under Gaussian is given by:

\[ F \left( x;\mu,\sigma^{2} \right) = \frac{1}{\sqrt{2\pi\sigma^{2}}} \int_{0}^{r}exp \left(\frac{-(x-\mu)^2}{2\sigma^{2}} \right)dx \]

In our case:

\[ F \left( x;\mu,\sigma^{2} \right) = \frac{1}{\sqrt{2\pi \cdot 5^{2}}} \int_{0}^{33}exp \left(\frac{-(x-30)^2}{2\cdot 5^{2}} \right)dx \]

Don't worry. We won't need to compute this integral.

Let us define a standardized score (also called a z-score) to simplify the problem.

z-score is a standardized random variable with a mean of 0 and a standard deviation of 1 \( \left( \mu=0, \sigma=1 \right) \).

\[ z = \frac{x-\mu}{\sigma} \]

A z-score defines the distance of \( x \) from the mean in units of standard deviations. For example:

  • If \(z-score=1\), the value of \( z \) is one standard deviation above the mean.
  • If \(z-score=-2.5\), the value of \( z \) is 2.5 standard deviations below the mean.
  • If \(z-score=0\), the value of \( z \) equals the mean.

The pizza delivery time in city 'A' is a random variable with a mean of 30 and a standard deviation of 5 \( \left( \mu=30, \sigma=5 \right) \).

z-score for 33 minutes is:

\[ z = \frac{33-30}{5}=0.6 \]

z-score for 0 minutes is:

\[ z = \frac{0-30}{5}=-6 \]

The PDF of \( z \) is a standard normal distribution:

\[ F \left( z \right) = \frac{1}{\sqrt{2\pi}}exp \left(-0.5z^{2} \right) \]

The cumulative probability is the area under the PDF between \( -\infty \) and \( z \).

Standard Normal Distribution

The Cumulative Probability of \( z \) is given by:

\[ CP \left( z \right) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{z}exp \left( -0.5z^{2} \right)dz \]

For our example, we need to find the following:

\[ P(-6 \leq z \leq 0.6)= CP(z=0.6)- CP(z=-6) \]

Calculating the PDF integral is not straightforward and requires much work. The faster method is to use statistical z-score tables or computer software packages.

z-score tables contain cumulative probabilities for different z-scores. The following figure exemplifies the location of the cumulative probability for z-score (z=0.6).

z-score table

\[ CP \left( z = 0.6 \right) = 0.7257 \]

You can use scientific computer software packages for a z-score integral computation.

The following commands compute the z-score integral in different computer software packages:

Computer Software Package Command
Python from scipy.stats import norm
norm.cdf(z)
MATLAB normcdf(z)
Excel NORM.DIST(z, 0, 1, TRUE)

from scipy.stats import norm

norm.cdf(0.6)
0.7257468822499265

norm.cdf(-6)
9.865876450376946e-10
											

normcdf(0.6)
0.7257

normcdf(-6)
9.8659e-10
											

\[ P(-6 \leq z \leq 0.6) = 0.7257 - 9.8659\times10^{-10} = 0.7257 \]

The likelihood of having a pizza in city 'A' within 33 minutes is 72.57%.

Or in other words, the pizza delivery time 72.57 percentile in city 'A' is 33 minutes.

Hint: When using computer software packages, you don't need to calculate the z-score. You can specify the mean and standard deviation as an argument of the software function.

The following commands compute the cumulative distribution in different computer software packages:

Computer Software Package Command
Python from scipy.stats import norm
norm.cdf(x, mu, sigma)
MATLAB norm.cdf(x, mu, sigma)
Excel NORM.DIST(x, mu, sigma, TRUE)

from scipy.stats import norm

norm.cdf(33, 30, 5)
0.7257468822499265
											

normcdf(33, 30, 5)

0.7257
											

Normal inverse cumulative distribution

In this chapter, we would like to answer a reverse question. What is the cumulative distribution for a given percentile?

For example, what is the 80th percentile for the pizza delivery time in the city' A'?

Normal Inverse Cumulative Distribution

One method is to use the z – score table:

  • In the table below, find the cumulative distribution value closest to 0.8.
  • The \( z-score \) is a combination of the row \( z-value \) and column \( z-value \): \( z=0.84 \).
z-score table

Now, we must convert \( z \) to \( x \):

\[ z = \frac{x-\mu}{\sigma} \]

\[ x =z\sigma + \mu = 0.84 \times 5+30=34.2 \]

The 80th percentile for the pizza delivery time in the city 'A' is 34.2 minutes.

If you use computer software, you can use the following commands:

Computer Software Package Command
Python from scipy.stats import norm
norm.ppf(x, mu, sigma)
MATLAB norminv(p, mu, sigma)
Excel NORMINV(x, mu, sigma)

from scipy.stats import norm

norm.ppf(0.8, 30, 5)
34.20810616786457
										

norminv(0.8, 30, 5)

34.2081
										

Confidence interval

A normally distributed random variable is described by mean \( (\mu) \) and standard deviation \( (\sigma) \). A confidence interval is a probability that a parameter falls between a set of values for a certain proportion of times.

Assume a weight measurement of 80kg with a measurement standard deviation \( (\sigma) \) of 2kg. The probability that the true weight falls between 78kg and 82kg is 68.25%.

Usually, we are interested in higher confidence levels, such as 90% or 95%. Let us see how to find it.

The following plot describes the standard normal distribution \( (\mu=0, \sigma=1) \). We want to find a 90% confidence interval.

Confidence interval

The area of the filled region under the curve is 90% of the total area. The area of the unfilled region is 10% of the total area. The area of the unfilled region on the left is 5% of the total area. We can find a z-score for percentile 5 or percentile 95.


from scipy.stats import norm

norm.ppf(0.05)
-1.6448536269514729

norm.ppf(0.95)
1.6448536269514722

										

norminv(0.05)
-1.6449

norminv(0.95)
1.6449
										

The 90% confidence interval is \( (\pm 1.645 \sigma) \).

For the weight measurement example, the 90% confidence interval is ±3.29kg. The probability that the true weight falls between 76.71kg and 83.29kg is 90%.