This appendix describes the method of confidence interval computation for a one-dimensional normal distribution.
The cumulative probability is the likelihood that the value of a random variable is within a specific range.
\[ P\left( a \leq X \leq b \right) \]
Let us return to the pizza delivery distribution example (see Essential background I section). We want to find the likelihood that the pizza in city 'A' would be delivered within 33 minutes:
\[ P\left( 0 \leq X \leq 33 \right) \]
As a reminder, the pizza delivery time in the city 'A' is normally distributed with a mean of 30 minutes and a standard deviation of 5 minutes \( \left( \mu=30, \sigma=5 \right) \).
We need to find the area under the PDF curve between zero and 33 minutes:
The filled area under Gaussian is given by:
\[ F \left( x;\mu,\sigma^{2} \right) = \frac{1}{\sqrt{2\pi\sigma^{2}}} \int_{0}^{r}exp \left(\frac{-(x-\mu)^2}{2\sigma^{2}} \right)dx \]
In our case:
\[ F \left( x;\mu,\sigma^{2} \right) = \frac{1}{\sqrt{2\pi \cdot 5^{2}}} \int_{0}^{33}exp \left(\frac{-(x-30)^2}{2\cdot 5^{2}} \right)dx \]
Don't worry. We won't need to compute this integral.
Let us define a standardized score (also called a z-score) to simplify the problem.
z-score is a standardized random variable with a mean of 0 and a standard deviation of 1 \( \left( \mu=0, \sigma=1 \right) \).
\[ z = \frac{x-\mu}{\sigma} \]
A z-score defines the distance of \( x \) from the mean in units of standard deviations. For example:
The pizza delivery time in city 'A' is a random variable with a mean of 30 and a standard deviation of 5 \( \left( \mu=30, \sigma=5 \right) \).
z-score for 33 minutes is:
\[ z = \frac{33-30}{5}=0.6 \]
z-score for 0 minutes is:
\[ z = \frac{0-30}{5}=-6 \]
The PDF of \( z \) is a standard normal distribution:
\[ F \left( z \right) = \frac{1}{\sqrt{2\pi}}exp \left(-0.5z^{2} \right) \]
The cumulative probability is the area under the PDF between \( -\infty \) and \( z \).
The Cumulative Probability of \( z \) is given by:
\[ CP \left( z \right) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{z}exp \left( -0.5z^{2} \right)dz \]
For our example, we need to find the following:
\[ P(-6 \leq z \leq 0.6)= CP(z=0.6)- CP(z=-6) \]
Calculating the PDF integral is not straightforward and requires much work. The faster method is to use statistical z-score tables or computer software packages.
z-score tables contain cumulative probabilities for different z-scores. The following figure exemplifies the location of the cumulative probability for z-score (z=0.6).
\[ CP \left( z = 0.6 \right) = 0.7257 \]
You can use scientific computer software packages for a z-score integral computation.
The following commands compute the z-score integral in different computer software packages:
Computer Software Package | Command |
---|---|
Python |
from scipy.stats import norm
norm.cdf(z)
|
MATLAB |
normcdf(z)
|
Excel |
NORM.DIST(z, 0, 1, TRUE)
|
from scipy.stats import norm
norm.cdf(0.6)
0.7257468822499265
norm.cdf(-6)
9.865876450376946e-10
normcdf(0.6)
0.7257
normcdf(-6)
9.8659e-10
\[ P(-6 \leq z \leq 0.6) = 0.7257 - 9.8659\times10^{-10} = 0.7257 \]
The likelihood of having a pizza in city 'A' within 33 minutes is 72.57%.
Or in other words, the pizza delivery time 72.57 percentile in city 'A' is 33 minutes.
Hint: When using computer software packages, you don't need to calculate the z-score. You can specify the mean and standard deviation as an argument of the software function.
The following commands compute the cumulative distribution in different computer software packages:
Computer Software Package | Command |
---|---|
Python |
from scipy.stats import norm
norm.cdf(x, mu, sigma)
|
MATLAB |
norm.cdf(x, mu, sigma)
|
Excel |
NORM.DIST(x, mu, sigma, TRUE)
|
from scipy.stats import norm
norm.cdf(33, 30, 5)
0.7257468822499265
normcdf(33, 30, 5)
0.7257
In this chapter, we would like to answer a reverse question. What is the cumulative distribution for a given percentile?
For example, what is the 80th percentile for the pizza delivery time in the city' A'?
One method is to use the z – score table:
Now, we must convert \( z \) to \( x \):
\[ z = \frac{x-\mu}{\sigma} \]
\[ x =z\sigma + \mu = 0.84 \times 5+30=34.2 \]
The 80th percentile for the pizza delivery time in the city 'A' is 34.2 minutes.
If you use computer software, you can use the following commands:
Computer Software Package | Command |
---|---|
Python |
from scipy.stats import norm
norm.ppf(x, mu, sigma)
|
MATLAB |
norminv(p, mu, sigma)
|
Excel |
NORMINV(x, mu, sigma)
|
from scipy.stats import norm
norm.ppf(0.8, 30, 5)
34.20810616786457
norminv(0.8, 30, 5)
34.2081
A normally distributed random variable is described by mean \( (\mu) \) and standard deviation \( (\sigma) \). A confidence interval is a probability that a parameter falls between a set of values for a certain proportion of times.
Assume a weight measurement of 80kg with a measurement standard deviation \( (\sigma) \) of 2kg. The probability that the true weight falls between 78kg and 82kg is 68.25%.
Usually, we are interested in higher confidence levels, such as 90% or 95%. Let us see how to find it.
The following plot describes the standard normal distribution \( (\mu=0, \sigma=1) \). We want to find a 90% confidence interval.
The area of the filled region under the curve is 90% of the total area. The area of the unfilled region is 10% of the total area. The area of the unfilled region on the left is 5% of the total area. We can find a z-score for percentile 5 or percentile 95.
from scipy.stats import norm
norm.ppf(0.05)
-1.6448536269514729
norm.ppf(0.95)
1.6448536269514722
norminv(0.05)
-1.6449
norminv(0.95)
1.6449
The 90% confidence interval is \( (\pm 1.645 \sigma) \).
For the weight measurement example, the 90% confidence interval is ±3.29kg. The probability that the true weight falls between 76.71kg and 83.29kg is 90%.