In Statistics , a is an interval Estimate of a Population Parameter . Instead of estimating the parameter by a single value, a whole interval of likely estimates is given. How likely the estimates are is determined by the confidence coefficient. The more likely it is for the interval to contain the parameter, the wider the interval will be.
Confidence intervals are used to indicate the reliability of an estimate. For example, a CI can be used to describe how reliable survey results are. All other things being equal, a survey result with a small CI is more reliable than a result with a large CI.
More precisely, a CI for a Population Parameter is an Interval with an associated Probability ''p'' that is generated from a random sample of an underlying population such that if the sampling was repeated numerous times and the confidence interval recalculated from each sample according to the same method, a proportion ''p'' of the confidence intervals would contain the Population Parameter in question.
Confidence intervals are the most prevalent form of Interval Estimation .
It must be noted that a confidence interval is not in general equivalent to a ( Bayesian ) Credible Interval . The common error of equating the two is known as the Prosecutor's Fallacy .
If ''U'' and ''V'' are statistics (i.e., observable Random Variable s) whose Probability Distribution depends on some unobservable Parameter θ, and
|
dependent on μ, but with a standard normal distribution independent of the parameter μ to be estimated. Hence it is possible to find numbers −''z'' and ''z'', independent of μ, where ''Z'' lies in between with probability 1 − α, a measure of how confident we want to be. We take 1 − α = 0.95. So we have:
:
The number ''z'' follows from:
:
::
(see
Probit and
Cumulative Distribution Function ), and we get:
:
::
::
::
.
This might be interpreted as: with probability 0.95 one will find the parameter μ between the stochastic endpoints:
:
and
:
Every time the measurements are repeated, there will be another value for the mean
of the sample. In 95% of the cases μ will be between the endpoints calculated from this mean, but in 5% of the cases it will not be. The actual confidence interval is calculated by entering the measured weights in the formula. Our 0.95 confidence interval becomes:
:
This interval has fixed endpoints, where μ might be in between (or not). There is no probability of such an event. We say: "with probability 1 − α the parameter μ lies in the confidence interval." We only know that by repetition in 100(1 − α) % of the cases μ will be in the calculated interval. In 100α % of the cases however it doesn't. And unfortunately we don't know in which of the cases this happens. That's why we say: with '''confidence level''' 100(1 − α) % μ lies in the confidence interval."
The following picture shows 50 realisations of a confidence interval for μ.
Observation of the sample means we choose from the population of all realisations. There the probability is 95% we end up having chosen an interval that contains the parameter. After realisation we just have our chosen interval. As seen from the picture there was a fair chance we choose an interval containing μ; however we may be unlucky and have picked the wrong one. We'll never know; we're stuck with our interval.
Suppose ''X''
1, ..., ''X''
''n'' are an
Independent sample from a
Normally Distributed population with
Mean μ and
Variance σ
2. Let
:
:
Then
:
has a
Student's T-distribution with ''n'' − 1 degrees of freedom. Note that the distribution of ''T'' does not depend on the values of the unobservable parameters μ and σ
2; i.e., it is a
Pivotal Quantity . If ''c'' is the 95th percentile of this distribution, then
: