In s (also called Dependent Variable s, Explained Variable s, Predicted Variable s, or Regressand s) (usually named ), and the Predictor s (also called Independent Variable s, Explanatory Variable s, Control Variable s, or Regressor s,) usually named ).
If there is more than one response variable, we speak of Multivariate Regression , which is not covered in this article.
Regression analysis is most commonly associated with fitting a curve (function) to some set of measurement data ( Curve Fitting ), but it can have other several objectives:
- Prediction of future observations, as by Curve Fitting
- Determining how closely the response can be predicted by the predictor
- Assessing the relationship between the predictors
In an experiment, the variables controlled by the experimenter are generally the predictors, while the yielded measurements are response variables. The name ''dependent variable'' was given because the response variable depends on the predictors, which are then called ''independent variables''. However, the predictors can very well be ''statistically'' dependent (for example if one takes ''X'' and ). Therefore, the terminology "dependent" and "independent" can be confusing and should be avoided.
Note that a Random Variable is a function rather than a variable in the usual sense.
The simplest type of regression uses a procedure to find the Correlation between a Quantitative response variable and a quantitative predictor. The relationship between the two random variables can be defined as a Linear Equation . This technique is known as Linear Regression and can be used with a single, or multiple predictor variables. Linear regression assumes the best estimate of the response is a Linear Function of some parameters (though not necessarily linear on the predictors). If the relationship is not linear in parameters, a number of Nonlinear Regression techniques may be used to obtain an more accurate regression.
If the response variable can take only discrete values (for example, a Boolean or Yes/No variable), Logistic Regression is preferred. The outcome of this type of regression is a function which describes how the probability of a given event (e.g. probability of getting "yes") varies with the predictors.
Predictor variables may be defined quantitatively or qualitatively(or ''categorical''). Categorical predictors are sometimes called . Depending on the nature of these predictors, a different regression technique is used:
- If the predictors are all quantitative, we speak of .
- If the predictors are all qualitative, one performs Analysis Of Variance .
- If some predictors are quantitative and some qualitative, one performs an Analysis Of Covariance .
Although these three types are the most common, there also exist Poisson Regression , Supervised Learning , and Unit-weighted Regression .
Let be a probability space and be measure spaces. will denote a ''p''-dimensional Parameter Space . Then:
- .
The relationship between the response and the predictors is represented mathematically by a function :
We define the , which means that or more concisely:
where .
|
|   |
(X_1,\cdots,X_n)</math> Then <math>\mathbb{E} is the only <math>\mathcal{C}</math>-measurable random variable <math>Y_0\in L^2(P)</math> for which <math>\mathbb{E}[(Y-Y_0)^2 </math> is minimal Moreover, by the Factorization Lemma , there exists a measurable function <math>\eta:(\Gamma_1 imes\cdots imes\Gamma_p)
ightarrow\mathbb{R}</math> such that <math>\mathbb{E}[YX]=\eta(X)</math> In regression analysis, what we are in fact doing is supposing we already know the form of the function <math>\eta</math> and we are only looking for the right coefficients In other words, we are looking for the function <math>\eta</math>, but we already know that it lies in a certain space
|
|   |
\mathbf{X}\widehat{ heta}_{LS}</math> and <math>\widehat{\sigma}^2:=rac{1}{n-p}\ec{Y}-\eta(\mathbf{X}\hat{ heta}_{LS})\^2</math> (with <math>\u\^2=u^t u</math>), we get:
|
In regression, we usually test the hypothesis that one or more of the parameters is zero against the alternative that all parameters are non-zero.
Maximum Likelihood is one method of estimating the parameters of a regression model, which behaves well for large samples. However, for small amounts of data, the estimates can have high variance or bias.
Bayesian methods can also be used to estimate regression models. A
Prior is placed over the parameters, which incorporates everything known about the parameters. (For example, if one parameter is known to be non-negative, a non-negative distribution can be assigned to it.) A
Posterior Distribution is then obtained for the parameter vector. Bayesian methods have the advantages that they use all the information that is available. They are exact, not asymptotic, and thus work well for small data sets if some contextual information is available to be used in the prior. Some practitioners use
Maximum A Posteriori (MAP) methods, a simpler method than full Bayesian analysis, in which the parameters are chosen that maximize the
Posterior . MAP methods are related to
Occam's Razor : there is a preference for simplicity among a family of regression models (curves) just as there is a preference for simplicity among competing theories.
To illustrate the various goals of regression, we will give three examples.
The following data set gives the average heights and weights for American women aged 30-39 (source: ''The World Almanac and Book of Facts, 1975'').
We would like to see how the weight of these women depends on their height. We are therefore looking for a function
such that
, where ''Y'' is the weight of the women and ''X'' their height. Intuitively, we can guess that if the women's proportions are constant and their density too, then the weight of the women must depend on the cube of their height. A plot of the data set confirms this supposition:
will denote the vector containing all the measured heights (
) and
is the vector containing all measured weights. We can suppose the heights of the women are independant from each other and have constant variance, which means the Gauss-Markov assumptions hold. We can therefore use the least-squares estimator, i.e. we are looking for coefficients
and
satisfying as well as possible (in the sense of the least-squares estimator) the equation:
:
Geometrically, what we will be doing is an orthogonal projection of ''Y'' on the subspace generated by the variables
and
. The matrix is constructed simply by putting a first column of 1's (the constant term in the model) a column with the original values (the ''X'' in the model) and a third column with these values cubed (
). The realization of this matrix (i.e. for the data at hand) can be written:
The matrix
(sometimes called "information matrix" or "dispersion matrix") is:
Vector
is therefore:
hence
A plot of this function shows that it lies quite closely to the data set:
The confidence intervals are computed using:
:
with:
:
:
:
:
Therefore, we can say that with a probability of 0.95,
:
:
:
- Audi, R., Ed. (1996) ''The Cambridge Dictionary of Philosophy''. Cambridge, Cambridge University Press. curve fitting problem p.172-173.
- Birkes, David and Yadolah Dodge, ''Alternative Methods of Regression'' (1993), ISBN 0-471-56881-3
- Chatfield, C. (1993) "Calculating Interval Forecasts," ''Journal of Business and Economic Statistics,'' 121-135.
- Fox, J., ''Applied Regression Analysis, Linear Models and Related Methods.'' (1997), Sage
- Hardle, W., ''Applied Nonparametric Regression'' (1990), ISBN 0-521-42950-1
- Meade, N. and T. Islam (1995) "Prediction Intervals for Growth Curve Forecasts," ''Journal of Forecasting,'' 413-430.