Gradient Descent Article Index for
Gradient
Website Links For
Gradient
 

Information About

Gradient Descent




Gradient descent is also known as steepest descent, or the '''method of steepest descent'''. When known as the latter, gradient descent should not be confused with the Method Of Steepest Descent for approximating integrals.


DESCRIPTION


Gradient descent is based on the observation that if the real-valued function F(\mathbf{x}) is defined and differentiable in a neighborhood of a point \mathbf{a}, then F(\mathbf{x}) decreases ''fastest'' if one goes from \mathbf{a} in the direction of the negative gradient of F at \mathbf{a}, -
abla F(\mathbf{a}). It follows that, if

:\mathbf{b}=\mathbf{a}-\gamma
abla F(\mathbf{a})

for \gamma>0 a small enough number, then F(\mathbf{a})\geq F(\mathbf{b}). With this observation in mind, one starts with a guess \mathbf{x}_0 for a local minimum of F, and considers the sequence
\mathbf{x}_0, \mathbf{x}_1, \mathbf{x}_2, \dots such that

:\mathbf{x}_{n+1}=\mathbf{x}_n-\gamma_n
abla F(\mathbf{x}_n),\ n \ge 0.

We have

:F(\mathbf{x}_0)\ge F(\mathbf{x}_1)\ge F(\mathbf{x}_2)\ge \dots,

so hopefully the sequence (\mathbf{x}_n) converges to the desired local minimum. Note that the value of the ''step size'' \gamma is allowed to change at every iteration.

This process is illustrated in the following picture.
:
Here F is assumed to be defined on the plane, and that its graph has a Bowl shape. The blue curves are the Contour Line s, that is, the regions on which the value of F is constant. A red arrow originating at a point shows the direction of the negative gradient at that point. Note that the (negative) gradient at a point is perpendicular to the contour line going through that point. We see that gradient ''descent'' leads us to the bottom of the bowl, that is, to the point where the value of the function F is minimal.


SOME EXAMPLES

Gradient descent has problems with pathological functions such as the Rosenbrock Function shown here:
:

The gradient descent method applied to F(x,y)=\sin\left( rac{1}{2} x^2 - rac{1}{4} y^2 + 3 ight) \cos(2 x+1-e^y):


COMMENTS


Note that gradient descent works in spaces of any number of dimensions, even in infinite-dimensional ones. In the latter case the search space is typically a Function Space , and one calculates the Gâteaux Derivative of the functional to be minimized to determine the descent direction.

Two weaknesses of gradient descent are:
#The algorithm can take many iterations to converge towards a local minimum, if the curvature in different directions is very different.
#Finding the optimal \gamma per step can be time-consuming. Conversely, using a fixed \gamma can yield poor results. Methods based on Newton's Method and inversion of the Hessian using Conjugate Gradient techniques are often a better alternative.

A more powerful algorithm is given by the BFGS Method which consists in calculating on every step a matrix by which the gradient vector is multiplied to go into a "better" direction, combined with a more sophisticated Line Search algorithm, to find the "best" value of \gamma.


SEE ALSO



REFERENCES


  • Mordecai Avriel (2003). ''Nonlinear Programming: Analysis and Methods.'' Dover Publishing. ISBN 0-486-43227-0.

  • Jan A. Snyman (2005). ''Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms.'' Springer Publishing. ISBN 0-387-24348-8