Consider a function f(x,y) of two variables. At any critical point (x0,y0) of f(x,y), the tangent plane is flat, or, in other words the best linear appoximation h(x,y) to f(x,y) at (x0,y0) is constant
h(x,y) = f(x0,y0)
So at the level the best linear approximation, all critical points look pretty much alike.
But there are different kinds of critical points: maxima and minima for example, and there are more. The differences show up when we look at the best quadratic approximation q(x,y). This has the from:
h(x,y) = f(x0,y0) A*(x- x0)2 + B*(y- y0)2 + 2*C*(x- x0)*(y- y0)
where
A = fxx(x0,y0)
B = fyy(x0,y0)
C = fxy(x0,y0)
The answer to this comes from one more application of our "looking at one dimensional slices
method". Before explaining that though, let's ask what types of critical points there are, and how we can recognize them geometricly. More specificly, what do the different types of critical points look like on a contour plot?
Here is an example of a contour plot in which you can see two critical points.
The one marked "point A", at about (a,b), is a local maximum.
Notice that the contour curves around it, close by, are pretty much a family
of concentric ellipses, centered on point A.
Let's now see why this pattern is one visual signature of a critical point --
that is, wherever you see it, you see a critical point.
Reacall that the gradient must point
in a direction perpendicular to the contour curve's tangent. Since the contour curves wrap around A, and hence have tangents pointing in every direction, we have that aritrarily
near to A there are points at which the gradient is perpendicular to any given direction. By continuity, it must be perpendicular to every direction at A. The only vector perpendicular
to every other vector is the zero vector, so the gradient is zero at
A, and A is a critical point.
On the other hand, a local maxima or minima will have just such a contour plot.
The concentric ellipses are the visual signature of local maxima or minima.
Now the point marked "point B" is clearly something else.
Notice what happens there: There is a nested family of hyperbolas between which two contour curves cross. At the
point of crossing, namely B, the gradient must be perpendicular to both tangent directions. The only vector that is perpendicular to two different
-- non-colinear -- directions is the zero vector, so the gradient is zero at
B, and B is a critical point. This
nested family of hyperbolas between which two contour curves cross is the visual signature of a type of critical point that we call a saddle point.
(1) f(x,y) < f(x0,y0)
and points (x,y) where
(2) f(x,y) > f(x0,y0)
The two crossing contour curves divide the neighborhood of a critical point into
two pairs of "wedges". The inequality (1) holds in one pair of wedges, and the
inequality (2) in the other pair.
The classic -- simplest -- example of a saddle point is the point (0,0)
for the function f(x,y) = x*y.
We need one more definition, which singles out and names the particular combination
of second partial derivatives of f(x,y) that is relevant to the classification of types of critical points.
D(x,y) = fxx(x,y)fyy(x,y) -
(fxy(x,y))2
Notice that in terms of our A, B and C notation
above, at a critical point (x0,y0)
D(x0,y0) = AB - C2
D(x0,y0) >= 0
fxx(x0,y0) <= 0 and
fyy(x0,y0) <= 0
fxx(x0,y0) < 0 or
fyy(x0,y0) < 0
then the critical point (x0,y0) is a local maximum.
D(x0,y0) >= 0
fxx(x0,y0) >= 0 and
fyy(x0,y0) >= 0
fxx(x0,y0) > 0 or
fyy(x0,y0) > 0
then the critical point (x0,y0) is a local minimum.
D(x0,y0) <= 0
D(x0,y0) < 0,
then the critical point (x0,y0) is a saddle point.
Now, why is this true? Why the discriminant?
The answer, as we indicated, comes from our "one dimensional slicing" method.
To see how, suppose we have a local maximum
of f(x,y) at (x0,y0). Then pick
any number a and consider the parameterized path that goes from
(x0,y0)
to the point
(x0+1,y0 + a)
as t varies between 0 and 1. This is given by
(x(t),y(t)) = (x0+t,y0 + a*t)
Now define
F(t) = f(x(t),y(t))
By the definition, since the path passes through a local maximum of f(x,y)
at t=0, F(t) has a local maximum at t=0.
As we know, this means that
F''(0) <= 0
IF we compute this second derivative using the chain rule --twice of course --
we get, using our A, B and C notation,
F''(0) = A + 2*C*a + B*a2
The thing to notice now is that a is an arbitrary number.
Therefore, combining the last two staements about F''(0) we see that
A + 2*C*a + B*a2 <= 0
for all a. Now this is a quadratic function of a. Thus its graph is parabola, and the inequality just above says that the graph of this
parabola stays on one side of the a axis. This will still be true if we multiply through by A, which will help us complete the square:
A*(A + 2*C*a + B*a2) = (A+C*a)2 + (A*B - C2)*a2
Now you see the discriminant on the right hand side. And you see that
unless the discriminant is non-negative, the graph would cross the
a axis. Therefore, since this crossing is impossible,
the discriminant is non-negative.
This explains why the discriminant must be non-negative at a local maximum.
The exact same argument applies to local minima.
Now suppose that the discriminant is strictly positive; i.e., D(x0,y0) > 0. The parabola
A + 2*C*a + B*a2 is always strictly above or below the
a axis. That means that F''(0) has just one
sign -- either strictly positve or negative -- no matter what a is.
So, on each slice, (x0,y0) is a local maximum
in the negative case, and a local minimum in the positive case. So if
D(x0,y0) > 0, (x0,y0)
is either a local maximum or minimum.
To decide which, just look along the x-axis: If we have a local maximum,
of f(x,y), then we have one on this slice, so that
fxx(x0,y0) <= 0
But if it is a minimum, we must have
fxx(x0,y0) >= 0
And of course, all of this applies to fyy(x0,y0) just as well.
The rest of the theorem can be seen to be true by similar considerations.
The Problem To Be Solved:
How do we use the three numbers A,
B and C to decide which type of critical point we are looking at?
Definition of Saddle Point
A critcal point (x0,y0) of a function f(x,y)is called a saddle point
in case arbitrarily close to (x0,y0) there are points
(x,y) where
Definition of Discriminant
The discriminant D(x,y) of a function f(x,y)
is the function
Theorem on the Classification of Critical Points
Let (x0,y0) be a critical point of f(x,y)
Then the following relations between the discriminant D(x0,y0) and the type of the critical point hold: