TOPIC: GRADIENTS, DIRECTIONAL DERIVATIVES AND TANGENT PLANES

Again recall the very simple idea that is behind many of the methods of multivariable calculus. Namely, one studies functions of several variables by applying the single variable calculus to one dimensional "slices" of them.

We have seen how this leads directly from Taylor's Theroem for a single variable to Taylor's Theorem for several variables. This led us to the notion of best linear approximation. Now let's understand the geometry behind this.

Recall that the strategy is to build single variable functions F(t) by combining multivariable functions f(x,y) with parameterized paths (x(t),y(y)) as follows: We define

F(t) = f(x(t),y(t))

Now F(t) is a function of a single variable -- namely t -- and we can differentiate it or integrate it with the many single varible methods at our disposal.

We started out with a function f(x,y) of two variables. What does differentiating F(t) tell us about f(x,y)? We can do it, but what is the point?

You can understand, in non-mathematical terms, what this tells you if you take the following point of view: Think of a piece of land that is to be surveyed. Think of x and y as the coordinates on a map of this piece of land, and finally think of f(x,y) as giving the altitude at the point (x,y).

Imagine you walk along a path that goes all over this piece of land, measuring the rate of change in your altitude as you go, taking notes the rate of change in altitude; i.e., the slope, at points all along your path.

If you took enough notes, you could afterwards construct an accurate three dimensional model of the region. That is, you can learn everything there is to know about the altitude function f(x,y) this way. And since slopes are given by derivatives, you can see how by studying F'(t) for lots of parameterized paths (x(t),y(t)) -- which ammounts to just another way of saying "keeping track of how your altidude changes as you walk all over the place" -- you can learn all about the altitude function.

Now, how do make mathematics out of this? In crude terms, how do we turn these musings about strategy into formulas?

The key to this is to extract the geometrical content from F'(t). Geometry will be the key to everything we do in this course.

Directional Derivatives and Gradients

To carry this discussion back to mathematics as such, let's consider a path (x(t),y(t)) that passes through some point (x0,y0) at t=0, and let's keep track of how our altitude changes as you walk along the path. Now lets compute the derivative of F(t) = f(x(t),y(t)). By the chain rule,

F'(0) = fx(x0,y0)*x'(0) + fy(x0,y0)*y'(0)

This is a combination of four numbers, namely

fx(x0,y0), x'(0), fy(x0,y0), y'(0)

Two came from f, and two are determined by the particular path through the point (x0,y0), namely the two components of the velocity vector

V = (x'(0),y'(0))

For different velocity vectors, we get a different value for the derivative. Suppose for example that V = (3,-1). Then we have

F'(0) = fx(x0,y0)*3 + fy(x0,y0)*(-1)

But suppose the velocity vector had been V = (2,4)? Then we would have had

F'(0) = fx(x0,y0)*2 + fy(x0,y0)*4

There are clearly infintely many such derivatives, since there are infinitely many possible velocity vectors V. Each one of these derivatives is called a directional derivative. Specifically, we make the following definition:

Definition of directional derivatives

Given a vector (a,b), the directional derivative of f in the direction (a,b) at the point (x0,y0) is the number

fx(x0,y0)*a + fy(x0,y0)*b

or, what is the same, the derivative at t = 0 of the function

F(t) = f(x0 + a*t,y0 + b*t)

To see where this last bit came from, suppose we are given an arbitrary vector (a,b). Now clearly there are infinitely many paths through (x0,y0) that have this velocity vector (a,b) at t=0. But a particularly simple one is the straight-line path

(x(t),y(t)) =(x0 + a*t,y0 + b*t)

The definition of directional derivatives becomes a bit more inteligible if we remember two things:

First, the vector (x'(0),y'(0)) is the velocity of the paramterized curve (x(t),y(t)) at t = 0. Second, given two vectors X = (a,b) and Y = (c,d), the dot product dot(X,Y) is given by

dot(X,Y) = a*c + b*d

Remembering these two things, we see that if we make the following definition:

Definition of the gradient

The gradient of f at (x0,y0), denoted by Gradf(x0,y0), is defined to be the vector

Gradf(x0,y0) = (fx(x0),fy(x0)).

Therefore, if we write V = (x'(0),y'(0)) for our velocity vector, then we can rewrite our directional derivatives in the following geometrical form:

F'(0) = dot(Gradf(x0,y0), V)

In fact, we will usually supress explicit mention of the arguments t, x0 and y0 -- which after all are taking on "general", i.e., non-specific , values in our discussion anyway -- and simply write

F'(0) = dot(Gradf,V)

This last form of the chain rule certainly looks more geometric, written out in terms of vectors and a dot product. It also separates what comes from f, and what comes from the path. In doing so, it singles out the data we need from f to compute directional derivatives. The for defining the gradient as a vector is exactly that it contains all the data from f that we need to compute arbitrary directional derivatives and a simple vector operation can be used to do the computation.

A good mathematical definition is never made for mysterious reasons. It is made for practical reasons with a particular class of computations in view.

The nomenclature "directional derivative" may seem a bit inappropriate, since the value of F'(0) depends not only on the direction of the velocity vector, but also on its magnitude . However, if we double the magnitude of the velocity vector, the directional derivative doubles too. On the other hand, if we change the direction by, say, doubling the angle the velocity vector makes with the x axis, anything at all can happen to the size of the directional derivative. So the name is not so bad: the really interesting dependence on V is in the dependence on the direcion of V. Once we know the directional derivatives of f for all unit vector velocities, we know them for all velocities.

Tangency and Tangent Planes

Now that we know about partial derivatives, we ask: what is the derivative of a function f(x,y) of two variables? For a function g(x) of one variable, we were always working with the derivative g'(x). Now we've got all these infinitely many directional derivatives. Isn't there something we can single out as the derivative? It may seem we can make progress by considering the two numbers fx(x0,y0) and fy(x0,y0). This is better than infinitely many, but these are just two particular directional derivatives: namely those corresponding to the directions (1,0) and (0,1) respectively. Any special place given to these directions would be an artifact of our coordinates system.

There is a good notion of the derivative of a function f(x,y) of two variables. To find it we just have to stopping thinking in terms of numbers, and to think instead in terms of geometry.

Recall that geometrically speaking the derivative g'(x0) for a function g(x) of a single variable x specifies the tangent line to the graph y= g(x) at x0.

For a function f(x,y) of two variables, the corresponding quantity of interest is the tangent plane to the graph z = f(x,y) at the point (x0,y0). Clearly, this is a single, well defined geometrical entity, and whatever specifies this plane has every bit as much right to be considered the derivative of f(x,y) as what specifies the tangent line of g(x) -- namely g'(x).

So, what does specify this plane, and how is it related to the directional derivatives that we have been considering? That is the question we shall now answer.

To do this, recall that the equation of a plane takes the general form

z = a*x + b*y + c

and note that this plane is the graph of a linear function; i.e. the function h(x,y)

h(x,y) = a*x + b*y + c

So what we are looking for is a way to compute the numbers a, b and c that give the tangent plane to f at (x0,y0).

This will be easy as soon as we have a precise definition of what it means for the graphs of two functions to be tangent to one another.

Definition of tangency for graphs in two variables

Let f(x,y) and h(x,y) be two functions of x and y. Consider their graphs; i.e., z = f(x,y) andz = h(x,y). These graphs are said to be tangent to one another at (x0,y0) provided

(1) f(x0,y0) = h(x0,y0)

(2) For every vector (a,b), the directional derivatives of f and h along (a,b) are equal at (x0,y0).

That is, not only do the values of the functions agree at the point of tangency, but also their slopes in any direction.

Now suppose the graphs of f(x,y) and h(x,y) are tangent to one another at (x0,y0). Suppose further that h(x,y) is linear; i.e., for some a, b and c,

h(x,y) = a*x + b*y + c

Then by (1), plugging in x0 and y0 we get

f(x0,y0) = a*x0 + b*y0 + c

or

c = f(x0,y0) - a*x0 - b*y0

Now, the directional derivative of f(x,y) in the direction (1,0) is fx(x0,y0), while the directional derivative of h(x,y) in the direction (1,0) is h_x(x0,y0) = a. In the same way, the directional derivative of f(x,y) in the direction (0,1) is fy(x0,y0), while the directional derivative of h(x,y) in the direction (0,1) is h_y(x0,y0) = b.

So we must have

a = fx(x0,y0) and b= fy(x0,y0)

Returning to our expression for c, we have

c = f(x0,y0) - fx(x0,y0)*x0 - fy(x0,y0)*y0

Now h(x,y) = a*x + b*y + c becomes

h(x,y) = fx(x0,y0)*x + fy(x0,y0)*y + [f(x0,y0) - fx(x0,y0)*x0 - fy(x0,y0)*y0]

or what is the same

h(x,y) = fx(x0,y0)*(x - x0) + fy(x0,y0)*(y - y0) + f(x0,y0)

Note that h(x,y) is just the best linear approxiamtion to f(x,y) at (x0,y0) as defined in the section on the mutivariable Taylor's theorem.

We can, however, rewrite this a bit more clearly in vector terms. Using the definitions made above:

Theorem on the tangent plane of f in terms of Gradf

The tangent plane to f at (x0,y0) is the graph of the linear function

h(x,y) = dot(Gradf,(x-x0,y-y0)) + f(x0,y0)

which is of course the plane

z - dot(Gradf,(x-x0,y-y0)) + f(x0,y0) = 0

WORKED PROBLEMS

Worked Problem 1

Let

f(x,y) = x2 - y2 - 2*x -2*y

  • (a) Find the equation of the tangent plane at (0,0)
  • (b) Find the equation of the tangent plane at (1,-1)
  • Solution to Worked Problem 1

    One easily computes that

    f(0,0) = 0 fx (0,0) = -2 fy (0,0) = -2

    Therefore, the gradient gradf(0,0)> is

    gradf(0,0)> = (-2,2)

    and the equation of the tangent plane at (0,0) is

    z = - 2*x -2*y

    For the next point, (1,-1), One easily computes that

    f(1,-1) = 0 fx (1,-1)) = -2 fy (1,-1) = -2

    Therefore, the gradient gradf(1,-1)> is

    gradf(0,0)> = (0,0)

    and the equation of the tangent plane at (0,0) is

    z = 0

    Worked Problem 2

    Let

    f(x,y) = x3 +2*x*y + y2 - 1

  • (a) Find the equation of the tangent plane at (0,0)
  • (b) Find the equation of the tangent plane at (1,-1)
  • Solution to Worked Problem 2

    One easily computes that

    f(0,0) = -1 fx (0,0) = 0 fy (0,0) = 0 fxx (0,0) = 0 fyy (0,0) = 2 fxy (0,0) = 2

    Therefore, the gradient gradf(0,0)> is

    gradf(0,0)> = (0,0)

    and the equation of the tangent plane at (0,0) is

    z = -1

    For the next point, (1,-1), One easily computes that

    f(1,-1) = 0 fx (1,-1)) = -2 fy (1,-1) = -2

    f(1,-1) = -1 fx (1,-1)) = 1 fy (1,-1) = 0 fxx (1,-1) = 6 fyy (1,-1) = 2 fxy (1,-1) = 2

    Therefore, the gradient gradf(1,-1)> is

    gradf(-1,1)> = (1,0)

    and the equation of the tangent plane at (0,0) is

    z = x-1

    POSED PROBLEMS

    Posed Problem 1

    Let

    f(x,y) = x4*y +y4*x - 2*x*y + 3

  • (a) Find the equation of the tangent plane at (0,0)
  • (b) Find the equation of the tangent plane at (1,-1)
  • Posed Problem 2

    Let

    f(x,y) = ex2 + y2

  • (a) Find the equation of the tangent plane at (0,1)
  • (b) Find the equation of the tangent plane at (2,1)