TOPIC: TAYLOR'S THEOREM WITH SEVERAL VARIABLES

There is a very simple idea behind many of the methods of multivariable calculus. Namely, one studies functions of several variables by applying the single variable calculus to them in "one dimensional slices of them".

The simplest version of this would be to apply it one variable at a time; that is, using vertical or horizontal slices. But that would be a bit too simple to take us very far.

A good way to do this, which will take us far, is to build single variable functions F(t) by combining multivariable functions f(x,y) with parameterized paths (x(t),y(y)) as follows: We define

F(t) = f(x(t),y(t))

For example, if

f(x,y) = x2 + y4 and (x(t),y(t)) = (t2 + 1,t)

then

F(t) = 2*t4 + 2*t2 + 1

Now F(t) is a simple function of one variable -- namely t -- and we can differentiate it or integrate it with ease.

But why is this useful?

The best way to answer this is by giving an application. We will do this here by using this method to derive the several variable version of Taylor's theroem from the one variable version -- very easily.

Recall that for functions F(t) of one variable, Taylor's theorem with remainder tells us which polynomial of degree N in t is the "best" approximation to F(t) at a given point t0 by a , and moreover, it tells us how good the approximation is.

The problem to be considered here is: how can we do this for functions of several variables?

Consider a function

f(x,y)

and a "base point"

(x0,y0)

about which we shall expand.

Now here is where the path comes in. Fix any other point (x,y) and define the paramterized path

(x(t),y(t)) = (x0 + t*(x-x0), y0 + t*(y-y0))

Notice that

  • (x(0),y(0)) = (x0,y0)
  • (x(1),y(1)) = (x,y)
  • That is, our paramterized path goes along a straight line segemnt from the base point (x0,y0) to our other point (x,y) at t varies from 0 to 1. So if we define

    F(t) = f(x(t),y(t))

    Then

    F(0) = f(x0,y0)

    F(1) = f(x,y).

    In what follows -- and, in fact, in most applications anywhere -- the main interest is in the Taylor approximations of degrees one and two. So we concentrate on these.

    By Taylor's Theorem applied to F(t) up to first order, we have that

    F(t) = F(0) + F'(0)*(t-t0) + R1(t,t0)

    where

    R1(t,t0) = (1/2)F''(s)*(t-t0)2

    for some s between 0 and t.

    And by Taylor's Theorem aplied to F(t) up to second order, we have that

    F(t) = F(0) + F'(0)*(t-t0) + (1/2)F''(0)*(t-t0)2+ R2(t,t0)

    where

    R2(t,t0) = (1/6)F'''(s)*(t-t0)3

    for some s between 0 and t.

    If we work out what this says for F(t) = f(x(t),y(t)) with t0 = 0 and t = 1, we get, first writing things still in terms of F,

    F(1) = F(0) + F'(0) + R1(t,t0)

    in the first order case, and

    F(1) = F(0) + F'(0) + (1/2)F''(0) + R2(t,t0)

    and the second order case.

    We now express these in terms of the original function f. We have already said what F(0) and F(0) are in terms of f(x,y). For the derivatives of F(t), we just need the chain rule. First,

    F'(t) = fx(x(t),y(t))x'(t) + fy(x(t),y(t))y'(t)

    Evaluating this at t= 0 gives us

    F'(0) = fx(x0,y0)x'(0) + fy(x0,y0)y'(0)

    Finally, we easily compute from the definition that

    x'(0) = (x-x0) and y'(0) = (y-y0)

    Computing F''(0) is the same in principle, but requires a bit more care. Differentiating the first derivative again yields

    F''(t) = fxx(x(t),y(t))*(x'(t))2+ 2*fxy(x(t),y(t))*x'(t)*y'(t) fyy(x(t),y(t))(y'(t))2

    Note that there are no terms involving x''(t) or y''(t) because x(t) and y(t) are linear functions of t. Evaluting this at t = 0 gives us

    F''(0) = fxx(x0,y0)*(x'(0))2+ 2*fxy(x0,y0)*x'(0)*y'(0) fyy(x0,y0)(y'(0))2

    Finally, evaluating x'(0) and y'(0) as before gives us

    F''(0) = fxx(x0,y0)*(x-x0)2+ 2*fxy(x0,y0)*(x-x0)* (y-y0) fyy(x0,y0)(y-y0)2

    This takes care of everything but the remainder terms. Working them out is pretty much more of the same. The details are left as an exercise.

    In the meantime we state the results, and discuss how we shall use them.

    Taylor's Theorem In Two Variables: First Order Case

    For any function f(x,y) with continuous second order derivatives

    f(x,y)= f(x0,y0) + fx(x0,y0)*(x-x0) + fy(x0,y0)*(y-y0) + R1

    Where

    |R1| <= M*((x - x0)2 + (y - y0)2)

    and M is an upper bound on |fxx| , |fxy| and |fyy|

    along a line segment conecting (x0,y0) and (x,y)

    Taylor's Theorem In Two Variables: Second Order Case

    For any function f(x,y) with continuous third order derivatives

    f(x,y)= f(x0,y0) + fx(x0,y0)*(x-x0) + fy(x0,y0)*(y-y0) +

    fxx(x0,y0)*(x-x0)2 + 2*fxy(x0,y0)*(x-x0)* (y-y0)+ fyy(x0,y0)*(y-y0)2 +

    R2

    Where

    |R2| <= M*((x - x0)2 + (y - y0)2)3/2

    and M is an upper bound on |fxxx| , |fxxy|, |fxyy| and |fyyy|

    along a line segment conecting (x0,y0) and (x,y)

    You may be wondering what happened to the factors of 1/N! in the remainder term. They got eaten up -- partially -- by effects of the mutlidimensionality. Actually, one could do better. Though this is not so important for small N, it becomes useful for large N. For more details, click here to see the proof and derivation of the bound on the remainder.

    Best Linear and Quadratic Approximation

    Now we recall one of the basic truths of mathematics: The nicest functions are the linear functions After that, the nicest are the quadratic functions; i.e., the second degree polynomials.

    In two variables, a function h(x,y) is linear in case it has the form:

    h(x,y) = a*x + b*y + c

    for constants a, b and c.

    Notice that

    h(x,y) = a*(x - x0) + b*(y - y0) + c

    is also linear becuase the terms can be regrouped into the specified form.

    Linear functions are so much easier to work with than non-linear functions that we often want to appriximate non-linear functions with linear ones. And, as we will see soon when we study Newton's method in several variables, we can get answers of arbitrarily good accuracy this way.

    But to do this well, we genreraly need to choose the best linear approximation. This is given to us by Taylor's formula, just as it is in one dimension.

    Definition of Approximation of Functions

    Let h(x,y) and f(x,y) be two functions. Then if

    |f(x,y) - h(x,y)|

    gets closer and closer to 0, as (x,y) gets closer and closer to (x0,y0), then we say that h(x,y) approximates f(x,y) at the point (x0,y0).

    Now clearly any linear function k(x,y) of the form

    k(x,y) = dot((a,b),(x-x0,y-y0)) + f(x0,y0)

    approximates f(x,y) at the point (x0,y0). But there is something special about the case (a,b) = Gradf(x0,y0): This choice gives us the best linear approximation to f(x,y) at (x0,y0).

    To explain this, we have to say what it means for one approximation to be better than another.

    A formal definition will follow, but let's try to grasp the point with an example first.

    Consider the function

    f(x,y) = x2 + y2

    And let's compute the Taylor expansion at (1,1). This is:

    f(x,y) = 2 + 2*(x-1) + 2*(y-1) + R1 = 2*x + 2*y -2 R1

    The best linear approximation is therefore

    h(x,y) = 2 + 2*(x-1) + 2*(y-1)

    Noe let's consider another function

    k(x,y) = 2 + 3*(x-1) + 2*(y-1)

    Where we've changed one of the coefficients. Note that

    k(1,1) = h(1,1) = f(1,1) = 2

    so both k(x,y) and h(x,y) are approximations to f(x,y) at (1,1).

    However,

    |f(x,y)-h(x,y)| = |R1|

    while

    |f(x,y)-k(x,y)| = |(x-1) + R1|

    Now, which of these discrepencies is bigger?

    Well, for (x,y) close to (1,1) it is going to be the second one. Consider

    (x,y) = (1.000001,1) = (1 + 10-6,1) Then it is easy to see from the bound on the size of R1 in our theorem that

    |R1| < 10-11 but that

    |(x-1) + R1| is almost equal to 10-6

    Here is the point: The size of R1 is always a fixed multiple of the square of the distance from the base point. If we modify any of the linear coefficients in the linear function we get by truncating the Taylor expansion, we will get an error that is proportianal to the distance at some point close by. (Note that this happened for (x,y) = (1.000001,1) in our example, but wouldn't have for (x,y) = (1,1.000001).)

    Now when the distance is small, it is better to have an error going like the square of the distance, since the square of a small number is reall small -- as with 10-6 and 10-12.

    That is the idea. If you understand that, you don't really need the definition that follows.

    Definition of Better Approximation

    Let h(x,y) and k(x,y) be two functions that approximate f(x,y) at the point (x0,y0). Let R > 0 be a given radius, and let H(R) denote the "worst case" dissagreement between h(x,y) and f(x,y) for (x,y) in the disk or radius R centered at (x0,y0). That is, H(R) is the maximum value of |f(x,y) - h(x,y)| for (x,y) in the disk or radius R centered at (x0,y0). Let K(R) be the corresponding quantity for the function h(x,y).

    Then h(x,y) is a better approximation to f(x,y) at (x0,y0) than h(x,y) is provided there is an R0 > 0 such that

    H(R) < K(R) for all R < R0.

    Of course we say that a function is the best approximation if it is better than any other. We get the best approximations, in the sense defined above, by simply throwing out the reaminder terms in Taylor's theoremm:

    Theorem on the Best Linear Approximation

    The best linaer approximation to f(x,y) at the point (x0,y0) is

    h(x,y) = a*(x - x0) + b*(y - y0) + c

    with

    a = fx(x0,y0)

    b = fy(x0,y0)

    c = f<(x0,y0)

    The proof of this theroem is just a simple analysis of the error term in Taylor's Theorem. Similarly, we have

    Theorem on the Best Quadratic Approximation

    The best linaer approximation to f(x,y) at the point (x0,y0) is

    h(x,y) = A*(x - x0)2 + B*(y - y0)2 + C*(x - x0)*(y - y0) + a*(x - x0) + b*(y - y0) + c

    with

    2*A = fxx(x0,y0)

    2*B = fyy(x0,y0)

    C = fxy(x0,y0)

    and a, b and c are as above.

    In the next section of the notes we will study the geometry of best linear approximations to f(x,y) -- and see the connection with tangent planes to the graph of f(x,y).

    WORKED PROBLEMS

    Worked Problem 1

    Let

    f(x,y) = x2 - y2 - 2*x -2*y

    Notice that this function is quadratic, so you should already have a pretty good idea about what the best linear and quadratic approximations are.

  • (a) Find the best linear approximation at (0,0)
  • (b) Find the best quadratic approximation at (0,0)
  • (c) Find the best linear approximation at (1,-1)
  • (d) Find the best quadratic approximation at (1,-1)
  • Solution to Worked Problem 1

    One easily computes that

    f(0,0) = 0 fx (0,0) = -2 fy (0,0) = -2 fxx (0,0) = 2 fyy (0,0) = -2 fxy (0,0) = 0

    Therefore, the best linear approximation h(x,y)> is

    h(x,y) = -2*x -2*y

    and the best quadratic approximation q(x,y)> is

    q(x,y) = x2 - y2 - 2*x -2*y

    which is just f(x,y) as you should have expected.

    For the next point, (1,-1), One easily computes that

    f(1,-1) = 0 fx (1,-1)) = -2 fy (1,-1) = -2 fxx (1,-1) = 2 fyy (1,-1) = -2 fxy (1,-1) = 0

    Therefore, the best linear approximation at (1,-1), h(x,y), is

    h(x,y) = -2*(x-1) -2*(y+1) = -2*x - 2*y

    and the best quadratic approximation at (1,-1), q(x,y)>, is

    q(x,y) = (x-1)2 - (y+1)2

    which is another way of writing f(x,y).

    Worked Problem 2

    Let

    f(x,y) = x3 +2*x*y + y2 - 1

    Notice that this function is not quadratic, but perhaps you already have a pretty good idea about what the best linear and quadratic approximations are.

  • (a) Find the best linear approximation at (0,0)
  • (b) Find the best quadratic approximation at (0,0)
  • (c) Find the best linear approximation at (1,-1)
  • (d) Find the best quadratic approximation at (1,-1)
  • Solution to Worked Problem 2

    One easily computes that

    f(0,0) = -1 fx (0,0) = 0 fy (0,0) = 0 fxx (0,0) = 0 fyy (0,0) = 2 fxy (0,0) = 2

    Therefore, the best linear approximation h(x,y)> is

    h(x,y) = -1

    and the best quadratic approximation q(x,y)> is

    q(x,y) = y2 + 2*x*y - 1

    which is just what you should have expected.

    For the next point, (1,-1), One easily computes that

    f(1,-1) = -1 fx (1,-1)) = 1 fy (1,-1) = 0 fxx (1,-1) = 6 fyy (1,-1) = 2 fxy (1,-1) = 2

    Therefore, the best linear approximation at (1,-1), h(x,y)>, is

    h(x,y) = (x-1) -1 = x -2

    and the best quadratic approximation at (1,-1), q(x,y)>, is

    q(x,y) = 3*(x-1)2 + (y+1)2 + (x-1)*(y+1) + (x-1) -1

    Is this what you expected?

    POSED PROBLEMS

    Posed Problem 1

    Let

    f(x,y) = x4*y +y4*x - 2*x*y + 3

    Notice that this function is not quadratic, but perhaps you already have a pretty good idea about what the best linear and quadratic approximations are.

  • (a) Find the best linear approximation at (0,1)
  • (b) Find the best quadratic approximation at (0,1)
  • (c) Find the best linear approximation at (1,1)
  • (d) Find the best quadratic approximation at (1,1)
  • (e) The function f(x,y) has a symmetry property; namely that for all x and x, f(y,x) = f(x,y). This is invariance under reflection about the line y=x: The value of the function is the same at a point and its reflection. Do the linear and quadratic approximations you found above have this symmetry? Can you explain this?
  • (f) Could you have solved parts (a) through (d) just using algebra?
  • Posed Problem 1

    Let

    f(x,y) = ex2 + y2

    Notice that this function is not a polynomial, so it is harder to find the approximations by algebra.

  • (a) Find the best linear approximation at (0,1)
  • (b) Find the best quadratic approximation at (0,1)
  • (c) Find the best linear approximation at (2,1)
  • (d) Find the best quadratic approximation at (2,1)