NEWTON'S METHOD IN TWO VARIABLES

Newton's method is an approach to solving the folowing problem:

The Problem To Be Solved

Given a system of two equations

(1) f(x,y) = 0

and

(2) g(x,y) = 0

find all points (x,y) that satisfy these equations.

The method takes certain input -- namely a good educated guess for about where each solution should be.

Usually, one arrives at this guess by graphical methods. Let's take that part for granted here, and suppose we already have a point (x0,y0) which approximately solves the system (i.e., makes the right hand sides of (1) and (2) both "small". How do we get from here to a better solution; i.e. one which works to some more decimal places? This is the problem toward which Newton's method is aimed.

The key idea is this: To get a good approximate solution, we replace the exact problem above with the approximate problem

h(x,y) = 0

and

k(x,y) = 0

where h(x,y) and k(x,y) are the best linear approximations to f(x,y) and g(x,y) at the point (x0,y0).

In geometric terms, this that means h(x,y) and k(x,y) are the linear functions such that

z= h(x,y)

and

z = k(x,y)

are the equations for the tangent planes to the graphs of f and g respectively at the point (x0,y0)

Linear eqautions are easy to deal with, and the plan is to use their solution to get a better approximate solution. The hope then is that repeating this often enough, we can get as good an approximation as we require.

Now you will notice that this description of Newton's method does not contain any formulas really. This is an important point. I want to make a strong distinction here between a method and an algorithm.

There are however many ways to turn Newton's method into an algorithm. Each has its own advantages and disadvantages, so which one is the best one depends on the application. A typical good book on numerical analysis will contain five or six implementations of the method as an algortihm that can be programed on a computer.

Here we will present two implementations of Newton's method, and then we will study their relative strengths.

First Implementation of Newton's Method as an Algortihm

The most straightforward approach is the following algorithm for producing a sequence of sucessive approximations starting from the first approximation (x0,y0):

Newton's Method: Two-At-A-Time Approach

For each positive integer j define (xj,yj) to be the solution of

h(x,y) = 0

and

k(x,y) = 0

where h(x,y) and k(x,y) are the best linear approximations to f(x,y) and g(x,y) at the point (xj-1,yj-1).

In the worked problems below we will generate the first two terms in the approximating sequence by hand. We will then generate more with the aid of a computer program.

Second Implementation of Newton's Method as an Algortihm

Here is another, simpler, algorithm for producing a sequence of sucessive approximations starting from the first approximation (x0,y0): In this simpler implementation, you work with one function at a time. Here are the steps.

Start with, say, f. We want to find a good approximate solution of f(x,y) = 0. Think of f(x,y) as representing the "altitude" at the point (x,y), and suppose for the sake of the argument that this is positive -- i.e., we're above sea level. To get down to sea level as quickly as possible, you'd go down hill by the steepest slope. (Well, maybe not, that might be a cliff -- but on paper we're not troubled by that sort of detail...) So just go down hill against the gradient. That is, the steepest way down hill is opposite to the direction the gradient points, so go in this direction. Let n denote the unit vector that points in this direction.

Now we not only need to know which way to go, but how far to go in that direction. Let d denote this distance, measured "as the crow flies".

So the question that we need to answer is: what is n, and what is d?

We can easily express the answers in terms of the gradient; here's how:

First d. Well, our altitude is f(x0,y0), and we need to bring this to zero. The slope in our chosen direction is

|gradf(x0,y0)|

so the distance d we should go (slope is rise over run) is

d = f(x0,y0)/ |gradf(x0,y0)|

The unit vector n in the direction we will go is

n = -gradf(x0,y0)/ |gradf(x0,y0)|

Hence the new point should be

(x1,y1) = (x0,y0) + d n

or

(1) (x1,y1) = (x0,y0) - f(x0,y0)grad f(x0,y0)/ |gradf(x0,y0)|2

This is the key formula in this approach. We derived it assuming that f(x0,y0) was positive, so that we needed to go down hill, but it works in the other case too: if f(x0,y0) is negative, then

-f(x0,y0)grad f(x0,y0)/ |gradf(x0,y0)|2

is a positvie multiple of the gradient, so the displacement dn will be in the direction of the gradient;,i.e., the uphill direction as it should be. So the key formula (1) works no matter what the sign of f(x0,y0) is.

Now, we would have f(x1,y1) = 0 exactly if f were linear. In a region where f doesn't deviate much from being linear, i.e., where its graph is close to the tangent plane at (x0,y0), then this will be correspondingly close to being satisfied. Hence, we've got what we want for f: a point (x1,y1) at which f(x1,y1) is pretty close to zero.

So now that we've got the value of f in good shape, what about g?

Just take the new point as a fresh starting point, and apply the same procedure for g.

Then go back to f, and so on , alternating.

We summarize as follows:

Newton's Method: One-At-A-Time Approach

For each positive integer j define (xj,yj) to be given by

(xj,yj) = (xj-1,yj-1) - f(xj-1,yj-1)grad f(xj-1,yj-1)/ |gradf(xj-1,yj-1)|2

for j odd, and by and

(xj,yj) = (xj-1,yj-1) - g(xj-1,yj-1)grad g(xj-1,yj-1)/ |gradg(xj-1,yj-1)|2

for j even.

In the worked problems below. we will generate the first three terms in the approximating sequence by hand.

But first, let's see why it works. A picture helps. In the following graph, the green curve is part of the solution set of the cubic eqaution

y-x3 +2*x +1 = 0

and the blue curve is part of the circle

x2 + y2 - 3 = 0

The blue and green curves cross near to

(1.46,-1.04)

which means there is a solution near there. The graph is a close-up on this region. The dots in the picture are the succesive points generated by the "one-at-a-time" implementation of Newton's method, starting from the point off to the right of the curves. The next point, after this starting guess is the one you get to by going directly toward the green curve. The next is the one you get going from there directly toward the blue curve. And so on.

Notice how the trail of dots "bounces back and forth" in the wedge, closing in on the intersection point as it goes.

How fast the sequence converges to the crossing point -- the solution -- depends on what the angle of crossing is.

What is better -- an angle close to 0, or close to pi/2? Think about it!

Comparison of the Two Implementations

Now, how do the two methods compare?

One thing is for sure: if you are doing things "by hand", each step of the "one at a time" method is easoer to do.

But that is not such a good answer -- your'e not likely to do to many of these by hand. In fact, you want to do just enough to get a good feeling for what is going on.

To compare them in the context of a computer program, having an interactive computer program with the two implementations is a big help. But with a computer program, we can experiment painlessly!

Here is a link to an applet that will let you experiment. It is the applet used to generate this pictur up above.

The applet interface looks like this:

What you see are the two equations graphed with the functions given in the boxes below the graph. From the graph, we see there are four solutions in this example.

A radio button changes the panel on the right and lets you select the imlementation that you want to use.

At the bottom are two text fields in which you to specify the two functions to be set equal to zero. The solution curves will be graphed in the colors indicated in the squares beside the functions.

If you select the "graph range controls" radio button, you will be able to specify the region that you want graphed.

If you then select the "step button" radio button, you will bring up a panel with a -- you guessed it -- step button. You can then click on your choice of a starting point, and then press this step button to step through the sequence of points generated by whichever implementation of Newton's method you are using.

Clicking the "algorithm" radio button brings up a menu of implementations. There are four of them to choose from, including the two discussed in these notes. Ignore the third for now. The fourth is a compbination of the first two. It starts out doing the one at a time implementation, and switches to the two at a time implementation at a carefully chosen stage in the process. See the lab notes for more information on this and source code for doing your own implementation.

Try it out. You will see that the solution point you finally converge to -- when you do converge -- is not always the one that is closest to where you started.

In fact, the question of which point one finally converges to has a suprisingly interesting and complicated answer -- which depends on the implementation.

Here are some computer generated pictures that show what can happen.

These are gennerated by another applet, that you can also experiment with. Here is what is going on in them.

The applet takes the point corresponding to each pixel as a starting point. It runs the selected implementation of Newton's method until |g(x,y)| + |f(x,y)| is less than 10-9, or 80 steps have been taken. After 80 steps, it gives up then no matter what the size of |g(x,y)| + |f(x,y)| is.

If convergence didn't happen by 80 steps, it colors the pixel white, and goes on the the next one. If however |g(x,y)| + |f(x,y)| got below 10-9, it takes the point at which the iteration was stopped -- a solution point -- and compares it with a list of previously found solutions. If the point is in the list already, it was assigned a color (the first one to get listed gets red as its color, the second gets green, the third blue... and so on). If it is found on the list, the pixel gets the previosly assigned color. If not, it is added to the list and assigned a color. The pixel gets that color.

This goes on, until all the pixels are covered in a row by row left to right sweep. At the end, one has a picture like those below.

The one on the left is for the one function at a time approach. (White comes out as gray here because of the way I produced this picture...) The next one is for the two at a time approach. I'll say what the third one is a bit later. But first notice that:

  • Of the four solutions, in the first picture, the two on the left have large "basins of attraction" i.e., sets of starting points that converge to them, as seen in the large red and green regions. The red region has a stable part -- solid red -- and an unstable part that is full of white points where there was no convergence. The other two have almost non-existent basins of attraction. (You may or may not be able to see the yellow spike of convergence points coming out of the lower right solution. It shows up on my mac, but I don't see it on my sparc station... it is very thin.)
  • The two solutions curves cross at a "big" angle -- close to pi/2 at the solutions with large basins of attraction, and cross at "small" angles at the other two. This answers a question raised above about whether small angle or large angle crossings are better. Can you see why this is the answer? Obviously one should be very careful in small angle situations. Here's a harder question: how small should the angle be before things go so wrong as here?
  • In the second picture, the one with the two function method, the basins of attraction are very, very complicated. Apart from the region in the upper left hand corner, there is something interesting to notice at, say, the place off to the right where four patches of green and red meet. Look closely at the bounday. You'll see that on the boundary, there are lost of blue and yellow dots to. So something very complicated goes on when you move across the boundary from the red basin to the green basin. If you use the applet to investigate what goes on here by zooming in and looking at the structure on a finer scale, you will see something suprising.
  • The final picture is a third implementation that starts off using the one at a time appraoch, and then when it rekons it has gotten close to a solution, it switches over to the two at a time. Notice that the white "region of no convergencein 80 steps" is almost eliminated -- so this has the speed advantages of the two function method -- but that the basins of attraction are "simple and fat", which is what one wants in a basin of attraction. From this picture you see that the set of starting points such that the iteration converged quickly to the solution point that was closest to the starting point has been made much larger than with the other two.
  • WORKED PROBLEMS

    Worked Problem 1

    Let

    f(x,y) = y - x3 + 2*x =1

    and

    f(x,y) = x2 + y2 -1

  • (a) Graph the two solution curves and determine the number of solutions.
  • (b) Take (1,1) as a starting point, and work out two iterations of the two at a time implementation of Newton's method.
  • (c) Take (1,1) as a starting point, and work out two iterations (i.e, one for f and one for g) of the one at a time implementation of Newton's method.
  • Solution to Worked Problem 1

    (a) Below is a graph showing the two curves -- blue for g(x,y) = 0 and green for f(x,y) = 0. As you see, there are two solutions. The center of the circle is at the origin, and the red dot -- our starting point -- is at (1,1). If you wnat more coordinate values, go to the applet, and bring up the cursor position panel... The graph suggests that the two solutions shouls be near (0,1) and (1,0). And in fact, these are solutions! So it is not necesary to approximate them. On the other hand, right now we are trying to learn about the approximation itself, and then it is good to know the exact answer that you are approximating. That's why this problems was cooked up to be solvable in closed form.

    (b) We begine by computing the values of the functions and their gradients at the starting point:

    gradf(x,y) = (-3*x2 + 2,1)

    gradg(x,y) = (2*x,2*y)

    Therefore

    f(1,1) = 1, gradf(1,1) = (-1,1) and g(1,1) = 1, gradg(1,1) = (2,2)

    and hence, the linearization of f(x,y) at (1,1) is:

    1 - (x-1) + (y-1) = -x + y + 1

    and the linearization of g(x,y) at (1,1) is:

    1 +2*(x-1) + 2*(y-1) = 2*x + 2*y -3

    We now approximate by setting the linearizations -- instead of the functions themselves -- equal to zero. We have

    -x + y = -1

    2*x + 2*y = 3

    Adding twice the first equations to the second yields the solution

    y = 1/4 and x = 5/4

    This is one complete iteration. At the new point (5/4, 1/4) we readily compute that

    f(5/4, 1/4) = -13/64 and g(5/4, 1/4) = 5/8

    Which is an improvement -- both values got closer to zero.

    We can check this with the applet. Pushing the "step" button in the applet yields:

    The numerical results agree exactly with our "by hand" calculations. (So you see you can use the applet to check the homework!)

    On to the next iteration!!!

    We've already computed the values of the functions at the new point, so we next need the values of the gradients. We compute:

    f(5/4,1/4) = -13/64, gradf(5/4,1/4) = (-43/16,1) and g(5/4,1/4) = 5/8, gradg(5/4,1/4) = (5/2,1/2)

    and hence, the linearization of f(x,y) at (1,1) is:

    -13/64 - (43/64)(x-5/4) + (y-1/4)

    and the linearization of g(x,y) at (1,1) is:

    5/8 + (5/2)*(x-5/4) + (1/2)*(y-1/4)

    We now approximate by setting the linearizations -- instead of the functions themselves -- equal to zero. After multiplying through to get rid of the denomonators, and simplifying, we get:

    -86*x + 32*y = -93

    20*x + 4*y = 21

    Solving this, one has

    x = 261/246 and y = 17/572

    Now you notice two things: (x,y) is close to (1,0) which we know is the answer, but the fractions are getting messy! It is time to let a computer take over!

    Here is the third itteration:

    and the fourth

    and the fifth:

    After five iterations, we have four decimal places of accuracy. But notice that there was a really big improvement from the fourth to the fifth. What happens next?

    Try the applet and find out!!

    (c) We will start with f. As before

    f(1,1) = 1, gradf(1,1) = (-1,1)

    So the new point we move to (x1,y1) is

    (x1,y1) = (1,1) - (1/(1+1))*(-1,1) = (1,1) - (-1/2,1/2) = (3/2,1/2)

    That's it, the first iteration. At this point (3/2,1/2), we have

    f(3/2,1/2) = -7/8 and g(3/2,1/2) = 3/2

    So we overshot a bit, and made g worse. But then we were working only with f. So now lets work with g. We have:

    g(3/2,1/2) = 3/2, gradf(3/2,1/2) = (3,1)

    So the new point we move to (x2,y2) is

    (x2,y2) = (3/2,1/2) - ((3/2)/(9+1))*(3,1) = (3/2,1/2) - (-9/20,3/20) = (21/20,7/20)

    At this point (21/20,7/20), we have

    f(21/20,7/20) = 2339/8000 and g(21/20,7/20) = 9/40

    That is, we've reduced the size of both f and g by a factor of four in just two steps. Here are pictures generated by the applet. First, the starting point:

    Next, after the first step (Noyice that the numerical values agree exactly with our "by-hand" calculations):

    Next, after the second step (Noyice that again the numerical values agree exactly with our "by-hand" calculations):

    Next, after the third step:

    And finally, after eight (count 'em!) steps:

    So it is taking quite a while to get two decimal places of accuracy here. But each step is much simpler...

    POSED PROBLEMS

    Posed Problem 1

    For this problem, consider the general third degree polynomial in two variables. This has 10 coefficients

    a, b, c, d, e, f, g, h, i, j

    as in

    f(x,y) = a*x3 + b*x2*y + c*x*y2 + d*y3 + e*x2 + f*x*y + g*y2 + h*x + i*y + j Assign values to the coefficients as follows: Toss a coin. If it comes up heads, put

    a=1

    Otherwise, put

    a=0

    Do the same for b, c, d, e, f, g, h, i, j except with j. Here use -1 instead of 1 for heads. (Why the minus sign for j? This is in case you toss, say,

    T T T T H T H T T H

    With these rules the eqaution f(x,y) =0 becomes

    f(x,y) = x2 + y2 -1 = 0

    which has solutions, as opposed to

    f(x,y) = x2 + y2 +1 = 0

    which doesn't have solutions.)

    Now that you have the eqaution f(x,y) = 0, use the same rule to randomly specify g(x,y) = 0.

    Then

  • (a) Use the applet -- or any other graphing aid if you need one -- to graph the two solution curves, and determine the number of solutions of the system.
  • (b) Pick one solution, and pick a point (x0,y0) with integer coefficients that is close to one of the solutions -- if any -- showing in your graph. Do one iteration of the two-function implementation of Newton's method for this starting point.
  • (c) For the same (x0,y0), do two steps of the "one at a time" implementation.
  • (d) Check your answers to (b) and (c) using the applet. Then find all solutions of the system to 6 decimal places of accuracy.
  • Repeat the procedure for a new pair, again, and again,...

    There are 220 different problems that can be generated this way. That is more than one million. So you have an effectively infinite store of practice problems here! Try several. You be suprised perhaps by the variety of graphs you see.