Index
Home
About
Blog
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Mon, 22 Dec 2003 23:55:53 GMT
Message-ID: <20031222.185553.916@pkmfgvm4.POK.IBM.COM>
In article <20031221182639.26181.00001280@mb-m15.aol.com>,
on 21 Dec 2003 23:26:39 GMT,
rmyers1400@aol.com (Rmyers1400) writes:
<snip>
>But it all begins with knowing there is a potential problem, and interval
>arithmetic is the only routine way I know to identify a potential problem.
This is so wrong.
First interval arithmetic is not a routine way to identify
problems. It is almost never used for this and for good reasons.
Second there are routine ways to identify problems.
These include
1. Running test cases for which the answer is known.
2. Running test cases with slightly perturbed input data.
3. Running test cases in higher precision.
James B. Shearer
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Tue, 23 Dec 2003 03:40:01 GMT
Message-ID: <20031222.224001.346@pkmfgvm4.POK.IBM.COM>
In article <hs2fuvk4ebigdpbrnmi2ec7cb0pnhaiehg@4ax.com>,
on Mon, 22 Dec 2003 19:38:34 -0500,
Robert Myers <rmyers@rustuck.com> writes:
>On Mon, 22 Dec 2003 23:55:53 GMT, jbs@watson.ibm.com wrote:
>
>>In article <20031221182639.26181.00001280@mb-m15.aol.com>,
>> on 21 Dec 2003 23:26:39 GMT,
>> rmyers1400@aol.com (Rmyers1400) writes:
>>
>> <snip>
>>
>>>But it all begins with knowing there is a potential problem, and interval
>>>arithmetic is the only routine way I know to identify a potential problem.
>>
>> This is so wrong.
>> First interval arithmetic is not a routine way to identify
>>problems. It is almost never used for this and for good reasons.
>> Second there are routine ways to identify problems.
>> These include
>> 1. Running test cases for which the answer is known.
>> 2. Running test cases with slightly perturbed input data.
>> 3. Running test cases in higher precision.
>
>Perhaps I should have said it is the only automatic way I know of to
>identify problems. The way that most people due numerical analysis
>corresponds to running a compiler with all the warning flags turned
>off. For Terje, that might be a safe practice. For most people, it
>wouldn't.
>
>Who knows what kinds of problems you are used to doing and under what
>kind of pressure. What we're talking about here are huge applications
>of critical importance often being run by people with little or no
>insight into the numerics and often under considerable time pressure.
>No time to do sensitivity analysis, no time to take the code apart, no
>time to do anything but put the code on the computer and run it.
Interval arithmetic is useless here. You do not have time
to run the problem with interval arithmetic which is certainly not
currently automatic. Not that it matters much since you know it
will report error bounds of (-Inf,Inf) regardless of the actual
errors.
>If you don't think that engineering models are used that way, you
>didn't pay attention to the most recent incident with the shuttle.
>That was a small model, but some of the models are not small.
>
>You've got thousands, hundreds of thousands of lines of code, you're
>running it under circumstances you've not encountered before, and
>you're not sitting in ibm's watson labs.
I don't use tests which will always report the code is
broken even when it isn't.
The lesson of the shuttle is you will get nowhere telling
management there might be a problem. You best be able to prove there
is a problem. If you run the code on a test problem where the answer
is known and the code gets the wrong answer you have a chance of
convincing management that the code might have a problem. If on the
other hand the code gets the right answer on all the test cases but
interval arithmetic says the error bounds are (-Inf,Inf) management
is unlikely to do anything.
James B. Shearer
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Wed, 24 Dec 2003 01:32:58 GMT
Message-ID: <20031223.203258.441@pkmfgvm4.POK.IBM.COM>
In article <6N2Gb.94$n13.100704@news.uswest.net>,
on Tue, 23 Dec 2003 16:51:37 -0500,
jonah thomas <j2thomas@cavtel.net> writes:
>jbs@watson.ibm.com wrote:
>
>> I believe interval arithmetic will almost always give bounds
>> of (-inf,inf) for large real codes even when those codes are completely
>> correct.
>
>A quick lit search shows this opinion repeated several times, though
>never by people who are actively doing interval arithmetic.
>
>That's completely plausible. People who believe that will quit doing
>interval arithmetic if they ever did. Meanwhile some people keep
>publishing about it, and they talk like they think it works. Are all
>those people merely fooling themselves?
Interval arithmetic is not an new idea. It has been around
for many decades. It is the sort of idea which initially seems
plausible and promising but which just doesn't work. I believe it
is now the consensus among experts that it is unlikely to ever work.
I suspect recent publications tend to be on very narrow applications.
I don't think anyone has any plausible roadmap for making interval
arithmetic generally useful. There are claims that if interval
arithmetic is made easier to use a lot more research will be done on
it and that this research will magically solve all of the problems
that are holding interval arithmetic back. I believe these claims
are wishful thinking with no rational basis.
>You mentioned gaussian elimination with pivoting. Do you have other
>examples that gave too-wide bounds for you? Of course the people who
>tried it and gave up won't have the most experience, but their
>experience should still count more than people who just repeat urban
>legends.
How about some examples of nontrivial real world algorithms
for which interval arithmetic gives useful bounds (preferably with
modest effort)? I think after decades of failure the burden of
proof is now on the interval arithmetic cultists.
James B. Shearer
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Tue, 23 Dec 2003 03:57:25 GMT
Message-ID: <20031222.225725.478@pkmfgvm4.POK.IBM.COM>
In article <o04fuv0ogbceiu9h6dmidvl0cfn3o1q7cg@4ax.com>,
on Mon, 22 Dec 2003 20:00:34 -0500,
Robert Myers <rmyers@rustuck.com> writes:
>On Mon, 22 Dec 2003 23:51:55 GMT, jbs@watson.ibm.com wrote:
>
>>In article <20031221082254.26181.00001229@mb-m15.aol.com>,
>> on 21 Dec 2003 13:22:54 GMT,
>> rmyers1400@aol.com (Rmyers1400) writes:
>>>>Subject: Re: Sun researchers: Computers do bad math ;)
>>>>From: zghuo@ncic.ac.cn (Andrew Huo)
>>>>Date: 12/21/2003 7:39 AM Eastern Standard Time
>>>>Message-id: <bs449a$hd6$1@news.yaako.com>
>>>>
>>>>Rounding is a fundamental problem for any radix.
>>>>Could we use 256, even 512 bit floating-point unit?
>>>>
>>>Properly-implemented interval arithmetic would provide absolute bounds on the
>>>rounding error in a calculation, and it would tell you if you needed to use a
>>>greater degree of precision, whether the needed precision was supported in
>>>hardware or not. That's a much bigger win than just implementing wider
>>>floating point units.
>>
>> Interval arithmetic on real problems will generally give an
>>error bound of (-Inf,Inf) which tells you absolutely nothing.
>
>That has happened with code you run, and it doesn't trouble you? You
>made no effort to find out where the error bound got out of control
>and why?
>
>>Wider
>>floating point is far more useful which is why it is far more widely
>>available.
>>
>
>And what happens to the error bound when you widen the precision?
>Still [-inf,+inf], then you still have a problem.
This is not how you use wider precision. If the double
precision (64 bit) answer agrees with the extended precision (128
bit) answer to 12 decimal places it is likely (but not certain)
that the double precision answer is correct to 12 decimal places.
If on the other hand the double precision answer and the
extended precision answer agree to 0 decimal places then you know
you have a problem.
>What Tom Womack said, I will agree with: the law of large numbers
>generally lets you get alot more than what the worst case analysis
>would predict, but you are counting on luck.
No, you are not counting on luck. Interval arithmetic
assumes errors are not correlated so in the worst case they will
always reinforce. But in many real algorithms the errors are
correlated in such a way that they will tend to cancel out as
the calculation proceeds. There is no luck involved.
>You can run sensitivity analyses if you have the time, but you still
>have to pull the code apart and find out where the poorly conditioned
>problem is.
Not if there is no poorly conditioned problem.
>The compiler warning is a fairly good analogy: with interval
>arithmetic you would wind up doing some work juggling arithmetic to
>get more reasonable error bounds without really changing the
>reliability of the calculation. That is to say, you will respond to
>alot of false alarms. Just as with compiler warnings, it's the case
>you fix that isn't a false alarm that counts.
Modifying code to eliminate compiler warnings is usually
easy even when there is no real problem.
Modifying code to get rid of false interval arithmetic
alarms is generally extremely difficult.
>Back to the case I mentioned: running an engineering model under
>pressure. Even if there is no time to fix the problem, wouldn't you
>rather know that there might be a problem, and that the prediction
>can't necessarily be counted on (conservative) than just to hope for
>the best? That wouldn't have happened in the shuttle orbiter
>incident, but if it had been such a model, and you decided that you
>couldn't rely on it, the story might have turned out differently.
>Same story with the dead Americans in Saudi Arabia.
Almost any real code might have problems. Interval
arithmetic isn't telling you anything you don't already know.
>Hope everything is safe and comfy at the labs.
You got something against my employer?
James B. Shearer
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Tue, 23 Dec 2003 21:06:44 GMT
Message-ID: <20031223.160644.742@pkmfgvm4.POK.IBM.COM>
In article <AEZFb.23$s04.29029@news.uswest.net>,
on Tue, 23 Dec 2003 11:01:08 -0500,
jonah thomas <j2thomas@cavtel.net> writes:
>Nick Maclaren wrote:
>> jbs@watson.ibm.com writes:
>> |> Robert Myers <rmyers@rustuck.com> writes:
>
>> |> >What Tom Womack said, I will agree with: the law of large numbers
>> |> >generally lets you get alot more than what the worst case analysis
>> |> >would predict, but you are counting on luck.
>
>The question here is, what are the odds? If you can rely on the odds of
>an error to be less than the odds of an undetected hardware error, then
>what's the harm? If you have a good way to estimate the odds and
>they're well enough in your favor then you're set. But you're counting
>ignorantly on luck when you hope you luck out and you can't guess the odds.
>
>> |> No, you are not counting on luck. Interval arithmetic
>> |> assumes errors are not correlated so in the worst case they will
>> |> always reinforce. But in many real algorithms the errors are
>> |> correlated in such a way that they will tend to cancel out as
>> |> the calculation proceeds. There is no luck involved.
>
>If you can prove that your errors are correlated that way then in that
>particular case interval arithmetic is clearly inapropriate.
This is provably the case for Gaussian elimination with
pivoting and I believe other common algorithms. There may also be
cases where it is believed to be the case but this has not been
proven. Note people were using Gaussian elimination to accurately
solve large linear systems before it was understood why it works
and when some smart people (including I believe VonNeuman) believed
based on an interval arithmetic type analysis that it could not
achieve accurate results.
<snip>
>How about this technique
>
> 5. The thousand monkeys approach -- Use Mathematica to state the
>problem in a variety of different ways. Give it to several independent
>NA-challenged programmers who code different solutions. Run them all.
>Since the big errors are not common, majority-rule will eliminate most
>of them -- provided we can assume the programmers are not correlated.
Unfortunately studies have shown programmers are correlated.
However checking against independent implementations that are
supposed to be doing the same thing could certainly help. I don't
know how cost effective it is however.
James B. Shearer
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Wed, 24 Dec 2003 20:46:06 GMT
Message-ID: <20031224.154606.998@pkmfgvm4.POK.IBM.COM>
In article <7_iGb.51$D35.10525@news.uswest.net>,
on Wed, 24 Dec 2003 11:17:47 -0500,
Jonah Thomas <j2thomas@cavtel.net> writes:
<snip>
>A good numerical analyst can spend a good long time and get solid error
>estimates for a Fortran routine that's, say, 200 lines long.
>
>It isn't uncommon to have Fortran projects a million lines long. Is
>there any possibility of sanity in this?
Sure as long as it is easy to check the results.
Consider weather forecasting codes. I don't know if they are
a million lines long but they are certainly much more than a couple of
hundred lines long and I doubt they have rigorous error estimates.
Nevertheless they provide useful results.
On the other hand if the results cannot be easily checked
(climate forecasting codes for example) then there is every reason
to be skeptical about the validity of the results. However this
isn't really due to the use of floating point, I would have no more
confidence in the validity of a million line purely integer code
whose results cannot be easily checked.
Problems due to finite precision arithmetic are hardly the
sole or even dominant source of error in large floating point
computations.
James B. Shearer
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Tue, 23 Dec 2003 20:41:29 GMT
Message-ID: <20031223.154129.849@pkmfgvm4.POK.IBM.COM>
In article <qg1guvs63eh1g7ounrpf51thv28b85e6e3@4ax.com>,
on Tue, 23 Dec 2003 04:35:23 -0500,
Robert Myers <rmyers@rustuck.com> writes:
>On Tue, 23 Dec 2003 03:57:25 GMT, jbs@watson.ibm.com wrote:
>
>>In article <o04fuv0ogbceiu9h6dmidvl0cfn3o1q7cg@4ax.com>,
>> on Mon, 22 Dec 2003 20:00:34 -0500,
>> Robert Myers <rmyers@rustuck.com> writes:
>
><snip>
>
>>
>>>What Tom Womack said, I will agree with: the law of large numbers
>>>generally lets you get alot more than what the worst case analysis
>>>would predict, but you are counting on luck.
>>
>> No, you are not counting on luck. Interval arithmetic
>>assumes errors are not correlated so in the worst case they will
>>always reinforce. But in many real algorithms the errors are
>>correlated in such a way that they will tend to cancel out as
>>the calculation proceeds. There is no luck involved.
>>
>If I only isolated this statement as a measure of your competence and
>judgment, I would conclude that I didn't want you to do any numerical
>analysis for me, because, as you started out responding to my comments
>about interval arithmetic, "it is so wrong."
>
>I will assume that, just like me, you occasionally turn a statement in
>which you have strong personal belief into a statement in which
>everyone should have a strong personal belief and overstate it.
>
>You are _counting_ on the errors not correlating. You don't know that
>they don't. The fact that they very often don't, and with
>overwhelming likelihood, doesn't mean that you have any certainty that
>they don't. You are counting on luck.
In the common case of Gaussian elimination with pivoting
to solve linear equations it is known that interval arithmetic
gives overly pessimistic bounds for the reason I stated. There is
no luck involved, you can decide which way to round each individual
floating point operation and you still will achieve reasonable
accuracy although the interval arithmetic error bounds may be
(-inf,inf).
<snip>
>> Almost any real code might have problems. Interval
>>arithmetic isn't telling you anything you don't already know.
>>
>It would have in Saudi Arabia.
I believe this is wrong. The problem with the Patriot
code was that the spec said the system only had to work for x
hours after being initialized. I believe the problem in question
could not occur before x hours and was not detected in testing for
that reason. It is hard to see how interval arithmetic would have
helped except in the sense that a car alarm that is always on will
in fact be on when your car is stolen.
James B. Shearer
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Wed, 24 Dec 2003 01:05:03 GMT
Message-ID: <20031223.200503.319@pkmfgvm4.POK.IBM.COM>
In article <7bdhuvsgvh5ratpl0h1176ktiab48nfrt7@4ax.com>,
on Tue, 23 Dec 2003 16:48:37 -0500,
Robert Myers <rmyers@rustuck.com> writes:
>On Tue, 23 Dec 2003 20:41:29 GMT, jbs@watson.ibm.com wrote:
>
>>In article <qg1guvs63eh1g7ounrpf51thv28b85e6e3@4ax.com>,
>> on Tue, 23 Dec 2003 04:35:23 -0500,
>> Robert Myers <rmyers@rustuck.com> writes:
>>>On Tue, 23 Dec 2003 03:57:25 GMT, jbs@watson.ibm.com wrote:
>>>
>
><snip>
>>
>> In the common case of Gaussian elimination with pivoting
>>to solve linear equations it is known that interval arithmetic
>>gives overly pessimistic bounds for the reason I stated. There is
>>no luck involved, you can decide which way to round each individual
>>floating point operation and you still will achieve reasonable
>>accuracy although the interval arithmetic error bounds may be
>>(-inf,inf).
>>
>If Gaussian eliminiation with pivoting were the world of numerical
>analysis, that might make your claims defensible. It isn't the world
>of numerical analysis, and I can present examples where you use the
>same inaccurate results over and over again in such a way that the
>errors are highly correlated.
So what. Of course there are examples of bad algorithms.
Of course interval arithmetic warns about them. The problem is
interval arithmetic also warns about most good algorithms as well.
How about presenting examples of nontrivial numerical methods that
interval arithmetic verifies as problem free?
<snip>
>>>> Almost any real code might have problems. Interval
>>>>arithmetic isn't telling you anything you don't already know.
>>>>
>>>It would have in Saudi Arabia.
>>
>> I believe this is wrong. The problem with the Patriot
>>code was that the spec said the system only had to work for x
>>hours after being initialized. I believe the problem in question
>>could not occur before x hours and was not detected in testing for
>>that reason. It is hard to see how interval arithmetic would have
>>helped except in the sense that a car alarm that is always on will
>>in fact be on when your car is stolen.
>>
>So there you are in the middle of the desert, and the system informs
>you: sorry, results no longer reliable. Too much roundoff error. If
>somebody had to think that the roundoff error of repeatedly adding the
>same clock tick over and over again and if they were in the habit of
>using interval arithmetic, they could just as well have provided such
>a warning in the design as simply to have concluded that it wasn't a
>problem because the system would never run so long. Who _wouldn't_
>get into that habit if they used interval arithmetic, they would
>periodically check if the results were reasonable? As it is, it isn't
>currently part of anyone's thinking except in the crudest way.
>
>So the system tells you that it is no longer reliable. If no more
>sophisticated recovery mechanism is available, you have to shut down
>under live fire and resynchronize. You might just as well, because
>the system isn't providing any protection at that point.
The system does not tell you any such thing. It tells you
interval arithmetic is no longer providing useful error bounds. It
will probably tell you this within a few seconds of the time it is
turned on. This does mean the system is not providing protection and
should be shut down especially just at the point it is needed.
Furthermore the error was not caused by repeatedly adding a
clock tick with accumulating error. This would not have been a problem
because only time differences matter and any drift will mostly cancel.
The problem appears to have been caused by the fact that the constant
..1 used by the code to convert from tenths of a second (as an integer)
to seconds (as a floating point number) appeared (in effect) in the code
as two different values, as a 24 bit approximation and as a 48 bit
approximation. If either had been used consistently there would have
been no problem. However as it was in places the code was computing
t1-t2 where t1 had been converted with one value and t2 with the other
value. This introduces an error which grows as the integer clock
counter counted up from 0 and was large enough after 100 hours to cause
problems.
You appear somewhat insulated from the realities of actual
real world software. The code in question was a modified 20 year old
assembly program. Revising such a code to use interval arithmetic is
simply not practical.
James B. Shearer
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Wed, 24 Dec 2003 19:31:50 GMT
Message-ID: <20031224.143150.565@pkmfgvm4.POK.IBM.COM>
In article <748iuv07nreahg8an9dj42r9rmu61gnfsh@4ax.com>,
on Wed, 24 Dec 2003 00:51:37 -0500,
Robert Myers <rmyers@rustuck.com> writes:
<snip>
>That doesn't mean that we should accept what you apparently regard as
>inevitable realities. DARPA, the funding agency for the work in
>question, is supposed to function like a venture capitalist: many
>failures, a few spectacular successes. The OP said that, in effect,
>the work being funded sounded like a waste of money. I disagreed,
>saying that we should be funding more of this kind of work.
>
>If the Sun work is a dead end, it won't get continued funding. That's
>how it works. No need for you and me to speculate about it. The
>project doesn't fit into a Gantt chart where a result, good or bad,
>must be delivered on schedule and things have to move forward, whether
>it makes any sense to or not.
There is plenty of evidence that interval arithmetic is a dead
end (at least as regards being generally applicable). It has been
around for decades, has received plenty of funding and never lived up
to early expectations.
Somewhat like tokamak hot fusion another apparent dead end
which continues to receive funding for no good reason.
James B. Shearer
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Tue, 23 Dec 2003 20:30:31 GMT
Message-ID: <20031223.153031.086@pkmfgvm4.POK.IBM.COM>
In article <ZQTFb.16147$Pg1.13401@newsread1.news.pas.earthlink.net>,
on Tue, 23 Dec 2003 09:24:09 GMT,
"Norbert Juffa" <juffa@earthlink.net> writes:
><jbs@watson.ibm.com> wrote in message news:20031222.185155.362@pkmfgvm4.POK.IBM.COM...
>> In article <20031221082254.26181.00001229@mb-m15.aol.com>,
>> on 21 Dec 2003 13:22:54 GMT,
>> rmyers1400@aol.com (Rmyers1400) writes:
>[...]
>> >Properly-implemented interval arithmetic would provide absolute bounds on the
>> >rounding error in a calculation, and it would tell you if you needed to use a
>> >greater degree of precision, whether the needed precision was supported in
>> >hardware or not. That's a much bigger win than just implementing wider
>> >floating point units.
>>
>> Interval arithmetic on real problems will generally give an
>> error bound of (-Inf,Inf) which tells you absolutely nothing. Wider
>> floating point is far more useful which is why it is far more widely
>> available.
>> James B. Shearer
>
>
>I don't think this is strictly true. Certainly a naive approach to interval
>arithmetic won't do much good. But improved versions have been somewhat
>successful as I recall. I remember that about 10 years ago, IBM sold software
>called ARITH-XSC. I think it was based mostly on the work of Prof. Kulisch,
>then at the University of Karlsruhe, Germany. One basic idea was to combine
>interval arithmetic with an accurate dot-product. The other idea was to use
>variable precision arithmetic in conjunction with interval arithmetic.
>
>Certainly the ARITH-XSC software wasn't without flaws and a quick check on
>IBM's website seems to indicate that they are no longer selling the product.
>Kahan et al published a critical paper on the predecessor software ARITH in
>the mid 1980s or thereabouts.
Meyers talked about an "automatic" method. Any automatic
application of interval arithmetic will necessarily be naive. More
sophisticated approaches generally involve substantially altering
the code to allow interval arithmetic to give useful error bounds.
But this is very difficult and is seldom the best way to analyze
a code.
James B. Shearer
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Wed, 24 Dec 2003 19:21:33 GMT
Message-ID: <20031224.142133.759@pkmfgvm4.POK.IBM.COM>
In article <81f0f84e.0312232117.5aa8d054@posting.google.com>,
on 23 Dec 2003 21:17:26 -0800,
bhurt@spnz.org (Brian Hurt) writes:
>"John Plank" <john plank@mail.com> wrote in message news:<brqv0d$q4i$1@hercules.btinternet.com>...
>> http://www.infoworld.com/article/03/12/17/HNbadmath1.html
>>
>
>I haven't read the 100+ messages in this thread yet. But two comments
>I can make-
>
>1) I know someone who worked on the Patriot. They helped debug that
>problem with the Patriot- basically, the Patriot leaks memory, and had
>run out of memory and thus couldn't autoengage. The only thing the
>Patriot cares about is relative times- being confused about what time
>it is by a third of a second wouldn't hurt anything. In fact, the
>person I know commented that the test Patriot they worked on generally
>thought it was sometime in early 1970, as they'd never bother to set
>the time. Being confused as to what decade it was didn't hurt.
>
>2) Who keeps time in floating point anyways? You keep time in fixed
>point- seconds or fractions of a second from some epoch.
For an account of what actually happened see
<a href="http://www.siam.org/siamnews/general/patriot.htm">
Roundoff Error and the Patriot Missile
</a>
The GAO report is also available online. See
<a href="http://www.fas.org/spp/starwars/gao/im92026.htm>
GAO report
</a>
Ironically the problem had already been discovered and a
fix was in the pipeline but it arrived a day late.
James B. Shearer
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Fri, 26 Dec 2003 22:39:07 GMT
Message-ID: <20031226.173907.317@pkmfgvm4.POK.IBM.COM>
In article <0e2kuv4s66ph8q29p3qp22cr55843dvng4@4ax.com>,
on Wed, 24 Dec 2003 17:11:47 -0500,
Robert Myers <rmyers@rustuck.com> writes:
>On Wed, 24 Dec 2003 20:46:06 GMT, jbs@watson.ibm.com wrote:
>
><snip>
>
>> Problems due to finite precision arithmetic are hardly the
>>sole or even dominant source of error in large floating point
>>computations.
>>
>
>That's true right now, but as you move to petaflop machines, the ops
>count gets scary. A fraction of an hour past, and you've run up
>enough operations to swamp a 64-bit word with bit noise.
>
>Easiest solution (and probably what will happen): move to 128 bits.
Any move to 128 bits will be gradual as some codes will
need it more than others. Since 128 bits is hardly used at the
moment widespread adoption is probably not imminent.
Note also that operation count is a pretty poor predictor
of whether a code will have numerical problems.
>In the meantime, just to satisfy my curiosity, I'll find a reasonable
>implementation of interval arithmetic and run it on a gasdynamics
>code. I predict that it will produce results less pessimistic than
>the ops count argument I just gave. With double precision arithmetic,
>I did just manage to get the dynamic range I needed out of that code,
>one part per million.
That would be interesting. Perhaps as a control you could
also convert it to quad (128 bit) precision and compare the effort
required and the information obtained.
James B. Shearer
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Fri, 26 Dec 2003 23:30:01 GMT
Message-ID: <20031226.183001.221@pkmfgvm4.POK.IBM.COM>
In article <3fec0a6f.379626007@news.eircom.net>,
on Fri, 26 Dec 2003 10:22:10 GMT,
wallacethinmintr@eircom.net (Russell Wallace) writes:
<snip>
>That sounds like a good idea to me. What sort of problems would such a
>program have to look for in typical real-life code? What would be good
>ways of finding and solving (or suggesting solutions for) a reasonable
>percentage of these problems?
>
>Put another way: There's a standard list of basic "How to optimize
>code" things that everyone knows a compiler should do (and how to make
>a compiler do them).
>
>Is there a similar list of "How to improve the accuracy of
>floating-point calculations" things that could be used as a starting
>point for automation?
Some fortran compilers have an option (perhaps autodbl) which
attempts to automatically double the precision of a floating point
calculation. For example from 64 bits to 128 bits. For large codes
this is likely to require some manual intervention to get working. It
will find a reasonable percentage of problems due to the finite
precision of floating point arithmetic.
James B. Shearer
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Mon, 29 Dec 2003 20:10:20 GMT
Message-ID: <20031229.151020.528@pkmfgvm4.POK.IBM.COM>
In article <fsvuuvsqul5al1jk2b2am85stc3ac9k29d@4ax.com>,
on Sun, 28 Dec 2003 20:39:44 -0500,
Robert Myers <rmyers@rustuck.com> writes:
>On Fri, 26 Dec 2003 22:39:07 GMT, jbs@watson.ibm.com wrote:
>
>>In article <0e2kuv4s66ph8q29p3qp22cr55843dvng4@4ax.com>,
>> on Wed, 24 Dec 2003 17:11:47 -0500,
<snip>
>> Robert Myers <rmyers@rustuck.com> writes:
>>>In the meantime, just to satisfy my curiosity, I'll find a reasonable
>>>implementation of interval arithmetic and run it on a gasdynamics
>>>code. I predict that it will produce results less pessimistic than
>>>the ops count argument I just gave. With double precision arithmetic,
>>>I did just manage to get the dynamic range I needed out of that code,
>>>one part per million.
>>
>> That would be interesting. Perhaps as a control you could
>>also convert it to quad (128 bit) precision and compare the effort
>>required and the information obtained.
>
>Oh, I know how that comparison will come out. That's why I'm sure
>that people will simply move to wider floating point as a solution.
Well if you know wider floating point is easier to use and
gives more useful information than interval arithmetic why have you
been pushing interval arithmetic on this newsgroup.
<snip>
>I'm a practical person, like most everybody participating in this
>newsgroup, but I don't see any point in spending money on huge
>calculations if you have no idea, and maybe don't even care, how
>accurate they are.
The government wastes a lot of money. Is this news to you?
James B. Shearer
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Mon, 29 Dec 2003 20:20:40 GMT
Message-ID: <20031229.152040.072@pkmfgvm4.POK.IBM.COM>
In article <3ff035f0.227987032@news.eircom.net>,
on Mon, 29 Dec 2003 14:12:38 GMT,
wallacethinmintr@eircom.net (Russell Wallace) writes:
>On Fri, 26 Dec 2003 23:30:01 GMT, jbs@watson.ibm.com wrote:
>
>> Some fortran compilers have an option (perhaps autodbl) which
>>attempts to automatically double the precision of a floating point
>>calculation. For example from 64 bits to 128 bits. For large codes
>>this is likely to require some manual intervention to get working. It
>>will find a reasonable percentage of problems due to the finite
>>precision of floating point arithmetic.
>
>That sounds like a great idea. There are times when you'd want to say
>"I want to try this in 128 bits, I'm willing to take the speed hit";
>and as far as labor costs go, it's about the cheapest testing method
>you can get. What manual intervention does it require? Do you mean
>things like cases where the code makes assumptions about how many
>bytes an array of numbers will take?
That is one issue. There are others. For example:
1. Explicit constants. For example 3.14159265358979324D0. It is
pretty clear to a human what should be done with this but even a human
might not immediately recognize 1.144729885849400187D0 as ln(pi).
Expecting a compiler to automatically handle this sort of thing
correctly is not realistic.
2. Convergence criteria. A convergence test which is appropriate
for double precision may have to be modified for quad precision.
3. Special functions. The compiler can replace a call to the
double precision exponential function to a call to a quad precision
exponential routine (if it exists which it may not for some of the
more obscure functions). But if the special function was implemented
by the user then the compiler is helpless. Just changing the
variables in the user's implementation to quad precision is unlikely
to magically give the function quad precision accuracy.
4. Really big codes are likely to have bugs which are harmless
in one environment but will show up if you start tweaking things
by trying for example to double the precision (or using a different
compiler etc.).
James B. Shearer
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Wed, 31 Dec 2003 02:06:15 GMT
Message-ID: <20031230.210615.189@pkmfgvm4.POK.IBM.COM>
In article <Fx3Ib.145$%A5.57291@news.uswest.net>,
on Mon, 29 Dec 2003 19:21:34 -0500,
jonah thomas <j2thomas@cavtel.net> writes:
>jbs@watson.ibm.com wrote:
>> wallacethinmintr@eircom.net (Russell Wallace) writes:
>>> jbs@watson.ibm.com wrote:
>
>>>> Some fortran compilers have an option (perhaps autodbl) which
>>>>attempts to automatically double the precision of a floating point
>>>>calculation. For example from 64 bits to 128 bits. For large codes
>>>>this is likely to require some manual intervention to get working. It
>>>>will find a reasonable percentage of problems due to the finite
>>>>precision of floating point arithmetic.
>
>>>That sounds like a great idea. There are times when you'd want to say
>>>"I want to try this in 128 bits, I'm willing to take the speed hit";
>>>and as far as labor costs go, it's about the cheapest testing method
>>>you can get. What manual intervention does it require? Do you mean
>>>things like cases where the code makes assumptions about how many
>>>bytes an array of numbers will take?
>
>> That is one issue. There are others. For example:
>> 1. Explicit constants. For example 3.14159265358979324D0. It is
>> pretty clear to a human what should be done with this but even a human
>> might not immediately recognize 1.144729885849400187D0 as ln(pi).
>> Expecting a compiler to automatically handle this sort of thing
>> correctly is not realistic.
>
>If you use the *same* constants but calculate to twice the precision,
>you should get the same results but with different error. That's what
>we're looking for. If we get almost the same result at different
>precision then we can hope that it isn't coincidence due to two
>different sets of roundoff errors. But if you need to do everything
>just right to get a correct result at twice the precision, how does it
>help to do it the short way first?
You are correct that you can just leave most of the constants
as is. This will not be as informative as really converting the
program to quad precision but will still catch many problems.
>You could get almost the same result by doing it at *half* the precision
>instead of double the precision. If the results are in shouting
>distance assume they're likely both right. Doing double precision
>instead of half precision only gets you a better hope that the results
>will come out close.
64 bit will often be ok while 32 bit fails so comparing
against single precision will give a lot of false alarms. Also note
IEEE 32 bit has a very limited exponent range which will cause some
programs to fail which would otherwise be ok.
<snip>
>> 4. Really big codes are likely to have bugs which are harmless
>> in one environment but will show up if you start tweaking things
>> by trying for example to double the precision (or using a different
>> compiler etc.).
>
>Great! Fix the code. There's no reason to depend on bugs to be
>harmless, fix them and they'll be harmless in both environments.
Of course this is good practice but this is still in some
sense a false alarm if the bug was harmless in the original
environment. In any case some manual effort will be required to
figure out what is going wrong. An annoying possibility is that
the bug is in the compiler, autodbl is probably not the most
thoroughly tested part of the compiler.
>Since nobody has claimed that doubling the precision will find all
>errors anyway, what we get is you can easily double the precision and
>find some fraction of errors. You can go to more trouble as you
>suggested and likely pick up more of them, but still not all.
>
>I haven't yet even seen a reasonable estimate of what fraction of the
>errors you can pick up by repeating the calculations in naive higher
>precision, or in sophisticated higher precision. Say it turns out that
>the naive approach will find 20% of errors, and the high-labor
>complicated double-precision approach will find 25% of errors. Then it
>wouldn't be worth it to do it the hard way.
Well I will make a SWAG that the naive approach will find
90%+ of the errors (due to use of finite precision floating point)
while the sophisticated approach will pick up 99%+. However I doubt
this sort of error represents even 10% of serious errors in large
double precision floating point codes.
James B. Shearer
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Wed, 31 Dec 2003 20:50:22 GMT
Message-ID: <20031231.155022.874@pkmfgvm4.POK.IBM.COM>
In article <jCrIb.491$nC.77694@news.uswest.net>,
on Tue, 30 Dec 2003 22:44:56 -0500,
jonah thomas <j2thomas@cavtel.net> writes:
>jbs@watson.ibm.com wrote:
<snip>
>> Well I will make a SWAG that the naive approach will find
>> 90%+ of the errors (due to use of finite precision floating point)
>> while the sophisticated approach will pick up 99%+. However I doubt
>> this sort of error represents even 10% of serious errors in large
>> double precision floating point codes.
>
>No easy method will find all the conceptual errors in a model. If you
>can find 90% of the numerical-analysis errors with little effort, that's
>a very good start. If you then look at what happens in the code to
>produce those errors, that might give clues to some of the conceptual
>errors too. It at least gives everyone involved a good reason to look
>at some of the details again.
>
>The effort in converting to doubled precision will be less if everyone
>starts out with the idea they'll be doing just that. It's always easier
>to design it in than to retrofit it. And if your guess is right, that
>would be enough to find 99%+ of this sort of error. That's pretty good.
>
>How could we get a good estimate as opposed to a SWAG?
A good estimate is very hard as much of the needed data
is proprietary (or classified or otherwise hard to obtain). And of
course it is hard to say too much about undiscovered bugs.
You could try collecting and classifying bugs which become
public for one reason or another and hope these are somewhat
representative.
Fly by wire systems have been mentioned. I believe a number
of fly by wire bugs have become public. Perhaps someone could compile
a list. I don't recall any due to finite precision. I think Airbus
has had some user interface issues and there was a story (urban
legend?) about a military plane which had problems flying over the
equator.
James B. Shearer
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Sat, 3 Jan 2004 01:02:13 GMT
Message-ID: <20040102.200213.006@pkmfgvm4.POK.IBM.COM>
In article <7a3cvvcto5c3g56577l5q96aj4p5ntjfvf@4ax.com>,
on Fri, 02 Jan 2004 19:35:11 -0500,
Robert Myers <rmyers@rustuck.com> writes:
>On Fri, 2 Jan 2004 16:50:56 -0600, "Del Cecchi"
><cecchinospam@us.ibm.com> wrote:
<snip>
>>If I were to apply Interval Arithmetic and Fancy Numerical Analysis to my
>>Spice run, would it tell me that I could expect what I see, ie reasonable
>>accuracy? Or would it predict very large errors?
>>
>
>If you go on a fishing expedition in such a large model, the odds are
>you will find something going on that you didn't know about and that
>you at least think is interesting. Do you have time for fishing
>expeditions?
This is not answering the question.
Interval arithmetic alone will probably give very pessimistic
bounds. Fancy numerical analysis might do better. However in general
providing rigorous tight error estimates is very difficult and usually
not worth the trouble as less rigorous methods will work satisfactorily
in practice.
James B. Shearer
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Sat, 3 Jan 2004 23:31:05 GMT
Message-ID: <20040103.183105.047@pkmfgvm4.POK.IBM.COM>
In article <MPG.1a5ffcdbb351f50398972f@news.verizon.net>,
on Sat, 03 Jan 2004 06:11:39 GMT,
Christer Ericson <christer_ericson@NOTplayTHISstationBIT.sony.com> writes:
>In article <20040102.190048.482@pkmfgvm4.POK.IBM.COM>,
<snip>
>First, note that these quotes do _not_ address the claim I was
>objecting to: that of IA giving "(-inf,inf)" results on "real"
>problems. There's a _huge_ difference between returning
>pessimistic bounds and (-inf,inf).
The implication of the quotes I gave is that the bounds are so
pessimistic as to be useless. There is little difference between that
and (-inf,inf).
What constitutes an useless bound? It is a bound that
guarantees no significant digits. Soon after this point is reached
you will try to divide by an interval containing zero which will
produce an interval of (-inf,inf) which will then propagate through the
rest of the calculation.
Look at GEPP. Suppose interval arithmetic gives bounds which
grow like 2**n where n is the number of pivot steps. So using IEEE
double arithmetic around the 53rd pivot you will try to pivot on an
interval containing zero and the calculation will blow up. So if your
matrix size is 30 interval arithmetic will predict you will lose 30
bits or 9 decimal digits, very pessimistic but still useful if you
started with 53 bits (or about 16 decimal digits). But the results for
size 60 or more will a bunch of (-inf,inf) intervals.
For very small problems like solving a set of 30 linear
equations the exponentially growing intervals may not have time to grow
large enough to wipe out all your precision. But for realistic large
computations this will occur, followed shortly thereafter by a bunch of
(-inf,inf) intervals.
James B. Shearer
Bill Gorton: "How did you go bankrupt?"
Mike Campbell: "Gradually, and then suddenly."
Hemingway("The Sun Also Rises")
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: Sun researchers: Computers do bad math ;)
Date: Wed, 7 Jan 2004 00:59:57 GMT
Message-ID: <20040106.195957.639@pkmfgvm4.POK.IBM.COM>
In article <MPG.1a63d8af118efcd1989735@news.verizon.net>,
on Tue, 06 Jan 2004 06:36:58 GMT,
Christer Ericson <christer_ericson@NOTplayTHISstationBIT.sony.com> writes:
>In article <3ff96351$1@news.wineasy.se>, mhkristiansen@yahoo.dk says...
>> [...]
>> You replaced a stable expression with an unstable one to reduce the
>> "error" ?
>
>In this case, the "unstable" expression results in a tighter
>interval for most values, since it does not suffer from the
>overestimation caused by the dependency problem of the
>original expression. As such, it does indeed reduce the
>error.
Let x = 2**(-60) and use IEEE 64-bit arithmetic.
x = 2**(-60)
(x+1) = 1
1/(x+1) = 1
x/(x+1) = 2**(-60)
1-1/(x+1) = 0
So compared to the exact answer (2**(-60)-2**(-120) ... )
x/(x+1) computes a much more accurate answer. Now try intervals
x = ( 2**(-60) , 2**(-60) )
(x+1) = ( 1 , 1+2**(-52) )
1/(x+1) = ( 1-2**(-52) , 1 )
x/(x+1) = ( 2**(-60)-2**(-112) , 2**(-60) )
1-1/(x+1) = ( 0 , 2**(-52) )
So the interval computed by x/(x+1) is much tighter.
So your example is wrong as I said. Furthermore it is pointless,
as more complicated dependencies cannot be eliminated in this way.
>Here's a simpler example that you might be able to understand:
>using interval arithmetic a number should be squared using
>f(x)=x^2, not f(x)=x*x. The former gives a tight bound, the
>latter an overestimating bound.
As Maclaren pointed out it doesn't matter unless your
interval contains 0 in which case you are probably doomed anyway.
And again this sort of thing is not going magically make (-inf,inf)
intervals go away.
Also as others have pointed out, the compiler should be able
to do this if it is desirable.
>Perhaps you should actually study IA some before commenting
>further?
Perhaps IA advocates should study some numerical analysis.
The consensus among NA experts is that IA has little to offer.
James B. Shearer
Index
Home
About
Blog