Result forwarding (Terje Mathisen)

Index Home About Blog
From: Terje Mathisen <terje.mathisen@hda.hydro.com>
Newsgroups: comp.arch
Subject: Re: 1teraflops cell processor possible?
Date: Tue, 02 Dec 2003 10:07:23 +0100
Message-ID: <bqhkpe$80c$1@osl016lin.hda.hydro.com>

Jan C. Vorbrüggen wrote:
>>Two posters have stated without proof that they *know* that a streaming
>>architecture won't beat a classical architecture on realistic code.
>
> From your description, "streaming" basically means re-using operands as
> much as possible, by passing the result(s) of one FU to the next FU with
> as little intermediate storage as possible. What the "two posters" told
> you was that current microprocessors already do that if the programmer
> can make it happen, and the energy cost of reading and writing registers
> and L1 is negligible compared to other costs.

Even stronger: Since the P6, current cpus have worked as well as they do
by mostly forwarding results directly from one operation to the next,
without even making the detour via the register bank, much less L1 or
any other level of cache.

I.e. for the last 8 years, our cpus have more or less effectively turned
regular programs into dataflow graphs.

The increasing imbalance betweeen computation and transportation costs
(not just memory access, AKA "The Memory Wall"), in time & energy, will
force this trend to continue.

It is _not_ as if nobody cares about RM's main beef.

Terje

--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
Index Home About Blog