From: firstname.lastname@example.org (John R. Mashey)
Subject: Instructions Per Clock (IPC), SPECint/Mhz .. not so useful
Date: Thu, 2 Sep 1993 00:01:27 GMT
(I've gotten a couple questions about this, some stirred up by
recent Sun claims to be "the industry leader in work per clock cycle".)
1) Once upon a time, there were a lot of arguments about CPU efficiency,
often expressed as IPC or Instructions Per Clock, with bigger = better.
The reasoning was:
a) On any given date, vendors might have different clock rates,
and performance is affected by clock rates.
b) Over time, clock rates would tend to improve for everybody at about
the same rate, simply because people tended to get the newest process
technology at about the same times, and clock rates were often driven
by access times to affordable SRAMs for off-chip L1 caches.
c) Hence, to predict how people would do in the future, without getting
misled by the back-and-forth leapfrogging of clock rates, if you could
compute an IPC measure, you could get an idea of the relative efficiencies
of different architecture+implementation+software combinations.
d) Some architectures need more instructions to accomplish the same work,
so it is hard to compare between different instruction set architectures.
e) In any case, usually only computer vendors had enough tools to know
how many instructions were actually being done.
f) Hence, it was better to measure the time for an actual benchmark,
and then use (relative performance)/Mhz. In particular, for integer
code, it turned out that
was a good approximation to the actual IPC numbers (for MIPS R3000s,
anyway), and of course, this number was something anybody could compute.
Inside a given instruction set, IPCs are fairly interesting for computer
designers. For competitive arguing, they are relevant IF-AND-ONLY-IF
vendors tend to achieve similar clock rates over time.
In 1987, this was pretty much true: most RISCs were around 16Mhz. 68020s
(@ 25Mhz) had slighly higher clock rates (1.5X difference, but that's it).
2) CPU designers have a wide range of choices. Consider starting with
a base design for a CPU that ends up with N gate delays in the critical
path, and a clock rate of M Mhz. One can think of two opposite variations:
2a) Do more work in parallel per cycle, or shorten the pipeline, usually
ending up with >N gate delays, and <M Mhz.
2b) Lengthen the pipeline, or parts thereof, to get <N gate dealys, and >M MHz.
It is quite possible that with the same exactly technology, you can build 2
chips with the same performance, cost, etc ... but 2a) will have higher
IPC than 2b) ... which is pretty much IRRELEVANT, because the assumption
that design a) will scale up Mhz at the same rate as b) is simply wrong:
one design has more gate delays in the critical path, and that's why the
achieved Mhz are different.
Of course, there are many other design choices that are important,
but this philosophical choice is important. Note that really extreme
choices (do a near-infinite amount of work per near-infinitely long cycle,
or do a near-zero amount of work in a near-zero cycle) are probably not
good ideas :-)
3) Unlike 1987, these days, there are *vast* differences in clock rates
achieved with reasonably comparable technology; hence, IPC (SPECint/Mhz)
is rather less useful as predictor than it once once. Of the major chip
families, the design style for high-end chips:
2a) RS/6000 & PPC, Pentium, SuperSPARC
2b) HP PA, MIPS R4xxx, Alpha
Note that the clock rates range from 50Mhz (.7micron, 3-metal BiCMOS
SuperSPARC) to 200Mhz (.75micron, 3-metal CMOS Alphas, including some unusual
process tweaking, of course), with .8micron, 2-metal, vanilla CMOS
R4000s getting 100Mhz (or actually, many at 120Mhz, such as in some NEC
Anyway, it is absolutely clear that people these days *do not* get the
similarity of clock rates that used to exist, and hence simple IPC
(or SPECint/Mhz) computations are not very useful any more.
4) Finally, to make things even worse, it is an over-simplification to
think that all parts of a chip are running at the same clock rate. People
sometimes do things like having self-timed circuits to run small pieces
of the chip at multiples of the clock rate generally used. Given such
things, IPCs really become meaningless, because what exactly is the clock rate?
More often than not, one thinks of a cycle either as the time to do an
ALU operation (specifically, an integer add), or maybe, a cache access.
In general, the scalability of a design (which was what IPC was trying to get
at, i.e., what would be achieved by better processes) is *not* determined
by the number of gate delays in most paths, but by the number of gate delays
in the *slowest* path, and it only takes one. Put another way, if
you have a chip that, to run most parts at 50Mhz requires that some part
be run at 100Mhz, it is a 100Mhz chip [in terms of technology usage],
not a 50Mhz chip. In fact, this is what SuperSPARC does with its
"cascaded" ALU. That is, a 50Mhz (20ns) SuperSPARC is required to
run 2 back-to-back ADDs in 20ns, which means you need a 10-ns adder,
just like a 100Mhz R4000 or HP PA or Alpha.
(This came up in a discussion at Hot CHips 1991, I think, where
the SPARC presenter said they did 2 integer ADDs per cycle; somebody
pointed out that a lot of integer code has sequential dependencies
and that 2 ALUs don't do you so much good, and the presenter
said they'd fixed that by doing 2 ALUs back-to-back in a cycle
.. i.e, they have to do ALU ops twice as fast as the clock
Of course, I don't know exactly where the critical paths are, but it is
absolutely clear that there must be vast differences in gate delays,
when R4000s were getting 100-120Mhz from .8micron, 2-metal vanilla CMOS,
and SuperSPARCs were (barely) getting 40Mhz from .8micron, 3-metal BiCMOS.
Presumably SuperSPARC+ not only got better speeds from the shrink, but also
must have reduced some gate delays in critical paths to get 50Mhz.
Now, this is perfectly legitimate design strategy; proponents of the
various styles can argue, but that's a different argument.
1) In an era when vendor's high-end clock rates vary by a factor of 4X,
talking about IPCs (while ignoring persistent clock rate differences)
is pretty irrelevant, because the old assumption of similar-achieved Mhz
has broken down badly.
2) In any case, to get work/cycle superiority by calling a 50Mhz SuperSPARC
50Mhz, when it needs 10ns adders, seems *really* bogus. I'm sure the
engineers know better... If you consider a SuperSPARC to be really
100Mhz, then it's (SPECint/Mhz, for example) is about the same
(within fuzziness of the measurement) as R4400s and Alphas, all with
512KB-1MB caches. I.e., the SPECint/Mhz ratio is around .6-.65.
Systems with 1-level caches (R3000s, HP7100s), or with more silicon
(RS/6000) get .8-.9ish numbers.
3) I will happily retract this, if somebody can convincingly show that:
a) The apparent gate-delay (and thus Mhz) difference is a bug
that will get fixed soon).
b) One should soon expect SS clock rates to get close to
other people's Real Soon.
4) Again, one can legitimately argue for either of the two design styles,
but enlightening discussion requires *both* IPC and Mhz; discussing either
one alone is at best confusing.
-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc>
DDD: 415-390-3090 FAX: 415-967-8496
USPS: Silicon Graphics 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311