From: mash@mash.engr.sgi.com (John R. Mashey) Newsgroups: sci.math.num-analysis,comp.arch Subject: Re: [++] Why 64-bit games? (Was: Re: Speed, and its need ...) Date: 4 Jun 1996 16:53:17 GMT In article <JAN.96Jun4091421@cora.neuroinformatik.ruhr-uni-bochum.de>, jan@neuroinformatik.ruhr-uni-bochum.de (Jan Vorbrueggen) writes: |> In article <4p0cd1$es6@murrow.corp.sgi.com> mash@mash.engr.sgi.com (John |> R. Mashey) writes: |> |> If one took a MIPS-like design and added a carry bit, you'd get 3 |> instructions: add-high-pieces, add-low-pieces, |> add-fromcarry-to-high-result |> |> No, you get two instruction: add-lower-parts, add-higher-parts-with-carry - |> i.e., you have the with and without variants of the integer add/substract |> instructions instead of a seperate add-carry. No, you get 3 instructions: that's why I said a MIPS-like design, which uses 2-input adders; while there may be other useful reasons for doing 3-input adders, most designers wouldn't do this just to save 1 clock for multiprecision adds... Note, of course, that out-of-order-register-renamed designs may get even more complexified by the existence of things like carry bits or other status flags. For instance, this is relatively straightforward on MIPS, since each integer operation creates 1 64-bit result (with exception of integer mul/div, which of course caused a painful special case), which is renamed onto a set of physical registers 2X larger than the set of logical registers. Now, one could widen the result register to contain carry flags, condition codes and such; however, you have probably introduced yet another set of dependency checks required to be done by the instruction scheduler. Remember: in aggressive modern implementations, something like a carry bit is *not* just one simple bit of state in a register somewhere; there may well be a copy of it with every result, and there may be some extensive logic to check its setting and use, if you don't want it to become a resource that ends up serializing most instructions... -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@sgi.com DDD: 415-933-3090 FAX: 415-967-8496 USPS: Silicon Graphics 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311

From: mash@mash.engr.sgi.com (John R. Mashey) Newsgroups: comp.arch Subject: Re: [++] Why 64-bit games? (Was: Re: Speed, and its need ...) Date: 5 Jun 1996 04:36:38 GMT In article <zalman-0406961727100001@198.95.245.190>, zalman@macromedia.com (Zalman Stern) writes: |> I'm not sure what John is disagreeing with me about... Well, I wasn't disagreeing much :-), but I was disagreeing with the the thought that MIPS was bad for multiprecision arithmetic because it lacked a carry bit ... i.e., while one might do better with a carry bit, the sequence is not very long to do it without, and it did offer 32x32->64 ... in practice, with many email conversations on this topic over 10 years, nobody gave me convincing evidence that saving a couple cycles here made very much difference in real codes; I actually got more requests for 64/32->32... -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@sgi.com DDD: 415-933-3090 FAX: 415-967-8496 USPS: Silicon Graphics 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311

From: mash@mash.engr.sgi.com (John R. Mashey) Newsgroups: comp.arch,comp.lang.misc,comp.lang.smalltalk Subject: Re: The Architect's Trap (was Object orientation without GC is nonsense) Date: 28 Jul 1997 07:44:41 GMT In article <5rcfpp$ed9@lyra.csx.cam.ac.uk>, nmm1@cus.cam.ac.uk (Nick Maclaren) writes: |> Yes, except for one thing. I have great difficulty in believing that |> carry bit support is any harder than the support of the near-universal |> constructions like 'a(n+m)'. In both cases, you need the result of one |> operation as input to the next. If there IS a problem with carry bits, |> then it obviously has to be with the implementation of the individual |> operations and not the pipelining and multi-issue. No, carry-bit support casues more problems. |> In particular, it is immediately obvious that implementing a carry bit for a |> N-bit addition or subtraction can be no harder than implementing a (N+1)-bit |> addition or subtraction, This sounds plausible, but turns out not to be the case. This is an example of what is the most common misapprehension that I encounter, i.e., analyzing implementations while THINKing of really simple, non-pipelined designs (like the early 8-bit CPUs). Many features that work fine in such designs are famous for causing great pain in more aggressive designs; current CPUs do *not* work that way. - pipelined - pipelined superscalar - pipelined, speculative out-of-order. The fundamental problem, for the bulk of RISC architectures, is that a particular property helps simplify implementations, and every time that property is violated, additional complexity results. The desired property: each operation produces at most one result, and all results are the same size, and there is a regular mechanism for dealing appropriately with data hazards in the pipeline. In a simple pipelined design, each result is written back to its target register, and the result is normally made available via a bypass network, in case it is immediately needed by the next instruction. If you take a typical current RISC and include "add-with-carry", for instance (and this is certainly doable), the irregularity definitely adds complexity, andthe carry bit can become a bottleneck in more aggressive implementations, as operations that either set it or test it become serialized on it , and also require extra hardware (comparators & muxes) in an awkward place. (See H&P: look up bypassing & condition codes in the index). As the CPU becomes superscalar, the extra logic for checking for dependencies gets worse: implementors especially hate irregularities. As it becomes speculative & out-of order, a typical design would require not just a register-rename unit, but a "carry-bit-rename-unit", as there is no such thing as "The carry bit", but rather an independent set of carry bits, with appropriate rename logic to select the correct carry bit set by the logically most recent instruction that sets the bit. hence, it actually can be *much* more complex to implement N-bit-add + carry bit, compared to N+1-bit add... And finally, in some (most, actually) RISC architectures, it is fairly easy to do multiprecision add, for example: Assume you have data in registers to do double-wide a = b + c, with inputs in bhigh,blow, chigh, clow, etc. Suppose you have addc: 3-input adder (oops, that costs a little more also), that also sets the carry bit (and I do the unsigned case): addc zero,zero,zero # clear carry bit addc alow, blow,clow addc ahigh,bhigh,chigh Suppose you don't have addc. MIPS code would look like: addu alow, blow, clow sltu tmp, alow, clow (set tmp = 1 if alow < clow, else 0) addu ahigh, bhigh, chigh * addu ahigh, ahigh, tmp On an out-of-order chip, the 2 sequences might well take the same time, because the 3 addcs are serialized through the carry bit, whereas the add marked * can execute in parallel with one of the previous instructions, assuming there are 2 ALUs. But even if it's a few more cycles, it's not like it's huge. Multiply is harder, and divide is harder yet, but especially the latter is dominated by the time to do the division itself. Anyway, aggressive RISC CPUs: carry bits make much less sense than they used to. Of course, as Nick suggests, it is good to have standard code for this available, but I-caches work fine for code dominated by multiprecision integer arithmetic. -- -john mashey DISCLAIMER: <generic disclaimer: I speak for me only...> EMAIL: mash@sgi.com DDD: 415-933-3090 FAX: 415-932-3090 [415=>650 August!] USPS: Silicon Graphics/Cray Research 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94043-1389

Index Home About Blog