Index
Home
About
Blog
From: jbs@watson.ibm.com
Newsgroups: comp.arch
Subject: Re: IA64 integer performance - was Re: The Forrest Curve (annual
Date: Thu, 28 Nov 2002 02:11:22 GMT
Message-ID: <20021127.211122.630@yktvmv.WATSON.IBM.COM>
In article <45022fc8.0211201907.72caccee@posting.google.com>,
on 20 Nov 2002 19:07:27 -0800,
iain-3@truecircuits.com (Iain McClatchie) writes:
>Michael> how many more GP registers (pointers) and FP
>Michael> registers (data) do you need to properly pipeline FP calculations ?
>
>This is a very interesting question!
>
>Michael> Let's assume a machine has four parallel FPUs. So we need
>Michael> registers for 20 parallel calculations.
>
>Okay. Now imagine that you're blocking a matrix multiply. For each inner
>block, you'll need to do a 4x5 matrix multiply in the register file.
>That's 20 registers for the output, 20 for each input, for 60. So far,
>less than 64.
This is not how to do it. You should do 4xn by nx5 matrix
multiplies. You need 20 registers to accumulate the inner products,
5 for the current row (in the nx5 matrix) and 1 for the current
column element (in the 4xn matrix). This will fit in 32 registers.
James B. Shearer
PS: Repeat of post which my news server appears to have lost.
Index
Home
About
Blog