Subject: Re: IA64 integer performance - was Re: The Forrest Curve (annual
Date: Thu, 28 Nov 2002 02:11:22 GMT
In article <firstname.lastname@example.org>,
on 20 Nov 2002 19:07:27 -0800,
email@example.com (Iain McClatchie) writes:
>Michael> how many more GP registers (pointers) and FP
>Michael> registers (data) do you need to properly pipeline FP calculations ?
>This is a very interesting question!
>Michael> Let's assume a machine has four parallel FPUs. So we need
>Michael> registers for 20 parallel calculations.
>Okay. Now imagine that you're blocking a matrix multiply. For each inner
>block, you'll need to do a 4x5 matrix multiply in the register file.
>That's 20 registers for the output, 20 for each input, for 60. So far,
>less than 64.
This is not how to do it. You should do 4xn by nx5 matrix
multiplies. You need 20 registers to accumulate the inner products,
5 for the current row (in the nx5 matrix) and 1 for the current
column element (in the 4xn matrix). This will fit in 32 registers.
James B. Shearer
PS: Repeat of post which my news server appears to have lost.