Index Home About Blog
Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: [RFC][PATCH] Faster generic_fls
Original-Message-ID: <Pine.LNX.4.44.0304300709300.7157-100000@home.transmeta.com>
Date: Wed, 30 Apr 2003 14:12:06 GMT
Message-ID: <fa.m9e2e2r.17gk8on@ifi.uio.no>

On 30 Apr 2003, Falk Hueffner wrote:
>
> gcc 3.4 will have a __builtin_ctz function which can be used for this.
> It will emit special instructions on CPUs that support it (i386, Alpha
> EV67), and use a lookup table on others, which is very boring, but
> also faster.

Classic mistake. Lookup tables are only faster in benchmarks, they are
almost always slower in real life. You only need to miss in the cache
_once_ on the lookup to lose all the time you won on the previous one
hundred calls.

"Small and simple" is almost always better than the alternatives. I
suspect that's one reason why older versions of gcc often generate code
that actually runs faster than newer versions: the newer versions _look_
like they do a better job, but..

			Linus



Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: [RFC][PATCH] Faster generic_fls
Original-Message-ID: <Pine.LNX.4.44.0304300824190.7157-100000@home.transmeta.com>
Date: Wed, 30 Apr 2003 15:29:45 GMT
Message-ID: <fa.m8emeim.16go88q@ifi.uio.no>

On 30 Apr 2003, Falk Hueffner wrote:
> Linus Torvalds <torvalds@transmeta.com> writes:
>
> > Classic mistake. Lookup tables are only faster in benchmarks, they
> > are almost always slower in real life. You only need to miss in the
> > cache _once_ on the lookup to lose all the time you won on the
> > previous one hundred calls.
>
> It seems to me if you call the function so seldom the table drops out
> of the cache, it is irrelevant how long it takes anyway.

Yeah, but then this whole discussion is irrelevant too.

I'm just saying that micro-benchmarks of operations that do not make any
sense on their own are _bad_ measures of real performance. That lookup is
going to lose to the "natural" code even if it hits the L2, and misses
only the L1.

Any piece of code that only works well when it sits in (and owns) the L1
is a piece of crap.

> Well, if a lookup table isn't "small and simple", I don't know what
> is.

Something that calculates it directly? Like, perchance, the current code?

There is _never_ any excuse to use a lookup table for something that can
be calculated with a few simple instructions. That's just stupid.

		Linus


Index Home About Blog