Index Home About Blog
From: torvalds@transmeta.com (Linus Torvalds)
Subject: Re: __ucmpdi2
Date: 	19 Sep 2000 11:59:19 -0700
Newsgroups: fa.linux.kernel

In article <20000919083901.A463@twiddle.net>,
Richard Henderson  <rth@twiddle.net> wrote:
>On Tue, Sep 19, 2000 at 12:22:41PM +0200, Andreas Schwab wrote:
>> IMHO it's a bug in gcc that it does not inline the comparison inside the
>> switch expression, since it already does it in all other places.  Perhaps
>> some problem with the patterns in the machine description.
>
>Perhaps, but without a test case it's hard to know.  My guess is that
>he's using gcc 2.7.2 or something decrepit like that; I couldn't reproduce
>the problem on current source with a simple test case.
>
>That said, I also think it is a bug that the kernel does not link 
>against libgcc.

I'd love to link against libgcc, but for the fact that

 - gcc developers have sometimes done horribly stupid things, and NOT
   linking against libgcc has been an excellent way of not getting
   bitten by it. Things like the exception handling etc, where not
   linking against libgcc caused the kernel to not link - where that was
   the RIGHT thing to do, because gcc had inserted completely bogus
   exception handling into the code.

   Proper fix: -fno-exceptions

 - Linux developers often do horribly stupid things, and use 64-bit
   division etc instead of using a simple shift. Again, not linking
   against libgcc finds those things early rather than late, because the
   horribly stupid things end up requireing libgcc support.

In the case of __ucmpdi2, it appears to be a combination of kernel and
compiler stupidity - there's no reason why __ucmpdi2 should not be done
inline by the compiler, but at the same time there is probably also
little reason to use a slow "long long" comparison in the kernel.

So again, not linking libgcc showed a problem. Good for us.

But yes, it is often much more convenient to not know about problems
like this. And some people don't think they are a big deal. I'd rather
nip them in the bud early than be "convenient", myself.

		Linus

From: torvalds@transmeta.com (Linus Torvalds)
Subject: Re: __ucmpdi2
Date: 	20 Sep 2000 11:02:32 -0700
Newsgroups: fa.linux.kernel

In article <10009192200.ZM278817@classic.engr.sgi.com>,
Jeremy Higdon <jeremy@classic.engr.sgi.com> wrote:
>>  - Linux developers often do horribly stupid things, and use 64-bit
>>    division etc instead of using a simple shift. Again, not linking
>>    against libgcc finds those things early rather than late, because the
>>    horribly stupid things end up requireing libgcc support.
>
>I would have thought that the compiler would generate a shift if it
>could (I'm presuming you're talking about shifting by a constant
>here -- or are you talking about code that always shifts by a
>computed power of two).

The compiler is smart, but the compiler doesn't have ESP.

For example, what some filesystems did was basically

	blocknumber = offset_64bit / filesystem->blocksize;

which is not optimizable. Because while it _is_ a division by a power of
two, gcc has no way of knowing that, nor _what_ power of two. Gcc
doesn't know that the ext2 blocksize is 1024, 2048 or 4096 ;)

The fix is to hit such Linux developers virtually on the head (by having
a kernel that doesn't link ;), and rewrite the code as

	blocknumber = offset_64bit >> filesystem->blocksize_bits;

which does exactly the same thing, except it is about a hundred times
faster.

See?

>> In the case of __ucmpdi2, it appears to be a combination of kernel and
>> compiler stupidity - there's no reason why __ucmpdi2 should not be done
>> inline by the compiler, but at the same time there is probably also
>> little reason to use a slow "long long" comparison in the kernel.
>
>Little reason or no reason?  If there is a reason, and it doesn't
>work, then the coder is forced to rewrite using 32 bit variables,
>synthesizing the result.  Then you have belabored C code as well
>as belabored machine code, and it doesn't automatically clean up
>when you move to a 64 bit machine.

Oh, but usually it does.

For example, most of the time these things are due to issues like

	if (offset >= page_offset(page))
		...

where Page_offset() is simply "(unsigned long long)page->index <<
PAGE_CACHE_SHIFT" 

Very readable, no?

But it doesn't get any worse by doing the comparison the other way
around, and instead doing

	if (index(offset) >= page->index)

which is faster (because now you have only one long long shift, not two
shifts and a comparison), and equally readable (yeah, you have to think
about it for a bit if you want to convince yourself that it's the same
thing due to the low-order bits you lost, but in many cases where we did
this conversion the end result was _more_ readable, because the end
result was that we always worked on index+offset parts, and there was no
confusion). 

And on 64-bit machines the code is exactly the same too.  No slow-down. 

This was why I hated the original LFS patches.  They mindlessly just
increased a lot of stuff to 64 bits, with no regard for what the code
really _meant_.  I ended up re-writing the core code completely before
LFS was accepted into the 2.3.x series - using page index calculations
instead, which meant that most of the actual critical routines _still_
did the same old 32-bit calculations, they just did them with the part
of the values that really mattered - thus giving effectively a 44 bit
address space. 

And btw, doing it this way means that on the alpha we could potentially
have a "77-bit address space" for file mapping. So yes, it actually
means other improvments too - even for 64-bit machines.

(Now, the 77-bit address space that the new VM potentially gives to
64-bit architectures is only useful for people who use the page cache
directly, because obviously file sizes are still just 64-bit integers.
But it could be useful for the internal implementation of distributed
memory, for example.)

Ehh.. Basically, my argument boils down to the old truism: by thinking
about the problem and doing the smart thing, you can often do more with
less work.

>So what we've said is: 64 bit is okay, except in a switch statement,
>or other random expressions that might cause gcc to embed a call to
>similar libgcc function.

No, what Linux really says is that you should avoid using "long long"
(and thus 64-bit values), because on many architectures it is slower. 

And I further say that it is usually very easy to avoid it. 

But you shouldn't go overboard. Simple "long long" arithmetic is useful
and easy, even on 32-bit platforms. The kernel does quite a lot of it,
as all file offsets are basically 64 bits. But by thinking about the
problem some more, you can often limit it to those simple operations,
which are fast anyway.

			Linus

From: torvalds@transmeta.com (Linus Torvalds)
Subject: Re: __ucmpdi2
Date: 	22 Sep 2000 13:51:04 -0700
Newsgroups: fa.linux.kernel

In article <10009202214.ZM224250@classic.engr.sgi.com>,
Jeremy Higdon <jeremy@classic.engr.sgi.com> wrote:

>In my case, it is simple "long long" arithmetic.  Shifts, ANDs, ORs,
>comparisons, etc.  Not even any addition (which should be pretty efficient
>with the HW carry bit on X86).  I don't know why:

Oh, I agree.  In your case this didn't show a Linux developer doing
anything stupid.  Instead, it showed the compiler doing something
stupid. 

You win some, you lose some.  But the end result will hopefully be that
the compiler gets fixed for this case.  Which is, after all, something
that benefits everybody. 

		Linus

Index Home About Blog