gcc assembly (Linus Torvalds)

Index Home About Blog

From: Linus Torvalds <torvalds@transmeta.com>
Date: 	Sat, 17 Nov 2001 11:20:45 -0800
Subject: Re: i386 flags register clober in inline assembly
Newsgroups: fa.linux.kernel

In article <20011117161436.B23331@atrey.karlin.mff.cuni.cz> you write:
>
>They don't need to be. On i386, the flags are (partly for historical reasons) clobbered
>by default.

However, this is one area where I would just be tickled pink if gcc were
to allow asm's to return status in eflags, even if that means that we
need to fix all our existing asms.

We have some really _horrid_ code where we use operations that
intrinsically set the flag bits, and we actually want to use them.
Using things like cmpxchg, and atomic decrement-and-test-with-zero have
these horrid asm statements that have to move the eflags value (usually
just one bit) into a register, so that we can tell gcc where it is.

(Example code:

atomic_sub_and_test:
	__asm__ __volatile__(
		LOCK "subl %2,%0; sete %1"
		:"=m" (v->counter), "=qm" (c)
		:"ir" (i), "m" (v->counter) : "memory");

Where we first get the value we _really_ want in ZF in eflags, then we
use "sete" to move it to a register, and then gcc will end up generating
code to test that register by hand, so the end result is usually
something like:

#APP
        lock ; decl 20(%edi); sete %al
#NO_APP
        testb   %al, %al
        je      .L1570

even though what we'd _want_ is really

	lock ; decl 20(%edi)
	jne .L1570

which is not only smaller and faster, but is often _really_ faster
because at least some Intel CPU's will forward the flags values to the
branch prediction stuff, and going through a register dependency will
add non-forwarded state and thus extra cycles.

So I would personally _really_ really like for some way to expose the
internal gcc

	"(set (cc0) ..asm..)"

construct, together with some way of setting the cc_status.flags.

From what I can tell, all the x86 machine description already uses "cc0"
together with the notion of comparing it to zero (either signed or
unsigned), so something like this _might_ just work

	unsigned long result;
	asm volatile(
		LOCK "decl %m"
		:"+m" (v->counter),
		 "=cc" (result)
		: :"memory");
	if (result > 0)		/* "jnb" */
		...

which would be wonderful, and would expand to

	(set (cc0) ..asm..)
	(set (pc)
		(if_then_else (gtu (cc0) (const_int 0))
			(label_ref (match_operand ..
			(pc))

Which _should_ just automatically give us

	lock ; decl ..
	ja ..

which is exactly what we want.

I know this used to be impossible in gcc, because the x86 didn't
actually track the flags values, and conditional jumps were really a
_combination_ of the conditional and the jump, and splitting it up so
that the conditional would be in an asm was thus not possible.

But I think gcc makes cc0 explicit on x86 these days, and that the above
kind of setup might be possible today, no?

		Linus

Date: 	Sat, 17 Nov 2001 12:30:17 -0800 (PST)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: i386 flags register clober in inline assembly
Newsgroups: fa.linux.kernel

On 17 Nov 2001, Momchil Velikov wrote:
>
> Indeed, with pattern like:
> (define_insn "*ja"
>   [(set (pc)
>         (if_then_else (gtu (match_operand:CC 0 "" "") (const_int 0))
>                       (label_ref (match_operand 1 "" ""))
> 		      (pc)))]
>   ""
>   "ja ..."
>   ...)

As far as I can tell, that pattern already exists in i386.md.

So it shouldn't need any new patterns, it would only need a way to allow
the setting of cc0 from the asm - so that we can get the first part,
namely the "(set (cc0) ..asm..)" thing.

		Linus

Date: 	Sat, 17 Nov 2001 12:42:35 -0800 (PST)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: i386 flags register clober in inline assembly
Newsgroups: fa.linux.kernel

On Sat, 17 Nov 2001, Jan Hubicka wrote:
>
> Actually the main dificulty I see is storing cc0 to variable.  CC0 is hard
> register and pretty strange one - you can't move it, you can't spill.

Well, you _can_ spill it, but you need to use "pushfl/popfl" to
spill/restore.

Also, if you're clever you don't spill cc0 itself, but the _comparison_,
ie if you need to spill in

	asm(.. "=cc" (cc0))
	if (cc0 > 0)

a sufficiently clever spill-engine would spill not eflags,  but instead
spill "cc0 > 0", which it can do with the "seq" expansions..

gcc already does know about "store-flag" instructions, although I
certainly agree that the _patterns_ of usage may end up being very
different than existing conditional comparisons..

		Linus

Date: 	Sat, 17 Nov 2001 18:48:22 -0800 (PST)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: i386 flags register clober in inline assembly
Newsgroups: fa.linux.kernel

On Sun, 18 Nov 2001, Jan Hubicka wrote:
> >
> > Well, you _can_ spill it, but you need to use "pushfl/popfl" to
> > spill/restore.
>
> I know, I can, but we are speaking about sollution how to get programs faster,
> not slow down them to scrawl.

Agreed, which is why I suggested the "spill the _comparison_, not the
actual cc0 register" approach.

That way, if you have to spill, you'll end up at worst with the same code
we already have to have, ie the code will end up something like

	lock ; decl mem
	seta %al		<- spill comparison to %al
	..
	testb %al,%al		<- re-do comparison test later
	jne ..

> Actually what can be feasible is to make asm statement set flags and follow
> it by store flag instruction that will be used in the conditional. Later
> the combine pass should be able to get it connected.

That sounds pretty ideal - have some way of telling gcc to add a "seta
%reg", while at the same time telling gcc that if it can elide the "seta"
and use a direct jump instead, do so..

> The design of asm statements should be IMO re-tought.  I think it has been
> mistake to make them so low level and allow user to write constraints directly,
> so perhaps we can think about big change for future gcc...

I don't see many alternatives. The fact is, asm's end up being used
exactly when gcc simply doesn't know what to do, so gcc doesn't know what
the constraints are either.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH 3/8] i386: bitops: Rectify bogus "+m" constraints
Date: Mon, 23 Jul 2007 17:47:05 UTC
Message-ID: <fa.GF4c/3vj6oif9quAJsw83M1x9Hw@ifi.uio.no>

On Mon, 23 Jul 2007, Satyam Sharma wrote:
>
> [3/8] i386: bitops: Rectify bogus "+m" constraints
>
> From the gcc manual:
>
>   Extended asm supports input-output or read-write operands. Use the
>   constraint character `+' to indicate such an operand and list it with
>   the output operands. You should only use read-write operands when the
>   constraints for the operand (or the operand in which only some of the
>   bits are to be changed) allow a register.
>
> So, using the "+" constraint modifier for memory, like "+m" is bogus.
> We must simply specify "=m" which handles the case correctly.

No. This is wrong.

"=m" means that the new value of the memory location is *set*.

Which means that gcc will potentially optimize away any previous stores to
that memory location.

And yes, it happens, and yes, we've seen it.

The gcc docs are secondary. They're not updated, they aren't correct, they
don't reflect reality, and they don't matter. The only correct thing to
use is "+m" for things like this, or alternatively to just use the
"memory" specifier at the end and make gcc generate really crappy code.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH 6/8] i386: bitops: Don't mark memory as clobbered
Date: Tue, 24 Jul 2007 19:41:01 UTC
Message-ID: <fa.OrJwyvB5AbJQ9zVC6O7xah0OJ9M@ifi.uio.no>

On Tue, 24 Jul 2007, Trent Piepho wrote:
>
> Speaking of that, why are all the asm functions in arch/i386/lib/string.c
> defined as having a memory clobber, even those which don't modify memory
> like strcmp, strchr, strlen and so on?

That's because the memory clobber will serialize the inline asm with
anything else that reads or writes memory.

So even if we don't actually change any memory, if we cannot describe what
we *read*, then we need to tell gcc to not re-order us wrt things that
could *write*. And the simplest way to do that is to say that you clobber
memory, even if you don't.

So as a more concrete example: imagine that we're doing a "strlen()" on
some local variable. We need to tell gcc that that variable has to be
updated in memory before it schedules the asm. The memory clobber does
that.

(Yes, the "asm volatile" may do so too, but it's very unclear what the
"volatile" on the asm actually does, so ..)

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH 6/8] i386: bitops: Don't mark memory as clobbered
Date: Tue, 24 Jul 2007 20:10:02 UTC
Message-ID: <fa.gd/efXeKjiBNGacVA1Ty1gOYzIQ@ifi.uio.no>

On Tue, 24 Jul 2007, Andi Kleen wrote:
>
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> >
> > (Yes, the "asm volatile" may do so too, but it's very unclear what the
> > "volatile" on the asm actually does, so ..)
>
> Without the volatile they get completely optimized away :/
> [tried that, cost a lot of debugging time -- empty string functions
> cause a lot of strange side effects]

Sure, that's *one* thing that "volatile" guarantees (it guarantees that
gcc won't optimize away things where the end result isn't actually visibly
used).

But gcc docs also talk about the other things volatile means, including
"not significantly moved".

Is that "not significantly moved" already equivalent to "not moved past
something that can change memory"? Probably not. Which is why it's a good
idea to have the "memory" clobber there too, because the semantics of the
"volatile" just aren't very well defined from a code movement standpoint
(while a general memory clobber is *very* clear: if gcc moves the inline
asm across another thing that uses/changes memory, that's a gcc bug, plain
and simple (*)).

		Linus

(*) Other than pure internal gcc spills/reloads. Spilling/reloading local
registers that haven't had their address taken obviously cannot be seen as
memory accesses.

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH 6/8] i386: bitops: Don't mark memory as clobbered
Date: Tue, 24 Jul 2007 21:47:25 UTC
Message-ID: <fa.TLiD4N63sId9pRu5KaXEzKPbykM@ifi.uio.no>

On Tue, 24 Jul 2007, Jeremy Fitzhardinge wrote:
> >
> > But gcc docs also talk about the other things volatile means, including
> > "not significantly moved".
>
> Actually, it doesn't.  In fact it goes out of its way to say that "asm
> volatile" statements can be moved quite a bit, with respect to other
> asms, other code, jumps, basic blocks, etc.

Ahh. That's newer.

Historically, gcc manuals used to say "may not be deleted or significantly
reordered".

So they've weakened what it means, probably exactly because it wasn't
well-defined before either.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH 6/8] i386: bitops: Don't mark memory as clobbered
Date: Thu, 26 Jul 2007 01:23:02 UTC
Message-ID: <fa.cq+rL/Awl4wrqJ0P083Vpqiwnqw@ifi.uio.no>

On Wed, 25 Jul 2007, Linus Torvalds wrote:
>
> Hmm. I really think you should take this up with the gcc people. That
> looks like a gcc bug - because there really is nothing that guarantees
> that the asm doesn't change the array that "x" points to, and the asm
> clearly talks about clobbering memory.

Actually, I take that back. I think gcc does the right thing, and yes,
it's explained by the memory clobber being just a blind "write to memory"
rather than read memory. My bad.

It does leave us with very few ways of saying that an asm can *read*
memory, and so it might be good to have it clarified that "volatile"
implies that (at least with the memory clobber).

Your examples are good, I think.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] make atomic_t volatile on all architectures
Date: Sun, 12 Aug 2007 09:56:17 UTC
Message-ID: <fa.FWq3azdY2kHabbiyKydvww20VnU@ifi.uio.no>

On Sun, 12 Aug 2007, Martin Schwidefsky wrote:
>
> The duplication "=m" and "m" with the same constraint is rather
> annoying.

It's not only annoying, it causes gcc to generate bad code too. At least
certain versions of gcc will generate the address *twice*, even if there
is obviously only one address used.

If you have problems with "+m", you are often actually better off using
just "m" and then adding a memory clobber. But that has other code
generation downsides.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] make atomic_t volatile on all architectures
Date: Sun, 12 Aug 2007 18:12:58 UTC
Message-ID: <fa.GaeoruIIW8FLBu90yr75M2r3Tdk@ifi.uio.no>

On Sun, 12 Aug 2007, Segher Boessenkool wrote:
>
> Yeah.  Compiler errors are more annoying though I dare say ;-)

Actually, compile-time errors are fine, and easy to work around. *Much*
more annoying is when gcc actively generates subtly bad code. We've had
use-after-free issues due to incorrect gcc liveness calculations etc, and
inline asm has beeen one of the more common causes - exactly because the
kernel is one of the few users (along with glibc) that uses it at all.

Now *those* are hard to find - the code works most of the time, but the
compiler has inserted a really subtle race condition into the code
(deallocated a local stack entry before last use). We had that with our
semaphore code at some point.

		Linus

Index Home About Blog