The C "volatile" qualifier(John R. Mashey; Linus Torvalds; Theodore Tso)

Index Home About Blog

From: mash@mips.com (John Mashey)
Newsgroups: comp.arch
Subject: Re: RISC vs CISC? Call a spade a spade?
Message-ID: <7037@spim.mips.COM>
Date: 16 Aug 91 23:32:33 GMT

In article <7006@spim.mips.COM> rogerk@mips.com (Roger B.A. Klorese) writes:
>In article <PCG.91Aug16141606@aberda.aber.ac.uk> pcg@aber.ac.uk (Piercarlo Grandi) writes:
>>I remember a report of a talk by John Mashey, in which he gave the
>>harrowing details of how unpleasant it had been to rewrite the MIPS Unix
>>kernel from Classic C to ANSI C, and all the dangers thereof.

>I think you are imagining this.  RISC/os is written in Classic C, with the
>addition of "volatile."

I think there is multipel confusion caused by the typical telephone series
problem.

Here is the standard story:
1) In 4Q85, we had a C Compiler that could compile itself with global
optimization turned on, and usethe result to compile itself a gain,
and get the same thing.  It was also adequate to compile the UNIX
kernel, albeit without global optimization turned on.
2) In early 1986, we started to do -O on the kernel, and it was
indeed harrowing at first, because:
	a) hardly anyone had implemented volatile in an optimizing
	compiler yet, and it's true implications weren't quite
	understood.
	b) Of course there were bugs in the optimizer.
3) hence, when we just blinding turned on -O (as volatile was in
process of being implemented), thigns broke everywhere, i.e.,
loops like:
	while (p->devicestatus != OK)
		junk = p->deviceinput;
and the compiler optimized everything away.

Then, we got volatile in the compiler, and declared  volatile .... *p;
...and it still borke, because it still saw that junk (which had never been
mentioned) was never used again, and hence that statement disappeared.

4) It became clear after a while that:
	any load or store to a volatile variable that would have happened
	with simplistic code ... must happen in exactly the same order and
	number ... or systems programmers go nuts.
So, our compiler folks did that.

5) And finally, there was the general issue of debugging an optimizer
when using the kernel.  This was the case where it would almost
work optimized, and we had to do binary search to find the module
where 1 store was being omitted.

Now, the only ANSI C issue in this whole story is the fact that
we were able to add volatile to our existing compilers, rather than
some MIPS-specific keyword, and know we were at least heading in
a standards-oriented direction.

Most of the problem was dealing with new compilers doing global optimization
on code, where (at that time) the number of people in the world who had
ever dealt with the resulting issues inside the kernel was small...
Figuring out what to make volatile was pretty straightforward: make
every pointer to a device structure volatile, plus a few other places.

6) We started on this 1Q86, nad had most of it in pretty reasonably shape
by 2Q86, and shipped -O'd kernels in production around Sept/Oct of 1986.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94088-3650

Date: Wed, 9 Feb 2000 13:56:02 -0800 (PST)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: [linux-usb] Re: [patch-2343pre5/#4] warnings
Newsgroups: fa.linux.kernel

On Wed, 9 Feb 2000, Vojtech Pavlik wrote:
> 
> I thought the same, so I looked deeper into the cause. The warning in
> question is:
> 
> cpia.c:1138: warning: passing arg 1 of `interruptible_sleep_on' discards
> 		`volatile' from pointer target type

So the fix is to remove "volatile", no? 

There aren't very many really good reasons for ever using "volatile"
anyway. It's a broken concept, without any well-defined semantics. It's
definitely completely unrealistic to use it for a whole structure, that
contains obvious non-volatile data.

I have a simple rule: in 99.9% of all cases, the existence of "volatile"
is just a sign that somebody doesn't lock the data structure correctly,
and hopes the compiler will magically make the problem go away when told
that it cannot rely on the data remaining the same.

The 0.1% case is for things like "jiffies": read-only variables where
it doesn't actually matter exactly which version of the variable you get,
so you don't care about locking.

		Linus

Date: 	Mon, 23 Jul 2001 11:11:25 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: user-mode port 0.44-2.4.7
Newsgroups: fa.linux.kernel

On Mon, 23 Jul 2001, Andrea Arcangeli wrote:
>
> GCC internally is allowed to generate code that relies on the contents
> of the memory to stay constant, this because of the C standard, the
> usual example is a fast path jump table for the "case" statement.

Bah.

If we have a case where we _really_ care about the value being stable, I
_still_ claim that that is the one that we need to protect. Not waving
some magic wand over the thing, and saying that "volatile" makes an
unstable value ok.

"volatile" (in the data sense, not the "asm volatile" sense) is almost
always a sign of a bug. If you have an algorithm that depends on the
(compiler-specific) behaviour of "volatile", you need to change the
algorithm, not try to hide the fact that you have a bug.

Now, the change of algorithm might be something like

	/*
	 * We need to get _one_ value here, because our
	 * state machine ....
	 */
	unsigned long stable = *(volatile unsigned long *)ptr;

	switch (stable) {
	....
	}

where the volatile is in the place where we care, and the volatile is
commented on WHY it actually is the correct thing to do.

The C standard doesn't say crap about volatile. It leaves it up to the
compiler to do something about it.

This is _doubly_ true of multi-word data structures like "xtime". The
meaning of "volatile" on the data structure is so unclear that it's not
even funny.

		Linus

Date: 	Mon, 23 Jul 2001 15:51:21 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: user-mode port 0.44-2.4.7
Newsgroups: fa.linux.kernel

On Mon, 23 Jul 2001, Jonathan Lundell wrote:
>
> If jiffies were not volatile, this initializing assignment and the
> test at the end could be optimized away, leaving an unconditional
> "return 0". A lock is of no help.

Right.

We want optimization barriers, and there's an explicit "barrier()"  thing
for Linux exactly for this reason.

For historical reasons "jiffies" is actually marked volatile, but at least
it has reasonably good semantics with a single-data item. Which is not to
say I like it. But grep for "barrier()" to see how many times we make this
explicit in the algorithms.

And really, THAT is my whole point. Notice in the previous mail how I used
"volatile" when it was part of the _algorithm_. You should not hide
algorithmic choices in your data structures. You should make them
explicit, so that when you read the code you _see_ what assumptions people
make.

For example, if you fix the code by adding an explicit barrier(), people
see that (a) you're aware of the fact that you expect the values to change
and (b) they see that it is taken care of.

In contrast, if your data structure is marked volatile, how is anybody
reading the code every going to be sure that the code is correct? You have
to look at the header file defining all the data structures. That kind of
thing is NOT GOOD.

So make the algorithm be correct. Then you will notice that there is
_never_ any reason (except for being lazy with tons of existing code) to
add "volatile" to data structures.

Ponder. Understand.

		Linus

Date: 	Tue, 24 Jul 2001 08:35:15 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: user-mode port 0.44-2.4.7
Newsgroups: fa.linux.kernel

On Tue, 24 Jul 2001, Jan Hubicka wrote:
>
> What I was concerned about is for example scenario:
> 1) cse realizes that given variable is constant in some region of program
>    (by seeing an conjump).

Note that just about _every_ single global in the whole kernel is
"volatile" in this sense.

In fact, I can't think of a single (interesting) global that cannot be
changed by another thread at any time.

What makes this even more interesting, of course, is that with SMP you
also have various memory ordering models, so many classic algorithms
simply will not work: one classic algorithm is to know about update order
and avoid locking by using optimistic algorithms:

	do {
		x = (volatile) value1;
		y = (volatile) value2;
	} while (x != (volatile) value1);

the above is a completely valid algorithm in a strictly ordered system,
but it simply WILL NOT WORK on SMP.

How do we handle globals in the kernel? We use locking.

It's really that simple. We _always_ use locking to a first approximation.
And the locking code takes care to tell gcc that gcc cannot optimize
memory movements of assume that memory is constant.

This is just a basic fact of life.

There are other valid algorithms, but they have to be secondary if only
because they are a lot harder to think about and validate.

One is to use memory ordering constraints. Note that in this one
"volatile" simply doesn't help. Even with "volatile" maybe (and this is
debatable - gcc does not actually have a very good definition of what
volatile means even for a purely gcc implementation standpoint) forcing
ordering with the compiler, it doesn't force any ordering in the hardware.

In fact, there have been CPU's at least designed that simply will not be
guaranteed to pick up changes by another CPU without some kind of memory
barrier. On such a CPU it is simply not _possible_ to use

	while ((volatile) jiffies < x)
		/* do nothing */;

because if "jiffies" is updated on another CPU, the first CPU will not
necessarily see it AT ALL. Even though the compiler causes the code to
re-load the value constantly.

An example of this kind of machine? Imagine a P4 that didn't have some
backwards compatibility cruft.. And wonder about some of the reasons that
Intel _really_ wants people to start using "rep ; nop" in these kinds of
loops.

Do we potentially have these bugs? Yes. In the case of "jiffies", we'd be
ok on these kinds of machines simply because all places that do this do so
with interrupts enabled (it wouldn't work on UP otherwise), and interrupts
would end up working as memory barriers in any case.

Ok, so memory ordering is one valid algorithm. And clearly "volatile" is
useless for memory ordering.

Another valid algorithm: "we don't care what the value is, old or new".
This is the one most often used for "jiffies". We simply don't care
enough. We know that jiffies can change at any time, and the code had
better not do anything strange about it anyway. And the code had better be
such that even the compiler cannot do anything really strange with it
either.

"xtime" simply isn't this kind of value. Why? Because it's not even
atomic. It's two 32-bit words, and the operations gcc would use on it
aren't things that we could improve with "volatile".

For "xtime", we're ok with (for example) only looking at one of the fields
(usually xtime.sec), and just not caring _what_ the value is, because we
just copy it into "inode->i_mtime" or something.

But if we have an algorithm that really _does_ care, then "volatile"
really doesn't help. See above on memory ordering etc constraints anyway.

> For instance gcc 3.1 now does load store motion that makes gcc to keep
> global variables in registers much longer than it did originally making
> such risk greater.

I bet that we'll find kernel bugs thanks to it. And I look forward to it,
as that is just a sign of gcc doing a better job. And we'll have to make
sure that we add the locks or the memory ordering things that we missed.

Looking back a few years, we used to have lots of code of the type

	while (!SCpnt->ready) {
		.. do some waiting ..
	}

and gcc was simply too stupid to notice that it could hoist the load of
"SCpnt->ready" outside the loop because the loop did nothing.

Then gcc improved, and for a while we had a _lot_ of "barrier()" calls
added. And we've been fine with this ever since.

Maybe we'll need to add some more. Maybe for xtime. Maybe for something we
haven't even thought about. Bugs are nothing new, and nothing to be really
afraid of.

		Linus

Date: 	Tue, 24 Jul 2001 09:04:28 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: user-mode port 0.44-2.4.7
Newsgroups: fa.linux.kernel

On Tue, 24 Jul 2001, Andrea Arcangeli wrote:
> On Mon, Jul 23, 2001 at 05:47:04PM -0600, Richard Gooch wrote:
> > I don't think it should be allowed to do that. That's a whipping
>
> it is allowed to do that, period. This is not your choice or my choice.
> You may ask gcc folks not to do that and I think they just do.

Stop this stupid argument.

A C compiler is "allowed" to do just about anything. The introduction of
"volatile" does not change that in any really meaningful way. It doesn't
change the fact that gcc is "allowed" to do a really shitty job on _any_
code it is given.

From a pure standards standpoint, gcc can change something like

	int i = *int_ptr;

into the equivalent of (assuming a little-endian machine with only byte
load/store instructions)

	unsigned char *tmp = (unsigned char *)int_ptr + 3;
	int j = 4;
	int i = 0;

	do {
		i <<= 8;
		i += *tmp;
		tmp--;
	} while (--j);

The fact that a C compiler is _allowed_ to create code like just about
anything is not an argument at all.

The above, btw, is NOT as ridiculous as it sounds. We've had the exact
opposite problem on alpha: byte stores are not "atomic", and would
"corrupt" the bytes around it due to the non-atomic nature of having to do

	load word
	mask value
	insert byte
	store word

Did it help to mark things "volatile" there? No. We had to change the code
to (a) either use locking so that nobody would ever touch adjacent bytes
concurrently or (b) stop using byte values.

			Linus

Date: 	Tue, 24 Jul 2001 09:59:48 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: user-mode port 0.44-2.4.7
Newsgroups: fa.linux.kernel

On Tue, 24 Jul 2001, Davide Libenzi wrote:
>
> Look, you're not going to request any kind of black magic over that variable.
> You're simply telling the compiler the way it has to ( not ) optimize the code.

Ehh.

But it shouldn't optimize it that way _every_ time. You only want the
specific optimizations in specific places. Which is why you use
"barrier()" or volatile in the _code_, not the data declaration.

For example, if you're holding a lock that protects it or you otherwise
know that nothing is touching it at the same time, you do NOT want to have
the compiler generate bad code.

And trust me, "volatile" generates _bad_ code a lot more often than it
generates correct code.

Look at this:

	volatile int i;
	int j;

	int main()
	{
	        i++;
	        j++;
	}

turning into this:

	main:
	        movl i,%eax
	        incl %eax
	        movl %eax,i
	        incl j
	        ret

Now, ask yourself why? The two _should_ be the same. Both do a
read-modify-write cycle. But the fact is, that when you add "volatile" to
the register, it really tells gcc "Be afraid.  Be very afraid. This user
expects some random behaviour that is not actually covered by any
standard, so just don't ever use this variable for any optimizations, even
if they are obviously correct. That way he can't complain".

See? "volatile" is evil. It has _no_ standard semantics, which makes it
really hard to implement sanely for the compiler. It also means that the
compiler can change the semantics of what "volatile" means, without you
really being able to complain.

Also note how the "incl j" instruction is actually _better_ from a
"atomicity" standpoint than the "load+inc+store" instruction. In this
case, adding a "volatile" actually made the accesses to "i" be _less_
likely to be correct - you could have had an interrupt happen in between
that also updated "i", and got lost when we wrote the value back.

Moral of the story: don't use volatile. If you want to have a counter that
is updated in interrupts etc, use "atomic_t" or locking. "volatile" makes
things worse or better based on completely random criteria that don't
necessarily have anything to do with what you _want_ to do.

			Linus

Date: 	Tue, 24 Jul 2001 10:38:45 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: user-mode port 0.44-2.4.7
Newsgroups: fa.linux.kernel

On Tue, 24 Jul 2001, Davide Libenzi wrote:
>
> Not that much if you look at how incl is "decomposed" internally ( w/o LOCK )
> by the CPU. If you really care about  j  you need an atomic op here, in any case.

Yes, but the "inc" is atomic at least on a UP system. So here "volatile"
might actually _show_ bugs instead of hiding them.

The real isssue, though, is that that is all volatile ever does. It can
show or hide bugs, but it can't fix them.

Of course, some people consider hidden bugs to _be_ fixed. I don't believe
in that particular philosophy myself.

		Linus

Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: [PATCH] fix get_jiffies_64 to work on voyager
Original-Message-ID: <Pine.LNX.4.58.0401061033490.9166@home.osdl.org>
Date: Thu, 8 Jan 2004 10:11:41 GMT
Message-ID: <fa.j8c6qld.1b2ikgv@ifi.uio.no>

On Tue, 6 Jan 2004, Paulo Marques wrote:

> What about this instead? I don't like very much this kind of construction, but
> it seems that it would prevent the lock altogether:
>
> 	u32 jiff_high1, jiff_high2, jiff_low
>
> 	do {
> 		jiff_high1 = ((volatile) jiffies_64) >> 32;
> 		jiff_low = ((volatile) jiffies_64);
> 		jiff_high2 = ((volatile) jiffies_64) >> 32;
> 	}
> 	while(jiff_high1 != jiff_high2);

Yes, we used to do things like that at some point (and your code is
buggy: by leaving out the size, the "volatile" cast casts to the implicit
"int" size in C).

It doesn't work in SMP environments, since the "ordering" by volatile is
only a single-CPU ordering. This is just one of the reasons why "volatile"
isn't very useful.

Also, the above still assumes an update ordering between low and high (it
assumes that both set of bits get written back simultaneously). Which is
not necessarily true on a software _or_ hardware ordering level.

So on the reader side you'd need to add an explicit "rmb()" between the
two reads of the high bits, and on the writer side you'd need to always
make sure that you do an atomic 64-bit write.

Since the "rmb()" is as expensive as the sequence lock, the above wouldn't
much help.

> If there is anyway to avoid the volatiles there, it would be much cleaner.

Not only is there a need to avoid them, they don't help in the least,
since you need the explicit CPU ordering macro anyway. And that ordering
macro is the true cost of "seq_read_lock()", so...

In contrast, the reason the simple "assume some values are stable" patch
actually _does_ help is that it doesn't depend on any ordering constraints
at all.  So it literally can be done lock-less, because it knows about
something much more fundamental: it depends on the _behaviour_ of the
values.

Basically, in the kernel, the expensive part about any lock is literally
the ordering. There are other things that can be expensive too (if the
lock is bouncing back and forth between CPU's, that becomes really
exensive really quickly), but to a first order and especially on locks
that aren't themselves fundamentally highly contended for, the CPU
serialization implied in the lock is the expensive part.

So converting that serialization to a "lockless" algorithm that still
depends on ordering usually doesn't much help for those locks. The
"ordering" part is still serializing, and the main win ends up being on
64-CPU systems etc where you can at least avoid the cacheline bounces.

Indeed, I think it was 64-CPU systems that caused the sequence lock stuff,
not so much the "normal" 2- and 4-way boxes.

If you want to improve on the sequence lock, you need to take advantage of
inherent data knowledge (in this case the knowledge that you don't care
about the exact value, and that you know how the bits are updated).

		Linus

Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: [PATCH] fix get_jiffies_64 to work on voyager
Original-Message-ID: <Pine.LNX.4.58.0401060945090.9166@home.osdl.org>
Date: Thu, 8 Jan 2004 10:12:13 GMT
Message-ID: <fa.j5soqtf.19i0l8p@ifi.uio.no>

[ This is a big rant against using "volatile" on data structures. Feel
  free to ignore it, but the fact is, I'm right. You should never EVER use
  "volatile" on a data structure. ]

On Tue, 6 Jan 2004, Tim Schmielau wrote:
>
> We then need to make jiffies_64 volatile, since we violate the rule to
> never read it.

No, we should _never_ make anything volatile. There just isn't any reason
to. The compiler will never do any "better" with a volatile, it will only
ever do worse.

If there are memory ordering constraints etc, the compiler won't be able
to handle them anyway, and volatile will be a no-op. That's why we have
"barrier()" and "mb()" and friends.

The _only_ acceptable use of "volatile" is basically:

 - in _code_ (not data structures), where we might mark a place as making
   a special type of access. For example, in the PCI MMIO read functions,
   rather than use inline assembly to force the particular access (which
   would work equally well), we can force the pointer to a volatile type.

   Similarly, we force this for "test_bit()" macros etc, because they are
   documented to work on SMP-safe. But it's the _code_ that works that
   way, not the data structures.

   And this is an important distinctions: there are specific pieces of
   _code_ that may be SMP-safe (for whatever reasons the writer thinks).
   Data structures themselves are never SMP safe.

   Ergo: never mark data structures "volatile". It's a sure sign of a bug
   if the thing isn't a memory-mapped IO register (and even then it's
   likely a bug, since you really should be using the proper functions).

   (Some driver writers use "volatile" for mailboxes that are updated by
   DMA from the hardware. It _can_ be correct there, but the fact is, you
   might as well put the "volatile" in the code just out of principle).

That said, the "sure sign of a bug" case has one specific sub-case:

 - to paste over bugs that you really don't think are worth fixing any
   other way. This is why "jiffies" itself is declared volatile: just to
   let people write code that does "while (time_before(xxx, jiffies))".

But the "jiffies" case is safe only _exactly_ because it's an atomic read.
You always get a valid value - so it's actually "safe" to mark jiffies as
baing volatile. It allows people to be sloppy (bad), but at least it
allows people to be sloppy in a safe way.

In contrast, "jiffies_64" is _not_ an area where you can safely let the
compiler read a unstable value, so "volatile" is fundamentally wrong for
it. You need to have some locking, or to explicitly say "we don't care in
this case", and in both cases it would be wrong to call the thing
"volatile". With locking, it _isn't_ volatile, and with "we don't care",
it had better not make any difference. In either case the "volatile" is
wrong.

We had absolutely _tons_ of bugs in the original networking code, where
clueless people thought that "volatile" somehow means that you don't need
locking. EVERY SINGLE CASE, and I mean EVERY ONE, was a bug.

There are some other cases where the "volatile" keyword is fine, notably
inline asm has a specific meaning that is pretty well-defined and very
useful. But in all other cases I'd suggest having the volatile be part of
the code, possibly with an explanation _why_ it is ok to use volatile
there unless it is obvious.

		Linus

Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: more (insane) jiffies ranting
Original-Message-ID: <Pine.LNX.4.58.0406271022430.16079@ppc970.osdl.org>
Date: Sun, 27 Jun 2004 17:41:51 GMT
Message-ID: <fa.ivnflo9.10k6ajn@ifi.uio.no>

On Sat, 26 Jun 2004, Chris Wedgwood wrote:
>
> On Sat, Jun 26, 2004 at 03:48:34PM -0700, Linus Torvalds wrote:
>
> > But for most data structures, the way to control access is either
> > with proper locking (at which point they aren't volatile any more)
> > or through proper accessor functions (ie "jiffies_64" should
> > generally only be accessed with something that understands about
> > low/high word and update ordering and re-testing).
>
> I don't entirely buy this.  Right now x86 code just assumes 32-bit
> loads are atomic and does them blindly in lots of places (ie. every
> user of jiffies just about).
>
> Without the volatile it seems entirely reasonable gcc will produce
> correct, but wrong code here so I would argue 'volatile' is a property
> of the data in this case.

It's a property of the data _iff_:
 - it is _always_ volatile
 - it is only ever used atomically: this also means that it must be
   totally independent of _all_ other data structures and have no linkages
   to anything else.

And basically, the above is pretty much never true except possibly for
real I/O accesses and sometimes things like simple "flags" (ie it's fine
to use "volatile sigatomic_t flag;" in user programs to have signal
handlers say "something happened" in a single-threaded environment).

NOTE! The "single-threaded environment" part really is important, and is
one of the historical reasons for volatile having been more useful than it
is today. If you are single-threaded and don't have issues like CPU memory
ordering etc, then you can let the compiler do more of the work, and there
are a lot of lockless algorithms that you can use that only depend on
fairly simple semantics for "volatile".

But the fact is, for the kernel none of the above is ever really true.
A 32-bit-atomic "jiffies" comes the closest, but even there the "always"
property wasn't true - it wasn't true in the update phase, and we
literally used to have something like this:

	*((unsigned long *)&jiffies)++;

to update jiffies and still get good code generation (now that we have a
64-bit jiffies and need to do more complex stuff anyway, we don't have
that any more, but you should be able to find it in 2.3.x kernels if I
remember correctly).

And _anything_ that has any data dependencies, "volatile" is totally
useless. Even the (acceptable in single-threaded user-space) "flag" thing
is not valid usage in the kernel, since for a flag in a multi-threaded
environment you still need an explicit CPU memory barrier in the code,
making it impossible for the compiler to do the right thing anyway.

> > I repeat: it is the _code_ that knows about volatile rules, not the
> > data structure.
>
> Except as I mentioned we have exceptions to this right now.

No we don't. The _only_ accepted exception is the special case of "the low
bits of jiffies", and that's accepted partly because of historical
reasons, and partly because it's fundamentally a data structure we don't
really care that much about. There should be no other ones.

And that special case _literally_ is only for people who don't care that
much. Anybody who cares about "real time" needs to get xtime_lock and do
the proper magic to get a real date.

So I don't see your argument. I'm obviously saying that "yes, we have
_one_ case where we make a data structure volatile", but at the same time,
that case is very much a "we don't really care too much about precision
there, and even so people think we should have real accessor functions".

So I stand by the rule: we should make _code_ have the access rules, and
the data itself should never be volatile. And yes, jiffies breaks that
rule, but hey, that's not something I'm proud of.

		Linus

Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: Question about memcpy_fromio prototype
Original-Message-ID: <Pine.LNX.4.58.0412141209150.3279@ppc970.osdl.org>
Date: Tue, 14 Dec 2004 20:19:52 GMT
Message-ID: <fa.gttluui.aiorpi@ifi.uio.no>

On Tue, 14 Dec 2004, Matthew Wilcox wrote:
>
> Hi Linus.  On x86 and ia64, memcpy_fromio is protoyped as:
>
> static inline void memcpy_fromio(void *dst, volatile void __iomem *src, int count)
>
> ALSA does this (except on x86 and sparc32, so you don't see it):
>
> int copy_to_user_fromio(void __user *dst, const void __iomem *src, size_t count)
> [...]
>                 memcpy_fromio(buf, src, c);
>
> which provokes a warning from gcc that we're discarding a qualifier (the
> 'const') from src.  Is ALSA just wrong?  Or is the 'volatile' wrong?

Neither. The right thing for a read-only IO pointer is actually

	const volatile void __iomem *

which looks funny ("const volatile"?) but makes sense for prototypes,
exactly because a "const volatile" pointer is the most permissive kind of
pointer there is. And it actually does describe the thing perfectly: it is
"const" because we don't write to it ("const" in C does not mean that the
thing is constant, and never has, confusing name and some C++ semantic
changes aside) and obviously as an IO area it's both "volatile" and
"__iomem".

On x86, readb/w/l already gets that right, so I'll just fix
memcpy_fromio(). Other architectures can sort out themselves (ppc64 is
already correct, at least for eeh).

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] uninline init_waitqueue_*() functions
Date: Wed, 05 Jul 2006 22:10:11 UTC
Message-ID: <fa.uUPxfD+K1dvqQIcs4KSbad9lecQ@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0607051458200.12404@g5.osdl.org>

On Wed, 5 Jul 2006, Ingo Molnar wrote:
>
> yeah, i'd not want to skip over some interesting and still unexplained
> effect either, but 35 bytes isnt all that outlandish and from everything
> i've seen it's a real win. Here is an actual example:
>
>  c0fb6137:       c7 44 24 08 00 00 00    movl   $0x0,0x8(%esp)
>  c0fb613e:       00
>  c0fb613f:       c7 44 24 08 01 00 00    movl   $0x1,0x8(%esp)
>  c0fb6146:       00
>  c0fb6147:       c7 43 60 00 00 00 00    movl   $0x0,0x60(%ebx)
>  c0fb614e:       8b 44 24 08             mov    0x8(%esp),%eax
>  c0fb6152:       89 43 5c                mov    %eax,0x5c(%ebx)
>  c0fb6155:       8d 43 64                lea    0x64(%ebx),%eax
>  c0fb6158:       89 40 04                mov    %eax,0x4(%eax)
>  c0fb615b:       89 43 64                mov    %eax,0x64(%ebx)

Ahh, it's _that_ old gcc problem.

That's actually a different thing.

Gcc is HORRIBLY BAD at doing the simple

	some_structure = (struct somestruct) { INITIAL };

assignments. It is so ludicrously bad that it's sad. It tends to do that
as a local "struct somestruct" on the stack that gets initialized,
followed by a memcpy().

In this case, the problem appears to be the spinlock initialization code.

In other words, I suspect 90% of your improvement was because you got that
braindamage out of line.

It would be _much_  better to just fix "spin_lock_init()" instead. That
would help a lot of _other_ users too, not just the waitqueue
initializations.

Making that a real function (and inline only for the non-debug case, at
which point it's just a simple and small store) would be much better.

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] uninline init_waitqueue_*() functions
Date: Wed, 05 Jul 2006 23:12:09 UTC
Message-ID: <fa.plBIdzhRfQ5ctDumDmcA43/a8qQ@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0607051555140.12404@g5.osdl.org>

On Wed, 5 Jul 2006, Linus Torvalds wrote:
> >
> >  c0fb6137:       c7 44 24 08 00 00 00    movl   $0x0,0x8(%esp)
> >  c0fb613e:       00
> >  c0fb613f:       c7 44 24 08 01 00 00    movl   $0x1,0x8(%esp)
> >  c0fb6146:       00
> >  c0fb6147:       c7 43 60 00 00 00 00    movl   $0x0,0x60(%ebx)
> >  c0fb614e:       8b 44 24 08             mov    0x8(%esp),%eax
> >  c0fb6152:       89 43 5c                mov    %eax,0x5c(%ebx)

Btw, this is even worse than usual.

I've seen the "create struct on the stack and copy it" before, but looking
closer, this is doing something I haven't noticed before: that double
assignment to the stack is _really_ strange.

I wonder why it first stores a zero to the stack location, and then
overwrites it with the proper spinlock initialized (one).

As far as I can tell, the spinlock initializer should really end up being
a perfectly normal "(struct spinlock_t) { 1 }", and I don't see where the
zero comes from.

It may actually be that we're double penalized because for the lock
"counter", we use a "volatile unsigned int", and that "0" is from some
internal gcc "initialize all base structures to zero" to make sure that
any padding gets zeroed, and then the "volatile" means that gcc ends up
not optimizing it away, even though it was a totally bogus store that
didn't even exist in the sources.

I wonder if we should remove the "volatile". There really isn't anything
_good_ that gcc can do with it, but we've seen gcc code generation do
stupid things before just because "volatile" seems to just disable even
proper normal working.

[ Test-case built as time passes ]

Try compiling this example file with "-fomit-frame-pointer -O2 -S" to see
the effect:

	horrid:
	        subl    $16, %esp
	        movl    $1, 12(%esp)
	        movl    12(%esp), %eax
	        movl    %eax, a1
	        movl    $1, b1
	        addl    $16, %esp
	        ret

	notbad:
	        movl    $1, a2
	        movl    $1, b2
	        ret

I really think that "volatile" in the kernel sources is _always_ a kernel
bug. It certainly seems to be so in this case.

(But at least with the compiler version I'm using, I'm not seeing that
extra unnecessary "movl $0" - I have gcc version 4.1.1 here)

Does removing just the "volatile" on the "slock" member shrink the size of
the kernel noticeably? And do you see an extra "movl $0" in this case with
your compiler version?

Maybe removing the volatile allows us to keep the initializer the way it
is (although going by past behaviour, I've seen gcc generate horrible code
in more complicated settings, so maybe the reason it works well on this
case without the "volatile" is just that it was simple enough that gcc
wouldn't mess up?)

		Linus
---
struct duh {
	volatile int i;
};

void horrid(void)
{
	extern struct duh a1;
	extern struct duh b1;

	a1 = (struct duh) { 1 };
	b1.i = 1;
}

struct gaah {
	int i;
};

void notbad(void)
{
	extern struct gaah a2;
	extern struct gaah b2;

	a2 = (struct gaah) { 1 };
	b2.i = 1;
}

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] spinlocks: remove 'volatile'
Date: Thu, 06 Jul 2006 19:59:03 UTC
Message-ID: <fa.C+obq7s9qMFcdBmTMpoGiRhs+6c@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0607061257130.3869@g5.osdl.org>

On Thu, 6 Jul 2006, Chris Friesen wrote:
>
> The C standard requires the use of volatile for signal handlers and setjmp.

Actually, the C standard requires "sigatomic_t".

> For userspace at least the whole discussion of "barriers" is sort of
> moot--there are no memory barriers defined in the C language, which makes it
> kind of hard to write portable code that uses them.

Any locking primitive BY DEFINITION has a barrier in it.

If it doesn't, it's not a locking primitive, it's a random sequence of
code that does something pointless.

			Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] spinlocks: remove 'volatile'
Date: Thu, 06 Jul 2006 19:56:46 UTC
Message-ID: <fa.d5F2Khv8ILtc+Rbhq7502gxgx8E@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0607061237300.3869@g5.osdl.org>

On Thu, 6 Jul 2006, Ingo Molnar wrote:
>
> yeah. I tried this and it indeed slashed 42K off text size (0.2%):
>
>  text            data    bss     dec             filename
>  20779489        6073834 3075176 29928499        vmlinux.volatile
>  20736884        6073834 3075176 29885894        vmlinux.non-volatile
>
> i booted the resulting allyesconfig bzImage and everything seems to be
> working fine. Find patch below.

Ok, the patch itself looks fine, but looking at some of the usage of the
spinlocks, I worry that the 'volatile' may actually have hidden a real
bug.

Here's what __raw_spin_lock() looks like on x86:

        alternative_smp(
                __raw_spin_lock_string,
                __raw_spin_lock_string_up,
                "=m" (lock->slock) : : "memory");

where the actual spinlock sequence itself isn't important.

What's important is what we tell the compiler, and the "=m" in particular.

The compiler will validly think that the spinlock assembly wil overwrite
the "slock" location _WITHOUT_ caring about the old value.

And that's dangerous, because it means that in theory a sequence like

	spin_lock_init(&lock);
	spin_lock(&lock);

might end up having the compiler optimize away the "unnecessary"
initialization code because the compiler (correctly, as far as we've told
it) thinks that the spin_lock() will overwrite the initial value.

And the "volatile" would have hidden that bug, for totally stupid and
unrelated reasons..

Now, admittedly, the above kind of code is insane, and possibly doesn't
exist, but still..

So I _think_ that we should change the "=m" to the much more correct "+m"
at the same time (or before - it's really a bug-fix regardless) as
removing the "volatile".

It would be interesting to hear whether that actually changes any code
generation (hopefully it doesn't, but if it does, that in itself is
interesting).

Btw, grepping for "=m" shows that semaphores may have the same bug, and in
fact, we seem to have that "volatile" there too (perhaps, in fact, because
somebody hit the bug and decided to fix it the simple way with "volatile"?
Who knows, it might even have been me, back before I realized how badly
gcc messes up volatiles)

Interestingly (or perhaps not), "atomic_add()" and friends (along with the
local_t stuff that seems to largely have been copied from the atomic code)
use a combination of "=m" and "m" in the assembler dst/src to avoid the
bug. They too would probably be better off using "+m".

I think that's because the whole "+m" thing is fairly new (and not well
documented either).

Appended is my previous test-program extended to show the difference
between "=m" and "+m"..

For me, I get:

	horrid:
	        subl    $16, %esp
	        movl    $1, 12(%esp)
	        movl    12(%esp), %eax
	        movl    %eax, a1
	        movl    $1, b1
	#APP
	        # overwrite a1
	        # modify b1
	#NO_APP
	        addl    $16, %esp
	        ret

	notbad:
	        movl    $1, b2
	#APP
	        # overwrite a2
	        # modify b2
	#NO_APP
	        ret

which shows that "horrid" (with volatile) generates truly crapola code due
to the volatile, but that the "overwrite a1" will not optimize away the
initializer.

And "notbad" has good code, but exactly because it's good code, the
compiler has also noticed that the "=m" means that the initialization of
"a2" is unnecessary, and has thus promptly removed it.

(Removing the initializer is _correct_ for that particular source code -
it's just that with the current x86 spinlock() macros, it would be
disastrous, and shows that your "just remove volatile" patch is dangerous
because of our incorrect inline assembly)

		Linus
---
struct duh {
	volatile int i;
};

void horrid(void)
{
	extern struct duh a1;
	extern struct duh b1;

	a1 = (struct duh) { 1 };
	b1.i = 1;
	asm("# overwrite %0":"=m" (a1.i));
	asm("# modify %0":"+m" (b1.i));
}

struct gaah {
	int i;
};

void notbad(void)
{
	extern struct gaah a2;
	extern struct gaah b2;

	a2 = (struct gaah) { 1 };
	b2.i = 1;
	asm("# overwrite %0":"=m" (a2.i));
	asm("# modify %0":"+m" (b2.i));
}

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] spinlocks: remove 'volatile'
Date: Fri, 07 Jul 2006 20:27:09 UTC
Message-ID: <fa.VKklA/zvSBrHL8EhpWMBgrC1tWU@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0607071318570.3869@g5.osdl.org>

On Fri, 7 Jul 2006, linux-os (Dick Johnson) wrote:
>
> Now Linus declares that instead of declaring an object volatile
> so that it is actually accessed every time it is referenced, he wants
> to use a GNU-ism with assembly that tells the compiler to re-read
> __every__ variable existing im memory, instead of just one. Go figure!

Actually, it's not just me.

Read things like the Intel CPU documentation.

IT IS ACTIVELY WRONG to busy-loop on a variable. It will make the CPU
potentially over-heat, causing degraded performance, and you're simply
not supposed to do it.

> Reference:
> /usr/src/linux-2.6.16.4/include/linux/compiler-gcc.h:
> #define barrier() __asm__ __volatile__("": : :"memory")

Actually, for your kind of stupid loop, you should use

 - include/asm-i386/processor.h:
	#define cpu_relax()        rep_nop()

where rep_nop() is

	/* REP NOP (PAUSE) is a a good thing to insert into busy-wait loops. */
	static inline void rep_nop(void)
	{
		__asm__ __volatile__("rep;nop": : :"memory");
	}

on x86, and can be other things on other CPU's. On ppc64 it is

	#define cpu_relax()     do { HMT_low(); HMT_medium(); barrier(); } while (0)

where those HMT macros adjust thread priority.

In other words, you just don't know what you're talking about. "volatile"
is simply not useful, and the fact that you cannot seem to grasp that is
_your_ problem.

Repeat after me (or just shut up about things that you not only don't know
about, but are apparently not willing to even learn):

	"'volatile' is useless. The things it did 30 years ago are much
	 more complex these days, and need to be tied to much more
	 detailed rules that depend on the actual particular problem,
	 rather than one keyword to the compiler that doesn't actually
	 give enough information for the compiler to do anything useful"

Comprende?

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] spinlocks: remove 'volatile'
Date: Sat, 08 Jul 2006 21:44:27 UTC
Message-ID: <fa.sLt/rMGNcNdREJNKcQ+A7eP+Wbk@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0607081442440.3869@g5.osdl.org>

On Sat, 8 Jul 2006, Pavel Machek wrote:
>
> Actually, because volatile is big hammer, it can be used to work
> around compiler bugs. If compiler dies at internal error in function
> foo, just sprinkle few volatiles into it, and you can usually work
> around that compiler problem.

Heh. That's probably an even better use of "volatile" than using it for
hiding bugs in the sources. The bugs in the sources you'd be better off
just _fixing_, while the compiler problems you may have a much harder time
working around..

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] spinlocks: remove 'volatile'
Date: Sun, 09 Jul 2006 03:10:15 UTC
Message-ID: <fa.zGe2Jr18SLxPb4AzGnJREdj/nto@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0607081851470.5623@g5.osdl.org>

On Sat, 8 Jul 2006, Ralf Baechle wrote:
>
> I tried the same on MIPS, for lazyness sake at first only in atomic.h.  With
> gcc 3.3 the code size is exactly the same with both "=m" and "+m", so I
> didn't look into details of the generated code.  With gcc 4.1 "+m" results
> in a size increase of about 1K for the ip27_defconfig kernel.  For example:
>
> <unlock_kernel>:
>        df830000        ld      v1,0(gp)
>        8c620028        lw      v0,40(v1)
>        04400014        bltz    v0,a80000000029944c <unlock_kernel+0x5c>
>        00000000        nop
>        2442ffff        subiu   v0,v0,1
>        ac620028        sw      v0,40(v1)	# current->lock_depth
>        8c630028        lw      v1,40(v1)	# current->lock_depth
>        0461000b        bgez    v1,a80000000029943c <unlock_kernel+0x4c>
>
> The poinless load isn't generated with "=m".  The interesting thing is
> that in all the instances of bloat I looked at it was actually happening
> not as part of the asm statement itself, so maybe gcc's reload is getting
> a little confused.

Indeed, that looks like gcc confusion, because that pointless load is
literally just re-loading the "task->lock_depth" that is all part of
perfecly normal C code _before_ the inline assembly even is reached.

Of course, if a "+m" causes gcc confusion, that's bad in itself, and
indicates that "=m" + "m" may actually be preferable due to some internal
gcc buglet.

I do _not_ see the same extra load on ppc64 or indeed x86 (gcc-4.1.1 in
both cases), so there seems to be something MIPS-specific here.

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] spinlocks: remove 'volatile'
Date: Sun, 09 Jul 2006 03:29:52 UTC
Message-ID: <fa.oQVkKoynluUUQf5ncxmlyXW3rL8@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0607082019180.5623@g5.osdl.org>

On Sun, 9 Jul 2006, Keith Owens wrote:
>
> This disagrees with the gcc (4.1.0) docs.  info --index-search='Extended Asm' gcc
>
>   The ordinary output operands must be write-only; GCC will assume
>   that the values in these operands before the instruction are dead and
>   need not be generated.  Extended asm supports input-output or
>   read-write operands.  Use the constraint character `+' to indicate
>   such an operand and list it with the output operands.  You should
>   only use read-write operands when the constraints for the operand (or
>   the operand in which only some of the bits are to be changed) allow a
>   register.

I'm fairly sure the docs are outdated (but may well be correct for older
gcc versions - as I already discussed elsewhere, that "+" thing was not
historically useful).

We've been using "+m" for some time in the kernel on several
architectures.

git aficionados can do

	git grep -1 '"+m"' v2.6.17

to see the pre-existing usage of this (most of them go back a lot further,
although some of them are newer - the <asm-i386/bitops.h> ones were added
in January.

So if "+m" didn't work, we've been in trouble for at least the last year.

			Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] spinlocks: remove 'volatile'
Date: Sun, 09 Jul 2006 04:01:11 UTC
Message-ID: <fa.IAImhf7/jrxPKEcGtup+EmcdMBk@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0607082041400.5623@g5.osdl.org>

On Sun, 9 Jul 2006, Keith Owens wrote:
>
>   			 "... Extended asm supports input-output or
>   read-write operands.  Use the constraint character `+' to indicate
>   such an operand and list it with the output operands.  You should
>   only use read-write operands when the constraints for the operand (or
>   the operand in which only some of the bits are to be changed) allow a
>   register."

Btw, gcc-4.1.1 docs seem to also have this language, although when you
actually go to the "Constraint Modifier Characters" section, that thing
doesn't actually say anything about "only for registers".

It would be good to have the gcc docs fixed. As mentioned, we've been
using "+m" for at least a year (most of our current "+m" usage was there
in 2.6.13), and some of those uses have actually been added by people that
are at least active on the gcc development lists (eg Andi Kleen).

But let's add a few more people who are more deeply involved with gcc.
Jan? Richard? Davem? Who would be the right person to check this out?

We can certainly write

	...
	:"=m" (*ptr)
	:"m" (*ptr)
	...

instead of the much simpler

	:"+m" (*ptr)

but we've been using that "+m" format for a long time already (and I
_think_ we did so at the suggestion of gcc people), and it would be much
better if the gcc documentation was just fixed here.

			Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] spinlocks: remove 'volatile'
Date: Fri, 07 Jul 2006 17:10:27 UTC
Message-ID: <fa.d1sqjRbmGHjaqL9aFzZ94yxj9v0@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0607071003140.3869@g5.osdl.org>

On Fri, 7 Jul 2006, Chuck Ebbert wrote:
>
> >  #define __raw_spin_unlock_string \
> >       "movb $1,%0" \
> > -             :"=m" (lock->slock) : : "memory"
> > +             :"+m" (lock->slock) : : "memory"
>
> This really is just an overwrite of whatever is there.  OTOH I can't see
> how this change could hurt anything..

Yeah, I don't think any non-buggy sequence can make it matter.

In theory, you could have something like

 - create a spinlock already locked
 - add the object that contains the spinlock to some global queues
 - do some op on the object to finalize it
 - unlock the spinlock

and then the "unlock" had better not optimize away the previous
initialization.

HOWEVER, it can't really do that anyway, since anything that made the
spinlock visible to anybody else would have had to serialize the lock
state for _that_, so if the "+m" vs "=m" makes a difference, you'd have
had a serious bug already for other reasons.

I'll leave it with a "+m". I've tested it locally, and looking at the
patch it definitely fixes real bugs in the inline asms, but I still hope
other people will take a look before I commit it, in case there are any
other subtle cases.

(A "+m" should always be safer than a "=m", so the patch should be very
safe, but hey, bugs happen to "obvious" code too)

			Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] spinlocks: remove 'volatile'
Date: Sat, 08 Jul 2006 18:53:54 UTC
Message-ID: <fa.IY77/ZoAdRrWgI9tq74CJUk2hqY@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0607081125440.3869@g5.osdl.org>

On Sat, 8 Jul 2006, Nick Piggin wrote:
>
> The volatile casting in atomic_* and *_bit seems to be a good idea
> (now that I think about it) [1].
>
> Because if you had a barrier there, you'd have to reload everything
> used after an atomic_read or set_bit, etc.

Yes and no. The problem _there_ is that we use the wrong inline asm
constraints.

The "+m" constraint is basically undocumented, and while I think it has
existed _internally_ in gcc for a long time (gcc inline asms are actually
fairly closely related to the internal gcc code generation templates, but
the internal templates generally have capabilities that the inline asms
don't have), I don't think it has been actually supported for inline asms
all that time.

But "+m" means "I will both read and write from this memory location",
which is exactly what we want to have for things like atomic_add() and
friends.

However, because it is badly documented, and because it didn't even exist
long ago, we have lots of code (and lots of people) that doesn't even
_know_ about "+m". So we have code that fakes out the "both read and
write" part by marking things either volatile, or doing

	...
	:"=m" (*ptr)
	:"m" (*ptr)
	...

in the constraints (or, in many cases, both).

> [1] Even though I still can't tell the exact semantics of these
>     operations eg. why do we need volatile at all? why do we have
>     volatile in the double underscore (non-atomic) versions?

So we tend to have "volatile" for a couple of different reasons:

 - the above kind of "we couldn't tell the inline assembly well enough
   what the instruction actually _does_, so we just tell gcc to not mess
   with it".

   This _generally_ should be replaced with using "+m", so that gcc just
   knows that we both read and write to the location, and that allows gcc
   to generate the best possible code, while still generating _correct_
   code because gcc knows what is going on and doesn't think the write
   totally clobbers the old value.

 - many functions are used with random data, and if the caller has a
   "volatile long array[]" (bad - but hey, it happens), then a function
   that _acts_ on that array, like the bitops functions, need to take a
   an argument like "volatile long *".

   So for example, "test_bit()", which can act both on volatile arrays
   _and_ on const arrays, will look like

	int test_bit(int nr, const volatile void * addr);

   which some people think is strange ("Both 'const' _and_ 'volatile'?
   Isn't that a contradiction in terms?"), but the fact is, it reflects
   the different callers, not necessarily test_bit() itself.

 - some functions actually really want the volatile access. The x86 IO
   functions are the best actual example of this:

	static inline unsigned int readl(const volatile void __iomem *addr)
	{
		return *(volatile unsigned int __force *) addr;
	}

   which actually has a _combination_ of the above reason (the incoming
   argument is already marked "volatile" just because the _caller_ may
   have marked it that way) and the cast to volatile would be there
   _regardless_ of the calling convention "volatile", because in this case
   we actually use it as a way to avoid inline assembly (which a number of
   other architectures need to do, and which x86 too needs to do for the
   PIO accesses, but we can avoid it in this case)

So those are all real reasons to use volatile, although the first one is
obviously a case where we no longer should (but at least we have
reasonably good historical reasons for why we did).

The thing to note that is all of the above reasons are basically
"volatile" markers on the _code_. We haven't really marked any _data_
volatile, we're just saying that certain _code_ will want to act on
the data in a certain way.

And I think that's generally a sign of "good" use of volatile - it's
making it obvious that certain specific _use_ of a data may have certain
rules.

As mentioned, there is _one_ case where it is valid to use "volatile" on
real data: it's when you have a "I don't care about locking, and I'm not
accessign IO memory or something else that may need special care"
situation.

In the kernel, that _one_ special case used to basically be the "jiffies"
counter. There's nothing to "lock" there - it just keeps ticking. And it's
still obviously normal memory, so there's no question about any special
rules for accesses. And there are no SMP memory ordering issues for
reading it (for the low bits), since the "jiffies" counter is not really
tied to anything else, so there are no larger "coherency" rules either.

So in that ONE case, "volatile" is actually fine. We really don't care if
we read the old value or the new value when it changes, and there's no
reason to try to synchronize with anything else.

There _may_ be some other cases where that would be true, but quite
frankly, I can't think of any. If the CPU were to have a built-in random
number generator mapped into memory, that would fall under the same kind
of rules, but that's basically it.

One final word: in user space, because of how signal handlers work,
"volatile" can still make sense for exactly the same reasons that
"jiffies" makes sense in the kernel. You may, for example, have a signal
handler that updates some flag in memory, and that would basically look
exactly like the "jiffies" case for your program.

(In fact, because signals are very expensive to block, you may have more
of a reason to use a "jiffies" like flag in user space than you have in
kernel. In the kernel, you'd tend to use a spinlock to protect things. In
user space, with signals, you may have to use some non-locking algorithm,
where the generation count etc might well look like "jiffies").

			Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] spinlocks: remove 'volatile'
Date: Sat, 08 Jul 2006 20:11:43 UTC
Message-ID: <fa.aSSYYm3S3gOFXerH35mvKW1PmiA@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0607081256170.3869@g5.osdl.org>

On Sat, 8 Jul 2006, Albert Cahalan wrote:
> >
> > 1. The volatile implementation of gcc is correct. The standard does not
> > talk about busses, not even about SMP.
>
> The standard need not. An implementation must deal
> with whatever odd hardware happens to be in use.

Not really.

The fact is, "volatile" simply doesn't inform the compiler enough about
what the effect of an access could be under various different situations.

So the compiler really has no choice. It has to balance the fact that the
standard requires it to do _something_ different, with the fact that
there's really not a lot of information that the user gave, apart from the
one bit of "it's volatile".

So the compiler really has no choice.

Btw, I think that the whole standard definition of "volatile" is pretty
weak and useless. The standard could be improved, and a way to improve the
definition of volatile would actually be to say something like

	"volatile" implies that the access to that entity can alias with
	any other access.

That's actually a lot simpler for a compiler writer (a C compiler already
has to know about the notion of data aliasing), and gives a lot more
useful (and strict) semantics to the whole concept.

So to look at the previous example of

	extern int a;
	extern int volatile b;

	void testfn(void)
	{
		a++;
		b++;
	}

_my_ definition of "volatile" is actually totally unambiguous, and not
just simpler than the current standard, it is also stronger. It would make
it clearly invalid to read the value of "b" until the value of "a" has
been written, because (by my definition), "b" may actually alias the value
of "a", so you clearly cannot read "b" until "a" has been updated.

At the same time, there's no question that

	addl $1,a
	addl $1,b

is a clearly valid instruction sequence by my simpler definition of
volatile. The fact that "b" can alias with itself is a tautology, and is
true of normal variables too, so any combination of ops on one variable
(any variable always aliases _itself_) is by definition clearly always
valid on a "volatile" variable too, and thus a compiler that can do the
combination of "load + increment + store" on a normal variable should
always do so on a volatile one too.

In contrast, the current C standard definition of "volatile" is not only
cumbersome and inconvenient, it's also badly defined when it comes to
accesses to _other_ data, making it clearly less useful.

I personally think that my simpler definition of volatile is actually a
perfectly valid implementation of the current definition of volatile, and
I suggested it to some gcc people as a better way to handle "volatile"
inside gcc while still being standards-conforming (ie the "can alias
anything" thing is not just clearer and simpler, it's strictly a subset of
what the C standard allows, meaning that I think you can adopt my
definition _without_ breaking any old programs or standards).

But there really is no way to "fix" volatile. You will always invariably
need other things too (inline assembly with "lock" prefixes etc) to
actually create true lock primitives. The suggested "can alias anything"
semantics just clarify what it means, and thus make it less ambiguous. It
doesn't make it fundamentally more useful in general.

			Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] spinlocks: remove 'volatile'
Date: Sat, 08 Jul 2006 19:15:26 UTC
Message-ID: <fa.JfP7UT9ZV1ZaXre2mWnlbx64UV4@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0607081154480.3869@g5.osdl.org>

On Sat, 8 Jul 2006, Andi Kleen wrote:
>
> How do you define reference? While you could do locked mem++ you can't
> do locked mem *= 2 (or any other non trivial operation that doesn't
> direct map to an memory operand x86 instruction which allows lock
> prefixes)

Actually, this is one reason why I don't like "volatile".

The compiler has exactly the same problems saying "what can you do", so if
you do

	extern int a;
	extern volatile int b;

	void testfn(void)
	{
		a++;
		b++;
	}

the compiler will generally generate _worse_ code for "b++" than for
"a++".

There's really no _reason_ not to do

	addl $1,a
	addl $1,b

for the above on x86 (or any other architecture that does memory
operations), but the compiler will almost certainly be totally paranoid
about the fact that "b" was marked volatile, and it won't dare combine
_any_ operations on it, even though they are obviously equivalent.

So try the above on gcc, and you'll see something like this:

	testfn:
	        movl    b, %eax
	        addl    $1, a
	        addl    $1, %eax
	        movl    %eax, b
	        ret

where the "volatile" on b means that the code generated was
_fundamentally_ more expensive, for absolutely zero gain.

Btw, this also shows another weakness with "volatile", which is very
fundamental, and very much a problem: look where the read-vs-write of "b"
takes place.

It _reads_ the old value of "b" before it reads and updates "a", and
_writes_ the new value of "b" after it has updated "a".

Why? It's perfectly correct code as far as the compiler is concerned: the
"volatile" was only on the "b", and accesses to "a" were not constrained,
so the compiler is perfectly correct in moving the access to "a" around,
since that one doesn't have any "side effects" as far as the definition of
the C language is concerned.

Now, people should realize what this means when it comes to things like
locking: even in the _absense_ of the CPU re-ordering accesses, and even
in a single-threaded program with signals, there is some real re-ordering
by the compiler going on.

> IMHO the "barrier model" used by Linux and implemented by gcc is much
> more useful for multithreaded programming, and MMIO needs other
> mechanisms anyways.
>
> I wish the C standard would adopt barriers.

The problem ends up being that you really can have a _lot_ of different
kinds of barriers.

For example, if you really want to implement spinlocks in some kinf of
"smart C" without needing inline assembly, different architectures will
want different barriers. In many architectures the barriers are not
stand-alone things, but are tied to the accesses themselves.

That's _not_ just the x86 kind of "locked instructions", btw, and it's not
even just an "ugly special case". Some locking models are _defined_ as
having the barriers not be separate from the operations, and in fact,
those locking models tend to be the BEST ONES.

So you either need to do barriers on a high level (ie "spinlock", "mutex"
etc) or you need to be able to describe things on a very low level indeed.
The low level is very hard to come to grips with, but the high level has
the problem that different classes of programs need totally different
kinds of high-level barriers, so it's almost impossible to do them as
somethng that the compiler can generate directly.

For example, the kernel kind of spinlocks are basically useless in user
space, because signals (the equivalent of an "interrupt" in user space)
are hard to block, so "spin_lock_irq()" is hard. Also, since there is some
external entity that does scheduling, the "I'm holding a spinlock, don't
schedule me away" kind of thing doesn't work. In the kernel, we just
increment the preemption count. In user space, you need to do something
different.

So the power of C is really that it makes the easy things easy, and the
hard things are _possible_ using valid ways to break the model. Notably,
you have type-casts to break the type model (and hey, many of the things
you can do when you break the type model may not be portable any more),
and you have inline assembly when you need to break the "abstract CPU
model".

Yeah, inline assembly is a "cop-out" in the sense that it's just admitting
that the language doesn't cover every possibility. But hey, _no_ language
can cover every single possibility, so the fact that C _does_ have a
cop-out is actually one of its biggest strengths. Many other languages
don't have that capability at all, and are strictly weaker as a result.

			Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] spinlocks: remove 'volatile'
Date: Sun, 09 Jul 2006 22:19:41 UTC
Message-ID: <fa.rCszDVqeyKNCKs010yPaWcGriIs@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0607091506250.5623@g5.osdl.org>

On Sun, 9 Jul 2006, Pavel Machek wrote:
>
> volatile int a; a=1; a=2;
>
> ...under old definition, there's nothing to optimize but AFAICT, your
> definition allows optimizing out a=1.

If 'a' can alias anything, then by definition the first 'a=1' could have
changed something else than the second one. Otherwise, it couldn't have
aliased "anything", it would have aliased only something _particular_.

IOW, you can think of "aliasing anything" as being equivalent to saying
"the address is indeterminate". Two writes could literally go to two
different things.

But yeah, maybe that's not really perfect either. It leaves the
read-vs-read ordering still open.

			Linus

From: Theodore Tso <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] spinlocks: remove 'volatile'
Date: Sun, 09 Jul 2006 19:51:44 UTC
Message-ID: <fa.z0+LWt3vbO+panVcW/WI0C0i5lo@ifi.uio.no>
Original-Message-ID: <20060709195114.GB17128@thunk.org>

On Sun, Jul 09, 2006 at 12:16:15PM -0700, David Schwartz wrote:
> > Volatile is useful for non device driver work, for example VJ-style
> > channels.  A portable volatile can help to code such things in a
> > compiler-neutral and platform-neutral way.  Linux doesn't care about
> > compiler neutrality, being coded in GNU C, and about platform
> > neutrality, having a per-arch abstraction layer, but other programs may
> > wish to run on multiple compilers and multiple platforms without
> > per-platform glue layers.
>
> 	There is a portable volatile, it's called 'pthread_mutex_lock'. It allows
> you to code such things in a compiler-neutral and platform-neutral way. You
> don't have to worry about what the compiler might do, what the hardware
> might do, what atomic operations the CPU supports, or anything like that.
> The atomicity issues I've mentioned in my other posts make any attempt at
> creating a 'portable volatile' for shared memory more or less doomed from
> the start.

The other thing to add here is that if you're outside of the kernel,
you *don't* want to be implementing your own spinlock if for no other
reason than it's a performance disaster.  The scheduler doesn't know
that you are spinning waiting for a lock, so you will be scheduled and
burning cpu time (and energy, and heat budget) while waiting for the
the lock.  I once spent over a week on-site at a customer complaining
about Linux performance problems, and it turned out the reason why was
because they thought they were being "smart" by implementing their own
spinlock in userspace.  Bad bad bad.

So if a userspace progam ever uses volatile, it's almost certainly a
bug, one way or another.

						- Ted

From: Theodore Tso <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: [OT] 'volatile' in userspace
Date: Mon, 10 Jul 2006 03:43:55 UTC
Message-ID: <fa.IBUXcPVbnQoDVy51Evv1wqYgqdE@ifi.uio.no>
Original-Message-ID: <20060710034250.GA15138@thunk.org>

On Sun, Jul 09, 2006 at 10:40:06PM +0200, Rutger Nijlunsing wrote:
> > So if a userspace progam ever uses volatile, it's almost certainly a
> > bug, one way or another.
>
> Without 'volatile' and disabling optimizations altogether, how do we
> prevent gcc from optimizing away pointers? As can be seen on
> http://wiki.rubygarden.org/Ruby/page/show/GCAndExtensions (at
> 'Compiler over-optimisations and "volatile"'), volatile is used to
> prevent a specific type of optimization. This is because of the
> garbage collector, which scans the stack and registers to find
> referenced objects. So you don't want local variables containing
> references to objects optimized away.

Well, if you look at the Wiki, it admits that this is a bug:

	(Warning: This section is not strictly correct. volatile
	instructs the C compiler that it should not do certain
	optimisations to code that accesses the variable - the value
	cannot be stored in a register and must be read from memory
	each time it is accessed. It is still perfectly legal for the
	compiler to overwrite the VALUE's stack location with other
	data, if the compiler decides there are no further uses of the
	VALUE. Fortunately, a side effect of volatile in common C
	compilers like GCC and Visual Studio is to prevent the
	dangerous optimisation described above. The Ruby source itself
	uses volatile for this purpose, so it is an "accepted hack"
	for Ruby C extensions.)

"Accepted hack" is basically another way of saying bug.  Some day GCC
will be made smart enough to optimize the use of str on the stack, and
then the Ruby will be screwed.  (Mental note to self: don't use Ruby
in any future project.)

This is really an architectural bug.  RSTRING() should have explicitly
bumped a use pointer, which the C code should have then explicitly
decremented, to protect the underlying pointer from getting GC'ed
unexpectedly.  It would have made RSTRING() more difficult to use, but
that's the price you pay when you try to graft a garbage-collected
paradigm into C code, since the C language really was never designed
for it.

So this would tend to confirm the rule of thumb: use of "volatile" in
a userspace progam tends to indicate a bug.

						- Ted

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] remove volatile from nmi.c
Date: Fri, 14 Jul 2006 15:25:14 UTC
Message-ID: <fa.xyBjttYN8/Vu+qZ9U+7W+CppieQ@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0607140757080.5623@g5.osdl.org>

On Fri, 14 Jul 2006, Steven Rostedt wrote:
>
> OK, I'm using this as something of an exercise to completely understand
> memory barriers.  So if something is incorrect, please let me know.

It's not an incorrect change, but perhaps more importantly, the old code
was buggy in other ways too. Which is sadly more-than-common with anything
that uses volatile - the issues that make people think using "volatile" is
a good idea also tend to cause other problems if the person in question
isn't careful (and using "volatile" obviously means that he/she/it wasn't
very careful when writing it).

In particular, notice how "endflag" is on the _stack_ of the CPU that
wants to send out the NMI to another CPU?

Now, think what that means for the case where we time out and return from
the function with an error.. In particular, think about the case of the
other CPU having been very busy, and now having a stale pointer that
points _where_ exactly?

Also, when the caller sets "endflag", it doesn't (for barrier reasons, see
more below) actually need to use a write barrier in either of the two
cases, because of some _other_ issues. There are two cases of the caller
setting endflag, and neither of them needs "set_wmb()", but it's perhaps
instructive to show _why_.

The first case is the initialization to zero. That one doesn't need a
write barrier, because it has _other_ serialization to any reader. In
order for another CPU to read that value, the other CPU needs to have
_gotten_ the pointer to it in the first place, and that implies that it
got the "smp_call_function()" thing.

And "smp_call_function()" will better have a serialization in it, because
otherwise _any_ user of smp_call_function() would potentially set up data
structures that aren't then readable from other CPUs. So for the
particular case of x86, see the "mb()" in smp_call_function() just before
it does the "send_IPI_allbutself()".

Now, the other case is the case where we set endflag to 1 because we're no
longer interested in the other CPU's. And the reason we don't need a
barrier there is that WE OBVIOUSLY NO LONGER CARE when the other side
sees the value - at that point, it's all moot, because there isn't any
serialization left, and it's just a flag to the other CPU's saying "don't
bother".

So let's go back to the bigger problem..

Now, there is a "reason" we'd want "endflag" to either be volatile, or
have the "set_wmb()", and that is that the code is incorrect in the first
place.

Without the volatile, or the "set_wmb()", the compiler could decide to not
do the last "endflag = 1" write _at_all_, because

 - endflag is an automatic variable

 - we're going to return from the function RSN, which de-allocates it

and as such, the "volatile" or "set_wmb()" actually forces that write to
happen at all. It so happens that because we have a printk() in there, and
gcc doesn't know that the printk() didn't get the address of the variable
through the "smp_call_function()" thing, gcc won't dare to remove the
write anyway, but let's say that the final 'printk("OK.\n");' wasn't
there, then the compiler could have removed it.

So in that sense, "volatile" and "set_wmb()" superficially "remove a bug",
since optimizing out the write is wrong. However, the REAL bug was totally
elsewhere, and is the fact that "endflag" is an automatic variable in the
first place! The compiler would have been _correct_ to optimize the store
away, because the compiler (unlike the programmer) would have correctly
realized that it cannot matter.

> The first removal is trivial, since the barrier in the while loop makes
> it unnecessary.

Yes, and the first removal is also very much correct.

> The second is what I think is correct.

See above. The second is "correct", in the sense that from a "volatile
removal" standpoint it does all the right things. But it's incorrect,
because it misses the bigger problem with the code.

So I would suggest that the _real_ fix is actually something like the
appended, but I have to say that I didn't really look very closely into
it.

I think that in _practice_ it probably doesn't really matter (in practice,
the other CPU's will either get the NMI or not, and in practice, the stack
location - even after it is released - will probably be overwritten by
something non-zero later anyway), but I think that my fix makes it more
obvious what is really going on, and it's easier to explain why it does
what it does because it no longer depends on insane code.

But somebody like Ingo should probably double-check this.

(The "Have we done this already" test is just covering my ass - I don't
think we should be calling that function more than once, but one of the
things that happens when the "endflag" semantics are fixed is that the
function now has history and the variable is no longer "per CPU". The
point is, that changes how initializations etc may need to be done: in
this case we only want to do it once, but in other cases this kind of
change may have more far-reaching implications).

		Linus

---
diff --git a/arch/i386/kernel/nmi.c b/arch/i386/kernel/nmi.c
index 2dd928a..eb8bbbb 100644
--- a/arch/i386/kernel/nmi.c
+++ b/arch/i386/kernel/nmi.c
@@ -106,7 +106,7 @@ #ifdef CONFIG_SMP
  */
 static __init void nmi_cpu_busy(void *data)
 {
-	volatile int *endflag = data;
+	int *endflag = data;
 	local_irq_enable_in_hardirq();
 	/* Intentionally don't use cpu_relax here. This is
 	   to make sure that the performance counter really ticks,
@@ -121,10 +121,14 @@ #endif

 static int __init check_nmi_watchdog(void)
 {
-	volatile int endflag = 0;
+	static int endflag = 0;
 	unsigned int *prev_nmi_count;
 	int cpu;

+	/* Have we done this already? */
+	if (endflag)
+		return 0;
+
 	if (nmi_watchdog == NMI_NONE)
 		return 0;

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] remove volatile from nmi.c
Date: Fri, 14 Jul 2006 16:48:24 UTC
Message-ID: <fa.6GS6o/G5jfLfCzYH5Pa14e+Qg/Y@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0607140941170.5623@g5.osdl.org>

On Fri, 14 Jul 2006, Chase Venters wrote:
> >
> > static int __init check_nmi_watchdog(void)
> > {
> > -	volatile int endflag = 0;
> > +	static int endflag = 0;
>
> Now that this is static, isn't this a candidate for __initdata?

Yes, that would be good.

Somebody want to test that it actually still _works_, and go through all
the logic?

On a similar vein: Steven, looking at the cmos version of the patch, I
have a hard time knowing whether the added barriers are needed, because I
didn't spend any time on looking at the context of the patch. But I
suspect that generally you do _not_ want to add barriers when you remove
volatiles.

Basically, "volatile" is not a sign that a barrier is needed per se. In
many cases, the _only_ thing that "volatile" implies is that the original
programmer was confused and/or lazy.

So replacing volatiles with accesses with barriers is usually the _wrong_
thing to do. The right thing to do is generally to just _remove_ the
volatile entirely, and then think hard about whether there was some _real_
reason why it existed in the first place.

Note that the only thing a volatile can do is a _compiler_ barrier, so if
you add a real memory barrier or make it use a "set_wmb()" or similar,
you're literally changing code that has been tested to work, and you're
in the process also removing the hint that the code may actually have
fundamental problems.

So I'd argue that it's actually _worse_ to do a "mindless" conversion away
from volatile, than it is to just remove them outright. Removing them
outright may show a bug that the volatile hid (and at that point, people
may see what the _deeper_ problem was), but at least it won't add a memory
barrier that isn't necessary and will potentially just confuse people.

			Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] remove volatile from nmi.c
Date: Fri, 14 Jul 2006 17:29:12 UTC
Message-ID: <fa.nC4p4e3rd92tZQto0K0drTM7RSI@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0607141017520.5623@g5.osdl.org>

On Fri, 14 Jul 2006, Linus Torvalds wrote:
>
> Now, there is a "reason" we'd want "endflag" to either be volatile, or
> have the "set_wmb()", and that is that the code is incorrect in the first
> place.

Btw, and this may just be me, but I personally don't much like the
"set_wmb()" macro. I think it should be removed.

I don't think we actually use it anywhere, and the thing is, it's not
really useful. It is basically _always_ equivalent to

	var = value;
	smp_wmb();

except I think some architectures could _in_theory_ make the assignment be
a "store with release consistency". The only architecture where that might
make sense that I can think of is Itanium, and even there the ia64
set_wmb() macro doesn't actually do that.

Yeah, the

	endflag = 1;
	smp_wmb();

is a bit longer, but is actually easier to understand, I think.

I suspect "set_wmb()" was added just from an incorrect sense of
consistency with "set_mb()" (which I don't particularly like either, but
at least that one makes a difference on a real platform, ie on x86 that
"set_mb()" ends up being implemented as a single "xchg" instruction).

			Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] remove volatile from nmi.c
Date: Fri, 14 Jul 2006 17:42:33 UTC
Message-ID: <fa.cblmPgPDd+owfMyEF2C4Zxkt4ic@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0607141040550.5623@g5.osdl.org>

On Fri, 14 Jul 2006, Steven Rostedt wrote:

> > 	endflag = 1;
> > 	smp_wmb();
>
> This was what I originally wrote, and then I saw the set_wmb which made
> me think that it was the proper way to do things (why else is it
> there?). So if it shouldn't be used, then we should get rid of it or at
> least mark it deprecated, otherwise you have people like me thinking
> that we should use it.

Yeah, we should probably get rid of it. No need to even mark it
deprecated, since nobody uses it anyway.

At a minimum, I think we should not document it in the locking
documentation, making people incorrectly think it might be a good idea.

Hmm? Andrew?

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH 0/24] make atomic_read() behave consistently across all
Date: Fri, 17 Aug 2007 03:44:43 UTC
Message-ID: <fa.CQDLUB55VyaFl/Uwd1DqglMeqJM@ifi.uio.no>

On Fri, 17 Aug 2007, Paul Mackerras wrote:
>
> I'm really surprised it's as much as a few K.  I tried it on powerpc
> and it only saved 40 bytes (10 instructions) for a G5 config.

One of the things that "volatile" generally screws up is a simple

	volatile int i;

	i++;

which a compiler will generally get horribly, horribly wrong.

In a reasonable world, gcc should just make that be (on x86)

	addl $1,i(%rip)

on x86-64, which is indeed what it does without the volatile. But with the
volatile, the compiler gets really nervous, and doesn't dare do it in one
instruction, and thus generates crap like

        movl    i(%rip), %eax
        addl    $1, %eax
        movl    %eax, i(%rip)

instead. For no good reason, except that "volatile" just doesn't have any
good/clear semantics for the compiler, so most compilers will just make it
be "I will not touch this access in any way, shape, or form". Including
even trivially correct instruction optimization/combination.

This is one of the reasons why we should never use "volatile". It
pessimises code generation for no good reason - just because compilers
don't know what the heck it even means!

Now, people don't do "i++" on atomics (you'd use "atomic_inc()" for that),
but people *do* do things like

	if (atomic_read(..) <= 1)
		..

On ppc, things like that probably don't much matter. But on x86, it makes
a *huge* difference whether you do

	movl i(%rip),%eax
	cmpl $1,%eax

or if you can just use the value directly for the operation, like this:

	cmpl $1,i(%rip)

which is again a totally obvious and totally safe optimization, but is
(again) something that gcc doesn't dare do, since "i" is volatile.

In other words: "volatile" is a horribly horribly bad way of doing things,
because it generates *worse*code*, for no good reason. You just don't see
it on powerpc, because it's already a load-store architecture, so there is
no "good code" for doing direct-to-memory operations.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH 0/24] make atomic_read() behave consistently across all
Date: Fri, 17 Aug 2007 03:07:38 UTC
Message-ID: <fa.K4ygY9CMnvGButwdYMA9I9At2wE@ifi.uio.no>

On Fri, 17 Aug 2007, Paul Mackerras wrote:
>
> Volatile doesn't mean it can't be reordered; volatile means the
> accesses can't be eliminated.

It also does limit re-ordering.

Of course, since *normal* accesses aren't necessarily limited wrt
re-ordering, the question then becomes one of "with regard to *what* does
it limit re-ordering?".

A C compiler that re-orders two different volatile accesses that have a
sequence point in between them is pretty clearly a buggy compiler. So at a
minimum, it limits re-ordering wrt other volatiles (assuming sequence
points exists). It also means that the compiler cannot move it
speculatively across conditionals, but other than that it's starting to
get fuzzy.

In general, I'd *much* rather we used barriers. Anything that "depends" on
volatile is pretty much set up to be buggy. But I'm certainly also willing
to have that volatile inside "atomic_read/atomic_set()" if it avoids code
that would otherwise break - ie if it hides a bug.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH 0/24] make atomic_read() behave consistently across all
Date: Fri, 17 Aug 2007 03:52:13 UTC
Message-ID: <fa.G9uco0tuFbvcxJJn+2GWRC546KM@ifi.uio.no>

On Fri, 17 Aug 2007, Nick Piggin wrote:
>
> I'm surprised too. Numbers were from the "...use asm() like the other
> atomic operations already do" thread. According to them,
>
>   text    data     bss     dec     hex filename
> 3434150  249176  176128 3859454  3ae3fe atomic_normal/vmlinux
> 3436203  249176  176128 3861507  3aec03 atomic_volatile/vmlinux
>
> The first one is a stock kenel, the second is with atomic_read/set
> cast to volatile. gcc-4.1 -- maybe if you have an earlier gcc it
> won't optimise as much?

No, see my earlier reply. "volatile" really *is* an incredible piece of
crap.

Just try it yourself:

	volatile int i;
	int j;

	int testme(void)
	{
	        return i <= 1;
	}

	int testme2(void)
	{
	        return j <= 1;
	}

and compile with all the optimizations you can.

I get:

	testme:
	        movl    i(%rip), %eax
	        subl    $1, %eax
	        setle   %al
	        movzbl  %al, %eax
	        ret

vs

	testme2:
	        xorl    %eax, %eax
	        cmpl    $1, j(%rip)
	        setle   %al
	        ret

(now, whether that "xorl + setle" is better than "setle + movzbl", I don't
really know - maybe it is. But that's not thepoint. The point is the
difference between

                movl    i(%rip), %eax
                subl    $1, %eax

and

                cmpl    $1, j(%rip)

and imagine this being done for *every* single volatile access.

Just do a

	git grep atomic_read

to see how atomics are actually used. A lot of them are exactly the above
kind of "compare against a value".

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH 0/24] make atomic_read() behave consistently across all
Date: Fri, 17 Aug 2007 16:57:34 UTC
Message-ID: <fa.sQTnALWcixMIOPtNoELUMC/lssU@ifi.uio.no>

On Fri, 17 Aug 2007, Nick Piggin wrote:
>
> That's not obviously just taste to me. Not when the primitive has many
> (perhaps, the majority) of uses that do not require said barriers. And
> this is not solely about the code generation (which, as Paul says, is
> relatively minor even on x86). I prefer people to think explicitly
> about barriers in their lockless code.

Indeed.

I think the important issues are:

 - "volatile" itself is simply a badly/weakly defined issue. The semantics
   of it as far as the compiler is concerned are really not very good, and
   in practice tends to boil down to "I will generate so bad code that
   nobody can accuse me of optimizing anything away".

 - "volatile" - regardless of how well or badly defined it is - is purely
   a compiler thing. It has absolutely no meaning for the CPU itself, so
   it at no point implies any CPU barriers. As a result, even if the
   compiler generates crap code and doesn't re-order anything, there's
   nothing that says what the CPU will do.

 - in other words, the *only* possible meaning for "volatile" is a purely
   single-CPU meaning. And if you only have a single CPU involved in the
   process, the "volatile" is by definition pointless (because even
   without a volatile, the compiler is required to make the C code appear
   consistent as far as a single CPU is concerned).

So, let's take the example *buggy* code where we use "volatile" to wait
for other CPU's:

	atomic_set(&var, 0);
	while (!atomic_read(&var))
		/* nothing */;

which generates an endless loop if we don't have atomic_read() imply
volatile.

The point here is that it's buggy whether the volatile is there or not!
Exactly because the user expects multi-processing behaviour, but
"volatile" doesn't actually give any real guarantees about it. Another CPU
may have done:

	external_ptr = kmalloc(..);
	/* Setup is now complete, inform the waiter */
	atomic_inc(&var);

but the fact is, since the other CPU isn't serialized in any way, the
"while-loop" (even in the presense of "volatile") doesn't actually work
right! Whatever the "atomic_read()" was waiting for may not have
completed, because we have no barriers!

So if "volatile" makes a difference, it is invariably a sign of a bug in
serialization (the one exception is for IO - we use "volatile" to avoid
having to use inline asm for IO on x86) - and for "random values" like
jiffies).

So the question should *not* be whether "volatile" actually fixes bugs. It
*never* fixes a bug. But what it can do is to hide the obvious ones. In
other words, adding a volaile in the above kind of situation of
"atomic_read()" will certainly turn an obvious bug into something that
works "practically all of the time).

So anybody who argues for "volatile" fixing bugs is fundamentally
incorrect. It does NO SUCH THING. By arguing that, such people only show
that you have no idea what they are talking about.

So the only reason to add back "volatile" to the atomic_read() sequence is
not to fix bugs, but to _hide_ the bugs better. They're still there, they
are just a lot harder to trigger, and tend to be a lot subtler.

And hey, sometimes "hiding bugs well enough" is ok. In this case, I'd
argue that we've successfully *not* had the volatile there for eight
months on x86-64, and that should tell people something.

(Does _removing_ the volatile fix bugs? No - callers still need to think
about barriers etc, and lots of people don't. So I'm not claiming that
removing volatile fixes any bugs either, but I *am* claiming that:

 - removing volatile makes some bugs easier to see (which is mostly a good
   thing: they were there before, anyway).

 - removing volatile generates better code (which is a good thing, even if
   it's just 0.1%)

 - removing volatile removes a huge mental *bug* that lots of people seem
   to have, as shown by this whole thread. Anybody who thinks that
   "volatile" actually fixes anything has a gaping hole in their head, and
   we should remove volatile just to make sure that nobody thinks that it
   means something that it doesn't mean!

In other words, this whole discussion has just convinced me that we should
*not* add back "volatile" to "atomic_read()" - I was willing to do it for
practical and "hide the bugs" reasons, but having seen people argue for
it, thinking that it actually fixes something, I'm now convinced that the
*last* thing we should do is to encourage that kind of superstitious
thinking.

"volatile" is like a black cat crossing the road. Sure, it affects
*something* (at a minimum: before, the black cat was on one side of the
road, afterwards it is on the other side of the road), but it has no
bigger and longer-lasting direct affects.

People who think "volatile" really matters are just fooling themselves.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH 0/24] make atomic_read() behave consistently across all
Date: Fri, 17 Aug 2007 19:12:25 UTC
Message-ID: <fa.B/I5vjeo/7R/Z7IosCgdPVKbGRs@ifi.uio.no>

On Fri, 17 Aug 2007, Chris Friesen wrote:
>
> I assume you mean "except for IO-related code and 'random' values like
> jiffies" as you mention later on?

Yes. There *are* valid uses for "volatile", but they have remained the
same for the last few years:
 - "jiffies"
 - internal per-architecture IO implementations that can do them as normal
   stores.

> I assume other values set in interrupt handlers would count as "random"
> from a volatility perspective?

I don't really see any valid case. I can imagine that you have your own
"jiffy" counter in a driver, but what's the point, really? I'd suggest not
using volatile, and using barriers instead.

>
> > So anybody who argues for "volatile" fixing bugs is fundamentally
> > incorrect. It does NO SUCH THING. By arguing that, such people only
> > show that you have no idea what they are talking about.

> What about reading values modified in interrupt handlers, as in your
> "random" case?  Or is this a bug where the user of atomic_read() is
> invalidly expecting a read each time it is called?

Quite frankly, the biggest reason for using "volatile" on jiffies was
really historic. So even the "random" case is not really a very strong
one. You'll notice that anybody who is actually careful will be using
sequence locks for the jiffy accesses, if only because the *full* jiffy
count is actually a 64-bit value, and so you cannot get it atomically on a
32-bit architecture even on a single CPU (ie a timer interrupt might
happen in between reading the low and the high word, so "volatile" is only
used for the low 32 bits).

So even for jiffies, we actually have:

	extern u64 __jiffy_data jiffies_64;
	extern unsigned long volatile __jiffy_data jiffies;

where the *real* jiffies is not volatile: the volatile one is using linker
tricks to alias the low 32 bits:

 - arch/i386/kernel/vmlinux.lds.S:

	...
	jiffies = jiffies_64;
	...

and the only reason we do all these games is (a) it works and (b) it's
legacy.

Note how I do *not* say "(c) it's a good idea".

			Linus

Index Home About Blog