From: Linus Torvalds <email@example.com>
Subject: Re: MMIO and gcc re-ordering issue
Date: Wed, 04 Jun 2008 02:47:49 UTC
On Wed, 4 Jun 2008, Nick Piggin wrote:
> Actually, according to the document I am looking at (the AMD one), a UC
> store may pass a previous WC store.
Hmm. Intel arch manual, Vol 3, 10.3 (page 10-7 in my version):
"If the WC buffer is partially filled, the writes may be delayed until
the next ocurrence of a serializing event; such as, an SFENCE or MFENCE
instruction, CPUID execution, a read or write to uncached memory, ..."
Any typos mine.
Anyway, Intel certainly seems to document that WC memory is serialized by
any access to UC memory.
But yes, I can well imagine that AMD is different, and I also heartily
would recommend rather being safe than sorry. Putting an explicit memory
barrier in between those accesses when you know it might make a difference
is just a good idea.
But basically, as far as I know the thing was designed to be invisible to
old software: that is the whole idea behind WC memory. So the design was
certainly intended to be that you can generally mark a framebuffer-like
structure WC without any software _ever_ caring, as long as you keep all
control ports in UC memory.
Of course, because burst writes from the WC buffer are <i>so</i> much more
efficient on the PCI bus than dribbling them out one write at a time, it
didn't take long before all the graphics cards etc wanted to <i>also</i>
mark their command queues as WC memory, so that you could burst out the
commands to the ring buffers as fast as possible. So now you have both
your frame buffer *and* your command buffers mapped WC, and now ordering
really has to be ensured in software if you access both.
[ And then there are the crazy people who mark *main memory* as WC,
because they don't want to pollute the cache with all the data, and then
you have the issue of cache coherency etc crap. Which only gets worse
with SMP, especially if one processor thinks it has part of memory
exclusively cached, and another one - or even the same one,
through another aliasing address - ignores the cache protocol.
And you now get unhappy CPU's that think that there is a bug in the
cache protocol and they get machine check faults.
So what started out as a "we can do accesses to the frame buffer more
efficiently without anybody ever even having to know or care" has
turned into a whole nightmare of people using it for other things, and
then you very much _do_ have to care! ]
And it doesn't surprise me if AMD then didn't get exactly the same