Write combining (Linus Torvalds)

Index Home About Blog
From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: MMIO and gcc re-ordering issue
Date: Wed, 04 Jun 2008 02:47:49 UTC
Message-ID: <fa.xJWSg2nW7mla3B0KF9p+qIu/leM@ifi.uio.no>

On Wed, 4 Jun 2008, Nick Piggin wrote:
>
> Actually, according to the document I am looking at (the AMD one), a UC
> store may pass a previous WC store.

Hmm. Intel arch manual, Vol 3, 10.3 (page 10-7 in my version):

  "If the WC buffer is partially filled, the writes may be delayed until
   the next ocurrence of a serializing event; such as, an SFENCE or MFENCE
   instruction, CPUID execution, a read or write to uncached memory, ..."

Any typos mine.

Anyway, Intel certainly seems to document that WC memory is serialized by
any access to UC memory.

But yes, I can well imagine that AMD is different, and I also heartily
would recommend rather being safe than sorry. Putting an explicit memory
barrier in between those accesses when you know it might make a difference
is just a good idea.

But basically, as far as I know the thing was designed to be invisible to
old software: that is the whole idea behind WC memory. So the design was
certainly intended to be that you can generally mark a framebuffer-like
structure WC without any software _ever_ caring, as long as you keep all
control ports in UC memory.

Of course, because burst writes from the WC buffer are <i>so</i> much more
efficient on the PCI bus than dribbling them out one write at a time, it
didn't take long before all the graphics cards etc wanted to <i>also</i>
mark their command queues as WC memory, so that you could burst out the
commands to the ring buffers as fast as possible. So now you have both
your frame buffer *and* your command buffers mapped WC, and now ordering
really has to be ensured in software if you access both.

[ And then there are the crazy people who mark *main memory* as WC,
  because they don't want to pollute the cache with all the data, and then
  you have the issue of cache coherency etc crap. Which only gets worse
  with SMP, especially if one processor thinks it has part of memory
  exclusively cached, and another one - or even the same one,
  through another aliasing address - ignores the cache protocol.

  And you now get unhappy CPU's that think that there is a bug in the
  cache protocol and they get machine check faults.

  So what started out as a "we can do accesses to the frame buffer more
  efficiently without anybody ever even having to know or care" has
  turned into a whole nightmare of people using it for other things, and
  then you very much _do_ have to care! ]

And it doesn't surprise me if AMD then didn't get exactly the same
rules.

Oh, well.

		Linus
Index Home About Blog