Streaming data(John R. Mashey)

Index Home About Blog
From: mash@mash.engr.sgi.com (John R. Mashey)
Newsgroups: comp.arch
Subject: Re: How forward/backward compatible are cache management ISAs?
Date: 18 Sep 1998 23:17:56 GMT

In article <zalmanEzHvr0.HD1@netcom.com>, zalman@netcom.com (Zalman
Stern) writes:

|> (Other possibilities present themselves too. The entire cache could be
|> scrubbed on process switches. The cache could behave differently depending
|> on whether the current process has read access to the block or not, etc. I
|> don't see these options being very attractive. And at some level, leaving
|> the architecture spec open doesn't really win. For example if the first
|> implementation zero's the cache block, applications may come to depend on
|> that. Probably unintentionally. But that gets us back to the whole
|> philisophical debate about architectural flexibility vs. application
|> reliability.)

This whole issue is another of my favorites, traditionally very difficult to
solve in a long-term way, part of a complex of problems best termed:
	"How can you make cache hierarchies get out of the way for
	streaming data when you want them out of the way, but without giving
	up the benefits of caches?"

1) Optimal solutions vary according to the cache designs.
	This kind of thing can generate a surprising amount of logic in
	a 2-level writeback cache, especially with L1 linesizes
	smaller than L2.

2) And worse, even with the same CPU/cache, they vary with the system
design.  What works on a simple uniprocessor can fail on a snooping SMP,
and what works there can fail on a ccNUMA...


3) The semantics pretty much have to say:
	(a) The cache line is zeroed, and what you're really saving,
	in a system with a writeback cache, is the initial read from memory.
OR	(b) Having offered the intent-to-write hint, any read reference
	to bytes not yet written is an error that is trapped. (This might
	require a status bit per byte, not pleasant.

	(a) Seems easier to implement, and in some systems would be pretty good.

4) Suppose you do (a).  This can be trickier than you'd think.

4a) Uniprocessor:
		(a) If the cache-line is not present, allocate it,
		evicting another if necessary.
		(b) Zero the cache line (which is present by now).
		No need to tell the outside world anything.

4b) Snooping-bus  SMP
	Cache line is:
	(a) Present, dirty:	zero it
	(b) Present, shared:  	broadcast invalidate, then zero it
	(c) Present, clean, owned:	mark it dirty, then zero it
	(d) Not present:	broadcast invalidate, allocate it, zero it


4c) Directory-based ccNUMA
	(a) Present, dirty: zero it
	(b) Present, shared:	request ownership from memory controller, zero
	(c) Present, clean, owned: tell memory controller of state change, zero
	(d) Not present:	request ownership from memory controller, zero

Requesting ownership from memory controller for write will cause
invalidates to be sent to any other nodes with copies ... of course, tne
normal coherency protocols would cause another node that had a dirty copy,
to first write it back to the home memory first.

5) Really making this work really well, with minimal wasted accesses,
probably requires some additional mechanisms, because in many systems designs,
you may save a memory access, but you still have coherency transactions.

6)  Among the issues that must be dealt with
	(a) Suppose the process that is doing this is context-switched to
	another CPU before it finishes filling the cache line.  If the
	CPU has any special state for that cache line, and that state affects
	the semantics, the OS better be able to save/restore that state,
	or transfer it to the other CPU. (For example, suppose you imagine
	that you had a special piece of hardware, with a bit-per-byte,
	to trap references to data read-but-not-written.)
	(b) Suppose it happens that the cache line gets evicted before
	the process is finished filling it.
	(c) Suppose a speculative execution CPU generates a read memory
	access that wants to read the data not yet filled in, but that was
	speculated along a path not actually taken.  It is possible to do
	the right thing [which is to notice the case and ignore it],
	but care must be taken.
For all of these reasons, zero-the-line is probably a cleaner, safer idea.,
if all of his is exposed to user-level code.

7) None of this is as appealing as it first sounds, and there probably is
a better way, but it takes additional semantics at the user level.
--
-john mashey    DISCLAIMER: <generic disclaimer: I speak for me only...>
EMAIL:  mash@sgi.com  DDD: 650-933-3090 FAX: 650-969-6289
USPS:   Silicon Graphics/Cray Research 6L-005,
2011 N. Shoreline Blvd, Mountain View, CA 94043-1389
Index Home About Blog