Index Home About Blog
From: (John R. Mashey)
Newsgroups: comp.arch
Subject: Re: SPARC Vs MIPS  (really, embedded control usage)
Date: 22 Apr 1995 03:52:53 GMT

(catching up - most of the dust seems to have settled ... but there were some
potentially-misleading comments among the sequence that need to be fixed):

	a) Most of the structure of the R2000 was designed in the Nov'84-
	June '86 time period, and the first chips came back at the end of 1985.

	b) There were certainly some features that might have been different,
	given the short time.  NMI is indeed one of them...  and I might have
	liked a little more flexibility in not having fixed addresses
	(some of which has been fixed later).

	c) However, some features that people have sometimes complained about
	as accidents or unintended omissions ... were done on purpose, or in
	a few cases were conscious implementation tradeoffs given the limited
	die area available.

	a) In 1994, there were 1.67M MIPS chips sold, of which 10% went
	into computer systems, and the oher 90% went into: avionics, laser
	printers, copiers, communcations boards, games, telephone switches 
	and lots of other things, and increasingly-coming consumer products.

	b) This is nice, because it adds volume, and helps amortize the
	development costs - many of these chips were derived from R3000s,
	which after all, first appeared in an $80K rack-mount system in 1988.

	c) Recall that MIPS had a somewhat different model than Sun:
	do a design and then provide the design & verification information
	to the various chip partners, and encourage them to modify the chip,
	or use the information as desired to create new versions for
	various markets.  Such proliferation is especially necessary at the
	lower cost points that are very sensitive to cost and features.

	d) As a result, I think there are more different flavors of MIPS-
	architecture chips being sold than any other RISC ... or if not,
	there are at least enough of them that I lose track myself.

	e) Anyway, this wasn't particularly accidental, even if it wasn't
	always consistently handled ... and some thought was being given to
	this from very early in the architecture's life, as detailed below.

	a) The architecture was required to be good for running UNIX ...
	but not be UNIX-specific.  In fact, it was a specific hope that
	the same chips actually have some use in very non-UNIX OS's like
	telephone switches (which require low-overhead context-switching),
	and high-end embedded applications.  Low/mid-range applications would
	require different chip variants, as actually happened.

	b) Uncached usage was planned from day one, but only in certain ways.
		1) Running code uncached/unmapped simplified the hardware
		needed when coming out of reset: let the code do it,
		then go cached/mapped when it felt like it.
		2) The idea of running serious amounts of time-critical code
		uncached was simply though to be irrelevant, because the
		speed was dragged down to memory speed.  The whole instruction
		set design assumed there would always be an I-cache.

	c) The exception mechanism was actually spec'd/negotiated between
	operating system people and the chip designers ... and (we) OS people
	got most of what we asked for. We specifically didn't care much for the
	typical exception vectors that we'd seen on many past CPUs, as we
	*counted cycles* not instructions, as counting the latter can often
	be a fallacy; people have often been misled by the fact that some
	microcoded engine does *everything* ... so it looks simple, but
	can take 100s of cycles to do it.

		a) Each exception vectors to a different location.

		b) But the locations are not very far apart, so they
		   don't have much code ... in fact, a very typical
		   sequence would be:

		c) Hardware saves all the register in some place ... and this
		  speed is determined by ability to store to memory ...
		  which in some microcoded CPUs, was possible to make go faster
		  than regular instructions ... whereas RISCs usually have
		  expected that a series of store instructions found in the
		  I-cache would store data as fast as could be done.

		d) Hardware vectors the PC somewhere depending on the CAUSE;
		  a common thing would be do this as:
			VECTOR-BASE | CAUSE<<n, where n was fixed enough
	          to allow a small nubmer of instructions at each vector

		e) The code at the vector targets would look like:

		exception1:	reg1 = 1
				jump common
		exception2:	reg1 = 2
				jump common

			manipulate state
			set up overall kernel / C environment
			reg1 = function-table[reg1]
			jump (reg1)

		f) Now, on a a RISC machine, there's no great benefit to
		adding a pile of special hardware that saves state:
			1) All of the state is visible.
			2) Normal stores can store as fast as anything else.
			3) If there are a lot of exceptions, you'd expect the
			exception-handling code to live in the I-cache.
			(Yes, I understand the hard-real-time issues ... which
			is why some chips have lockable cache segments.)

		g) In addition, on  a machine that *depends* on I-cache for
		reasonable performance, and knowing that branches are always
		bad for pipeline bubbles anyway, the *last* thing I want to do
		is vector one place for a few instructions, and then go off to
		common code anyway ...  especially if the cache line size
		(which may well be different for different implementations)
		is larger than the set of instructions at the vector

		h) Hence, for a UNIX or UNIX-like system, we preferred:

			1) Hardware saves a CAUSE code, goes to the common
			   exception address, and arrives with interrupts
			   masked off ... all of which is simple and quick.
			2) The OS saves the registers, then uses the CAUSE
			   register to vector off as it wishes.
			3) Yes, there are a bunch of store instructions ...
			   but this was deemed trivial.

		i) On the other hand, we considered the possibility of other
		OS flavors, and observed that one could:

			1) Save a few registers, just enough to work with.
			2) Use the CAUSE code to vector off right away  ...
			3) To individual routines that could then save
			   a lot of state .. or do a little work, and return.
		and that cycle-wise, especially with potential cache misses,
		we didn't see why this wouldn't be competitive with multiple
		vectors ... and it was certainly easier to implement.
	d) Summary of this part: having lots of exception vector addresses
	was left out on purpose, and with software involvement, and with
	people counting *cycles*;  the answer we got may or may not be right,
	and there may well be circumstances where there are better ways,
	but it certainly wasn't accidental. 


-john mashey    DISCLAIMER: <generic disclaimer, I speak for me only, etc>
DDD:    415-390-3090	FAX: 415-967-8496
USPS:   Silicon Graphics 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311

Index Home About Blog