CPU feature economics (John R. Mashey)

Index Home About Blog
From: mash@mash.engr.sgi.com (John R. Mashey)
Newsgroups: comp.arch
Subject: Re: malloc in hardware?
Date: 25 Mar 1998 21:56:49 GMT

I believe that the question "is it a good idea to put malloc in hardware"
falls into the general class of "should XX be done in hardware",
and that there are some rules of thumb that people have used pretty well.

(Note: Any proposed feature cannot be considered in isolation, but as an
addition to a set of features that already exists, or is in a design proposal.
I.e., to add feature X to ISA A might be deemed useful if it were found to
produce 1% performance improvement for a minimal cost, but the same
feature might not be added to ISA B, if there were already hardware/software
mechanisms for covering 90% of the uses.)

1. Don't put it in hardware if:
	1A. It implements an algorithm with sequential dependencies
	that can be exactly emulated by a series of existing instructions
	at about the same speed.
	2A. Lots of people are still arguing about what the semantics should
	be, and newer algorithms are likely to change it.

2. Do put it in hardware.
	2A. If there is parallelism available to the hardware that cannot
	be obtained by sequences of existing instructions.
	(This is why FPUs exist, and things like MMX: it is hard to
	synthesize a 2-3-clock FP multiply out of typical integer operations,
	whis is why the argument "IEEE FP is complicated and CPUs do it,
	why can't we have XXX is inapplicable.)
	2B. If it's well-understood, and unlikely to have semantic changes.
	2C. If, despite fitting 1A, the code sequence occurs often enough
	that there are substantial code space savings.
	For example, some architectures supply load-multiple/store-multiple
	instructions, which load/store registers no faster than the sequence
	of loads and stores, but save code space on function prolog and epilog,
	and can be made reasonably straightforward.
	On the other hand, complex function-call instructions have somewhat
	fallen out of favor, as compilers can often do fairly well.
	2D. If the hardware has access to extra state, or abilities difficult
	to provide to normal user-level instructions, like:
		- extra state, like guard bits
		- ability to guarantee non-interruptability
		- abilty to access a piece of state that is otherwise restricted
	2E. If hardware checks can be done in parallel, such that they can
	eliminate numerous software checks that would hardly ever fail.


As far as I can tell, malloc in hardware:
	fits 1A and 2A (and therefore not 2A and 2B)
	Probably doesn't fit 2C, i.e., if you had a malloc instruction,
	it would usually be called as a library function anyway.

So, that leaves 2D, or maybe 2A.  So, a reasonable question would be:

Can you start with a vanilla architecture, like any of the current
RISCs, and add a few instructions that really help memory allocation,
deallocation?  Is there some way you can get at more parallelism in
the machine for this?  (It's not enough to just take a sequence of
instructions andcast them into hardware, unless it does a lot for you).
(I'm sort of reminded of the SOAR project at UCB.)

We had a long discussion with Alan Kay about this ~10 years ago,
i.e., what facilities could be added for languages of the Smalltalk/LISP
sort, and for dynamic allocation in general, and considered a lot of
things like programmable masks that would be compared against every memory
reference, but we didn't come with any consensus on useful, implementable
features.  A lot of such features in various chips actually ended up getting
bypassed by implementors of such langauges, for one reason or another.




--
-john mashey    DISCLAIMER: <generic disclaimer: I speak for me only...>
EMAIL:  mash@sgi.com  DDD: 650-933-3090 FAX: 650-932-3090
USPS:   Silicon Graphics/Cray Research 6L-005,
2011 N. Shoreline Blvd, Mountain View, CA 94043-1389
Index Home About Blog