Vectored interrupts (John Mashey)

Index Home About Blog
From: "John Mashey" <old_systems_guy@yahoo.com>
Newsgroups: comp.arch
Subject: Re: Vectored Interrupt Fetch
Date: 28 Feb 2006 10:52:51 -0800
Message-ID: <1141152771.570411.215550@t39g2000cwt.googlegroups.com>

Eric P. wrote:
....
> To expand on this...
> It is not just the interrupt vector (the table of routine pointers)
> but the driver code, the interrupt stack and all data objects that
> are touched that must be considered. Caching just the vector would
> have little effect.
.....

Yes, and more...

In an earlier post, Mitch wrote: "The early RISC argument is not to do
vectored interrupts, but to interrupt to a specified location and then
have software read a surprise register and use that to determine the
interrupt handler. All in all, the vectored interrupt eliminates this
software and the overhead of loading the instruction to determine the
vector. "

At MIPS (designed in early 1985):
a) We had a separate exception location for UTLBMISS, the software TLB
refill routine.

b) The other exceptions and interrupts trapped to a common locaiton,
having set a CAUSE register.

We probably could have gotten a more general vector scheme, but:
a) We (i.e., the OS group) had done plenty of UNIX ports and other OSs
on CPus with vectored inteterrupts, and rather than having them again,
waht we wanted was: "Get us into the OS quickly, in a clean state, and
let us figure out what to do."  Why?

b) Consider the typical vectored scheme:

BASE => vector of addresses, indexed by exception cause code.

BASE: @exception0
BASE+4: @exception1
......

exception0: code for exception 0
   ....
exception1: code for exception 1
....

Looks good ... but in fact, in various real OSs of the time, it turns
out that most interrupts and exceptions were heavyweight, because:
- the OS might be responding to an I/O interrupt
- the OS mighty have to transfer large amounts of data to/from the user
program
- the OS might have to handle a page fault
- the OS might have to be prepared to do a context switch

So, as a result the *actual* code often found in such systems was:

exception0: mov cause,0  # record the cause code
  jump commoncode
exception1: move cause,1
  jump commoncode

commoncode:
 do common register saving, setup for kernel environment
 jump vector[cause]

Oops.  In this case, having the vectored setup actually didn't help at
all, and might well have hurt, because:
A the CPU must fetch the original vector address [potential cache miss
1]*
B the CPU jumps to the location [potential cache miss 2]*
C the CPU does 1 instruction, then jumps to the common code [potential
miss 3]*
D the CPU does the common code [potential misses]
E the software eventually does an indexed jump to the real exception
routines [potential miss]*

The *'d items are effectively branches that are difficult to predict,
or if predictable, are hard to get much overlap on, even on
aggressively-speculated designs.
The MIPS-style design simply skipped A, B, and C, just asking the
hardware to give us the cause code in a register.  Note that in some
designs, the jump E looks like:
   load register,vector[cause]
   jumpreg register

and the load instruction can be interleaved with earlier register-save
code, giving more overlap.

In general, you really only need vectored interrupts where there are
exceptions that:
a) Can be frequent
b) Have very minimal handling, so that overhead really matters

For example, if one wanted to have fast emulations for missing
instructions, or misaligned operands, or tag-check fixups, etc, the
none would probably want an additional, very low-overhead mechanism for
user-level traps to do this.  As I noted in some posts last year, that
sort of feature is something I'd wished we had time to invent.

Finally, note that every person in our OS group had been burnt, some
time or other, by bugs in some CPU's exception mechanism, and we were
pretty fanatic that hardware complexity had to be justified in this
area, since bugs in it were nightmares.



RULES
1) Count cycles, not just instructions.

Even in mid-1980s scalar CPUs with simple pipelines, cache and TLB
effects were already relevant.  These days, mis-predicted branches and
specualtive unwinds can add lots of cycles.

2) It's hard to have meaningful discussions in the absence of frequency
information.

3) if infrequent events are efficiently handled, that's nice, assuming
that the implementation cost was essentailly free.  It's rarely worth
complexifying a design to speed up events that are relatively
infrequent.

4) One really has to understand how the OS is going to work.
Index Home About Blog