Index
Home
About
From: mash@mips.com (John Mashey)
Newsgroups: comp.arch
Subject: Re: RISC vs CISC? Call a spade a spade?
Message-ID: <7037@spim.mips.COM>
Date: 16 Aug 91 23:32:33 GMT
In article <7006@spim.mips.COM> rogerk@mips.com (Roger B.A. Klorese) writes:
>In article <PCG.91Aug16141606@aberda.aber.ac.uk> pcg@aber.ac.uk (Piercarlo Grandi) writes:
>>I remember a report of a talk by John Mashey, in which he gave the
>>harrowing details of how unpleasant it had been to rewrite the MIPS Unix
>>kernel from Classic C to ANSI C, and all the dangers thereof.
>I think you are imagining this. RISC/os is written in Classic C, with the
>addition of "volatile."
I think there is multipel confusion caused by the typical telephone series
problem.
Here is the standard story:
1) In 4Q85, we had a C Compiler that could compile itself with global
optimization turned on, and usethe result to compile itself a gain,
and get the same thing. It was also adequate to compile the UNIX
kernel, albeit without global optimization turned on.
2) In early 1986, we started to do -O on the kernel, and it was
indeed harrowing at first, because:
a) hardly anyone had implemented volatile in an optimizing
compiler yet, and it's true implications weren't quite
understood.
b) Of course there were bugs in the optimizer.
3) hence, when we just blinding turned on -O (as volatile was in
process of being implemented), thigns broke everywhere, i.e.,
loops like:
while (p->devicestatus != OK)
junk = p->deviceinput;
and the compiler optimized everything away.
Then, we got volatile in the compiler, and declared volatile .... *p;
...and it still borke, because it still saw that junk (which had never been
mentioned) was never used again, and hence that statement disappeared.
4) It became clear after a while that:
any load or store to a volatile variable that would have happened
with simplistic code ... must happen in exactly the same order and
number ... or systems programmers go nuts.
So, our compiler folks did that.
5) And finally, there was the general issue of debugging an optimizer
when using the kernel. This was the case where it would almost
work optimized, and we had to do binary search to find the module
where 1 store was being omitted.
Now, the only ANSI C issue in this whole story is the fact that
we were able to add volatile to our existing compilers, rather than
some MIPS-specific keyword, and know we were at least heading in
a standards-oriented direction.
Most of the problem was dealing with new compilers doing global optimization
on code, where (at that time) the number of people in the world who had
ever dealt with the resulting issues inside the kernel was small...
Figuring out what to make volatile was pretty straightforward: make
every pointer to a device structure volatile, plus a few other places.
6) We started on this 1Q86, nad had most of it in pretty reasonably shape
by 2Q86, and shipped -O'd kernels in production around Sept/Oct of 1986.
--
-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash
DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94088-3650
From: mash@mash.engr.sgi.com (John R. Mashey)
Newsgroups: comp.std.c,comp.arch
Subject: Re: speculative stores
Date: 23 May 1997 00:04:20 GMT
This thread of discussion [on freedom or lack thereof of compilers to
do things with C volatile variables]
is a good example of an interesting issue that
combines computer architecture, operating systems, compiler design,
language design, and standards design ... odd interactions among them,
and the nonobvious difficulty of getting it all right, i.e., some of this
looks simple unless you've actually had to do it.
As a good example of this kind of issue, I recommend:
Michael O'Dell, "Putting UNIX on Very Fast Computers or What the Speed of
Light Means to Your Favorite System Call", Summer USENIX June 11-15, 1996,
239-246.
1. Simple model, but wrong
1.1 It is all too easy to think of a CPU as executing one instruction
at a time, with no concern for caches & MMUs, and with loads and adds
about equal time, i.e., akin to Knuth's MIX. Of course, current real
CPUs:
- May have uncached, or cache-miss loads that take 100+ clocks
- May have many instructions in progress at once
- May perform operations out of order
- May issue speculative loads on stores along paths that in fact
will never be reached (i.e., incorrect branch predict).
[Speculative stores? for write-back cache, you can do any cache-miss
part speculatively, but actual store has to wait].
1.2 It is easy to think of compilers as generating straightforward obvious
code with little optimization. At least, this *was* true when I first
encountered C in 1973, with no global register allocation for example,
or even real common subexpression elimination.
Of course, modern compilers do numerous optimizations.
1.3 Operating systems often do caching for performance, and then have
to be careful about file system corruption.
1.4 In all 3 cases, the scenario is:
- To go fast, allow internal state to become temporarily inconsistent with
the simple model, most of the time
- Then, restore consistency when necessary, such as:
CPU: interrupt, exception, or sync/barrier instructions
Compiler: save register back into a variable
Operating system: deal with interrupts, signals, some explicit
system calls (like sync or fsync)
and the problem is:
- Discovering that programs done for simpler, earlier environments have made
assumptions about consistency and sequentiality that are "almost always"
true, but not quite always. There are legions of bugs and surprises
that have occurred because of this, and it certainly is a challenge
to people who design CPUs, operating systems, compilers, and language
standards to achieve performance without breaking things.
2. Some examples in addition to the O'Dell paper
2.1 Original Bourne shell, ~1976/1977 and then early 1980s
For speed, Steve B had used a clever trick of using a memory arena without
checking for the end, but placing it so that running off the end would cause
a memory fault, which the shell then trapped, allocated more memory,
then returned to the instruction that caused the trap and continued.
The MC68000 (in order to go fast) had an exception model that
broke this (among other things) and caused some grief to a whole generation
of people porting UNIX to 68Ks in the early 1980s.
2.2 MIPS R2000, optimizing compilers, volatile, 1985/1986
During 1Q86, we were bringing up UNIX on MIPS chips.
We had a good optimizing compiler for C, with good global register
allocation, common subexpression elimination, etc, and the MIPS
instruction set had been designed assuming such a compiler.
(Such C compilers were fairly rare at this point).
Most UNIX systems are on machines with memory-mapped I/O devices,
i.e., with drivers that do uncached loads/stores from/to memory mapped
hardware registers, whose semantics are *not* memory.
I.e., it is really bad to read a register location and get an old
cached copy of an I/O status [this occasionally happened from
programming errors.] MIPS chips provided a way to turn regular
loads/stores into uncached loads simply by diddling address bits,
which meant that it was fairly reasonable to start with an existing
driver, and just make sure the device registers were mapped into the
right place in the address space to get uncached accesses.
Of course, if you want to use global optimizing compilers for such
code, you desperately need "volatile" or equivalent to keep the
compiler from eliminating loads you were counting on, or generating
extra stores, or moving code around that had side-effects.
Thank goodness, in late 1985, "volatile" proposals had firmed up enough
that our engineers felt OK putting it into the compiler.
[Note: it was not yet in the official standard; nevertheless, people who
actually had to make something work were at least happy there was a way
to do it that looked like it would get to be a standard, rather than
having to invent a clearly-proprietary extension.]
So, existing device drivers acquired a bunch of volatile declarations,
and so did certain other pieces of code. It took a while to understand
what was required, which was that the optimized references to volatiles had to
happen in exactly number and order as they would with the intuitive
unoptimized model; anything less strict led to ugly bugs.
This made things work, modulo bugs. At one point, UNIX ran fine unoptimized,
but crashed optimized, so we did a binary search on kernel modules,
optimizing half of them, testing, then optimizing half of what was left, etc.
The compiler had optimized away a single store instruction that it should
not have....
This is why it's hard: this is an example of interaction among:
CPU architecture, operating system, compiler design, language design,
programmer expectations, and standards, where doing the right thing means
having to understand most of these at once.
--
-john mashey DISCLAIMER: <generic disclaimer: I speak for me only...>
EMAIL: mash@sgi.com DDD: 415-933-3090 FAX: 415-967-8496
USPS: Silicon Graphics/Cray Research 6L-005,
2011 N. Shoreline Blvd, Mountain View, CA 94043-1389
Index
Home
About