Specs (Al Viro; Linus Torvalds; Theodore Ts'o)

Index Home About Blog

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: I request inclusion of SAS Transport Layer and AIC-94xx into
Date: Thu, 29 Sep 2005 20:03:14 UTC
Message-ID: <fa.g0a33ji.m0e6ii@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.58.0509291247040.3308@g5.osdl.org>

On Thu, 29 Sep 2005, Arjan van de Ven wrote:
>
> a spec describes how the hw works... how we do the sw piece is up to
> us ;)

How we do the SW is indeed up to us, but I want to step in on your first
point.

Again.

A "spec" is close to useless. I have _never_ seen a spec that was both big
enough to be useful _and_ accurate.

And I have seen _lots_ of total crap work that was based on specs. It's
_the_ single worst way to write software, because it by definition means
that the software was written to match theory, not reality.

So there's two MAJOR reasons to avoid specs:

 - they're dangerously wrong. Reality is different, and anybody who thinks
   specs matter over reality should get out of kernel programming NOW.
   When reality and specs clash, the spec has zero meaning. Zilch. Nada.
   None.

   It's like real science: if you have a theory that doesn't match
   experiments, it doesn't matter _how_ much you like that theory. It's
   wrong. You can use it as an approximation, but you MUST keep in mind
   that it's an approximation.

 - specs have an inevitably tendency to try to introduce abstractions
   levels and wording and documentation policies that make sense for a
   written spec. Trying to implement actual code off the spec leads to the
   code looking and working like CRAP.

   The classic example of this is the OSI network model protocols. Classic
   spec-design, which had absolutely _zero_ relevance for the real world.
   We still talk about the seven layers model, because it's a convenient
   model for _discussion_, but that has absolutely zero to do with any
   real-life software engineering. In other words, it's a way to _talk_
   about things, not to implement them.

   And that's important. Specs are a basis for _talking_about_ things. But
   they are _not_ a basis for implementing software.

So please don't bother talking about specs. Real standards grow up
_despite_ specs, not thanks to them.

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: I request inclusion of SAS Transport Layer and AIC-94xx into
Date: Fri, 30 Sep 2005 00:36:16 UTC
Message-ID: <fa.fvqh53c.hg06im@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0509291730360.3378@g5.osdl.org>

On Thu, 29 Sep 2005, Luben Tuikov wrote:
>
> >    It's like real science: if you have a theory that doesn't match
> >    experiments, it doesn't matter _how_ much you like that theory. It's
> >    wrong. You can use it as an approximation, but you MUST keep in mind
> >    that it's an approximation.
>
> But this is _the_ definition of a theory.  No one is arguing that
> a theory is not an approximation to observed behaviour.

No.

A scientific theory is an approximation of observed behaviour WITH NO
KNOWN HOLES.

Once there are known holes in the theory, it's not a scientific theory. At
best it's an approximation, but quite possibly it's just plain wrong.

And that's my point. Specs are not only almost invariably badly written,
they also never actually match reality.

At which point at _best_ it's just an approximation. At worst, it's much
worse. At worst, it causes people to ignore reality, and then it becomes
religion.

And that's way _way_ too common. People who ignore reality are sadly not
at all unusual.

"But the spec says ..." is pretty much always a sign of somebody who has
just blocked out the fact that some device doesn't.

So don't talk about specs.

Talk about working code that is _readable_ and _works_.

There's an absolutely mindbogglingly huge difference between the two.

			Linus

From: Theodore Ts'o <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: I request inclusion of SAS Transport Layer and AIC-94xx into the 
	kernel
Date: Fri, 30 Sep 2005 05:32:48 UTC
Message-ID: <fa.d899e68.1vlmsb8@ifi.uio.no>
Original-Message-ID: <20050930053149.GA22199@thunk.org>

On Thu, Sep 29, 2005 at 04:20:13PM -0700, Luben Tuikov wrote:
>
> A spec defines how a protocol works and behaves.  All SCSI specs
> are currently very layered and defined by FSMs.

A spec defines how a protocol works and behaves --- *if* it is
well-specified and unambiguous, and *if* vendors actually implement
the spec correctly.  (And sometimes vendors have major economic
incentives to cheat and either intentionally violate the
specification, or simply not bother to test to make sure whether or
not they implemented their hardware correctly.)

Computing history has been literred with specifications that were
incompentently written and/or incompentently implemented --- from the
disaster known as ACPI, to FDDI (early FDDI networking gear was
interoperable only if you bought all of your gear from one vendor,
natch), consumer-grade disks which lied about when data had been
safely written to iron oxide to garner better Winbench scores, and
many, many, many others.

This is one of the reasons why the IETF doesn't bless a networking
standard until there are multiple independent, interoperable
implementations --- and even _then_ there will be edge cases that
won't be caught until much, much later.

In those cases, if you implement something which is religiously
adherent to the specification, and it doesn't interoperate with the
real world (i.e., everybody else, or some large part of the industry)
--- do you claim that you are right because you are following the
specification, and everyone else in the world is wrong?  Or do you
adapt to reality?  People who are too in love with specifications so
that they are not willing to be flexible will generally not be able to
achieve complete interoperability.  This is the reason for the IETF
Maxim --- be conservative in what you send, liberal in what you will
accept.  And it's why interoperability testing and reference
implementations are critical.

But it's also important to remember when there is a reference
implementation, or pseudo-code in the specification, it's not the only
way you can implement things.  Very often, as Linus has pointed out,
there are reasons why the pseudo-code in the specification is wholely
inappropriate for a particular implementation.  But that's OK; the
implementation can use a different implementastion, as long as the
result is interoperable.

Regards,

						- Ted

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: I request inclusion of SAS Transport Layer and AIC-94xx into
Date: Mon, 03 Oct 2005 22:58:18 UTC
Message-ID: <fa.hb3p6ck.j7k380@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0510031531170.31407@g5.osdl.org>

On Mon, 3 Oct 2005, Ryan Anderson wrote:
>
> Let me rephrase what Linus said, to help remove the misreading that
> seems so common today.  I think a fair rewording would be, "A spec is a
> guideline.  When it fails to match reality, continuing to follow it is a
> tremendous mistake."

Yes (and that _should_ be obvious, but seldom is). But even stronger than
that.

Even in the case where a spec follows reality, the organization of the
spec very seldom has anything to do with organization of code.

A lot of people seem to think that spec abstractions should be translated
into code abstraction. Not so. It often makes no sense to do so at all.

There are exceptions. I suspect that pretty much all of them are specs
that _used_ to be code (ie they are documentation of real
implementations).

For example, Al Viro pointed out privately that the C preprocessor spec
actually matches what a C preprocessor is supposed to do, and that it was
easy to generate code from the spec. The reason? The code existed first,
the spec was written from that. Writing it back into software "just
works", because the spec really _was_ software to begin with, just
re-written as a spec.

But when it comes to hardware, almost all specs are written from the
standpoint of the hardware, not the standpoint of the software driving it.
The spec might even tell you accurately what the hardware does (hey,
miracles happen!), but that doesn't mean that you should organize your
software around it.

And the undeniable fact is, that once a spec gets big and complex enough,
it won't be exhaustively tested. For example, we've seen time and time
again that the hardware testing has been totally not based on any spec,
but on just testing against (usually just one, and usually Windows) one
single implementation of the "other side".

So for example, you'll have a general spec that says that the hardware
reacts in certain ways, but the only case that has been _tested_ is the
particular ways that Windows uses. Which is why we have hardware that
locks up when given commands in the wrong order - where "wrong" is not
defined by the spec, but by what Windows just happens to do.

This is especially common in the "cheap" market. For example, for SCSI,
most of the violations tend to be USB storage - which is supposed to act
largely like SCSI, but in reality really doesn't. It locks up if you
try to access sectors that aren't there, etc.

And this is where the spec people come in. They think that the "spec" is
right, and the reality is wrong. So they blame the broken hardware. Which
is "true" to some degree - there's a lot of broken hardware out there. But
it's _pointless_. Broken hardware is not an excuse - it's just a fact of
life. It's not acceptable to say "but the spec says.."

And yes, the real problem with people ignoring reality are often in the
high end. The high end tends to be the place where vendors are used to
saying "we don't use broken hardware". The high end is where people say
"if it doesn't conform to spec, we don't care: it's broken". In short, the
high end is where people are the most likely to just ignore the realities
_outside_ the high end. They'll point to the spec, and say "do it like
this". Without ever caring that doing it like that simply may not _work_
on a lot of setups.

So when the SAS people say that the SCSI layer should conform to their
needs, next time they should remember that it _also_ needs to conform to
the needs of things like USB storage. Which has totally different goals,
implementation issues, and bugs.

		Linus

From: Al Viro <viro@ftp.linux.org.uk>
Newsgroups: fa.linux.kernel
Subject: Re: I request inclusion of SAS Transport Layer and AIC-94xx into the 
	kernel
Date: Mon, 03 Oct 2005 23:23:46 UTC
Message-ID: <fa.it3c95u.1k60too@ifi.uio.no>
Original-Message-ID: <20051003232244.GZ7992@ftp.linux.org.uk>

On Mon, Oct 03, 2005 at 03:56:50PM -0700, Linus Torvalds wrote:
> For example, Al Viro pointed out privately that the C preprocessor spec
> actually matches what a C preprocessor is supposed to do, and that it was
> easy to generate code from the spec. The reason? The code existed first,
> the spec was written from that. Writing it back into software "just
> works", because the spec really _was_ software to begin with, just
> re-written as a spec.

Not quite, AFAIK.  Existing code was a fscking mess of subtly incompatible
implementations; the thing that had helped was simple - the people who
would have to implement the damn thing had a lot of presense in the
committee.  So it boiled down to
	* observation: attempt to describe it as text transformation leads
to horrors; it really acts on token stream; give up treating it like a text
filter.
	* after figuring out what it should do to sequences of tokens they
ended up with a reasonably simple algorithm that matched the existing
behaviour sans the nasty corner cases everyone handled differently.
	* _that_ had been turned into spec.

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: I request inclusion of SAS Transport Layer and AIC-94xx into
Date: Tue, 04 Oct 2005 15:10:58 UTC
Message-ID: <fa.ha3j5kn.j723g5@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0510040803010.31407@g5.osdl.org>

On Tue, 4 Oct 2005, Tomasz K³oczko wrote:
>
> On Mon, 3 Oct 2005, Linus Torvalds wrote:
> [..]
> > This is especially common in the "cheap" market. For example, for SCSI,
> > most of the violations tend to be USB storage - which is supposed to act
> > largely like SCSI, but in reality really doesn't. It locks up if you
> > try to access sectors that aren't there, etc.
>
> Yes .. of course .. but please don't tap some words (without this kind
> comment) which sounds like rules [1]. *Especialy if* talk is about *one*
> specified piece of hardware.

What "one" piece of hardware? There's a hell of a lot more broken USB
devices out there (and no, it's not "one" type either) than there will
probably _ever_ be SAS devices.

And the thing is, from a kernel _maintenance_ standpoint, the broken
hardware is the one that is expensive. Maybe only 0.1% of all hardware
ends up having some bugs - but that doesn't matter. It may look like a
"small" percentage to you, but it ends up being a _huge_ burden on
developers to try to figure out what is going on, often _exactly_ because
it's a small percentage, and the developers don't have it.

So the argument that "most hardware conforms to spec" is not a valid
argument. Not if it's 51%, and not if it's 99.9%. Because the cost is in
the ones that don't.

And that is why I'm trying to educate people that specs are purely paper.
Often much less valuable than a roll of TP.

Because what matters is not the spec, but real life. For example, in the
SCSI layer we've ended up being much more successful with the approach of
trying to use the same discovery sequence as Windows - because unlike the
spec, that's REAL LIFE, and that's the case that actually works.

The same way software inevitably has bugs in areas that haven't been
tested, hardware has bugs in areas that haven't been tested. It has
nothing to do with specs, and no, specs don't make people test it.

			Linus

Index Home About Blog