Index Home About Blog
From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.16-rc4: known regressions
Date: Wed, 22 Feb 2006 00:38:00 UTC
Message-ID: <fa.EIGikejAW3oPiK5q8UzRZD5ctCk@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0602211631310.30245@g5.osdl.org>

On Tue, 21 Feb 2006, Andrew Morton wrote:
>
> We.  Don't.  Do. That.
>
> Please either restore the old events so we can have a 6-12 month transition
> period or revert the patch.

I agree.

This stupid argument of "HAL is part of the kernel, so we can break it" is
_bogus_.

The fact is, if changing the kernel breaks user-space, it's a regression.
IT DOES NOT MATTER WHETHER IT'S IN /sbin/hotplug OR ANYTHING ELSE. If it
was installed by a distribution, it's user-space. If it got installed by
"vmlinux", it's the kernel.

The only piece of user-space code we ship with the kernel is the system
call trampoline etc that the kernel sets up. THOSE interfaces we can
really change, because it changes with the kernel.

		Linus


From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.16-rc4: known regressions
Date: Wed, 22 Feb 2006 01:10:52 UTC
Message-ID: <fa.FD9XBrZm/N1OZe5sUuDx1i/Jn2s@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0602211700580.30245@g5.osdl.org>

On Tue, 21 Feb 2006, Linus Torvalds wrote:
>
> The only piece of user-space code we ship with the kernel is the system
> call trampoline etc that the kernel sets up. THOSE interfaces we can
> really change, because it changes with the kernel.

Side note: if people want to, we could have other "trampolines" like that,
so that we could have more user-level code that gets distributed with the
kernel. It doesn't have to be something that gets mapped into every binary
either: we could - if we wanted to - have things like shared libraries or
helper shell scripts or whatever that we expose in /sys/shlib/ that are
kernel-version dependent.

Then we could perhaps change more things, just because we could change the
wrappers that actually use them together with the kernel.

To some degree, /initrd was supposed to do things like that, and in
theory, it still could. However, realistically, 99% of any /initrd is more
about the distribution than the kernel, so right now we have to count
/initrd as a distribution thing, not a kernel thing.

		Linus


From: Theodore Ts'o <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.16-rc4: known regressions
Date: Wed, 22 Feb 2006 11:22:57 UTC
Message-ID: <fa.Wj8bj7V0JFqEeCbpQ4SgUy+Cerc@ifi.uio.no>
Original-Message-ID: <20060222112158.GB26268@thunk.org>

On Tue, Feb 21, 2006 at 05:06:48PM -0800, Linus Torvalds wrote:
>
> To some degree, /initrd was supposed to do things like that, and in
> theory, it still could. However, realistically, 99% of any /initrd is more
> about the distribution than the kernel, so right now we have to count
> /initrd as a distribution thing, not a kernel thing.

... and if we're truly going to be pouring more and more complexity
into initrd (such as userspace swsusp), then (a) we probably should
make it more of a kernel-specific thing, and not a distro-specific
thing, since without that you can be pretty much guaranteed that more
and more people will be using and testing swsusp2, and not uswsusp,
and (b) we need to add _way_ better debugging provisions so that if
something dies in early boot, you don't go pulling out your hair
trying to figure out what went wrong, and having to spend a good 20
minutes or so between each try-to-fix the initrd, watch the boot fail,
reboot into a working setup, cursing Red Hat's nash, modifying the
initrd, and trying again.

Usually I break the loop by giving up, and ripping out whatever kernel
feature requires initrd, such as dm, and installing on hard partitions
with all of the kernel modules I need compiled into the kernel.  I
still have no idea why mptscsi fails to detect SCSI disks when loaded
as a module via initrd on various bits of IBM hardware (including the
e326 and ls-20 blade), but works fine when compiled directly into the
kernel....

If we want more and more stuff to be poured into initrd, it's got to
be made easier to debug and consistent across distributions, such that
more people can test initrd configurations, and flush out the bugs,
never mind the question of programs like udev randomly breaking
between kernel releases.  Maybe it's time to consider moving all of
that into the kernel source; if they wanted to be treated as part of
the kernel, then let them literally become part of the kernel from a
source code and release management perspective.

						- Ted


From: Theodore Ts'o <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.16-rc4: known regressions
Date: Wed, 22 Feb 2006 16:26:15 UTC
Message-ID: <fa.F29rPFDjzHp8wL9zBmyyeEjY2Jg@ifi.uio.no>
Original-Message-ID: <20060222162533.GA30316@thunk.org>

On Wed, Feb 22, 2006 at 07:48:21AM -0800, Joel Becker wrote:
> On Wed, Feb 22, 2006 at 06:21:58AM -0500, Theodore Ts'o wrote:
> > with all of the kernel modules I need compiled into the kernel.  I
> > still have no idea why mptscsi fails to detect SCSI disks when loaded
> > as a module via initrd on various bits of IBM hardware (including the
> > e326 and ls-20 blade), but works fine when compiled directly into the
> > kernel....
>
> Ted,
> 	Do you mean that you are using a distro (eg, RHEL4 or something)
> with a mainline kernel?  We've seen something similar, and what we've
> determined is happening is that insmod is returning before the module is
> done initializing.  It's not that mptscsi fails to detect the disks.
> Rather, it's still in the detection process when the boot process tries
> to mount /.  So there's no / yet, and the thing hangs.

Yep, that's exactly what I'm doing; RHEL4U2 with a 2.6.14 or 2.6.15
kernel.  Thanks for the tip, that should help me investigate further!

> In the case we
> see, it's some interaction between the RHEL4/SLES9 version of
> module-init-tools and the latest version of the kernel.  Our first
> attempt at fixing it was to change the linuxrc to sleep between each
> insmod.  This works, but only if the modules load and initialize
> themsleves fast enough.  Get a FC card in there, and it just doesn't
> work.  So we've taken to compiling the modules in-kernel.

Sounds like this is another of example of system support programs
(insmod in this case) breaking with modern kernels.  Hopefully now
that Linus has laid down the law about how breaking userspace is
uncool, people will agree that it's a bug.  (That is unless Red Hat
made some kind of incompatible change and it's Red Hat's fault, but I
kinda doubt that.)  Anyway, I'll look into this some more and see why
you can't use a mainline kernel with RHEL4, at least not in this
configuration.

						- Ted


From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.16-rc4: known regressions
Date: Wed, 22 Feb 2006 18:01:47 UTC
Message-ID: <fa.LvzG1II48GM8N8ZRfd1KAlr2GqE@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0602220942040.30245@g5.osdl.org>

On Wed, 22 Feb 2006, Gabor Gombas wrote:
>
> I don't think isnmod is broken. It's job is to load a chunk of code into
> the kernel, and it's doing just that.
>
> The asynchronous device discovery is caused/required by hotplug. If you
> can recreate the problem with a kernel that has CONFIG_HOTPLUG disabled,
> then I agree that this is a kernel bug which should be fixed.

I think it currently can happen without HOTPLUG too. In fact,
CONFIG_HOTPLUG is really a "special drivers that do hot-plugging", not
about "devices that show up on their own".

The thing is, "insmod" really just tends to introduce the driver to the
kernel. It leaves it pretty open what that driver will actually _do_. And
a lot of drivers tend to do discovery independently of actually plugging
in the driver.

For example, any USB host driver will always discover its devices
asynchronously, and has no dependency on CONFIG_HOTPLUG. It can take
several seconds for all the hubs to have powered up and discovered what is
behind them.

The same is true of most SCSI buses - CONFIG_HOTPLUG may talk about
whether the actual _controller_ is hotpluggable, but not about whether a
disk is, or how disk discovery takes place.

Now, arguably "insmod" (both the user binary and the kernel side) is doing
the right thing: it's inserting the driver. The fact that all the devices
that the driver uses may not be immediately available is not insmod's
issue. That's a very valid way to look at it.

At the same time, it's also arguable that from an ease of use standpoint,
"insmod" should generally try to wait until the driver has enumerated what
it knows about. That's a totally non-technical argument, but it's an
equally valid standpoint.

Of course, the technical argument is that discovery can take a long long
time (minutes to wait for disks to spin up etc), so if insmod were to wait
for it all, we'd be really screwed and our bootup times would go through
the roof.

The usability argument is that right now we don't have any easy way at all
to wait for bus scanning to have finished, and that's a very valid
argument. You could wait for the hotplug event, but since you don't even
_know_ that you'll get such an event, that's really not a very good answer
either.

We could improve.

I _think_ that in this particular case, the best particular choice might
be for the "mount" binary to be taught to re-try after a few seconds:
either with a command line argument, or with just the early bootup initrd
code being encouraged to have a loop like

	if (mounting root failed)
		echo "Please press F1 to continue"
		do
			read-keyboard-with-5-second-timeout
		while (mounting root failed)
	endif

so that the user would have to press a key (or we'd just re-try every five
seconds).

That way, the boot wouldn't just fail immediately over something that can
be fixed (sometimes the root partition might just be hot-pluggable too:
"insert disk and continue" can be a valid way to handle issues).

		Linus


From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.16-rc4: known regressions
Date: Wed, 22 Feb 2006 19:43:59 UTC
Message-ID: <fa.ajvdiuqyExDadiG+RMCaIgG3XpU@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0602221135290.30245@g5.osdl.org>

On Wed, 22 Feb 2006, Greg KH wrote:
>
> RHEL is a very different kernel from mainline (just like SLES is).  Have
> you looked through their patches to see if they are including something
> that causes this behavior?

Quite apart from that, we have definitely had issues where pure timing
makes a difference - the kernel does the same exact thing, but just
switches the order of some driver initialization, so that when /sbin/init
starts, some discovery is still on-going.

It's _rare_, but it's one kind of bug that the kernel really can't do a
lot about. For example, for the longest time we held off from the
scheduler running child programs before returning to the parent after a
fork(), simply because that triggered a real race condition in "bash".

Eventually, we could say "screw it, it's a user-level timing bug", but the
point being that sometimes timing changes, and while we _can_ try to keep
even timing-related behaviour like that similar, sometimes it just isn't
possible.

It's quite possible that nothing has really "changed", and that some part
of the kernel just finishes more quickly (or slowly), triggering this
problem.

		Linus


From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.16-rc4: known regressions
Date: Wed, 22 Feb 2006 20:16:46 UTC
Message-ID: <fa.YW/YMLQtdcb7jv249NFQ6Iaeud4@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0602221205040.30245@g5.osdl.org>

On Wed, 22 Feb 2006, Andrew Morton wrote:
>
> Yes, I tend to think that insmod should just block until all devices are
> ready to be used.  insmod doesn't just "insert a module".  It runs that
> module's init function.

It really is very hard to accept the "blocking" behaviour.

Some things can take a _loong_ time to be ready, including even requiring
user intervention. And even when scanning takes "only" a few seconds, if
there are multiple modules, you really want to scan things in parallel.

Not finding a disk is often a matter of timing out - not all buses even
have any real "enumeration" capability, and enumeration literally ends up
being "try these addresses, and if nothing answers in 500 msec, assume
it's empty".

Now, 500 msec may not sound very bad, but it all adds up. I get unhappy if
my bootup is a minute. I'd prefer booting up in a couple of seconds.

Also, how ready do you want things to be? Do you want to know the device
is there ("disk at physical location X exists"), or do you want to have
read the UUID off the disk and partitioned it? The latter is what is
needed for a mount, but it's often a _lot_ more expensive than just
knowing the disk is there, and it's not even necessarily needed in many
circumstances.

For example, say that you have more than just a couple of disks attached
to the system, but many of them are for non-critical stuff. You do not
necessarily want to wait for them all to spin up at all. You usually only
care about one of them - the root device.

		Linus


From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.16-rc4: known regressions
Date: Wed, 22 Feb 2006 15:46:39 UTC
Message-ID: <fa.p/ppH6LPGZ6i05xxyEGaDUTeSGs@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0602220737170.30245@g5.osdl.org>

On Wed, 22 Feb 2006, Kay Sievers wrote:
>
> Well, that's part of the contract by using an experimental version of HAL,
> it has nothing to do with the kernel

NO NO NO!

Dammit, if this is the logic and mode of operation of HAL people, then we
must stop accepting patches to the kernel from HAL people.

THIS IS NOT DEBATABLE.

If you cannot maintain a stable kernel interface, then you damn well
should not send your patches in for inclusion in the standard kernel. Keep
your own "HAL-unstable" kernel and ask people to test it there.

It really is that easy. Once a system call or other kernel interface goes
into the standard kernel, it stays that way. It doesn't get switched
around to break user space.

Bugs happen, and sometimes we break user space by mistake. Sometimes it
really really is inevitable. But we NEVER EVER say what you say: "it's
your own fault". It's _our_ fault, and it's _our_ problem to work out.

Guys: you now have two choices: fix it by sending me a patch and an
explanation of what went wrong, or see the patch that broke things be
reverted. And STOP THIS DAMN APOLOGIA.

I'm fed up with hearing how "breaking user space is ok because it's HAL or
hotplug". IT IS NOT OK. Get your damn act together, and stop blaming other
people.

If the interfaces were bad, we keep them around. Look in fs/stat.c some
day. Realize that some of those interfaces are from 1991. They were bad,
but that doesn't change _anything_. People used them, and we had
implemented them.

			Linus


From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.16-rc4: known regressions
Date: Wed, 22 Feb 2006 17:12:43 UTC
Message-ID: <fa.dFhcP/u9xIa5a062m7pxfZlFFcE@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0602220848280.30245@g5.osdl.org>

On Wed, 22 Feb 2006, David Zeuthen wrote:
>
> Oh, you know, I don't think that's exactly how it works; HAL is pretty
> much at the mercy of what changes goes into the kernel. And, trust me,
> the changes we need to cope with from your so-called stable API are not
> so nice.

Why do you "cope"?

Start complaining. If kernel changes screw up something, COMPLAIN. Loudly.
They shouldn't.

> It also makes me release note that newer HAL releases require newer
> kernel and udev releases and that's alright.

It's _somewhat_ ok to have a well-defined one-way dependency. It's sad,
but inevitable sometimes.

For example, the kernel does have a dependency on the compiler used to
compile it. We try to avoid it as far as possible, but we've slowly been
updating it, first from 1.40 to 2.75 to 2.9x and now to 3.1. But the
kernel obviously shouldn't have any other run-time dependencies, because
everything else is "on top of" the kernel.

What is NOT ok is to have a two-way dependency. If user-space HAL code
depends on a new kernel, that's ok, although I suspect users would hope
that it wouldn't be "kernel of the week", but more a "kernel of the last
few months" thing.

But if you have a TWO-WAY dependency, you're screwed. That means that you
have to upgrade in lock-step, and that just IS NOT ACCEPTABLE. It's
horrible for the user, but even more importantly, it's horrible for
developers, because it means that you can't say "a bug happened" and do
things like try to narrow it down with bisection or similar.

> For just one example of API breaking see
>
>  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=175998

So the kernel obviously shouldn't be just randomly changing the type
numbers around.

The _real_ bug seems to be that some people think it is OK to do this kind
of user-visible changes, without even considering the downstream, or
indeed, without even telling anybody else (like Andrew or me) about it.

> This breaks stuff for end users in a stable distribution. Not good.

Indeed. Not good at all.

And yes, some of it may be just HAL being a fragile mess, and some of it
may end up being just user-level code that must be made to be more robust
("I see a new type I don't understand" "Ok, assume a lowest common
denominator, and stop whining about it").

But a lot of it is definitely some kernel people being _waayy_ too
cavalier about userspace-visible changes.

> I think maintaining a stable syscall interface makes sense. Didn't you
> once say that only the syscall interface was supposed to be stable? Or
> was that Greg KH? I can't remember...

It's _not_ just system calls. It's any user-visible stuff. That very much
includes /proc, /sys, and any "kernel pipes" aka netlink etc bytestreams.

What is not stable is the _internal_ data structures. We break external
modules, and we sometimes break even in-kernel drivers etc with abandon,
if that is what it takes to fix something or make it prettier.

So fcntl and ioctl numbers etc are _inviolate_, because they are part of
the system interface. As is /proc and /sys. We don't change them just
because it's "convenient" to change them in the kernel.

If /sys needs an extended type to describe the command set of a device, we
do NOT just change an existing attribute in /sys.

> And I also think that breaking things like sysfs can be alright as long
> as you coordinate it with major users of it, e.g. udev and HAL.

The major users are USERS. Not developers. It doesn't help to "coordinate"
things, when what gets screwed is the end-user who no longer can upgrade
his kernel without worrying that something might break.

THIS IS WHY WE MUST MAKE THE KERNEL INTERFACES STABLE!

If users cannot upgrade their kernels safely, we will have two totally
unacceptable end results:

 - users won't upgrade. They don't dare to, because it's too painful, and
   they don't understand HAL or hotplug, or whatever.

   If a developer cannot see that this is unacceptable, then that
   developer is a nincompoop and needs to be educated.

 - users upgrade, and generate bug reports and waste other developers time
   because those other developers didn't realize that the HAL cabal had
   decided that that breakage was "ok".

   Or worse, they don't generate the bug reports, and then six months from
   now, when they test again, and it's still broken, they generate a
   really bad one ("it doesn't work") when everybody - including the HAL
   cabal - has forgotten what it was all about.

   Again, if a developer cannot see that this is unacceptable, then that
   developer is not playing along, and needs to have his mental compass
   re-oriented.

The fact is, regressions are about 10x more costly than fixing old bugs.
They cause problems downstream that just waste everybodys time. It's a
_hell_ of a lot more efficient to spend extra time to keep old interfaces
stable than it is to cause regressions.

> One day perhaps sysfs will be "just right" and you can mark it as being
> stable. I just don't think we're there yet. And I see no reason
> whatsoever to paint things as black and white as you do.

Nothing will _ever_ be "just right", and this has been going on too long.
We had better get our act together.

		Linus


From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.16-rc4: known regressions
Date: Wed, 22 Feb 2006 17:35:52 UTC
Message-ID: <fa.qnl7xA6Oi2kPSyh4HNEqglsgcAs@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0602220915500.30245@g5.osdl.org>

On Wed, 22 Feb 2006, Linus Torvalds wrote:
>
> > For just one example of API breaking see
> >
> >  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=175998
>
> And yes, some of it may be just HAL being a fragile mess, and some of it
> may end up being just user-level code that must be made to be more robust
> ("I see a new type I don't understand" "Ok, assume a lowest common
> denominator, and stop whining about it").

Btw, having looked at that bug report some more, I have to say that this
particular one seems to be of the "HAL is just being an ass about things"
variety.

Why the hell anybody would care about what the command transport type is,
when all that matters is that it's a block device, I don't understand. The
exact details of what kind of block device it is are totally secondary,
and shouldn't affect basic desktop behaviour.

The patch (to HAL) that the bugzilla entry points to doesn't seem to make
anything better either. It just adds _another_ magic case-statement.
Instead, it should just default to doing something sane.

			Linus


From: Al Viro <viro@ftp.linux.org.uk>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.16-rc4: known regressions
Date: Wed, 22 Feb 2006 18:04:49 UTC
Message-ID: <fa.4Z4aJbQxcuP3QGrsvnfCpjFouUY@ifi.uio.no>
Original-Message-ID: <20060222180423.GD27946@ftp.linux.org.uk>

On Wed, Feb 22, 2006 at 09:31:59AM -0800, Linus Torvalds wrote:
> Why the hell anybody would care about what the command transport type is,
> when all that matters is that it's a block device, I don't understand. The
> exact details of what kind of block device it is are totally secondary,
> and shouldn't affect basic desktop behaviour.

Actually, it's not about transport, it's about command _set_.  So there
is legitimate userland code that would want to know that (especially since
a lot of external enclosures have incredibly brittle and crappy firmware
and go tits-up when they see anything they don't recognize), but
	a) the last thing that code wants is to have TYPE_RBC mislabeled
as TYPE_DISK and
	b) hal has nothing to do with that.

The only place where _transport_ enters the picture is that RBC is very common
in e.g. firewire-to-IDE bridges, so sbp2 had to deal with it somehow.  And
instead of teaching sd.c to deal with those (it's very easy) it went ahead
and just marked those as type 0 (disk).  Almost worked...


From: Al Viro <viro@ftp.linux.org.uk>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.16-rc4: known regressions
Date: Wed, 22 Feb 2006 17:52:09 UTC
Message-ID: <fa.l/SNzaL+3EJcRCrAlSHiZhphE8o@ifi.uio.no>
Original-Message-ID: <20060222175131.GC27946@ftp.linux.org.uk>

On Wed, Feb 22, 2006 at 09:08:47AM -0800, Linus Torvalds wrote:
> >  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=175998
>
> So the kernel obviously shouldn't be just randomly changing the type
> numbers around.

> The _real_ bug seems to be that some people think it is OK to do this kind
> of user-visible changes, without even considering the downstream, or
> indeed, without even telling anybody else (like Andrew or me) about it.

That's not quite true...

Some background: sbp2 took SCSI devices of type 14 (very reduced and slightly
incompatible version of "SCSI disk", fairly common for external disks) and
forcibly marked them as type 0.  Since sd.c had no way to tell whether it's
dealing with normal SCSI disk or with RBC one, it was unable to tell how
to find out whether the cache on that disk is write-through or write-behind
(that being one of the incompatibilities).

That leads to actual data corruption on reboot, BTW - some of these guys
simply lose the contents of cache at that.

Obvious fix?  Make sd.c deal with RBC (note that it's a valid SCSI type -
you bloody well can have it for a device attached to any SCSI bus, not
just firewire) and leave the sdev->type intact, so that sd.c could know
what's going on.  Right?

As it turns out, sdev->type is not just exposed to userland via sysfs
(that has legitimate uses), it's exposed to userland that happens to be
braindead.  There are two questions:
	* what commands does that device accept?
	* is there an sd<...> block device for it?
Both are valid for userland.  E.g. stuff like scsiinfo, etc., is issuing
SCSI commands via SG_IO.  And yes, knowing the device type is very, very
useful there.  For that we actually would want accurate type, for the same
reasons why we want it in sd.c.  The second question ("is there an sd.c
block device for this guy?") also is valid and has a sane answer in sysfs.

What got broken?  Code that used to assume that sd.c will never, ever handle
openly RBC disks.  As long as that remained true, userland could assume that
"sd fodder" and "has type 0" were the same.  Which was never guaranteed.


From: Al Viro <viro@ftp.linux.org.uk>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.16-rc4: known regressions
Date: Wed, 22 Feb 2006 18:10:46 UTC
Message-ID: <fa.QrqGFDi+yfz3tRQdo2x34I+4AUA@ifi.uio.no>
Original-Message-ID: <20060222181010.GE27946@ftp.linux.org.uk>

On Wed, Feb 22, 2006 at 05:55:06PM +0000, Christoph Hellwig wrote:
> On Wed, Feb 22, 2006 at 05:51:31PM +0000, Al Viro wrote:
> > What got broken?  Code that used to assume that sd.c will never, ever handle
> > openly RBC disks.  As long as that remained true, userland could assume that
> > "sd fodder" and "has type 0" were the same.  Which was never guaranteed.
>
> sd also has been handling TYPE_MOD forever, which HAL still doesn't deal
> with.  Not that it should care at all about the scsi command type as
> you mentioned..

Oh, right - magneto-optical is also there.  I suspect that HAL doesn't
really care, along the lines of "they are all removable anyway"...


From: Theodore Ts'o <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.16-rc4: known regressions
Date: Thu, 23 Feb 2006 04:20:20 UTC
Message-ID: <fa.z+/BlOd8vMQ2u3DPBHX4aPfHDqU@ifi.uio.no>
Original-Message-ID: <20060223041707.GA9645@thunk.org>

On Wed, Feb 22, 2006 at 09:14:59AM -0800, Martin Bligh wrote:
> >But I realize these changes are important because it's progress and back
> >in 2.6.0 things were horribly broken for at least desktop workloads [1].
> >It also makes me release note that newer HAL releases require newer
> >kernel and udev releases and that's alright. In fact it's perfectly
> >fine. We get users to upgrade to the latest and greatest and we keep
> >making good progress. That's open source at it's finest I think.
>
> If it's all that fragile, surely it just means that someone picked the
> wrong point at which to try to form the API abstraction?
>
> Frankly, that seems to be the issue behind a lot of these problems -
> people decide to shove stuff into userspace for some religions reason,
> without thinking about the API implications at all.

Martin has hit the nail on the head.

There is currently a religion going on in some circles (we see it in
the uswsusp vs suspend2 debate) which states that moving functionality
to userspace is always better because it makes the kernel "simpler".
Well, maybe.  To the extent that we move policy to userspace, that is
(usually) goodness, but we have to weigh the resulting _interface_
complexity.  When you take a piece of work and split it up between the
kernel and userspace, by definition there will have to be some kind of
interface between the kernel and the userspace code.

Some people assume the only thing that makes up the interface is
syscalls, ioctl's, and fnctls, but that's not true; /proc and /sys are
interfaces too.  And as Linus has stated, once we introduce an
interface, that's it; we have to maintain it forever.  No gratuitous
changes.  If that is too hard, because we can imagine potential
changes that might require us to change the interface, or painful
backwards compatibility kludges to maintain the old interface for at
least 12 months --- then maybe it was a bad idea to move certain
pieces of functionality into userspace in the first place.

> We don't have a sane way to package all the userspace crud together
> with the microkernel that people are turning Linux into. Either people
> quit pretending that divesting things to userspace is a solution to all
> hard problems, or we create a packaging / bundling mechanism for all
> this shite. Frankly, I prefer the former, but whichever ... it's
> getting insane.

Precisely.  These days, using initrd is an exercise in pain, and
wasted time; anything goes wrong, and there is no recovery whatsoever,
except a reboot back to a working setup, and if you have multiple SCSI
drivers, a 5-10 minute wait for the boot to cycle.  So I don't use it.
And if there is functionality that requires initrd's, as a general
rule I don't use it, and I don't test it.  And if it's just me being
stupid, then everyone can ignore me.  But if it's many people then
maybe folks should start considering that we either need to make
initrd more robust, and probably start bundling initrd setup's into
the kernel, or we should start reconsidering the whole plan of moving
more and more into initrd in the first place.

						- Ted


From: Theodore Ts'o <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: [RFC] Add kernel<->userspace ABI stability documentation
Date: Tue, 28 Feb 2006 06:34:07 UTC
Message-ID: <fa.TUVC7lToZHF8uUDV//XgfwkJUAw@ifi.uio.no>
Original-Message-ID: <20060228063207.GA12502@thunk.org>

On Mon, Feb 27, 2006 at 03:45:25PM -0800, Greg KH wrote:
> > So I just don't see any upsides to documenting anything private or
> > unstable. I see only downsides: it's an excuse to hide behind for
> > developers.
>
> So should we just not even document anything we consider "unstable"?
> The first trys at things are usually really wrong, and that only can be
> detected after we've tried it out for a while and have a few serious
> users.  Should we brand anything new as "testing" if the developer feels
> it is ready to go?

How about "we don't let anything into mainline that we consider
'unstable' from an interface point of view"?

There seems to be a fetish going on today that everything possible
should be mindlessly pushed out into userspace regardless of whether
or not it makes sense.  What folks don't seem to understand is that
there is a tradeoff between implementational complexity (in terms of
lines of code in /usr/src/linux) and interface complexity (see Rusty's
talk about designing good interfaces and how hard that can be).

If we're not sure we can get the interface right, then maybe it's a
sign it needs to stay in -mm longer, or maybe we were trying to push
the wrong thing out into userspace.  If the interface isn't easy to
understand, and we aren't confident that we can promise to never
change it once we put it out there (although of course we can always
add additional interfaces as we add new features), then maybe the
mistake was in trying to create the interface in the first place.

Don't get me wrong; I'm a big fan of pushing policy out of the kernel;
but only if the interface that we use to expose the kernel
functionality has been very carefully designed.


Another alternative, as a few people including myself have noted, is to
shipping that part of the userspace with the kernel sources, so that
it is part of the kernel sources from a release management point of
view, even if it lives in userspace.

							- Ted


From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [git pull] drm request 3
Date: Fri, 05 Mar 2010 17:25:32 UTC
Message-ID: <fa.PzP25tU9r8WjNEECEUpqoxoLpq0@ifi.uio.no>

On Fri, 5 Mar 2010, Daniel Stone wrote:
>
> So you're saying that there's no way to develop any reasonable body of
> code for the Linux kernel without committing to keeping your ABI
> absolutely rock-solid stable for eternity, no exceptions, ever?

I think that's what David ended up saying, but I think he is being _too_
strict.

It's not how we've done other things either. We've changed ABI's over time
many times. And we've even had complete breakage of old tools (although
that is very much reserved for system tools, not regular applications: I
think we've been almost religious about "normal" apps not breaking,
unless there was some really overriding reason like security that forced
us to really remove some interface).

But most of the changes have been extending things, leaving the old
interfaces around (often as wrappers around the new internal world order,
sometimes by effectively having actual duplicated code).

And then the old interface is maintained for quite a while (sometimes
decades, often years, and generally at least for several kernel versions
so that people have time to upgrade, and a distro can generally pick a
newer kernel without having to change anything else).

Sometimes we've done things that really end up requiring new tools. It's
pretty rare, but it does happen. It happens a lot more for "esoteric"
things that aren't every-day-in-your-face (I've seen at least _one_ mutter
about "sysfs" in this thread ;) and might break something like a
temperature sensor, for example.

So the machine might _work_ and you could go for days without even
noticing, but you might have some very specific functionality missing.
Maybe your power meter doesn't work, or maybe you need to upgrade your
kernel profiler to get good profiles again. Things like that.

I suspect you as an X person know this very well, in fact. X itself has
carried along a _lot_ of cruft exactly like this, that you guys have been
removing only in the last few years - sometimes after decades of it being
there. The whole switch to modern font handling is an obvious example of a
_major_ fundamental feature change like that.

So in general, what the kernel strives for is that very kind of "the old
model will still work - but it might be slow and emulated on top of a new
way of dong things, and not get a lot of attention any more".

And sometimes, there's really no good way of maintaining two interfaces at
the kernel level, and then you have the downstream tools that have to
learn to pick either the old or the new one, so that the tool still works
regardless.

And again, the old code _eventually_ bitrots or gets cleaned up, but what
you really really want to avoid is to have a flag-day when you switch from
one to the other, and you can't switch between adjacent versions of the
kernel.

In the 2.4.x/2.6.x split, for example, we did have system tools that
needed to be upgraded if you came from a 2.4.x environment. You can still
see signs of that in the kernel tree: we have that whole
Documentation/Changes file that _still_ remains in our tree, even though
it's purely historical.

But if you look at that Documentation/Changes file, I don't think there is
_any_ flag-day issue except for the removal of "devfs". People _still_
talk about devfs in hushed tones. Everything else is about having to
upgrade system tools _before_ upgrading the kernel (iow, they still worked
on 2.4.x, but you needed recent enough versions of them to compile and run
a 2.6.x kernel).

In other words, it wasn't a "flag day" (apart from the already mentioned
devfs users, and possibly something else I can't think of). It was an
upgrade, yes, and it required some other things to be recent, but you
could go back-and-forth between kernels if you had to.

(Of course, it's now many years since that, so maybe my rose-colored
glasses makes me forget the pain involved. And I obviously personally
never made the whole 2.4.x -> 2.6.x jump, since I'd been running the
development kernels in between. So maybe I forgot some painful part).

			Linus

Index Home About Blog