Device probing (Linus Torvalds)

Index Home About Blog

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Please revert 5adc55da4a7758021bcc374904b0f8b076508a11
Date: Wed, 09 May 2007 17:22:42 UTC
Message-ID: <fa.YbV4IR7GqzFho8dE1CbNkj2MTTQ@ifi.uio.no>

On Wed, 9 May 2007, Greg KH wrote:

> On Tue, May 08, 2007 at 01:58:10PM -0700, David Miller wrote:
> >
> > 1) A proper dependency system is necessary
> >
> > 2) Proper mutual exclusion for shared system resources/registers/etc.
> >    that are poked at in an ad-hoc unlocked manner currently
> >
> > Is that basically what it boils down to?
>
> Yes, that's about it.
>
> Number 1 seemed to cause the most crashes, I don't think number 2 ever
> caused any problems, but it might have, there were too many weird oopses
> to be able to rule that out.

One issue is that a lot of shared resources and their locking really
aren't known until *after* you've done a first-level probing.

The classic example of this really is a cardbus controller, and almost any
multi-function PCI device. Yes, they are "independent" PCI devices in
their own right, but they almost invariably have some shared state.

A bus driver that probes them concurrently is simply broken.

And no, the solution is not to special-case multi-function devices and
always probe the subfunctions serially. That would suck for many things
(disk controllers are *also* often subfunctions). The solution really *is*
to probe the devices serially, and let the layer that actually knows what
it is doing (the low-level driver) decide how it goes from there.

I can almost guarantee that the same is true of most other buses. For
example, I wouldn't be surprised at all to hear that you shouldn't probe
the individual LUN's of many "SCSI" devices concurrently. The number of
bugs in things like multifunction card readers (total lockup if you read
the wrong config pages or even try to read past the end of the flash etc)
is just scary.

(Server people seem to think that "SCSI" == "high-end expensive hardware
that we paid too much for", and yes, that's sometimes true, but "SCSI"
also equals "el-cheapo stuff that sells for $5 and talks something that
looks enough like SCSI commands that we want to consider it SCSI").

So I'm really convinced that the bus layer should be serial and then have
some capability to allow lower levels to do the things *they* know is fine
to do independently in a parallel way. But anything that makes that be a
bus-level choice is almost guaranteed to be broken on just about all
buses!

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Please revert 5adc55da4a7758021bcc374904b0f8b076508a11
Date: Tue, 08 May 2007 15:47:22 UTC
Message-ID: <fa.8Zh5k6xqVa8bryAam5QIxQ79pqM@ifi.uio.no>

On Tue, 8 May 2007, Adrian Bunk wrote:
>
> Let's revert it, and I'll then send a new patch containing only the
> PCI_MULTITHREAD_PROBE removal.

I really don't want to revert that removal. If somebody wants to resurrect
a part of the patch that has nothing to do with PCI, in order to do it for
some other bus, just send that as a patch (not as a revert). But no, I'm
not going to revert that patch.

And no, we should not do it at the device core level. In fact, I don't
think we should do it at that level at all.

I'm pretty sure that the performance problems are at individual device
drivers, and that the right solution is to thread at *that* level. Not
higher up.

Threading at the bus level just inevitably means things like random
numbers for devices depending on some timing/scheduling issue. That's
nasty.

Threading at a driver level still does that (ie individual disks may be
attached in some order that depends on how fast they are to respond), but
in a much more controlled fashion, and only for drivers that explicitly
say that they can do it.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Please revert 5adc55da4a7758021bcc374904b0f8b076508a11
Date: Tue, 08 May 2007 18:30:40 UTC
Message-ID: <fa.F/RQEWHWDLwmseesn6jF10MP2IA@ifi.uio.no>

On Tue, 8 May 2007, Cornelia Huck wrote:
>
> > Threading at a driver level still does that (ie individual disks may be
> > attached in some order that depends on how fast they are to respond), but
> > in a much more controlled fashion, and only for drivers that explicitly
> > say that they can do it.
>
> How is that better? You still must rely on udev for persistent device
> names.

It's better because then the *driver* can decide whether it supports
threaded probing or not.

The threaded PCI bus probing was totally and utterly broken exactly
because many drivers were simply not willing or able to handle it.

Also, quite frankly, when this came up I already posted a much better
approach for allowing arbitrary threading. It was ignored, but maybe you
want to take a look.

See

	http://lkml.org/lkml/2006/10/27/185

for last time this came up. Any driver can decide to do

	execute_in_parallel(myprobe, myarguments);

and that really *simple* thing allows synchronization points
("wait_for_parallel()") and arbitrary setting of the allowed parallelism.

I really think that is a *much* better approach. It allows, for example,
the driver to do certain things (like name allocations and any
particularly *fast* parts of probing) synchronously, and then do slow
things (like spinning up disks and actually reading the partition tables)
asynchronously.

Anyway, I will refuse to merge anything that does generic bus-level
(unless the bus is something off-the-board like SCSI) parallelism, unless
you can really show me how superior and stable it is. I seriously doubt
you can. We tried it once, and it was a total disaster.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Please revert 5adc55da4a7758021bcc374904b0f8b076508a11
Date: Tue, 08 May 2007 19:32:29 UTC
Message-ID: <fa.fLhY76DlJ1mLC3U6QMiNEWTAJS4@ifi.uio.no>

On Tue, 8 May 2007, Cornelia Huck wrote:
>
> Is there any reason why a driver shouldn't be able to handle probing
> several devices at once?

Internal driver reasons?

Look, Cornelia, there's about a million drivers in the kernel. They are
not all safe. End of discussion.

If you are willing to go through them all, fix everything, test it, talk
with all maintainers, all the more power to you.

As it is, that's not going to happen. So you should *assume* that drivers
are not expecting to be called multiple times in parallel. And if they
are, you may then assume that people don't want them to be, because they
want stable naming.

> What if two devices become hotplugged at the same time? Does this imply
> that the bus _always_ needs to do some serializing there?

You want to test it? Be my guest.

But the final nail in the coffin is that doing it at the bus level is
INFERIOR. As you yourself admitted, it's better to do some things
synchronously.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Please revert 5adc55da4a7758021bcc374904b0f8b076508a11
Date: Tue, 08 May 2007 20:02:58 UTC
Message-ID: <fa.1rc0xeithPfG9Bh03EA8ZciD/Rc@ifi.uio.no>

On Tue, 8 May 2007, Linus Torvalds wrote:
>
> But the final nail in the coffin is that doing it at the bus level is
> INFERIOR. As you yourself admitted, it's better to do some things
> synchronously.

Side note: there may well be clever combinations of "bus side" support
*together* with per-device rules.

For example, right now we probe devices by calling their "probe()"
routines synchronously. Changing that to be asynchronous simply isn't an
option, because we've seen drivers that get  unhappy (and the hotplug
argument isn't an argument: *most* drivers aren't even hotplug-capable
anyway).

BUT.

Instead of changing existign probe functionality to be asynchronous, we
could *add* a new and asynchronous part to it. For example, we could make
the rule for PCI - or other bus - devices be:

 - the bus will *first* call the "probe()" function synchronously.

 - after that one has completed, we will call "probe_async()"
   asynchronously at some point (it ie might be scheduled immediately
   after the "probe()" call, but delayed by some arbitrary issues like
   just already having too many asynchronous probes on-going or similar)

(A variation of the above might be that *everybody*s synchronous probe
function will be called first, and then the asynchronous probe functions
will be called only when they are all done. That might help with drivers
that have dependencies between different PCI functions - Cardbus comes to
mind, where the different slots look like independent PCI devices, but
slot zero is literally the master and controls some of the functions on
slot 1 too - similar issues may well happen in other multi-function
devices, and it might simplify things if you knew that the serial probe
had completed fully before the asynchronous parallel part even starts).

So an unmodified driver would basically work exactly like it does now, but
if a driver is happy with being called asynchronously, it could just
change it's

	.probe = mydriver_probe

thing into a

	.probe_async = mydriver_probe

and we can do that ona per-driver basis with that kind of really simple
one-liner change.

In fact, there is nothing wrong with having *both* a synchronous part, and
an async part:

	.probe = mydriver_setup,
	.probe_async = mydriver_spin_up_and_probe_devices,

and it would do basic setup (including, for example, the fast enumeration
of whatever devices are connected) synchronously, but then do anything
more in the async part - and the async part would still be guaranteed that
the setup has been run by the time it is scheduled (but not really have
any other guarantees).

Hmm? Would something like this work? I dunno, but it seems a hell of a lot
safer and more capable than the aborted PCI multithreaded probing that was
an "all or nothing" approach.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Please revert 5adc55da4a7758021bcc374904b0f8b076508a11
Date: Wed, 09 May 2007 17:26:41 UTC
Message-ID: <fa.w2Z0C1fnVQL1YKN9I1Hvq5o1W34@ifi.uio.no>

On Wed, 9 May 2007, Greg KH wrote:
> >
> >  but then won't the devices get registered in a random order? (i.e. whenever
> >  the async portion finishes the probing and finds the details of what there
> >  is to register)
>
> So?  We have busses today that have devices get registered in random
> order, our userspace tools can handle it just fine now.

However, that does *not* translate to: "..so  we can do it for all buses".

People *do* depend on simple things like internal harddisks showing up in
a particular order. The fact that you *can* handle it with UUID's etc does
not mean that people do, or that they should be forced to do so.

There is no inherent goodness in making the bootup more random. In fact,
there's a lot of badness to it, ranging from just not being very nice to
"really a bitch to debug". And no amount of "user level _could_ handle it"
changes that.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Please revert 5adc55da4a7758021bcc374904b0f8b076508a11
Date: Wed, 09 May 2007 17:10:15 UTC
Message-ID: <fa.7lwa7eF3dlvH+IvZZMMmXQ70GpI@ifi.uio.no>

On Wed, 9 May 2007, Greg KH wrote:
>
> Why is it dead?  Since when is PCI the only bus in the system?

Quite frankly, any probing strategy that isn't relevant to PCI simply
isn't relevant - full stop!

No, it's not the only bus, but if something isn't relevant to PCI it
shouldn't be in any general bus layer abstraction.  PCI is _that_ dominant
(and all the modern variations are just extensions on PCI, PCI didn't go
away just because it's called PCI-X or whatever).

I think buses like SCSI, USB etc are valid things to worry about as being
very different from PCI, but they are not something that the generic bus
probing code (as exemplified by the commit in question) should know/care
about. Whatever probing semantics that a SCSI adapter ends up using for
probing its bus should be up to the SCSI layer, and there is no point in
thinking that it should be a "generic bus" abstraction.

In fact, different SCSI adapters would likely have different rules. iSCSI
probably won't have anything in common with "normal" SCSI when it comes to
probing, for example.

		Linus

Index Home About Blog