Development speed (Al Viro; Linus Torvalds; Theodore Tso)

Index Home About Blog

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Slow DOWN, please!!!
Date: Wed, 30 Apr 2008 23:21:13 UTC
Message-ID: <fa.zrkRmIOIf8Rjl34wZax7LhWsbUw@ifi.uio.no>

On Thu, 1 May 2008, Willy Tarreau wrote:
> >
> > Any suggestions on how to convince people that their code is not worth
> > merging?
>
> I think you're approaching a solution Linus. If developers take a refusal
> as a punishment, maybe you can use that for trees which have too many
> unresolved regressions.

Heh. It's been done. In fact, it's done all the time on a smaller scale.
It's how I've enforced some cleanliness or process issues ("I won't pull
that because it's too ugly"). I see similar messages floating around about
individual patches.

That said, I don't think it really works that well as "the solution": it
works as a small part of the bigger picture, but no, we can't see
punishment as the primary model for encouraging better bevaiour.

First off, and maybe this is not true, but I don't think it is a very
healthy way to handle issues in general. I may come off as an opinionated
bastard in discussions like these, and I am, but when it actually comes to
maintaining code, really prefer a much softer approach.

I want to _trust_ people, and I really don't want to be a "you need to do
'xyz' or else" kind of guy.

So I'll happily say "I can't merge this, because xyz", where 'xyz' is
something that is related to the particular code that is actually merged.
But quite frankly, holding up _unrelated_ fixes, because some other issue
hasn't been resolved, I really try to not do that.

So I'll say "I don't want to merge this, because quite frankly, we've had
enough code for this merge window already, it can wait". That tends to
happen at the end of the merge window, but it's not a threat, it's just me
being tired of the worries of inevitable new issues at the end of the
window.

And I personally feel that this is important to keep people motivated.
Being too stick-oriented isn't healthy.

The other reason I don't believe in the "won't merge until you do 'xyz'"
kind of thing as a main development model is that it traditionally hasn't
worked.  People simply disagree, the vendors will take the code that their
customers need, the users will get the tree that works for them, and
saying "I won't merge it" won't help anybody if it's actually useful.

Finally, the people I work with may not be perfect, but most maintainers
are pretty much experts within their own area. At some point you have to
ask yourself: "Could I do better? Would I have the time? Could I find
somebody else to do better?" and not just in a theoretical way. And if the
answer is "no", then at that point, what else can you do?

Yes, we have personalities that clash, and merge problems. And let's face
it, as kernel developers, we aren't exactly a very "cuddly" group of
people. People are opinionated and not afraid to speak their mind. But on
the whole, I think the kernel development community is actually driven a
lot more by _positive_ things than by the stick of "I won't get merged
unless I shape up".

So quite frankly, I'd personally much rather have a process that
encourages people to have so much _pride_ in what they do that they want
it to be seen as being really good (and hopefully then that pride means
that they won't take crap!) than having a chain of fear that trickles
down.

So this is why, for example, I have so strongly encouraged git maintainers
to think of their public trees as "releases". Because I think people act
differently when they *think* of their code as a release than when they
think of it as a random development tree.

I do _not_ want to slow down development by setting some kind of "quality
bar" - but I do believe that we should keep our quality high, not because
of any hoops we need to jump through, but because we take pride in the
thing we do.

[ An example of this: I don't believe code review tends to much help in
  itself, but I *do* believe that the process of doing code review makes
  people more aware of the fact that others are looking at the code they
  produce, and that in turn makes the code often better to start with.

  And I think publicly announced git trees and -mm and linux-next are
  great partly because they end up doing that same thing. I heartily
  encourage submaintainers to always Cc: linux-kernel when they send me a
  "please pull" request - I don't know if anybody else ever really pulls
  that tree, but I do think that it's very healthy to write that message
  and think of it as a publication event. ]

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Slow DOWN, please!!!
Date: Thu, 01 May 2008 15:27:26 UTC
Message-ID: <fa.KB2UzyjGZNuKH8HEWux9gHfGSpc@ifi.uio.no>

On Thu, 1 May 2008, Rafael J. Wysocki wrote:
>
> Okay, so what exactly are we going to do to address the issue that I described
> in the part of my last message that you skipped?

Umm. I don't really see anything to say. You said:

> Still, the issue at hand is that
> (1) The code merged during a merge window is somewhat opaque from the  tester's
>      point of view and if a regression is found, the only practical means  to
>     figure out what caused it is to carry out a bisection (which generally  is
>     unpleasant, to put it lightly).
> (2) Many regressions are introduced during merge windows (relative to the
>     total amount of code merged they are a few, but the raw numbers are
>     significant) and because of (1) the process of removing them is  generally
>     painful for the affected people.
> (3) The suspicion is that the number of regressions introduced during  merge
>     windows has something to do with the quality of code being below
>     expectations, that in turn may be related to the fact that it's being
>     developed very rapidly.

And quite frankly, (2) and (3) are both: "merge windows introduce new
bugs", and that's such an uninteresting tautology that I'm left
wordless.  And (1) is just a result of merrging lots of stuff.

Of course the new bugs / regressions are introduced during the merge
window.  That's when we merge new code.  New bugs don't generally happen
when you don't get new code.

And of course finding bugs is always painful to everybody involved.

And of course the bugs indicate something about the quality of code
being merged.  Perfect code wouldn't have bugs.

So what you are stating isn't interesting, and isn't even worthy of
discussion.  The way you state it, the only answer is: don't take new
code, then.  That's what your whole argument always seems to boild down
to, and excuse me for (yet again) finding that argument totally
pointless.

So let me repeat:

 (1) we have new code. We always *will* have new code, hopefully. A few
     million lines pe year.

     If you don't accept this, I don't have anything to say.

 (2) we need a merge window.  That is a direct result not of wanting to
     have lots of code at the same time, but of the _reverse_ issue: we
     want to have times of relative calm.

     And again, if you continue to see the merge window as the
     "problem", rather than as the INEVITABLE result of wanting to have
     a calm period, there's no point in talking to you.

 (3) Ergo, there's a very fundamental and basic and inescapable result:
     we absolutely _will_ have times when we get lots and lots of new
     code.

So these are not "problems".  They are *facts*.  Stating them as
problems is stupid and pointless.  I'm not going to discuss this with
you if you cannot get over this.

So please accept the facts.

Once you accept the facts, you can state the things you can change.  But
the things you cannot change is the merge window, and the fact that we
get a lot of new code at a high rate (where the merge window will
inevitably compress that rate, so that we have _another_ window where
the rate is lower).

So stop arguing against facts, and start arguing about other things that
can be argued about. That's all I'm saying.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Slow DOWN, please!!!
Date: Thu, 01 May 2008 18:28:27 UTC
Message-ID: <fa.qV91Lf5RUnoDubuVWHGOg4VqoG0@ifi.uio.no>

On Thu, 1 May 2008, Al Viro wrote:
> On Thu, May 01, 2008 at 10:41:21AM -0700, Linus Torvalds wrote:
> >
> > Same goes for "we should all just spend time looking at each others
> > patches and trying to find bugs in them". That's not a solution, that's a
> > drug-induced dream you're living in.
>
> As one of those obviously drug-addled freaks who _are_ looking for bugs...
> Thank you so fucking much ;-/

That's not what I meant, and I think you know it.

Of course as many people as possible should look at other peoples patches
and comment on them. But saying so won't _make_ it so.  And it's also
something that we have done since day #1 _anyway_, so anybody who thinks
that it would improve code quality from where we already are, should
explain how he thinks the increase would be caused, and how it would
happen.

So when we're looking at improvement suggestions, they should be real
suggestions that have realistic goals, not just wishes. And they
shouldn't be the things we *already* do, because then they wouldn't
be improvements.

In other words: do people have realistic ideas for how to make others
spend _more_ time looking at patches? And not just _wishing_ people did
that?

			Linus

From: Al Viro <viro@ZenIV.linux.org.uk>
Newsgroups: fa.linux.kernel
Subject: Re: Slow DOWN, please!!!
Date: Thu, 01 May 2008 19:38:06 UTC
Message-ID: <fa.Rzz6bXpDv+/Js3UWOZfVJ1wuFUY@ifi.uio.no>

On Thu, May 01, 2008 at 11:23:43AM -0700, Linus Torvalds wrote:
> On Thu, 1 May 2008, Al Viro wrote:
> > On Thu, May 01, 2008 at 10:41:21AM -0700, Linus Torvalds wrote:
> > >
> > > Same goes for "we should all just spend time looking at each others
> > > patches and trying to find bugs in them". That's not a solution, that's a
> > > drug-induced dream you're living in.
> >
> > As one of those obviously drug-addled freaks who _are_ looking for bugs...
> > Thank you so fucking much ;-/
>
> That's not what I meant, and I think you know it.

FWIW, the way I'd read that had been "face it, normal folks don't *do*
that and if you hope for more people doing code review - put down your
pipe, it's not even worth talking about".  Which managed to get under
my skin, and that's not something that happens often...

Anyway, I'm glad it had been a misparsing; my apologies for the reaction.

> So when we're looking at improvement suggestions, they should be real
> suggestions that have realistic goals, not just wishes. And they
> shouldn't be the things we *already* do, because then they wouldn't
> be improvements.
>
> In other words: do people have realistic ideas for how to make others
> spend _more_ time looking at patches? And not just _wishing_ people did
> that?

The obvious answer: amount of areas where one _can_ do that depends on
some things that can be changed.  Namely:
	* one needs to understand enough of the area or know where/how
to get the information needed for that.  I've got some experience with
the latter and I suspect that most of the folks who do active reviews
have their own set of tricks for getting into the unfamiliar area fast.
Moreover, having such set of tricks is probably _the_ thing that makes
us able to do that kind of work.
	Sharing such (i.e. "here's how one wades through unfamiliar
area and gets a sense of what's going on there; here's what one looks
out for; here's how to deal with data structures; here are the signs
of problematic lifetime logics; here's how one formulates hypothesis
about refcounting rules; here's how one verifies such and looks for
possible bugs in that area; etc.) is a Good Idea(tm).
	Having the critical areas documented with easy to review in
mind is another thing that would probably help.  And yes, it won't
happen overnight, it won't happen for all areas and it won't be mandatory
for maintainers, etc.  Previous part (i.e. which questions to ask
about data structures, etc.) would help with that.
	FWIW, I'm trying to do that - right now I'm flipping between
wading through Cthulhu-damned fs/locks.c and its friends and getting
the notes I've got from the last month work into edible form (which
includes translation into something that resembles normal English,
among other things - more than half of that is in... well, let's call
it idiom-rich Russian).
	* patches should be visible *when* *they* *can* *be* *changed*.
If it's "Linus had pulled from linux-foo.git and that included a merge
from linux-foobar.git, which is developed on foobar-wank@hell.knows.where",
it's too late.  It's not just that you don't revert; it's that you _can't_
realistically revert in such situation - not without very massive work.
And I don't know what _can_ be done about that, other than making it
socially discouraged.  To some extent it's OK, but my impression is that
some areas are as bad as CVS-based "communities" had been and switch to
git has simply hidden the obvious signs of trouble...

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Slow DOWN, please!!!
Date: Thu, 01 May 2008 01:41:30 UTC
Message-ID: <fa.URVKFVL6Q1dvqLiHcqJyKbqA/Mo@ifi.uio.no>

On Wed, 30 Apr 2008, Linus Torvalds wrote:
>
> You (and Andrew) have tried to argue that slowing things down results in
> better quality,

Sorry, not Andrew. DavidN.

Andrew argued the other way (quality->slower), which I also happen to not
necessarily believe in, but that's a separate argument.

Nobody should ever argue against raising quality.

The question could be about "at what cost"? (although I think that's not
necessarily a good argument, since I personally suspect that good quality
code comes from _lowering_ costs, not raising them).

But what's really relevant is "how?"

Now, we do know that open-source code tends to be higher quality (along a
number of metrics) than closed source code, and my argument is that it's
not because of bike-shedding (aka code review), but simply because the
code is out there and available and visible.

And as a result of that, my personal belief is that the best way to raise
quality of code is to distribute it. Yes, as patches for discussion, but
even more so as a part of a cohesive whole - as _merged_ patches!

The thing is, the quality of individual patches isn't what matters! What
matters is the quality of the end result. And people are going to be a lot
more involved in looking at, testing, and working with code that is
merged, rather than code that isn't.

So _my_ answer to the "how do we raise quality" is actually the exact
reverse of what you guys seem to be arguing.

IOW, I argue that the high speed of merging very much is a big part of
what gives us quality in the end. It may result in bugs along the way, but
it also results in fixes, and lots of people looking at the result (and
looking at it in *context*, not just as a patch flying around).

And yes, maybe that sounds counter-intuitive. But hey, people thought open
source was counter-intuitive. I spent years explaining why it should work
at all!

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Slow DOWN, please!!!
Date: Thu, 01 May 2008 02:01:50 UTC
Message-ID: <fa.rkmXNHRndsl1BaGq1bl8+1WYITI@ifi.uio.no>

On Wed, 30 Apr 2008, David Miller wrote:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Wed, 30 Apr 2008 18:40:39 -0700 (PDT)
>
> > IOW, I argue that the high speed of merging very much is a big part of
> > what gives us quality in the end. It may result in bugs along the way, but
> > it also results in fixes, and lots of people looking at the result (and
> > looking at it in *context*, not just as a patch flying around).
>
> This is a huge burdon to put on people.
>
> The more broken stuff you merge, the more people are forced to track
> these problems down so that they can get their own work done.

I'm not saying we should merge crap.

You can take any argument too far, and clearly it doesn't mean that we
should just accept *anything*, because it will magically be gilded by its
mere inclusion into the kernel. No, I'm not going to argue that.

But I do want to argue against the notion that the only way to raise
quality is to do it before it gets merged. It's often better to merge
early, and fix the issues the merge brings up early too!

Release early, release often. That was the watch-word early in Linux
kernel development, and there was a reason for it. And it _worked_. Did it
mean "release crap, release anything"? No. But it did mean that things got
lots more exposure - even if those "things" were sometimes bugs.

			Linus

From: Al Viro <viro@ZenIV.linux.org.uk>
Newsgroups: fa.linux.kernel
Subject: Re: Slow DOWN, please!!!
Date: Thu, 01 May 2008 02:22:30 UTC
Message-ID: <fa.Ci2i0Mh98TAC7sYXk94QTXTJ3es@ifi.uio.no>

On Wed, Apr 30, 2008 at 06:40:39PM -0700, Linus Torvalds wrote:

> Now, we do know that open-source code tends to be higher quality (along a
> number of metrics) than closed source code, and my argument is that it's
> not because of bike-shedding (aka code review), but simply because the
> code is out there and available and visible.

Really?  And how, pray tell, being out there will magically improve the
code?  "With enough eyes all bugs are shallow" stuff out of ESR's arse?

FWIW, after the last month's flamefests I decided to actually do something
about review density of code in the areas I'm theoretically responsible
for.  Namely, do systematic review of core data structure handling (starting
with the place where most of the codepaths get into VFS - descriptor tables
and struct file), doing both blow-by-blow writeup on how that sort of things
is done and documentation of the life cycle/locking rules/assertions made
by code/etc.  I made one bad mistake that held the things back for quite
a while - sending heads-up for one of the worse bugs found in process to
never-sufficiently-damned vendor-sec.  The last time I'm doing that, TYVM...

Anyway, I'm going to get the notes on that stuff in order and put them in
the open.  I really hope that other folks will join the fun afterwards.
The goal is to get a coherent braindump that would be sufficient for
people new to the area wanting to understand and review VFS-related code -
both in the tree and in new patches.

files_struct/fdtable handling is mostly dealt with, struct file is only
partially done - unfortunately, struct file_lock has to be dealt with
before that and it's a (predictable) nightmare.  On the other end of
things, fs_struct is not really started, vfsmount review is partially
done, dentry/superblock/inode not even touched.

Even with what little had been covered... well, let's just say that it
caught quite a few fun turds.  With typical age around 3-4 years.  And
VFS is not the messiest part of the tree...

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Slow DOWN, please!!!
Date: Thu, 01 May 2008 00:39:15 UTC
Message-ID: <fa.Bbdf2+Pi+s9nXSt5JcnUXKkWGfA@ifi.uio.no>

On Wed, 30 Apr 2008, david@lang.hm wrote:
>
> look at the mess of the distro kernels in the 2.5 and earlier days. having
> them maintain a large body of patches didn't work for them or for the mainline
> kernel.

Exactly.

I do think Rafael's TCP analogy is somewhat germane, but it misses the
point that the longer the queue gets, the *worse* the quality gets. It
gets worse because the queued-up patches don't actually get tested any
more during their queueing, and because everybody else who isn't
intimately involved with production of said patches just gets *less*
inclined to look at big patch-queue than a small one.

So having a long queue and trying to manage it (by some kind of negative
feedback) is counter-productive, because by the time that situation
happens, you're basically screwed already.

That's what we largely had with the Xen merge, for example. A lot of the
code had been around for basically _forever_, and the people involved in
reviewing it got really tired of it, and there was no way in *hell* a new
person would ever start reviewing the huge backlog. Once it is massive,
it's just too massive.

So trying to push back from the destination is really painful. It's also
aggravating for everybody else. When people were complaining about me not
scaling (remember those flame-wars? Now the complaint is basically the
reverse), it was very painful for everybody, and most of all me.

So I really really hope that if we need throttling (and I do want to point
out that I'm not entirely sure we do - I think the issue is not "number of
commits", but "quality of code", and I do _not_ agree that the two are
directly related in any way), it should be source-based.

Trying to make sure that the source throttles, and not by making
developers feel unproductive. And quite frankly, most things that throttle
the source are of the annoying and non-productive kind. The classic source
throttle tends to be to make it very "expensive" to do development, by
introducing various barriers.

The barriers are usually "you need to have <n> other people look at it",
or "you need to pass this five-hour test-suite", and almost invariably,
the big issue is not code quality, but literally to slow things down. And
call me crazy, but I think that a process that is designed to not
primarily get quality, but slow things down, is likely to generate not
just bad feelings, but actually much worse code too!

And the thing is, I don't even think our main problem is "lots of
changes". I think we've actually been very successful at managing lots of
change. Our problems are elsewhere.

So I think our primary problems are:

 - making mistakes is inevitable and cannot be avoided, but we can still
   add more layers to make it less likely. But these should *not* be aimed
   at being cumbersome to slow things down - they should basically
   pipeline perfectly, so that there is no frustrating ping-pong latency.

   And linux-next falls into this kind of category: it doesn't really slow
   down development, but it would be another "pipeline stage" in the
   process.

   (In contrast, requiring every patch to have <n> reviewed-by etc would
   add huge latencies and slow down things hugely, and just generally be
   make-believe work once everybody started gaming the system because it's
   so irritating)

 - we do want more testing as part of the pipeline (but again, not
   synchronously - but to speed up feedback for when things go wrong. So
   it wouldn't get rid of the errors, but if it happens quickly enough,
   maybe we'd catch things early in the development pipeline before it
   even hits my tree)

   Having more linux-next testing would be great.

 - Regular *user* debuggability and reporting.

   Quite frankly, I think the reason a lot of people really like being
   able to bisect bugs is not that "git bisect" is such an inherently cool
   program, but because it is a really great tool for *users* to
   participate in the debugging, in ways oops reports etc were not.

   Similarly, I love the oops/warning report statistics that Arjan sends
   out. With vendor support users help debugging and reporting without
   even necessarily knowing about it. Things like *that* matter a lot.

Notice how none of the above are about slowing down development.  I don't
think quality and speed of development are related. In fact, I think
quality and speed often go hand-in-hand: the same way some of the best
programmers are also the most productive, I think some of the most
productive flows are likely to generate the best code!

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Slow DOWN, please!!!
Date: Thu, 01 May 2008 03:48:15 UTC
Message-ID: <fa.uc69JJ64iP7y4Buvr4h3vl5lAgk@ifi.uio.no>

On Thu, 1 May 2008, Paul Mackerras wrote:
>
> Having things ready by the time the merge window opens is difficult
> when you don't know when the merge window is going to open.  OK, after
> you release a -rc6 or -rc7, we know it's close, but it could still be
> three weeks off at that point.  Or it could be tomorrow.

Well, if the tree is ready, you shouldn't need to care ;)

That said:

> By the way, if you do want to make that rule, then there's a really
> easy way to do it - just pull linux-next, and make that one pull be
> the entire merge window. :)  But please give us at least a week's
> notice that you're going to do that.

I'm not going to pull linux-next, because I hate how it gets rebuilt every
time it gets done, so I would basically have to pick one at random, and
then that would be it.

I also do actually try to spread the early pulls out a _bit_, so that
if/when problems happen, there's some amount of information in the fact
that something started showing up between -git2 and -git3.

HOWEVER.

One thing that was discussed when linux-next was starting up was whether I
would maintain a next branch myself, that people could actually depend on
(unlike linux-next, which gets rebuilt).

And while I could do that for really core infrastructure changes, I really
would hate to see something like that become part of the flow - because
I'd hope things that really require it should be so rare that it's not
worth it for me to maintain a separate branch for it.

But there could be some kind of carrot here - maybe I could maintain a
"next" branch myself, not for core infrastructure, but for stuff where the
maintainer says "hey, I'm ready early, you can pull me into 'next'
already".

In other words, it wouldn't be "core infrastructure", it would simply be
stuff that you already know you'd send to me on the first day of the merge
window. And if by maintaining a "next" branch I could encourage people to
go early, _and_ let others perhaps build on it and sort out merge
conflicts (which you can't do well on linux-next, exactly because it's a
bit of a quick-sand and you cannot depend on merging the same order or
even the same base in the end), maybe me having a 'next' branch would be
worth it.

But it would have to be low-maintenance. Something I might open after
-rc4, say, and something where I'd expect people to only ask me to pull
_once_ (because they really are mostly ready, and can sort out the rest
after the merge window), and if they have no open regressions (again, the
"carrot" for good behaviour).

I'm not saying it's a great idea, but if that kind of flow makes sense to
people, maybe it should be on the table as an idea or at least see if it
might work.

But let's see how linux-next works out. Maybe all the subsystem
maintainers can just get their tree in shape, see that it merges in
linux-next, and not even need anything else. Then, when the merge window
opens, if you're ready, just let me know.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Slow DOWN, please!!!
Date: Wed, 30 Apr 2008 20:32:14 UTC
Message-ID: <fa.9zRFuDRyuUDRaRFcgZJugIETTm8@ifi.uio.no>

On Wed, 30 Apr 2008, Andrew Morton wrote:
>
> <jumps up and down>
>
> There should be nothing in 2.6.x-rc1 which wasn't in 2.6.x-mm1!

The problem I see with both -mm and linux-next is that they tend to be
better at finding the "physical conflict" kind of issues (ie the merge
itself fails) than the "code looks ok but doesn't actually work" kind of
issue.

Why?

The tester base is simply too small.

Now, if *that* could be improved, that would be wonderful, but I'm not
seeing it as very likely.

I think we have fairly good penetration these days with the regular -git
tree, but I think that one is quite frankly a *lot* less scary than -mm or
-next are, and there it has been an absolutely huge boon to get the kernel
into the Fedora test-builds etc (and I _think_ Ubuntu and SuSE also
started something like that).

So I'm very pessimistic about getting a lot of test coverage before -rc1.

Maybe too pessimistic, who knows?

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Slow DOWN, please!!!
Date: Wed, 30 Apr 2008 22:09:30 UTC
Message-ID: <fa.0nxkbyRh3LH42GPrl/rbHFTJdGo@ifi.uio.no>

On Wed, 30 Apr 2008, Rafael J. Wysocki wrote:
>
> How bisectable is linux-next, BTW?

Each _individual_ release will be entirely bisectable, since it's all git
trees, and at no point does anything collapse individual commits together
like -mm does.

HOWEVER.

Due to the way linux-next works, each individual release will be basically
unrelated to the previous one, so it gets a bit more exciting indeed when
you say "the last linux-next version worked for me, but the current one
does not".

Git can actually do this - you can make the previous (good) linux-next
version be one branch, and the not-directly-related next linux-next build
be another, and then "git bisect" will _technically_ work, but:

 - it will not necessarily be as efficient (because the linux-next trees
   will have re-done all the merges, so there will be new commits and
   patterns in between them)

 - but much more distressingly, if the individual git trees that got
   merged into linux-next were also using rebasing etc, now even all the
   *base* commits will be different, and saying that the old release was
   good tells you almost nothing about the new release!

   (The good news is that if only a couple of trees do that, the bisection
   information from the other trees that don't do it will still be valid
   and useful and help bisection)

 - also, while it's very easy for somebody who knows and understands git
   branches, it's technically still quite a bit more challenging than just
   following a single tree that never rebases (ie mine) and just bisecting
   within that one.

So yes, git bisect will work in linux-next, and the fundamental nature of
git-bisect will not change at all, but it's going to be a bit weaker
"between different versions" of linux-next than it would be for the normal
git tree that doesn't do the "merge different trees all over again" thing
that linux-next does.

		Linus

From: Theodore Tso <tytso@MIT.EDU>
Newsgroups: fa.linux.kernel
Subject: Re: RFC: starting a kernel-testers group for newbies
Date: Thu, 01 May 2008 17:26:33 UTC
Message-ID: <fa.11o/fAI1FXV5tIsJ3fK1VUHRFKU@ifi.uio.no>

On Thu, May 01, 2008 at 08:49:19AM -0700, Andrew Morton wrote:
> Another fallacy which Arjan is pushing (even though he doesn't appear to
> have realised it) is "all hardware is the same".
>
> Well, it isn't.  And most of our bugs are hardware-specific.  So, I'd
> venture, most of our bugs don't affect most people.  So, over time, by
> Arjan's "important to enough people" observation we just get more and more
> and more unfixed bugs.
>
> And I believe this effect has been occurring.

So the question is if we have a thousand bugs which only affect one
person each, and 70 million Linux users, how much should we beat up
ourselves that 1,000 people can't use a particular version of the
Linux kernel, versus the 99.9% of the people for which the kernel
works just fine?

Sometimes, we can't make everyone happy.

At the recent Linux Collaboration Summit, we had a local user walk up
to a microphone, and loosely paraphrased, said, "WHINE WHINE WHINE
WHINE I have have a $30 DVD drive that doesn't work with Linux.  WHINE
WHINE WHINE WHINE WHINE What are *you* going to do to fix my problem?"

Some people like James responded very diplomatically, with "Well, you
have to understand, the developer might not have your hardware, and
there's a lot of broken out here, etc., etc."  What I wanted to tell
this user was, "Ask not what the Linux development community can do
for you.  Ask what *you* can do for Linux?"  Suppose this person had
filed a kernel bugzilla bug, and it was one of the hundreds or
thousands of non-handled bugs.  Sure, it's a tragedy that bugs pile
up.  But if they pile up because of crappy hardware, that's not a
major tragedy.  If we can figure out how to blacklist it, and move on,
we should do so.

> And why can't they work on the bug?  Usually, because they found a
> workaround.  People aren't going to spend months sitting in front of a
> non-functional computer waiting for kernel developers to decide if their
> machine is important enough to fix.  They will find a workaround.  They
> will buy new hardware.

Hey, in this particular case, if this user worked around the problem
by buying new hardware, it was probably the right solution.  As far as
we know we don't have a systematic problem where huge numbers DVD
drives aren't working, so if there are a few odd ball ones that are
out there, we just CAN'T self-flagellate ourselves that we're not
fixing all bugs, and letting some bugs pile up.

> Which leads us to Arjan's third fallacy:
>
>    "How many bugs that a sizable portion of users will hit in reality
>    are there?" is the right question to ask...
>
> well no, it isn't.  Because approximately zero of the hardware bugs affect
> a sizeable portion of users.  With this logic we will end up with more and
> more and more and more bugs each of which affect a tiny number of users.
> Hundreds of different bugs.  You know where this process ends up.

... and maybe we can't solve hardware bugs.  Or that crappy hardware
isn't worth holding back Linux development.  And I'm not sure ignoring
it is that horrible of a thing.  And in practice, if it's a hardware
bug in something which is very common, it *will* get noticed very
quickly and fixed.  But if it's in a hardware bug in some rare piece
of hardware, the user is going to have to either (a) help us fix it,
or (b) decide that his time is more valuable and that buying another
$30 DVD drive might be a better use of his and our time.

Back when I was the serial driver maintainer, I certainly made those
kinds of triage decisions.  I knew the serial driver was working on
the vast majority of the Linux users, because if it broke in a major
ways, I would hear about it, in spades and get lots and lots of hate
mail.  And there were plenty of crappy ISA boards out there; and I
would help them out when I could, and sometimes spend more volunteer
time helping them by changing one or two outb() to outb_p()'s (yes,
that really made a difference; remember, we're talking about crappy PC
class hardware with hardware bugs), but at the end of the day, past a
certain point, even with a willing and cooperative end-user, I would
have to call it a day, and give up, and tell them to get another
serial card.  (And back in the days of ISA boards, we couldn't even
use blacklists.)

And you know what?  Linux didn't collapse into a steaming pile of dung
when I did that.  We're all volunteers, and we need to recognize there
are limits to what we can do --- otherwise, it will way to easy to
burn out and become a bitter shell of a maintainer....

Even BSD fan boys will realize that in BSD land, you have to do even
more of this; if there's random broken hardware, or simply a lack of a
device driver, very often your only recourse is to work around the
problem by buying another serial card, or wifi card, or whatever.  And
this happens much more with BSD than Linux, simply because they
support fewer devices to begin with.

					- Ted

P.S.  We should really try to categorize bugs so we can figure out
what percentage of the bugs are device driver bugs, and what
percentage are core kernel bugs, which are "if you stress the system
too badly" sort of bugs, or "if you do something bad like yank the USB
stick without unmounting the filesystem first" sort of thing.  I think
if we did this, the numbers wouldn't look quite so scary, because it's
things like device driver problems with wierd sh*t bugs are not
comparable with core functionality bugs in the SLUB allocator, for
example.

Index Home About Blog