The merge window (Linus Torvalds)

Index Home About Blog

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [GIT]: Networking
Date: Tue, 19 Aug 2008 17:03:56 UTC
Message-ID: <fa.P0/doFAUiDzkK2M5tGX6PIwEDfM@ifi.uio.no>

On Tue, 19 Aug 2008, David Miller wrote:
>
>  114 files changed, 1533 insertions(+), 898 deletions(-)

David, this absolutely _has_ to stop.

We're after -rc3. Your network merges continue to be too f*cking large,
and this has been going on for many months now. If you cannot throttle
people, I will have to throttle you and stop pulling things.

I'm going to take this, but really - this isn't just new drivers or
something like that that you've used as an excuse for big pulls before,
this is a _lot_ of changes to existing code.

Tell your people to look at the regression list, and if it's not there,
they should stop.

I realize that this problem is partly because when I see the pull requests
from you, I effectively see a combined pull from multiple different
sources, and in that sense it's not quite as big. But the networking pulls
have _consistently_ had the problem that they keep on being big not just
after -rc3, but after -rc4 and on, and I get the distinct feeling that
you're not moving the pain downwards, and aren't telling the people under
you to keep it clean and minimal and regressions only.

For example, those BT updates looked in no way like regression fixes. So
what the f*ck were they doing there? And why do you think all those driver
updates cannot cause new regressions?

If it's not a regression fix, it shouldn't be there. It should be in the
queue for the next version. Why is that apparently so hard for the network
people?

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [GIT]: Networking
Date: Tue, 19 Aug 2008 20:54:28 UTC
Message-ID: <fa.Z+dF7engD6uzQZCyIKGM/kply4M@ifi.uio.no>

On Tue, 19 Aug 2008, Marcel Holtmann wrote:
>
> Also I cleaned up the MAINTAINERS file entries for Bluetooth. Are these
> considered harmful now and should be postponed to the next merge window?
> They can obviously not introduce any regressions?

What I consider harmful is not any individual commit per se, but the
mindset that clearly says "hey, this particular commit is good, let's
push it up".

And all of the commits are _individually_ fine and the likelihood for
breakage is probably damn low, but when you have lots of them, that
doesn't work any more.

The whole point of the merge window is that you should be sending good,
tested commits _then_. And if you miss the merge window, then you queue
them up for the next one.

As it is, it seems like some people think that the merge window is when
you send any random crap that hasn't even been tested, and then after the
merge window you send the stuff that looks "obviously good".

How about raising your quality control a bit, so that I don't have to
berate you? Send the _obviously good_ stuff during the merge window, and
don't send the "random crap" AT ALL. And then, during the -rc series, you
don't do any "obviously good" stuff at all, but you do the "absolutely
required" stuff.

The rule should be that if you have any doubt _what-so-ever_ that
something is absolutely required, you simply don't send it during the -rc
phase. And if you have any doubt at all about something not working, you
don't send it during the merge window either!

The merge window is not for "let's get this tested, so that we can fix it
during the -rc". And the stabilization phase is not for "this one looks
obviously correct and safe".

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [GIT]: Networking
Date: Tue, 19 Aug 2008 21:32:15 UTC
Message-ID: <fa.cQ5OBfs9Q5MdvmK+YxeHjt0h9go@ifi.uio.no>

On Tue, 19 Aug 2008, David Miller wrote:
>
> I agree, we should be working on regressions fixes now.

Not just now. For the last two weeks, yes.

> And we should essentially be doing so up until the merge window opens up
> again, right?

Yes. But any new code should go into another branch (or delayed entirely,
but that probably doesn't work wekk for you guys) so that by the time the
merge window opens up, it's already ready and rearing to go, and
preferably pretty well tested too.

The problem is, you guys end up accepting a lot of stuff even after the
merge window. I know why - it's easy to do. It looks obviously fine. And
yeah, I let things slide.

The problem is, I've let things slide for a long time, and you guys don't
feel the pain.

> When do people following those rules have time to work on new stuff?

You can work on the new stuff too, but DON'T F*CKING SEND IT TO ME!

What's so hard to understand about that?

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [GIT]: Networking
Date: Tue, 19 Aug 2008 20:48:38 UTC
Message-ID: <fa.Wx1QON367ya2K5Xz9N14n51BKfc@ifi.uio.no>

On Tue, 19 Aug 2008, David Miller wrote:
>
> The BT bits were the only part I really considered borderline,
> and I was going to push back on Marcel.

I really don't see the e1000 and netxen updates as being critical either.
Sure, they look like driver improvement, but "improvement" is not what the
-rc3+ series is about.

Same goes for all the loopback changes. They look like cleanups or feature
enables.

IOW, it all looks like good commits, but quite a _lot_ of that queue looks
like good commits that should happen during the merge window, not during
the stabilization phase.

And this is by no means unique to _this_ pull request. It's been a very
clear pattern for a long time now. The networking area tends to be one of
the absolutely *most* active ones during the post-rc1 phase.

[ Yeah, in all fairness some architectures also do that, but at least I
  feel like I _really_ don't need to care when I get a diffstat that only
  touches arch/sh/* or something like that. ]

> But to be honest, I haven't seen bluetooth updates from him
> for such a long time I felt that being strict here would just
> exacerbate the problem.

I pointed out the BT ones as standing out (they were larger than some of
the other patches too), but I really don't think this was in any way
limited to BT in any shape, form or color. Quite frankly, looking through
the thing, my gut feel is that about _half_ the commits over-all should
probably have been in the queue for the next release.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [GIT]: Networking
Date: Tue, 19 Aug 2008 21:29:19 UTC
Message-ID: <fa.2fnU+OazJY4PMBSVJFqFeS3Zcrg@ifi.uio.no>

On Tue, 19 Aug 2008, Rafael J. Wysocki wrote:
>
> FWIW, they fix the recent regression tracked as
> http://bugzilla.kernel.org/show_bug.cgi?id=11316 .

Yeah, and the real cause was apparently another commit that *ALSO*
happened after the merge window!

Guys, you're making excuses for the problem.

The problem that triggered this bugus loopback change was commit
e5a4a72d4f88f4389e9340d383ca67031d1b8536. Look at when that one was done.

This is my whole _point_. The networking layer is doing development during
the -rc window. And you guys are making excuses for it. Wake up, guys!

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [GIT]: Networking
Date: Tue, 19 Aug 2008 21:47:30 UTC
Message-ID: <fa.+glS/7kVMafEMMDyGMkLEyY3Uus@ifi.uio.no>

On Tue, 19 Aug 2008, David Miller wrote:
>
> That change was made under the pretext that it was tested heavily and
> that if we hit any problem whatsoever with it that we couldn't fix
> quickly it would be reverted.

David, I will say this one more time:

 - as long as you concentrate on individual commits, you're missing the
   big picture.

you can _always_ make excuses for individual commits. That's not my point.
Or rather, it actually verry much _is_ my point. If you have the mindset
that you're looking for excuses why any individual commit is ok to merge,
then you don't end up with a coupld of individual commits any more: you
end up with a LOT OF CHURN.

It's not the individual commits. You're looking at the individual trees,
and you're missing the forest. The problem isn't the individual trees. The
problem is that there's a metric sh*tload of individual trees, what we in
the tree industry call a 'forest'. You're not seeing it.

And btw, don't get me wrong - you're not the only problem spot. During the
-rc's leading up to 2.6.26, drivers/merdia was actually a _bigger_
problem. I happen to care less about that (the same way I care less about
some odd-ball architectures), but I have to admit that drivers/media was a
total disaster last time around.

So if it makes you feel any better, others have been even worse.  But this
networking problem ha been going on for quite a while.

So the problem here really is that you seem overly eager to make excuses
for individual patches. And if they _stayed_ "individual" it would all be
good. But they don't seem to.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [GIT]: Networking
Date: Tue, 19 Aug 2008 21:23:15 UTC
Message-ID: <fa.6gq/ydD85zsxNKeaicV593Gk7Ng@ifi.uio.no>

On Tue, 19 Aug 2008, David Miller wrote:
>
> Those fix a performance regression reported by a real user.

Since when?

The thing is, I can do a

	gitk v2.6.24.. drivers/net/loopback.c

as well as anybody else, and TSO has not been enabled for loopback at
least since 2.6.24. Going back to 2.6.23 (which has more changes that I
won't comment on), it looks like that LOOPBACK_TSO thing you removed was
there back then too.

So the performance regression if it happened must have been due to
something else, no?

Oh, I'm sure that enabling TSO speeds things up, but apparently it also
basically enables a code-path that hasn't been enabled since at least
2.6.23, no?

Really, David. Was the performance regression due to something else, and
then by enabling LOOPBACK_TSO it hid the problem? Or what? The thing is,
-rc3 is _not_ the point to apparently change something that hasn't been
changed in about a year (I didn't go any further back in history).

So what's going on? Do you seriously think it's a good point in time to
enable TSO for loopback after a long time of apparently _not_ being
enabled?

It smells like excuses to me. Was this really a "must be in 2.6.27" thing?

And no, it wouldn't bother me if this was a rare thing. Again, let me
repeat: the problem is not any of the individual commits _per_se_. The
problem is that the network layer stands out. And not in a good way. It
stands out as being a layer that gets a _lot_ of churn late in the -rc
game.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [GIT]: Networking
Date: Tue, 19 Aug 2008 21:53:36 UTC
Message-ID: <fa.GcFkmhwwK7e5V5+S388rwRvm0Y4@ifi.uio.no>

On Tue, 19 Aug 2008, David Miller wrote:
>
> But I think you're throwing the baby out with the bath water, the
> majority of that pull contained legitimate real bug regression fixes.

.. and notice how I

 (a) took it

and

 (b) am asking for you to be more careful?

In other words, I would be a lot happier if you didn't say "majority". I
would be a ton happioer if you could HONESTLY say that every single one
was a regression.

And the thing is, you cannot. Some of the ones I pointed you to were
actually regressions due to _other_ patches you had much too happily sent
me after the merge window had already closed).

> That's why some other developers are coming out of the woods and
> defending me, they don't have to do that, but they do it because they
> feel I'm being slighted at least a little bit.

Umm. The only defending I have seen was a F*CKING DISGRACE, since nobody
apparently had the balls to stand up and admit that the whole problem
happened after -rc1 in the first place!

In other words, the "defense" was just making excuses for EXACTLY the
behaviour I'm trying to tell you shouldn't have happened in the first
place.

Please. You're still making excuses for this, even after I pointed out
that ALL of the problems with the whole loopback driver thing happened
after the merge window.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [GIT]: Networking
Date: Tue, 19 Aug 2008 22:41:22 UTC
Message-ID: <fa.KmBDulVU0PUfd3a9ZBiR2Vpuj7k@ifi.uio.no>

On Wed, 20 Aug 2008, Evgeniy Polyakov wrote:
>
> I belive it was you who told that there is no black and white (another
> guy told that there is no spoon, I frequntly confuse).

Yes.

> Any changes made no matter when can not be 100% tested in laboratory
> environment, even fixes, which look obviously.

100% agreed.

Please note that I'm not against these things slipping in occasionally.
The reason I brought this up in the first place really wasn't the loopback
driver issue at all. The reason I brought it up was simply the fact that
when I compare the size and frequency of changes, the networking pulls
tend to be the worst of the lot of the "core" kernel changes.

I say "core" kernel changes, because things are usually worse for the
outliers. As mentioned, networking is actually one of the _better_ guys if
you start comparing to the DVB people, or to some of the architectures
that often slip the merge window _entirely_, and *all* their changes come
in during -rc2 or something.

So it's not that networking is especially bad on an absolute scale in this
regard. And it's not like it doesn't happen all the time for everybody
else too. But I think networking has ben a bit more cavalier about things
than many other core areas.

So no, I'm not asking for black-and-white absolutes here. But I'm asking
for a "tightening of the belts". Please don't let it all hang out, ok?

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [GIT]: Networking
Date: Wed, 20 Aug 2008 16:10:52 UTC
Message-ID: <fa.+Eroa7iVYvD9ZfjUpjT2GoLl2BI@ifi.uio.no>

On Wed, 20 Aug 2008, Marcel Holtmann wrote:
>
> John was just pointing out (like myself before) that a lot of people are
> under the impression that documentation updates and new drivers should
> not be queued up and merged as soon as possible.

I think (and hey, I'm flexible, and we can discuss this) that the rules
should be:

 - by default, the answer should always be "don't push anything after the
   merge window unless it fixes a regression or a nasty bug".

   Here "nasty bug" is something that is a problem in practice, and not
   something theoretical that people haven't really reported.

 - but as a special case, we relax that for totally new drivers (and that
   includes things like just adding a new PCI or USB ID's to old drivers),
   because (a) it can't really regress and (b) support for a specific
   piece of hardware can often be critical.

With regard to that second case, I'd like to note that obviously even a
totally new driver _can_ regress, in the sense that it can cause build
errors, or problems that simply wouldn't have happened without that
driver. So the "cannot regress" obviously isn't strictly true, but I
think everybody understands what I really mean.

It should also be noted that the "new driver" exception should only be an
issue for things that _matter_.

For example, a machine without networking support (or without suppoort for
a some other really core driver that provides basic functionality) is
practically useless. But a machine without support for some particular
webcam or support for some special keys on a particular keyboard? That
really doesn't matter, and might as well wait for the next release.

So the "merge drivers early" is for drivers that reasonably _matter_ in
the sense that it allows people to test Linux AT ALL on the platform. It
shouldn't be "any possible random driver".

IOW, think about the drivers a bit like a distro would think about
backporting drivers to a stable kernel. Which ones are really needed?

Also, note that "new driver" really should be that. If it's an older
driver, and you need to touch _any_ old code to add a new PCI ID or
something, the whole argument about it not breaking falls away. Don't do
it. I think, for example, that the SCSI people seem to be a bit too eager
sometimes to update their drivers for new revisions of cards, and they do
it to old drivers.

And finally - the rules should be guidelines. It really isn't always
black-and-white, but most of the time the simple question of "could this
_possibly_ be just queued for the next release without hurting anything"
should be the basic one. If the answer is "yes", then wait.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH v3.1415] Documentation: add documentation summary for
Date: Tue, 23 Jun 2009 19:06:03 UTC
Message-ID: <fa.lhU86OFBzxaZ9HN49v8GFvNdd1g@ifi.uio.no>

On Tue, 23 Jun 2009, Krzysztof Halasa wrote:
>
> I would say, to address regressions and fix bugs. Not only those added
> recently, though obviously older bugs should have already been fixed.

The thing is, I don't take bug fixes late in the -rc just because they are
bug fixes.

And I really shouldn't.

If it's an old bug, and doesn't cause an oops or a security issue, it had
damn well better wait for the next merge window. There is absolutely _no_
reason to just blindly "fix bugs" at the end of the rc stage, because
quite frankly, the risks coming from fixing a bug is often bigger than the
advantage.

Even "obvious bugs" may be things that people depend on, or that other
parts of the kernel simply rely on indirectly. For a recent example of
this, see what happened when we fixed an obvious bug on x86-64 to check
user space addresses properly: it turns out that 'strnlen_user()' depended
on the bug ("misfeature"), and had to be fixed when the bug was fixed.

So no. Regressions really are _different_ from "fixing bugs".

Regressions need to be fixed even if it may even re-introduce another
long-time bug - simply because we're much better off with a _consistent_
set of bugs where people can depend on their machine either working or
not, than with some kind of unstable situation that never gets anywhere
(we found that out the hard way with both ACPI and power management).

So the end result is:

 - we always want to fix bugs

 - but the primary time to fix bugs is during the merge window

 - after the merge window closes, the effort should be on _regressions_,
   nothing else.

 - security issues and oopses etc catastrophic bugs obviously need to be
   handled at any stage.

IOW, "it fixes a bug" is _not_ sufficient. The real issue is "it _really_
can't wait to the next merge window", not "bug or not".

				Linus

Index Home About Blog