git rebase (Al Viro; Linus Torvalds; Theodore Tso)

Index Home About Blog

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Tue, 12 Feb 2008 20:09:34 UTC
Message-ID: <fa.v22Gl2Sq+O+kAPIbhi3/O6Q7Mg0@ifi.uio.no>

On Tue, 12 Feb 2008, J. Bruce Fields wrote:
>
> But the "author" is still preserved, right?  Why do you need the
> committer name to be preserved?  (I'm not denying that there could be
> reasons, I'm just curious what they are.)

It's not that the committer should be preserved, but:

 - the chain from author -> committer should be visible in the
   Signed-off-by: lines.

   If you rebase somebody elses tree, you screw that up. You need to add
   your sign-off, since now *you* are the new committer, and *you* took
   somebody elses work!

 - you should respect the down-stream developer, and if that downstream
   developer continues to work with his branch or works with other people,
   you shouldn't screw that up!

Both of those basically say that you should never rebase somebody elses
work. You can use rebase to rebase your *own* work on top of somebody
elses thing (since that doesn't change the sign-off chain, and you still
respect the downstream developers development model)!

But of course, if you rebase, you should respect the wishes of the
up-stream developer too. I don't do rebases. So if you asked me to pull,
the stuff I pulled can never be rebased, because it just *is* in my tree.

Put another way: think of the absolute *chaos* that would happen if I were
to rebase instead of just merging. Every time I pull from you I'd
invalidate your whole tree, and you'd have to re-generate. It gets
unmaintainable very quickly.

And that's actually ignoring a real issue: stability of commits. The nice
thing about stable commit naming is that all bug-reports from other people
that told where the bug happened are basically 100% trust-worthy and the
code is 100% reproducible not just for you, but for everybody else.

In other words, you really shouldn't rebase stuff that has been exposed
anywhere outside of your own private tree. But *within* your own private
tree, and within the commits that have never seen the light of day,
rebasing is fine.

(And yes, there are exceptions. If it's a clear "throw-away tree" all the
rules go out the window, of course, as long as everybody involved *knows*
it's a throw-away tree, and know that if they pull it they have to
synchronise 100% with you - so within a very tight-knit case or within a
very specific small detail that is actively being worked on, those rebases
with cleanups make tons of sense).

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Wed, 13 Feb 2008 01:33:08 UTC
Message-ID: <fa.Z/Ob0SLov7zP7PKFbkR8yBXnHog@ifi.uio.no>

On Tue, 12 Feb 2008, David Miller wrote:
>
> > Put another way: think of the absolute *chaos* that would happen if I were
> > to rebase instead of just merging. Every time I pull from you I'd
> > invalidate your whole tree, and you'd have to re-generate. It gets
> > unmaintainable very quickly.
>
> I actually wouldn't mind that, the first thing I do when sending a
> pull request is I stop putting things into my tree and as soon as the
> recipient pulls I wipe out my tree and clone a fresh copy of their's.

You *really* don't see the problem here?

> I really like that mode of operation.

*YOU* like it, because it never generates any issues for *you*. You're the
top in your heap, and the people above you don't do that insane thing, so
you get all of the advantages, with none of the downsides. Of *course* you
like it.

But as people have pointed out, it generates issues for the people under
you! If I did it, the people who now complain about networking would
not just be a couple, it would be everybody. Nobody could depend on
anything out there, because everything would have to rebase.

You just don't see the problems, because the only person above you isn't
crazy enough to do what you propose. You also don't do ten merges a day of
subsystems you don't know.

The importance of merging (rather, not screwing up history in general)
becomes really obvious when things go tits-up. Then they go tits-up
*without* screwing up the history of the trees that were hopefully tested
individually.

If you re-base things that others developed, you lose that. Imagine if I
merged first Greg's tree (by rebasing), and then there was some
fundamental thing that didn't cause a conflict, but just made something
not work, when I rebased yours on top. Think about what happens.

Now I've merged (say) 1500 networking-related commits by rebasing, but
because I rebased on top of Greg's tree that I had also rebased,
absolutely *none* of that has been tested in any shape of form. I'd not
use most of the things I pulled, so I'd never see it, I'd just push out
something that was very different from *both* trees I pulled, with no way
to really blame the merge - because it doesn't even exist.

So as a result, some *random* commit that was actually fine on its own has
now become a bug, just because it was re-written.

You don't see the problem here?

Yes, this is the *crap* you do all the time. You don't see the problems as
much, because you merge probably only about a tenth of the volume I merge,
and you can keep track of the subsystem more. But even though you don't
have nearly the same kinds of problems, people have complained about your
process.

So there's a real reason why we strive to *not* rewrite history. Rewriting
history silently turns tested code into totally untested code, with
absolutely no indication left to say that it now is untested.

You can limit the damage by keeping it to a single subsystem and by
serializing that subsystem by - for example - talking it over amongst
yourself what order you do things in, and yes, most of the time rewriting
doesn't hurt anything at all, but I guarantee you that it's a big mistake
to do, and the mistake gets bigger the more _independent_ people you have
involved.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Wed, 13 Feb 2008 01:59:40 UTC
Message-ID: <fa.mmT3DcI20pDu2TzHf84+doLhilQ@ifi.uio.no>

On Tue, 12 Feb 2008, David Miller wrote:
>
> Now how do I remove a bogus commit for a tree that I've already pushed
> out and published for other people, without any record of it appearing
> in the GIT tree any more?

So, the answer is: if others have actually pulled, it's simply not
possible.

There simply is no way to undo history that isn't local any more. You just
*have* to revert, or you need to find every single person that pulled
(directly or indirectly) and ask them to undo all the work they did on top
of your bad commit.

This is not git-related, btw. That's just how the world works. It's a bit
like the internet - if you said something stupid on #IRC and it made it to
bash.org, there's not a whole lot you can do to "undo" your stupidity. You
can only hope to do better in the future and assume people forget.

> How do I insert build fixes into existing changesets so that the tree
> is more bisectable?

Just delay pushing out. There really is _zero_ downside to this. None at
all. There are only upsides.

> If Jeff merged in a tree that introduced a ton of whitespace errors
> git is complaing about, is there a way I can fixup that changeset
> in-place? (or should I tell Jeff to start adhering to GIT's whitespace
> warning messages when he applies patches?)

Umm. Git doesn't complain about whitespace _except_ when applying patches.
So if you don't rebase his work, you'll never see the whitespace warnings
either!

Of course, we'd probably wish that Jeff cared about the whitespace
warnings and pushed back on them, but the fact is, that warning isn't
meant for you - because by the time you pull Jeff's tree, it's simply not
your issue any more. Jeff already applied it. Either he's your trusted
lietenant or he's not.

Quite frankly, to me it sounds like you're not ready to "let go" and trust
the people under you. Trust me, it's worth it. It's why your life is easy:
I have let go and I trust you.

Also, I'd *much* rather have a few problems in the tree than have people
screw up history in order to hide them. Sure, we want to keep things
bisectable, but quite frankly, if you do a reasonable job and compile the
kernel before you push out, it will be "mostly bisectable".

And yes, mistakes happen. Mistakes will *always* happen. It's ok. Relax.

Let me put it another way: You're _both_ going to be *much* better off
pushing back on Jeff, telling him that "I can't pull from you because your
tree is ugly and doesn't compile", than taking his tree and rebasing it.

Remember? I used to do that all the time. I berated the ACPI people for
creating monster trees that were horrible and contained fifteen merges and
two real commits. I didn't try to clean it up for them, I just told them
what the problem was, and you know what? The ACPI tree is one of the
cleanest ones out there now!

So in short:

 - clean trees and bisectability are all wonderful things. No doubt about
   that at all.

 - but if getting there means that you lose a lot of _other_ wonderful
   things (like being able to trust history, and the people being under
   your watchful eyes having to deal with you re-writing their trees),
   we'd be much better off taking the occasional commit that fixes things
   up _after_ the fact rather than before!

 - and you actually can help fix your issues by doing some simple things
   *before* pushing out, rather than push out immediately. IOW, do your
   whitespace sanity fixes, your compile checks etc early, and don't push
   out until after you've done them.

Hmm?

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Wed, 13 Feb 2008 02:20:14 UTC
Message-ID: <fa.c9IkgAsU/RxSF7B7AdO8v1Ku1xg@ifi.uio.no>

On Tue, 12 Feb 2008, Andrew Morton wrote:
>
> So it would not be efficient for David to do all this queue-cleaning
> *prior* to putting the tree into linux-next, because more stuff will pop up
> anyway.

Well, what others have done is to have special "temporary branches".

This is what git itself does, for example. The "pu" branch in git is used
for experimental stuff, and it's _declared_ to be rebased, redone, and
generally just unsafe at any moment.

So it is easy to have a special "testing" branch that is just declared to
be unsafe.  Make Linux-next pull that testing branch - it will pollute the
Linux-next tree (and anybody else who just wants to see what the current
state is), but since those are re-generatd from scratch every day
_anyway_, so who cares?

But don't make it something people pull by mistake (ie never call it
"master", and when mentioning it in some email message, always mention the
fact that it's not a stable branch, and never ask anybody to pull it
without making it very clear that it's just for testing, not for real
merging).

So git does have support for those things. They are very much "secondary"
branches (any tree they are pulled into will itself become "poisoned" and
unstable), but it's easy enough to have something like that for testing
purposes. And if it all tests out fine, you can just move it as-is into
the "real" branch if you want to.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Wed, 13 Feb 2008 05:45:22 UTC
Message-ID: <fa.aDGitWhzDbPN7o4q+UuSjj41eZ8@ifi.uio.no>

On Tue, 12 Feb 2008, J. Bruce Fields wrote:

> > So as a result, some *random* commit that was actually fine on its own has
> > now become a bug, just because it was re-written.
>
> If there was a "fundamental thing that didn't cause a conflict", then
> the two trees in question probably didn't touch the same code, so would
> probably merge cleanly, for the same reason that one rebased onto the
> other cleanly.  But depending on what the "fundamental thing" was, the
> merge might still introduce the same bug, right?

Absolutely. But if you do a true merge, the bug is clearly in the merge
(automatedly clean or not), and the blame is there too. IOW, you can blame
me for screwing up. Now, I will say "oh, me bad, I didn't realize how
subtle the interaction was", so it's not like I'll be all that contrite,
but at least it's obvious where the blame lies.

In contrast, when you rebase, the same problem happens, but now a totally
innocent commit is blamed just because it happened to no longer work in
the location it was not tested in. The person who wrote that commit, the
people who tested it and said it works, all that work is now basically
worthless: the testing was done with another version, the original patch
is bad, and the history and _reason_ for it being bad has been lost.

And there's literally nothing left to indicate the fact that the patch and
the testing _used_ to be perfectly valid.

That may not sound like such a big deal, but what does that make of code
review and tested-by, and the like? It just makes a mockery of trying to
do a good job testing any sub-trees, when you know that eventually it will
all quite possibly be pointless, and the fact that maybe the networking
tree was tested exhaustively is all totally moot, because in the end the
stuff that hit the main tree is something else altogether?

I don't know about you, but I'd personally be really disappointed if it
happened to me, and I felt that I did a really good job as a
submaintainer. I'd also feel that the source control management sucked.

Contrast that to the case where somebody simply does a merge error. The
original work doesn't lose it's validity - so the original maintainer
hasn't lost anything. And quite frankly, even the person who "screwed up"
with the merge hasn't really done anything bad: these things _do_ happen.

So bugs happen; not big deal. But the fact that the bugs are correctly
attributed - or rather, not mis-attributed to somebody blameless - that
_is_ a big deal.

It's not like I will guarantee that all my manual merges are always 100%
correct, much less try to guarantee that no subtle merge issue can make
things not work even if it all merged totally cleanly. That isn't my
point. And others will make merge mistakes too. But the people they merged
from will not be blamed.

So just the fact that the right commit gets blamed when somebody does a
"git bisect" is I think a big issue. It's just fundamentally more fair to
everybody. And it means that the people who push their work to me can
really choose to stand behind it, knowing that whatever happens, their
work won't get diluted by bad luck or others' incompetence.

And no, maybe most people don't feel things like that matters. But I do
think it's important.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Tue, 12 Feb 2008 17:10:53 UTC
Message-ID: <fa.wB4gyi77c52JUMRBUzDezTF9cvs@ifi.uio.no>

On Tue, 12 Feb 2008, Jeff Garzik wrote:

> David Miller wrote:
> > This is why, with the networking, we've just tossed all of the network
> > driver stuff in there too.  I can rebase freely, remove changesets,
> > rework them, etc. and this causes a very low amount of pain for Jeff
> > Garzik and John Linville.
>
> Rebasing is always a pain, and John and I both agreed the other day that you
> do it too often.

I do think that some people rebase too often (and that David is in that
number), often with no real discernible reason. I think rebasing is great
when you are doing active development (ie you're really acting in "quilt
mode") and I actually think git could and should integrate more of the
queues modes, but I don't think it should be a default action for an
up-stream developer.

I also don't think rebasing helps the particular problem under discussion
(ie conflicts due to having to sort out dependencies between different
trees), and in some ways hurts it.

One thing that I personally react to is that

 - I think sysfs and the device model layer has had too much churn, and
   I'm unhappy that people seem to expect that to continue.

   [ NOTE!! I'm picking on the device model/sysfs stuff here not because
     it's the only one, but because it's the obvious and good example. I
     do think we have other cases of the same thing. ]

   Really. I do agree that we need to fix up bad designs, but I disagree
   violently with the notion that this should be seen as some ongoing
   thing. The API churn should absolutely *not* be seen as a constant
   pain, and if it is (and it clearly is) then I think the people involved
   should start off not by asking "how can we synchronize", but looking a
   bit deeper and saying "what are we doing wrong?"

   It may well be that part of the problem is that the people causing the
   churn don't realize the downsides of the pain they are causing, because
   THEY aren't the generally ones that see it!

   For example, it's easy for Greg to change his driver core, and he can
   obviously synchronize with himself in the other trees (because his left
   hand is hopefully somewhat aware of what his right hand is doing), so I
   suspect Greg simply doesn't see the pain that much. So Greg thinks that
   the solution is to just have me merge his changes early, and the pain
   is all gone as far as he is concerned.

 - That said, I'm also a bit unhappy about the fact you think all merging
   has to go through my tree and has to be visible during the two-week
   merge period. Quite frankly, I think that you guys could - and should -
   just try to sort API changes out more actively against each other, and
   if you can't, then that's a problem too.

   In other words, please do use the distributed nature of git to your
   advantage, when there are things you guys know you need to sort out.

So there are two separate and totally independent issues here.

One is that I suspect some people are a bit too willing to do cleanup for
its own sake, and do not realize that backwards compatibility does
actually help too, and that "better solutions" are sometimes worse than
"keep things stable". We should always *allow* major breakage when
necessary, but I think the threshold for them should be higher than I
think it currently is.

The other is that once somebody says "ok, I *really* need to cause this
breakage, because there's a major bug or we need it for fundamental reason
XYZ", then that person should

 (a) create a base tree with _just_ that fundamental infrastructure change,
     and make sure that base branch is so obviously good that there is no
     question about merging it.

 (b) tell other people about the reason for the infrastructure change, and
     simply allow others to merge it. You don't have to wait for *me* to
     open the merge window, you need to make sure that the people that get
     impacted most can continue development!

This is where "rebases really are bad" comes in. When the above sequence
happens, the fundamental infrastructure change obviously does need to be
solid and not shifting under from other people who end up merging it. I do
not want to see five different copies of the fundamental change either
because the original source fixed it up and rebased it, or because the
people who merged it rebased _their_ trees and rebased the fundamental
change in the process.

Can that (b) be my tree? Sure. That's been the common case, and I'll
happily continue it, of course, so I'm not arguing for that to go away.
Merging is my job, I'll do it. But when the merge window is a problem, my
merge window should *not* hold up people from using the distributed nature
of git for their advantage.

But yes, obviously when doing cross-merges, you'd better be really
*really* sure that the base is solid and will get merged. But let's face
it, all the really core maintainers should damn well know that by now:
you've all worked with me for years, so you should be able to trivially be
able to tell whether you *might* need to worry about something, and when
it's a slam dunk.

And it's the "it's a slam dunk" cases that I think are (a) the common ones
and (b) the ones where you can just do cross-merges to satisfy each others
needs.

Hmm? Does that sound palatable to people?

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Tue, 12 Feb 2008 18:50:15 UTC
Message-ID: <fa.tPCENopPlFqcYCCZZdpMRn5fp5k@ifi.uio.no>

On Tue, 12 Feb 2008, James Bottomley wrote:
>
> Hm ... I think net is a counter example to this.  Rebases certainly work
> for them.

They consider themselves to be "one tree" and are thus largely a totally
different issue than the one discussed here.

Also, I actually flamed David a lot last round over his rebasing of
netfilter etc. It was no acceptably done. That network tree was all
screwed up, with the git committer information not matching the signed-off
path etc.

If you do cross-tree rebasing, you need to consider it 100% equivalent to
just passing patches around in emails. Because it really is.

> Yes ... I don't do that ... Like I said, I only rebase for an actual
> conflict.

And this is how things should work.

> Well, it came at me because Jens was rebasing the block tree as he
> worked through issues in the two branches I was based on.

Yes, and I am in no way saying that the core driver model has been the
only problem spot.

And also, I do not like "hard rules". Every rule always has an exception,
and sometimes a rebase-based strategy can be the right thing even across
trees.

But you're all ignoring my fundamental objection: you're talking as if
cross-tree fundamental API changes should be the "norm", and that we
should try to solve the workflow issues that stem from that. And I'm
saying that I think we should try to FIX the issue, and make sure that
it simply *isn't* the norm.

In other words, I'm perfectly happy to be an a*hole and tell people that I
simply won't merge things that cause undue API churn at all, and that were
not thought out sufficiently.

We've had too many issues like that (SG chaining, iommu, driver core, not
to mention the upheavals in x86) lately, but realistically, which
subsystem remains a problem for the future? And maybe the correct thing to
do is to just say "enough!".

I'm perfectly happy being hardnosed and saying "nope, that's crap, it
doesn't matter if the code is slightly better if you cause those kinds of
issues".

The thing is, sometimes the answer really *is* "Don't do that then!". If
our model becomes bogged up by cross-subsystem serialization issues, that
means that we (a) spend too much time handling the fall-out from that and
(b) too little time just refining the details.

And quite frankly, while "good core architecture" matters a lot, in the
end, "getting all the boring small things right" matters even more! Big
architectural churn that cleans up and fixes the "big issues" is totally
anti-productive if it means that we don't look at the boring small detail
stuff.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Tue, 12 Feb 2008 19:00:06 UTC
Message-ID: <fa.O5Ko/o53kZW8No0KtLGVLnQnl70@ifi.uio.no>

On Tue, 12 Feb 2008, Linus Torvalds wrote:
>
> In other words, I'm perfectly happy to be an a*hole and tell people that I
> simply won't merge things that cause undue API churn at all, and that were
> not thought out sufficiently.

.. btw: I'd need to know this in advance. I usually don't see the problem
until it's too late.

And this is very much an area where "Linux-next" can help: if some
subsystem causes problems in Linux-next for other maintainers, I really
think it shouldn't just be a matter of "drop the git tree that didn't
merge cleanly", but it should literally be a question of "maybe we should
drop the _earlier_ git tree that caused the later one not to merge
cleanly".

In other words, maybe things like core block layer changes or device model
changes should be *last* in the merge-list (or if first, also be first to
be dropped if they cause merge errors downstream!).

That way, infrastructure changes that screw up others can only happen if
the maintainer actively works with the others to make sure it works even
before it would ever merge into Linux-next successfully.

That may sound odd, but it actually matches what I personally believe in:
we have more driver code and other "outlying" things than we have core
things, and most of our problems come from that - so we should prioritize
*those* things, not the "fundmantal core changes".

So how about making that the default situation: drivers and other outliers
merge first. If fundamental API changes happen, they merge last, and if
their maintainers can't make it in time in the merge window, they just get
dropped.

That sure as hell would put the pain on API changes solidly where it
belongs.

			Linus

From: Al Viro <viro@ZenIV.linux.org.uk>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Tue, 12 Feb 2008 19:42:18 UTC
Message-ID: <fa.JMzhKPk03DyNKlAqGka9bVU4eJc@ifi.uio.no>

On Tue, Feb 12, 2008 at 10:59:00AM -0800, Linus Torvalds wrote:
>
>
> On Tue, 12 Feb 2008, Linus Torvalds wrote:
> >
> > In other words, I'm perfectly happy to be an a*hole and tell people that I
> > simply won't merge things that cause undue API churn at all, and that were
> > not thought out sufficiently.
>
> .. btw: I'd need to know this in advance. I usually don't see the problem
> until it's too late.

We could simply decide that API changes affecting more than one subsystem
Must Be Serialized(tm).  Explicitly.  As in "any such change is posted
and discussed in advance, order of merges decided upon and we have merge
window for one decided set of API changes + fallout *ONLY*".  With merge
window in question normally taking a few days.

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Wed, 13 Feb 2008 00:54:43 UTC
Message-ID: <fa.IC9dUVprMCmwGxD9ARtqf79gXWg@ifi.uio.no>

On Tue, 12 Feb 2008, David Miller wrote:

> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Tue, 12 Feb 2008 10:59:00 -0800 (PST)
>
> > That sure as hell would put the pain on API changes solidly where it
> > belongs.
>
> If a person does a driver API change and does all the work to sweep
> the entire tree updating all the drivers, doesn't it penalize that
> person a bit much to stick a new driver in front of that work?

If that API change doesn't conflict with the work that hundreds of other
people do, it's obviously not a problem whichever way it goes.

And if the API change *does* cause conflicts, then yes, the onus of fixing
those conflicts (again) goes to the person who changed the API. Everybody
else did everything right.

> People write code on top of infrastructure, both new and old, not the
> other way around.  At least to me, that seems how the merging ought to
> work too.

You think that infrastructure is more important than outlying code. But
you do that only because you write the infrastructure, not because you
have any logical reason to think so.

The fact is, that "outlying code" is where we have all the bulk of the
code, and it's also where we have all those developers who aren't on the
"inside track". So we should help the outliers, not the core code.

And very fundamentally, API changes are to be discouraged. If we make them
harder to do and make people think twice (and occasionally say "not worth
it"), that sounds like a damn good thing to me.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Tue, 12 Feb 2008 18:27:57 UTC
Message-ID: <fa.O90JgBQCp/TUnVaFgERSAuRdpos@ifi.uio.no>

On Tue, 12 Feb 2008, Greg KH wrote:
>
> I may be a bit defensive here, but I hope that all of the recent
> kobject/kset/driver core changes have been done with the thought of
> "what are we doing wrong".

.. but are we expecting it to be finished?

That's the point.

This whole "Linux-next" discussion so far has almost been predicated on
the whole assumption that this is an on-going concern. And it really
should NOT be.

If it's an on-going concern, we need to tackle *that* issue, not the issue
that cross-subsystem merges are hard. They simply seem to happen too much.

In other words, I'm not AT ALL interested in the merges we've already
done. That's over and done with, and we'll never ever do those merges
again. Who cares? I don't.

I'm purely and _only_ interested in the merges of the future. You don't
need to be defensive about the things that led up to this discussion, I'm
more hoping that we can aim at fixing the problem at the source, rather
than trying to work around it.

We simply shouldn't have all that many conflicts. We've had *way* too many
of them lately, and I think it's because people have felt it wasn't too
painful.

Put another way: back when we worked with just patches, we avoided renames
like hell, and we also tried to simply even re-architect the whole tree so
that you didn't have so many patch conflicts. One main reason as far as I
was concerned for things like per-directory Kconfig files and the whole
initcall() stuff was the fact that the old single Kconfig file and the old
crazy init/main.c file were total *nightmares* when it came to conflict
resolution.

So we split things up more, and we didn't do renames (or were very careful
about it). We avoided the things that caused pain.

I think we need to remember that: yes, we'll always have to have ways to
fix the pain that does happen, but even more importantly, we should strive
for models where it doesn't happen in the first place!

And simply avoiding cross-subsystem API changes unless there is a major
*MAJOR* reason for them is the obvious thing to do. Simply face the fact
that even in open source there are major reasons to stay with an old
interface even if it's not optimal.

We absolutely MUST NOT have the mindset that "cross-subsystem conflicts
happen all the time".

That was my point.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Tue, 12 Feb 2008 21:37:45 UTC
Message-ID: <fa.ggMNyLiHQvj9t+pqn0dUeUOq3cw@ifi.uio.no>

On Tue, 12 Feb 2008, Greg KH wrote:
>
> Yes, I agree, there are lots of examples of this, but the overall
> majority are reviewed by 2 people at least (or sure as hell should be,
> maybe we need to bring into existance the "reviewed-by" marking to
> ensure this.)

Well, I don't really "review" any patches that come through Andrew. What I
do is:

 - global search-and-replace Andrew's "acked-by:" with one that is both
   him and me (that way I make sure that I _only_ sign off on patches that
   he has signed off on!)

 - look through all the commit *messages* (but not patches). This
   sometimes involves also editing up grammar etc - some of those messages
   just make me wince - but it also tends to include things like adding
   commit one-liner information if only a git commit ID is mentioned etc.

 - and only for areas that I feel competent in, I look at the patches too.

So, to take an example, when Andrew passes on uml patches that only touch
arch/um and include/asm-um, my sign-off does not mean *any* kind of review
at all. It's purely a sign that it's passed the sign-off requirements
properly.

When it comes to VM issues or other things, things are different, and I
actually review the patch (and occasionally send it back with "nope, I'm
not applying this"). But for stuff that comes through Andrew, that's
probably less than a quarter of the patches. And I don't mark the ones
I've reviewed specially in any way.

And I suspect I'm not at all alone in this. People simply have maintainers
they trust (and _need_ to trust in order to not become a bottleneck).

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Tue, 12 Feb 2008 19:56:44 UTC
Message-ID: <fa.tF0tt1NIFfPFS+8Nr1rluZvMYeE@ifi.uio.no>

On Tue, 12 Feb 2008, Greg KH wrote:
>
> > That's the point.
>
> Not it isn't.  To quote you a number of years ago:
> 	"Linux is evolution, not intelligent design"

Umm. Have you read a lot of books on evolution?

It doesn't sound like you have.

The fact is, evolution often does odd (and "suboptimal") things exactly
because it does incremental changes that DO NOT BREAK at any point.

The examples are legion. The mammalian eye has the retina "backwards",
with the blind spot appearing because the fundmanetal infrastructure (the
optical nerves) actually being in *front* of the light sensor and needing
a hole in the retina to get the information (and blood flow) to go to the
brain!

In other words, exactly *because* evolution requires "bisectability" (any
non-viable point in between is a dead end by definition) and does things
incrementally, it doesn't do big flips. It fixes the problems on an
incremental scale both when it comes to the details and when it comes to
both "details" (actual protein-coding genes that code directly for some
expression) and "infrastructure" (homeobox and non-coding genes).

So quite frankly, you're the "intelligent designer" here. You're the one
who seems to claim that we need those leaps of faith and wild jumps.

> Oh, it's been painful at times, but they are, overall, very rare.

No, overall, they have *not* been rare lately. We've had them all over.
And not just the one introduced by you.

> If you look at the rate of change we are currently running at, it's
> amazing that we do not get _more_ of these kinds of problems.

I don't think that's a valid argument.

Sure, we have lots of changes, but 99.9% of them have no cross-subsystem
effect what-so-ever.

> > And simply avoiding cross-subsystem API changes unless there is a major
> > *MAJOR* reason for them is the obvious thing to do. Simply face the fact
> > that even in open source there are major reasons to stay with an old
> > interface even if it's not optimal.
>
> I strongly disagree here.  We lived with that kset/ktype crap for years,
> and I finally broke down and cleaned it up, simplifying things, removing
> code, making the kernel smaller, leaner, and easier for others to change
> and use in the future.  With your statement, such a change should have
> never taken place as it what we had at the time was "not optimal", but
> good enough to live with.

You didn't listen at all.

I said that the threshold should be high, not that it should be
impossible. I also said that we should strive for making it unnecessary to
have the painful total synchronization points.

The fact is, we *have* been able to do things like this gradually and
well, without introducing breakage. Take the VM changes, for example:
those were pretty damn fundamental, where we've changed the calling
convention totally for fault handling.

But that thing was done without at any point really seriously breaking
code. It involved adding the new interface, and letting the old one live
in parallel.

The last remnant of the old "nopage()" interface still exists, but I think
right now it's only used by DRM.

Did it require the drivers to be updated? Yes. But it did NOT require the
total synchronization, because it still worked with the old interface.

> But they do happen about once or twice a kernel release, just by virtue
> of the way things need to happen.

And I violently disagree.

It should not be "once of twice a kernel release".

It should be "once or twice a year" that you hit a flag-day issue. The
rest of the time you should be able to do it without breakage. It's
doable. You just HAVEN'T EVEN TRIED, and seem to be actively against even
doing so.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Tue, 12 Feb 2008 20:33:36 UTC
Message-ID: <fa.iLIKP6VrU4asedw/gKFaznDACmY@ifi.uio.no>

On Tue, 12 Feb 2008, Russell King wrote:
>
> 3. rebase the branch on top of the conflicting change, throw out the
>    patches which prove to be a problem and ask the original author of
>    those patches to fix them up for the conflicting change.  The result
>    is a completely bisectable tree.
>
> (3) is the solution which I chose, and it worked _extremely_ well.

Don't get me wrong at all. Rebasing is fine for stuff you have committed
yourself (which I assume was the case here).

Rebasing is also a fine conflict resolution strategy when you try to
basically turn a "big and complex one-time merge conflict" into "multiple
much smaller ones by doing them one commit at a time".

But what rebasing is _not_ is a fine "default strategy", especially if
other people are depending on you.

> (3) is effectively what akpm does with his tree - when a patch conflicts
> with other changes, he throws the changes out and bangs peoples heads
> together to get a new set of patches generated which work together.

Right. And it's one of the fundamental differences between git and a patch
queue.

Patch queues are very flexible, but they simply don't scale. They don't
scale in history (ie you cannot sanely keep track of multiple queues as
they grow in the long run - you need a way to regularly "freeze" things
into a release tar-ball or something like that). But they also don't scale
with number of users - even just read-only ones.

The latter example is something we see in -mm right now. When people
report problems against -mm, it's usually an all-or-nothing thing
(so-and-so mm release doesn't work), but even when people sometimes bisect
to a specific point in mm, it's not "stable" in the sense that that point
may not make any sense in a subsequent -mm queue version.

And all of these issues are not about "-mm" per se, but are about patch
queues in general. And "rebase" turns a git repository effectively to a
patch queue too, with all the same downsides.

And I don't think patch queues are evil: I use git rebase all the time
myself (not for the kernel, but for git, where I'm not the top-level
thing). But I'd not rebase commits from other peoples trees: I rebase
commits that I applied as patches (ie the *author* may be somebody else,
but the committer is me!) or that I committed myself, and that I haven't
pushed out.

Note that difference between "committer" and "author". There's nothing
wrong with rebasing commits that are _authored_ by other people. But there
absolutely is something wrong rebasing commits that are _commmitted_ by
somebody else, unless you two are best buddies and know and trust each
other and are ok with each other messing up the commits (and use some
external method of serialization, so that you don't step on each others
toes).

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Wed, 13 Feb 2008 00:50:59 UTC
Message-ID: <fa.JnpcI8eCDHCJ3EdwW0QnDmaYg9M@ifi.uio.no>

On Tue, 12 Feb 2008, Greg KH wrote:
>
> Perhaps you need to switch to using quilt.  This is the main reason why
> I use it.

Btw, on that note: if some quilt user can send an "annotated history file"
of their quilt usage, it's something that git really can do, and I'll see
if I can merge (or rather, coax Junio to merge) the relevant part of stgit
to make it possible to just basically get "quilt behaviour" for the parts
of a git tree that you haven't pushed out yet.

A pure patch-stack will be faster at that thing than git would be (it's
simply easier to just track patches), but on the other hand, using git
would get some other advantages outside of the integration issue (eg the
cherry-pick thing really is a proper three-way merge, not just an "apply
patch", so it can do better).

It wasn't the original goal of git, but not only are really doing all the
series management anyway (that is largely what "rebase" is all about,
after all), but the git goals have obviously expanded over time too.

			Linus

From: Theodore Tso <tytso@MIT.EDU>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Wed, 13 Feb 2008 02:18:13 UTC
Message-ID: <fa.Du4ACaRbg9JHkAf8E3ul8VyS1FY@ifi.uio.no>

On Tue, Feb 12, 2008 at 04:49:46PM -0800, Linus Torvalds wrote:
> On Tue, 12 Feb 2008, Greg KH wrote:
> >
> > Perhaps you need to switch to using quilt.  This is the main reason why
> > I use it.
>
> Btw, on that note: if some quilt user can send an "annotated history file"
> of their quilt usage, it's something that git really can do, and I'll see
> if I can merge (or rather, coax Junio to merge) the relevant part of stgit
> to make it possible to just basically get "quilt behaviour" for the parts
> of a git tree that you haven't pushed out yet.

So this is what I do for ext4 development.  We maintain a quilt series
in git, which is located here at: http://repo.or.cz/w/ext4-patch-queue.git

A number of ext4 developers have write access to commit into that
tree, and we coordinate amongst ourselves and on
linux-ext4@vger.kernel.org.  I tend to suck it into git using the
"guilt" package, and do periodic merge testing with a number of git
queues to detect potential merge conflicts.  Not as many as James
does, but I may start doing more of that once I steal his scripts.  :-)

The patch queue also gets automatic testing on a number different
platforms; for that reason the series files comments which version of
the kernel it was last based off of, so the ABAT system can know what
version of the kernel to use as the base of the quilt series.

I do a fair amount of QA, including copy editing and in some cases
rewriting the patch descriptions (which are often pretty vile, due to
a number of the ext4 developers not being native English speakers; not
their fault, but more than once I've had no idea what the patch
description is trying to say until I read through the patch very
closely, which is also good for me to do from a code QA point of view  :-).

Periodically, the patch queue gets pushed into the ext4.git tree and
as a patch series on ftp.kernel.org.

I've never been very happy with stgit because of past experiences
which has scarred me when it got get confused and lost my entire patch
series (this was before git reflogs, so recovery was.... interesting).
There's always been something deeply comforting about having the ASCII
patch series since it's easy to back it up and know you're in no
danger of losing everything in case of a bug.  Also, having the patch
series stored in ASCII as a quilt stack means that we can store the
quilt stack itself in git, and with repo.or.cz it allows us to have
multiple write access to the shared quilt stack, while still giving us
the off-line access advantages of git.  (Yes, I've spent plane rides
rewriting patch descriptions.  :-)

The other advantage of storing the patch stack as a an ASCII quilt
series is we have a history of changes of the patches, which we don't
necessarily have if you just use stgit to rewrite the patch.  So we
have the best of both worlds; what gets checked into Linus's tree is a
clean patch series, but we keep some history of different versions of
a patch over time in the ext4-patch-queue git repository.  (I wish we
had better changelog comments there too, but I'll take what I can
get.)

						- Ted

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Wed, 13 Feb 2008 00:45:49 UTC
Message-ID: <fa.glojDm0g6N8ZuMLqSeaPpM0ue1Y@ifi.uio.no>

On Tue, 12 Feb 2008, David Miller wrote:
>
> At 1500 changesets, a merge conflict shows up about once
> every day or two as 2.6.N nears it's release into final
> as bug fixes trickle in.
>
> I find using GIT to fixup merge errors on a tree of that
> scale to be really painful.  And it only fixes up the final
> result in a merge changeset.

Heh. I've had the reverse situation: "git rebase" often results in *more*
conflicts than "git merge" (ie "pull").

But one issue is also that when conflicts happen, different people are
used to different things. I'm particularly used to merge-type conflicts,
and in fact there's some fairly advanced support in git for helping
resolve them that people who *don't* do merge-level conflict resolution
may not even be aware of.

In particular, if you want to try it, do something that conflicts and then
do

	gitk --merge

to see what the conflict is all about. That is just fancy shorthand for

	gitk HEAD...MERGE_HEAD -- <list of conflicting files>

so what it does is to show only the relevant history (the three dots means
that it's a _symmetric_ set difference from HEAD to MERGE_HEAD) for the
merge, and only for the particular files that had conflicts!

This often means that even when you merge a thousand commits (or the thing
you merge *into* has thousands of commits since the merge base, which is
the common case for me), you actually only see a couple of commits - only
the ones that actually modified the conflicting files!

(If you have many files that conflict, you can further narrow it down to
just one at a time by explicitly listing the file/directory you want to
work on, ie do "gitk --merge <pathname-here>").

> Let me give you a good example, just yesterday I had to rebase
> my net-2.6 tree a few times.  It's because I put a patch in there
> that the person said was OK but it broke the build.  There is
> zero reason for me to push that tree to Linus with the bad
> commit and the revert, it's just noise and makes the tree harder
> for people to review.

Actually, better than rebase in that situation is to just remove the bad
commit. Yes, you'd use "rebase" for it, but you'd use it not to move the
whole series to a newer place, you'd use it just to rebase the commits
*after* the commit you remove.

This is something where I actually think git could and should do better:
git has the capability to act as more of a "quilt replacement", but
because it wasn't part of the original design, we never actually exposed
the simple queue management commands to do this (stgit does things like
that, though).

So if you haven't pushed out, right now you'd have to do this stupid
thing:

	[ start (and activate) a 'fixup' branch at X ]
	git checkout -b fixup X

	[ edit edit edit to fix it up ]
	..

	[ commit the fixed state ]
	git commit --amend

	[ go back to the old broken state ]
	git checkout master

	[ now, rebase 'master' on top of the fix ]
	git rebase fixup

	[ ok, done, forget the fixup branch ]
	git branch -d fixup

and I don't discourage this kind of behaviour at all, but it is only good
iff:

 - you have not pushed things out (obviously), so nobody ever even notices
   that you've fixed up stuff

 - you haven't pulled anything from outside (so you aren't trying to
   rebase other peoples commits).

   If you *really* want to try to do this even across merges you've done,
   there is fancy a "--preserve-merges" thing that you can try to use
   (needs "-i" to work, but "-i" is often cool for other reasons too!)

Basically, what I'm trying to say is that "git rebase" can be used in
fancy ways to do things that people outside your repository will never
even *know* were done. It's only when outsiders can see the effects of git
rebase that you're in trouble!

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Wed, 13 Feb 2008 01:42:16 UTC
Message-ID: <fa.C5IXM8SF+98RrpTneVRavo/uFYQ@ifi.uio.no>

On Tue, 12 Feb 2008, David Miller wrote:
>
> But as soon as I've applied any patches to my tree I've "pushed out".
> So this scheme doesn't work for me.  The first thing I do when I have
> changes to apply is clone a tree locally and on master.kernel.org,
> then I apply that first patch locally and push it out to master.

I actually suggest you literally delay your push-out.

I don't generally delay things by a lot, but I tend to try to at least do
a compile in between pushing out - and even if I've pulled something in
between the thing that broke, I'll just "git reset --hard" to a working
state if something broke, and just re-pull instead of even trying to
rebase or anything like that.

(IOW, I often find it much easier to just start over and re-do than
actually doing a rebase).

I don't do it all the time, by any means, but there's really no huge
reason to push out all the time. And that's doubly true for subsystem
maintainers. Quite often, the right thing to do is to only push out when
you are ready to do the "please pull" message.

> What would be really cool is if you could do the rebase thing, push
> that to a remote tree you were already pushing into and others could
> pull from that and all the right things happen.

It would also be really cool if Claudia Schiffer had decided that hiding
under my desk is a good idea.

IOW, you really haven't though that through. That is how TLA and darcs
worked, and it's a total disaster.

Trust me, you don't know how good you have it.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Wed, 13 Feb 2008 02:36:09 UTC
Message-ID: <fa.xfKJNMzGfeSGbxC5KNaMjgCqyqs@ifi.uio.no>

On Tue, 12 Feb 2008, James Bottomley wrote:
>
> Yes, this is exactly the feature I'm looking for.  It would allow the
> downstream users of a rebased tree to rebase themselves correctly.
>
> All the information about the rebase is in the reflog ... it can't be
> too difficult to pass it through on a pull and allow the downstream tree
> to do the right thing.

Guys, you simply have no idea what you're talking about.

Those downstream trees can have done things themselves. They *depended* on
the state you gave them.

You can't just say "oops, I lied, this is the state you should have used,
now it's _your_ mess to sort out".

OF COURSE it's what you'd like to use - it absolves you of any and all
actual responsibility. But dammit, that's not what anybody else wants than
the irresponsible person who cannot be bothered to stand up for his work!

If you're not confident enough about your work, don't push it out! It's
that simple. Pushing out to a public branch is a small "release".

Have the f*cking back-bone to be able to stand behind what you did!

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Wed, 13 Feb 2008 03:33:04 UTC
Message-ID: <fa.4zru8Jba+djr4fRMQAxhaCq4tZk@ifi.uio.no>

On Tue, 12 Feb 2008, James Bottomley wrote:
>
> Right at the moment, I maintain a <branch> and a <branch>-base and
> simply cherry pick the commits between the two to do the right thing
> when I know my volatile base has changed.  It would be very helpful to
> have a version of rebase that new my base had been rebased.

Hey, I know, you could use.. drumroll..

	"git rebase"

I know that's a big leap of faith, to use git rebase for rebasing, but
there you have it. Us git people are kind of odd that way.

IOW, if you know the old broken base, and the new base, just do

	git rebase --onto newbase oldbase

and it should do exactly that (basically lots of automated cherry-picks).

[ But the fact is, if you did anything fancy (like pulled in other peoples
  work), you cannot sanely rebase _those_ peoples work. They didn't screw
  up to begin with! You can play with "git rebase -i --preserve-merges",
  of course, but I really think you're doing something wrong if you start
  pulling other peoples work into an unstable thing, so while it may work,
  I'd strongly suggest against even trying, because the problem is your
  workflow ]

So let's say that you have a remote branch that you track that goes
rebasing (let's call it "origin/pu" to match the real-life git behaviour),
then you should literally be able to do

	old=$(git rev-parse origin/pu)	&&
	git fetch			&&
	new=$(git rev-parse origin/pu)	&&
	git rebase --onto $new $old

and no, I didn't actually test it, but hey, it really should be that
simple.

[ And no, you don't really need to do those "old=" and "new=" things, they
  are there to make it explicit - you could easily just have done

	git fetch
	.. oh, noticed that origin/pu changed ..
	git rebase --onto origin/pu origin/pu@{1}

  where we just let git take care of the old/new itself using the reflog,
  so that "origin/pu@{1}" assumes that you just know that the only thing
  that has changed origin/pu was that previous "git fetch", and that
  really *did* change it. ]

In other words, git does give you exactly what you want, but nothing
really changes the fact that you should only rebase like this only if:

 - you haven't already exported the result (or only exported it as those
   unstables branches that people know to avoid)

 - your changes on top are just your own linear series of commits (where
   "applying a patch from somebody else" is still _your_ commit, of
   course, just with authorship attributed to somebody else)

so that part really is very fundamental.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Wed, 13 Feb 2008 03:49:37 UTC
Message-ID: <fa.aaIfpaB8voNbi03ZhZ/TP1iFUZE@ifi.uio.no>

On Tue, 12 Feb 2008, Linus Torvalds wrote:
>
> 	git rebase --onto $new $old

..and in case it wasn't clear - this is just a general way of saying "move
the commits on this branch since $old to be based on top of $new" instead.

You can pick out those old/new commit ID's using gitk or whatever if you
wish. Neither the $new or the $old needs to even be an existing branch -
just pick them with gitk.

So if you literally want to just move the top 5 commits (assuming those
top five commits are just a nice linear thing you did) from the current
branch to be on top on another branch instead, you can literally do this:

	# save this state, maybe we want to keep it around. Call it "old"
	git branch old-branch

	# rebase the top five commits onto $target
	git rebase --onto $target HEAD~5

ta-daa - all done. The branch you are on will now have been rewritten to
be the top five commits moved to be on top of the $target you chose, and
if you want to get back the old state, it's nicely squirrelled away in
"old-branch".

(That obviously assumes no merge conflicts - you'll have to resolve those
yourself ;)

Of course, if you didn't even want to save the old branch, just skip the
first step. If you have reflogs enabled (and git does that by default in
any half-way recent version), you can always find it again, even without
having to do "git fsck --lost-found", at least as long as you don't delete
that branch, and it hasn't gotten pruned away (kept around for the next 90
days by default, iirc)

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream :-))
Date: Wed, 13 Feb 2008 18:11:39 UTC
Message-ID: <fa.KAZxUw7n3stMebUsXAodbHAr08Y@ifi.uio.no>

On Wed, 13 Feb 2008, Roel Kluin wrote:
>
> In nature there is a lot of duplication: several copies of genes can exist
> and different copies may have a distinct evolution.

This is true of very complex animals, but much less so when looking at
things like bacteria (and arguably, any current sw project is closer to
bacteria in complexity than anything mammalian).

In bacteria (and viruses), duplication of DNA/RNA is a big cost of living
in general, and as a result there is *much* less junk DNA. So in an
evolutionary sense, it's much closer to what the kernel should have (with
occasional duplication of code and interfaces to allow new functionality,
but rather aggressive pruning of the excess baggage).

In other words, all of these choices are a matter of "balance". In some
areas, excess code is not a sufficient downside, and we keep even broken
source code around with no actual function, "just because" (or rather,
because the cost of carrying it around is so small that nobody cares).

That's true in the kernel as in biology: check out not just deprecated
code, but the drivers and other odds-and-ends that are explicitly marked
as non-coding DNA (we just happen to call them BROKEN in our Kconfig).

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Announce: Linux-next (Or Andrew's dream  :-))
Date: Thu, 14 Feb 2008 18:03:38 UTC
Message-ID: <fa.ISvXCDhUMDnzFEciw6xkeYFg/qA@ifi.uio.no>

On Thu, 14 Feb 2008, Stephen Rothwell wrote:
>
> Originally, I assumed the stable branch would be for our "usual" API
> changes, but it appears we are not having any more of those. :-)

It's not that we should _never_ have them, it's that they shouldn't be
"business as usual".

I'm happy with them being a "a couple of times a year". I'm not happy with
them being "once or twice for every release cycle". That's the big deal
for me.

If we have a big flag-day that affects a lot of drivers (or architectures)
once or twice a year, I think everybody involved will be happy to stand up
and say "ok, that fixes problem X, and the new thing really is better, so
let's do it, it's worth it".

But if it's something that happens essentially every single release, that
is something else altogether. Then it's not a "ok, let's bite the bullet
and make the kernel better" thing any more, but instead it devolves into
"f*ck, the merge window is open again, now I have to fix up all the crap
people pushed on me".

See? That's a *huge* difference, even if it is "only" a mental one (and
clearly it isn't - there's the actual real work of the "I have to fix
things up" part too).

So to recap: I have absolutely nothing against fixing up bad internal
API's and breaking things. But 99% of the time that should be something we
can do incrementally (ie introduce the new API, and simply accept the
fact that removing the old API will take a few months). And the case when
that _really_ doesn't work should be rare enough that it doesn't wear
people down.

Because if you listen to the tone of people in this discussion, much of it
is about people being _tired_ of having to fix things up. It's not exactly
been a "wow, the end result sure was nice!" kind of discussion, is it?

And this is where "process" really matters. Making sure people don't get
too frustrated about the constant grind.

>							  However,
> I see an argument for attempting to stabilise possible conflicting
> changes get Linus' review/ack and add them to the stable branch.
>
> Linus suggested that such changes should go into an independent tree that
> everyone could pull into their trees with the full confidence that that
> tree would be merged into Linus' tree when the merge window opens.  I am
> suggesting that that tree be the stable branch of linux-next.

I absolutely have no problem with having a "this is the infrastrcture
changes that will go into the next release". In fact, I can even
*maintain* such a branch.

I've not wanted to open up a second branch for "this is for next release",
because quite frankly, one of the other problems we have is that people
already spend way too much time on the next release compared to just
looking at regressions in the current one. But especially if we're talking
about _purely_ API changes etc infrastructure, I could certainly do a
"next" branch.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: If you want me to quit I will quit
Date: Sat, 26 Apr 2008 17:32:15 UTC
Message-ID: <fa.B1cmxnutVGCK9gkHdM8wm22PZms@ifi.uio.no>

On Sat, 26 Apr 2008, Adrian Bunk wrote:
> On Sat, Apr 26, 2008 at 08:44:20AM -0700, Andrew Morton wrote:
> >...
> > git-tree owners might need, umm, some encouragement here.  It's much easier
> > for them to slap the oh-let's-fix-that-up commit at the tail of their
> > queue, which leaves us with the straggly commit record.
>
> As far as I understand Linus on these matters people David Miller
> mustn't edit older commits in their trees once their tree got pushed
> out.

I wouldn't say "mustn't", because the _one_ thing I hate is totally rigid
rules.

What I do try to encourage is for people to think publicising their git
trees as "version announcements". They're obviously _development_
versions, but they're still real versions, and before you publicize them
you should try to make sure that they make sense and are something you can
stand behind.

And once you've publicized them, you don't know who has that tree, so just
from a sanity and debugging standpoint, you should try to avoid mucking
with already-public versions. If you made a mistake, add a patch on top to
fix it (and announce the new state), but generally try to not "hide" the
fact that the state has changed.

But it's not a hard rule. Sometimes simple cleanliness means that you can
decide to go "oops, that was *really* wrong, let's just throw that away
and do a whole new set of patches". But it should be something rare - not
normal coding practice.

Because if it becomes normal coding practice, now people cannot work with
you sanely any more (ie some random person pulls your tree for testing,
and then I pull it at some other time, and the tester reports a problem,
but now the commits he is talking about don't actually even exist in my
tree any more, and it's all really messy!).

The x86 tree still does this. I absolutely detest it. Ingo claims that his
model is better, and I'm pretty damn sure he's wrong. But until it starts
causing bigger problems, I'll give him the benefit of the doubt.

				Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: If you want me to quit I will quit
Date: Sat, 26 Apr 2008 18:27:22 UTC
Message-ID: <fa.5c2Up/GgUrVId2bz49hk25iR0aY@ifi.uio.no>

On Sat, 26 Apr 2008, Sam Ravnborg wrote:
>
> It also depends on whare you are located in the dependency tree.

Absolutely.

> Being kbuidl maintainer I have very few people that actually pull me git
> tree (except from -mm and -next). So I rebase at will and have so far
> not got a single complaint from anyone pulling my tree.

I agree. Some trees are so specific (and/or simply don't have enough
patches in them) that it simply doesn't matter if two different people
pull the same tree. Even if it might end up causing some duplication of
commits (because the pulled tree might end up being then pulled further),
it's not a big deal if it's rare.

In fact, we have always had duplicated commits even when they are passed
around as email - just because perhaps two different trees simply needed
the same fix, and rather than wait for it, they both integrated it (and
then when they get merged, the same patch exists twice in the history,
just with different committer info etc).

So yeah, rebasing ends up being really convenient if you really don't
expect to have any other "real" end users than eventually being pulled
into my tree (or, even more commonly, and when rebasing is *really*
convenient: when it's just you keeping track of your own private patches
in your own private tree and don't know if they will *ever* go upstream at
all).

> But people like Davem and Ingo sits much higher in the dependency chain
> and thus they have a very different set of users and thus a different
> set of problems to take into account.

Yes. David has changed his workflow to accomodate others, while Ingo still
does the rebasing (and it works out because nobody else works on his trees
using git).

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: If you want me to quit I will quit
Date: Sat, 26 Apr 2008 19:05:54 UTC
Message-ID: <fa.zmIVeTMGg7+FOE059EwbUTLVHPY@ifi.uio.no>

On Sat, 26 Apr 2008, Stefan Richter wrote:
>
> Well, the need to amend single patches --- and folding the amendment in before
> mainline submission to correct important problems of the first shot --- is
> something which happens all the time.

.. and you simply SHOULD NOT PUBLICIZE the tree before it has gotten to a
reasonable point.

Keep the rough-and-not-ready thing that is being discussed as patches on
lkml as your own working tree, and just don't expose it as a public git
branch. You can't do any sane discussion over git anyway - if things are
being actively worked-on among people, you'd be passing patches around as
emails etc.

Yes, people may be (and I would strongly suggest _should_ be) using
something like git or quilt etc to keep track of the patches that they
(and others) have been discussing over email, but that has absolutely
nothing to do with making a public git tree available to others.

So:
 - making things public is *different* from developing them. Don't push
   out just because you committed something!

 - you shouldn't publicize a tree until it's in reasonable shape. EVER.
   Even -mm or -next is *not* better off with a pile of sh*t just because
   you're working in that area.

   I cannot stress this enough. I think Andrew has been way too polite to
   some people.

 - and once it is, you generally shouldn't mess with old commits even when
   you fix things. Full cleanliness or always being able to bisect
   specific configurations is not an excuse for messing up all the other
   things, and if this problem happens a lot, I would like to point you to
   the two previous points.

Really.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: If you want me to quit I will quit
Date: Sat, 26 Apr 2008 19:19:24 UTC
Message-ID: <fa.F53BEVbv7ZBd1EhtXG/UV/3YFNM@ifi.uio.no>

On Sat, 26 Apr 2008, Linus Torvalds wrote:
>
> So:
>  - making things public is *different* from developing them. Don't push
>    out just because you committed something!
>
>  - you shouldn't publicize a tree until it's in reasonable shape. EVER.
>    Even -mm or -next is *not* better off with a pile of sh*t just because
>    you're working in that area.
>
>    I cannot stress this enough. I think Andrew has been way too polite to
>    some people.
>
>  - and once it is, you generally shouldn't mess with old commits even when
>    you fix things. Full cleanliness or always being able to bisect
>    specific configurations is not an excuse for messing up all the other
>    things, and if this problem happens a lot, I would like to point you to
>    the two previous points.

And btw, a *big* part of the above is also:

 - mistakes happen.

There will be bugs. There will be cases where things aren't bisectable
(although they should generally be bisectable for *your* configuration,
because if they aren't, that shows that you didn't even compile the
commits you made).

And there will be kernels that don't boot. Even expecting people to always
boot-test every single commit would be unrealistic - let's face it, most
things look really obvious, and the fact that even obvious fixes can have
bugs doesn't mean that there should be hard rules about "every single
commit has to be boot-tested on X machines".

So it's an important part of the process to try to do a good job, and not
publicizing crap - but it's *equally* important to realize that crap
happens, and that it's easily *more* distracting to try to clean it up
after-the-fact than it is to just admit that it happened.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: If you want me to quit I will quit
Date: Sat, 26 Apr 2008 20:36:00 UTC
Message-ID: <fa.VBYD0X/UVUao53e72V/YtX1e5fg@ifi.uio.no>

On Sat, 26 Apr 2008, Andrew Morton wrote:

> On Sat, 26 Apr 2008 12:18:34 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> > So it's an important part of the process to try to do a good job, and not
> > publicizing crap - but it's *equally* important to realize that crap
> > happens, and that it's easily *more* distracting to try to clean it up
> > after-the-fact than it is to just admit that it happened.
>
> Fact is, this is the way in which developers want to work.  That is their
> workflow, and their tools should follow their workflow.  If a tool's
> behaviour prevents them from implementing their desired workflow, it isn't
> the workflow which should be changed ;)

But that was exactly my point. Bugs *will* happen. Follow-up patches
*will* happen. Don't fight it. Do the best you can do - there's no way
people will ever avoid all bugs to begin with.

And trying to white-wash things later is just pointless and actively
*bad*, when others have already seen and merged the original patches.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [GIT pull] x86 fixes for 2.6.26
Date: Sat, 17 May 2008 03:21:03 UTC
Message-ID: <fa.hKU0C1F2vkEXty2ncbjk+tLz7jc@ifi.uio.no>

On Fri, 16 May 2008, Theodore Tso wrote:
>
> Why do you consider rebasing topic branches a bad thing?

Rebasing branches is absolutely not a bad thing for individual developers.

But it *is* a bad thing for a subsystem maintainer.

So I would heartily recommend that if you're a "random developer" and
you're never going to have anybody really pull from you and you
*definitely* don't want to pull from other peoples (except the ones that
you consider to be "strictly upstream" from you!), then you should often
plan on keeping your own set of patches as a nice linear regression.

And the best way to do that is very much by rebasing them.

That is, for example, what I do myself with all my git patches, since in
git I'm not the maintainer, but instead send out my changes as emails to
the git mailing list and to Junio.

So for that end-point-developer situation "git rebase" is absolutely the
right thing to do. You can keep your patches nicely up-to-date and always
at the top of your history, and basically use git as an efficient
patch-queue manager that remembers *your* patches, while at the same time
making it possible to efficiently synchronize with a distributed up-stream
maintainer.

So doing "git fetch + git rebase" is *wonderful* if all you keep track of
is your own patches, and nobody else ever cares until they get merged into
somebody elses tree (and quite often, sending the patches by email is a
common situation for this kind of workflow, rather than actually doing git
merges at all!)

So I think 'git rebase' has been a great tool, and is absolutely worth
knowing and using.

*BUT*. And this is a pretty big 'but'.

BUT if you're a subsystem maintainer, and other people are supposed to be
able to pull from you, and you're supposed to merge other peoples work,
then rebasing is a *horrible* workflow.

Why?

It's horrible for multiple reasons. The primary one being because nobody
else can depend on your work any more. It can change at any point in time,
so nobody but a temporary tree (like your "linux-next release of the day"
or "-mm of the week" thing) can really pull from you sanely. Because each
time you do a rebase, you'll pull the rug from under them, and they have
to re-do everything they did last time they tried to track your work.

But there's a secondary reason, which is more indirect, but despite that
perhaps even more important, at least in the long run.

If you are a top-level maintainer or an active subsystem, like Ingo or
Thomas are, you are a pretty central person. That means that you'd better
be working on the *assumption* that you personally aren't actually going
to do most of the actual coding (at least not in the long run), but that
your work is to try to vet and merge other peoples patches rather than
primarily to write them yourself.

And that in turn means that you're basically where I am, and where I was
before BK, and that should tell you something. I think a lot of people
are a lot happier with how I can take their work these days than they
were six+ years ago.

So you can either try to drink from the firehose and inevitably be bitched
about because you're holding something up or not giving something the
attention it deserves, or you can try to make sure that you can let others
help you. And you'd better select the "let other people help you", because
otherwise you _will_ burn out. It's not a matter of "if", but of "when".

Now, this isn't a big issue for some subsystems. If you're working in a
pretty isolated area, and you get perhaps one or two patches on average
per day, you can happily basically work like a patch-queue, and then other
peoples patches aren't actually all that different from your own patches,
and you can basically just rebase and work everything by emailing patches
around. Big deal.

But for something like the whole x86 architecture, that's not what te
situation is. The x86 merge isn't "one or two patches per day". It easily
gets a thousand commits or more per release. That's a LOT. It's not quite
as much as the networking layer (counting drivers and general networking
combined), but it's in that kind of ballpark.

And when you're in that kind of ballpark, you should at least think of
yourself as being where I was six+ years ago before BK. You should really
seriously try to make sure that you are *not* the single point of failure,
and you should plan on doing git merges.

And that absolutely *requires* that you not rebase. If you rebase, the
people down-stream from you cannot effectively work with your git tree
directly, and you cannot merge their work and then rebase without SCREWING
UP their work.

And I realize that the x86 tree doesn't do git merges from other
sub-maintainers of x86 stuff, and I think that's a problem waiting to
happen. It's not a problem as long as Ingo and Thomas are on the net every
single day, 12 hours a day, and respond to everything. But speaking from
experience, you can try to do that for a decade, but it won't really work.

I've talked to Ingo about this a bit, and I'm personally fairly convinced
that part of the friction with Ingo has been that micro-management on a
per-patch level. I should know. I used to do it myself. And I still do it,
but now I do it only for really "core" stuff. So now I get involved in
stuff like really core VM locking, or the whole BKL thing, but on the
whole I try to be the anti-thesis of a micro-manager, and just pull from
the submaintainers.

It's easier for me, but more importantly, it's actually easier for
everybody *else*, as long as we can get the right flow working.

Which is why I still spend time on git, but even more so, why I also try
to spend a fair amount of time on explaining flow issues like this.
Because I want to try to get people on the same page when it comes to how
patches flow - because that makes it easier for *everybody* in the end.

[ IOW, from my personal perspective, in the short run the easiest thing to
  do is always "just pull".

  But in the long run, I want to know I can pull in the future too, and
  part of that means that I try to explain what I expect from downstream,
  but part of that also means that I try to push down-stream developers
  into directions where I think they'll be more productive and less
  stressed out so that they'll hopefully *be* there in the long run.

  And I think both Ingo and Thomas would be more produtive and less
  stressed out if they could actually pull from some submaintainers of
  their own, and try to "spread the load" a bit. It involves them finding
  the right people they can trust, but it also involves them having a
  workflow in place that _allows_ those kinds of people to then work with
  them! ]

> Is there a write up of what you consider the "proper" git workflow?

See above. It really depends on where in the work-flow you are.

And it very much does depend on just how big the flow of patches is. For
example, during 2.6.24..26, net/ and drivers/net had ~2500 commits.
arch/x86 and include/asm-x86 had ~1300 commits. Those are both big
numbers. We're talking a constant stream of work.

But Ted, when you look at fs/ext4, you had what, 67 commits in the
2.6.24..25 window? That's a whole different ballgame. If you have 67
commits in a release window of two months, we're talking roughly one a
day, and you probably didn't have a single real conflict with anybody else
during that whole release window, did you?

In *that* situation, you don't need to try to stream-line the merging. You
are better off thinking of them as individual patches, and passing them
around as emails on the ext4 mailing lists. People won't burn out from
handling an average of one patch a day, even for long long times. Agreed?

Realistically, not many subsystems really need to try to find
sub-sub-maintainers. Of the architectures, x86 is the most active one
*by*far*. That said, I think PowerPC actually has a chain of maintenance
that is better structured, in that there is more of a network of people
who have their own areas and they pull from each other. And POWERPC only
has about half the number commits that x86 has. I bet that lower number of
commits, coupled with the more spread out maintenance situation makes it
*much* more relaxed for everybody.

Networking, as mentioned, is about twice the number of patches (in
aggregate) from x86, but the network layer too has a multi-layer
maintenance setup, so I suspect that it's actually more relaxed about that
*bigger* flow of commits than arch/x86 is. Of course, that's fairly
recent: David had to change how he works, exactly so that the people who
work with him don't have to jump through hoops in order to synchronize
with his tree.

In other words, I very heavily would suggest that subsystem maintainers -
at least of the bigger subsystems, really see themselves as being in the
same situation I am: rather than doing the work, trying to make it easy
for *others* to do the work, and then just pulling the result.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [GIT pull] x86 fixes for 2.6.26
Date: Sat, 17 May 2008 17:05:59 UTC
Message-ID: <fa.TMeLysTesZjBpEYi3yqe/DRhjOY@ifi.uio.no>

On Sat, 17 May 2008, Theodore Tso wrote:
>
> Right, but so long as a subsystem maintainer doesn't publish his/her
> topic branches, and only sends out patches on their topic branches for
> discussion via e-mail, they're fine, right?

Yes. But then you really cannot work with other people with git.

That's what i was saying - you can use "git rebase" as long as you're a
"leaf developer" from a git standpoint, and everything you do is just
emailing patches around.

And quite frankly, if the x86 maintainer is a "leaf developer", we are
going to be in trouble in the long run. Unless some other architecture
comes out an takes away all the users and developers (which obviously
isn't going to happen).

> Basically, this would be the subsystem maintainer sometimes wearing an
> "end-point-developer" hat, and sometimes wearing a "subsystem
> maintainer" hat.  So rebasing is fine as long as it's clear that it's
> happening on branches which are not meant as a base for
> submaintainers.

It's not about "not meant as a base". It's about "cannot *possibly* be a
base". And the difference is that while *you* may not want others to base
their work off it, are you sure others agree?

And realize that while "git rebase" may be making things easier for the
person that does the rebase, what it ends up doing for *others* is to take
away options from them, and making for more work for them.

Again, if there are not enough others to matter, then you _should_ make
the workflow be around your own personal sandbox. So 'git rebase' makes
sense then.

Basically, it boils down to whether you're a technical manager or a grunt.

A grunt should use 'git rebase' to keep his own work in line. A technical
manager, while he hopefully does some useful work on his own, should
strive to make _others_ do as much work as possible, and then 'git rebase'
is the wrong thing, because it will always make it harder for the people
around you to track your tree and to help you update your tree.

And it's absolutely true that Ingo has been a 'grunt' in many ways. Not
only does everybody start out that way, but if you ask the question "who
does the actual work" (as a way to find out who is not a manager, because
managers by definition are useless bloodsucking parasites), then Ingo
clearly doesn't look very managerial.

But I definitely think we want Ingo and Thomas to be managers, not grunts.

Yes, both Ingo and Thomas are the top committers when looked at
individually. Here's the top five committers since 2.6.24 in arch/x86:

	Ingo Molnar (194):
	Thomas Gleixner (125):
	Glauber de Oliveira Costa (117):
	Roland McGrath (103):
	Glauber Costa (92):
	...

and in that sense they look very much non-managerial. But those ~200
commits are still just two hundred out of 1900 commits total! We *need*
managers, not just grunts. And I can well imagine how stressful it is to
not just do the two hundred commits, but also try to orchestrate the other
~1700 ones.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Please pull ACPI updates
Date: Thu, 17 Jul 2008 19:13:58 UTC
Message-ID: <fa.449AXtoqVZe1f7yoFxihbMtiEQE@ifi.uio.no>

On Thu, 17 Jul 2008, Len Brown wrote:
>
> ps.
>
> One thing I wish I had in git is a way to make this sequence easier...

Actually, it _is_ easier, although apparently the thing that made it
easier isn't necessarily widely known or documented.

I'm going to quote your whole sequence, because it's not a really odd
thing do want to do, and quoting how you do it now also makes the "proper
git way" actually much more understandable:

> Say I have a big topic branch with 30 patches in it.
> The 3rd patch turns out to have a bug in it, but the
> rest of the series is okay.  Today I invoke gitk on
> the branch and keep that open.
>
> Then I create a new topic branch at the broken patch.
>
> I always consult ~/src/git/Documentation/git-reset.txt
> so I can remember the following sequence...
>
> $ git reset --soft HEAD^
> $ edit
> $ git commit -a -c ORIG_HEAD
>
> Now I've got the fixed 3rd patch checked in,
> but 27 patches in the original branch are hanging
> off the original broken 3rd patch.
> So I git-cherry-pick 27 patches
> I hope I get them in the right order and don't miss any...
>
> It would be nice if we could somehow git rebase those
> 27 patches in a single command, but if we do,
> that pulls with it the broken 3rd patch.
>
> come to think of it, I can probably export 4..27 as
> an mbox and then import it on top of the new 3,
> maybe that is what others do.

No, what others do (if they know the tricks) is to say something like:

	git rebase -i <starting-point>

("-i" is short for "--interactive") where the starting point is just where
you want to start your work. It might be "origin", or it might be
"HEAD~30" to say "30 commits ago" or whatever. Anyway, it's basically the
stable point before the place you want to fix up.

That thing will literally show you the list of commits in an editor, and
then you can re-order them or mark them for special actions.

The normal action is to "pick" the commit (ie you just cherry-pick it).
But rather than just picking a commit (remember - you can change the order
by just editing the list), you can also do

 - "edit": this just basically pauses the rebase after committing the
    cherry-pick, so that you can then edit things and fix them with "git
    commit --amend", and when you're happy, you do "git rebase --continue"
    to continue your series.

 - "squash": this squashes the commit in togethr with the previous one,
   and is very useful together with moving commits around. IOW, maybe you
   committed a fix to a previous commit, and want to integrate the fix
   into the original commit - in that case you'd move the commit in the
   list up to just after the commit you want to fix, and change the "pick"
   into a "squash"

so it actually makes doing what you do above by hand much easier.

[ Honesty in advertising: I actually don't use this at all. I've tested
  it, and I think it's very useful, but I have so far mostly ended up
  doing this by hand in the very few cases I do it at all. Part of the
  reason is that "git rebase -i" is fairly recent, so it's not part of my
  normal tool set.

  But the bigger reason is that obviously all the commits I tend to do are
  just merges, and I don't maintain "patch series" in my trees except
  sometimes for git itself ]

Git rebase can also try to rebase merges (the "-p" flag - short form of
"--preserve-merges"), so this _does_ work with a non-linear history too to
some degree, but quite frankly, I would seriously suggest people try to
avoid getting quite that funky with it. It's useful for emergencies, but
you'd better know what you are doing, and you should look at the
before-and-after state very carefully.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Please pull ACPI updates
Date: Thu, 17 Jul 2008 19:17:19 UTC
Message-ID: <fa.zQ9T8aA6tdEe096hNtT66LPNXRw@ifi.uio.no>

On Thu, 17 Jul 2008, Linus Torvalds wrote:
>
> No, what others do (if they know the tricks) is to say something like:
>
> 	git rebase -i <starting-point>

Oh, and before anybody goes any further with this: it's a very convenient
feature, but it _is_ still technically nothing but a very nice interface
to cherry-picking and rebasing history.

So all the caveats about _not_ doing this with public history that others
have already seen and possibly merged are still in place. That part
doesn't change at all.

So it's great to do with private changes to clean them up before
publicizing them, or possibly with branches that you have explicitly told
people are _not_ stable, but it is still very much about creating a
totally new history.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Please pull ACPI updates
Date: Thu, 17 Jul 2008 15:19:20 UTC
Message-ID: <fa.VoAdMSD6KMajuc+SsOOPmPIH714@ifi.uio.no>

On Thu, 17 Jul 2008, Andi Kleen wrote:
>
> Hmm, but if you're dependent ACPI needs to go in first anyways, doesn't it?

Umm. The particular PART of ACPI you depend on needs to go in first, yes.

That's the whole point of topic branches. They allow you to separate out
work to different areas, so that people who are interested in (say) the
PCI-impacting ones can merge with one part, without having to wait for the
other parts to stabilize.

> I don't think the ACPI tree is dependent on PCI at least. Or at least I didn't
> notice any problems in this area.

The PCI tree merged the suspend branch from the ACPI tree. You can see it
by looking at the PCI merge in gitk:

	gitk dc7c65db^..dc7c65db

and roughly in the middle there you'll find Jesse's commit 53eb2fbe, in
which he merges branch 'suspend' from Len's ACPI tree.

So Jesse got these three commits:

	0e6859d... ACPI PM: Remove obsolete Toshiba workaround
	8d2bdf4... PCI ACPI: Drop the second argument of platform_pci_choose_state
	0616678... ACPI PM: acpi_pm_device_sleep_state() cleanup

from Len's tree. Then look at these three commits that I got when I
actually merged from you:

	741438b... ACPI PM: Remove obsolete Toshiba workaround
	a80a6da... PCI ACPI: Drop the second argument of platform_pci_choose_state
	2fe2de5... ACPI PM: acpi_pm_device_sleep_state() cleanup

Look familiar? It's the same patches - just different commit ID's. You
rebased and moved them around, so they're not really the "same" at all,
and they don't show the shared history any more, and the fact that they
were pulled earlier into the PCI tree (and then into mine).

This is what rebasing causes.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Please pull ACPI updates
Date: Thu, 17 Jul 2008 20:01:45 UTC
Message-ID: <fa.wRoYUgxZz2K3yfYYFj4+PyqE85k@ifi.uio.no>

On Thu, 17 Jul 2008, Andi Kleen wrote:
>
> The whole point of the exercise of cleaning up/rewriting the history is to make
> the tree as bisectable as possible.

No.

"git bisect" is perfectly able to handle merges. They are _fine_.

The problem with rebasing is that it *changes* something that was already
tested (and possibly merged into somebody elses tree) into SOMETHING ELSE.

And that means that a large portion of the previous testing is basically
thrown away.

In particular, if something worked for somebody before, it also removes
the "known good state" from a bisection standpoint, so rebasing actually
makes things _harder_ to bisect - because now you cannot sanely bisect
between two versions of the tree (when you mark the old tree "good", it
has no relevance to the new tree that had all the old history rewritten).

So no, rebasing does _not_ make bisection easier. It makes it easier to
understand, perhaps, but it actually makes many things much much harder,
and removes all trace of any testing coverage that the old commit had.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Please pull ACPI updates
Date: Thu, 17 Jul 2008 20:17:41 UTC
Message-ID: <fa.vhTehQSb0tyyoZMxNoEeJVB34oE@ifi.uio.no>

On Thu, 17 Jul 2008, Linus Torvalds wrote:
>
> In particular, if something worked for somebody before, it also removes
> the "known good state" from a bisection standpoint, so rebasing actually
> makes things _harder_ to bisect - because now you cannot sanely bisect
> between two versions of the tree (when you mark the old tree "good", it
> has no relevance to the new tree that had all the old history rewritten).

Let me do a made-up example of this just to kind of illustrate the point.

Let's say that _I_ rebased history to always make it linear.

That would obviously make things much "easier" to bisect, since now it's
just a linear list of commits, and bisection is just taking the midpoint
of that list and trying it. Right? Also, I'd fix up the linear list so
that any found bugs are squashed into the code that caused them, so all
_known_ bugs are non-issues from the standpoint of bisection: because the
code you are bisecting already has those removed entirely.

That's a clean nice linear history with no unnecessary breakages (no
compile failures, no other unrelated and solved bugs) for bisection, so
bisecting new bugs must be much simpler, right?

WRONG.

It means that a person who ran my tree as of yesterday, and had a working
setup, but then updated to my tree as of today, and things break, can no
longer bisect sanely AT ALL - because the state that he was at yesterday
is basically completely *gone*, because it has been cleaned-up and
sanitized, since I happened to rewrite history from a week ago due to
finding a bug.

Also, related to the same thing, if that person had some patches of his
own that he was working on on top of the state I had yesterday, since I
rebased, it's now almost impossible for him to be able to judge what is
really _new_ stuff, and what is just the old stuff he was working on,
except it's been cleaned up and sanitized.

But at the same time, the new history is clearly _simpler_, isn't it? Yes,
it is simpler, BUT ONLY IF YOU DON'T TAKE INTO ACCOUNT THAT SOMEBODY
ALREADY SAW AND USED THE _OTHER_ HISTORY YESTERDAY!

I'm shouting, because this is really really important from a very
fundamental standpoint. It's not just important from a git standpoint:
this really is _not_ some odd git-specific implementation issue. No, it's
much much more fundamental than git. It's a very basic fact that would be
true with _any_ SCM.

So git just happens to encode that fundamental truth a bit more explicitly
and make it very obvious. Git is very careful at _not_ losing that state
as it existed somewhere yesterday.

So rebasing and cleanups may indeed result in a "simpler" history, but it
only look that way if you then ignore all the _other_ "simpler" histories.
So anybody who rebases basically creates not just one simple history, but
a _many_ "simple" histories, and in doing so actually creates a
potentially much bigger mess than he started out with!

As long as you never _ever_ expose your rewriting of history to anybody
else, people won't notice or care, because you basically guarantee that
nobody can ever see all those _other_ "simpler" histories, and they only
see the one final result. That's why 'rebase' is useful for private
histories.

But even then, any testing you did in your private tree is now suspect,
because that testing was done with the old history that you threw away.
So even if you delete all the old histories and never show them, they kind
of do exist conceptually - they existed in the sense that you tested them,
and you've just hidden the fact that what you release is different from
what you tested.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: Please pull ACPI updates
Date: Thu, 17 Jul 2008 20:29:35 UTC
Message-ID: <fa.0U8Wyxi5RnpnsfZxAvaVXdcmfcc@ifi.uio.no>

On Thu, 17 Jul 2008, Linus Torvalds wrote:
>
> But even then, any testing you did in your private tree is now suspect,
> because that testing was done with the old history that you threw away.
> So even if you delete all the old histories and never show them, they kind
> of do exist conceptually - they existed in the sense that you tested them,
> and you've just hidden the fact that what you release is different from
> what you tested.

One final note on this: the above is obviously not a problem for simple
code that only really does one thing, and in particular for code that you
wrote yourself. Moving your own commits around to make them make more
sense - or splitting them up etc - is often a _good_ thing, even if it
obviously does change history.

So using rebase to clean up and simplify and/or fix up stupid ordering
issues is good, and "git rebase -i" is actually really good for this
thing.

No, the problems start happening when you do it on a larger scale, or (and
this is very common in the kernel), your rebase _moves_ the commits over
big distances because you update to the top of my development tree. In
that case, while your patches themselves may not have changed, the base
that you based them on may have changed really subtly - it still compiles,
it still "works", but maybe it doesn't work as well as it used to. And
that's why the old testing you did is basically almost worthless.

So rebasing can be good or bad. It's a matter of degree. Rebasing private
and small things is generally good. Rebasing bigger things can cause
problems. And the further away you rebase something, the more problems it
will generally cause.

		Linus

Subject: Re: [git pull] drm-next
From: Linus Torvalds
Date: Sun, 29 Mar 2009 14:48:18 -0700

On Sun, 29 Mar 2009, Dave Airlie wrote:

> My plans from now on are just to send you non-linear trees, whenever I 
> merge a patch into my next tree thats when it stays in there, I'll pull 
> Eric's tree directly into my tree and then I'll send the results, I 
> thought we cared about a clean merge history but as I said without some 
> document in the kernel tree I've up until now had no real idea what you 
> wanted.

I want clean history, but that really means (a) clean and (b) history.

People can (and probably should) rebase their _private_ trees (their own 
work). That's a _cleanup_. But never other peoples code. That's a "destroy 
history"

So the history part is fairly easy. There's only one major rule, and one 
minor clarification:

 - You must never EVER destroy other peoples history. You must not rebase 
   commits other people did. Basically, if it doesn't have your sign-off 
   on it, it's off limits: you can't rebase it, because it's not yours.

   Notice that this really is about other peoples _history_, not about 
   other peoples _code_. If they sent stuff to you as an emailed patch, 
   and you applied it with "git am -s", then it's their code, but it's 
   _your_ history.

   So you can go wild on the "git rebase" thing on it, even though you 
   didn't write the code, as long as the commit itself is your private 
   one.

 - Minor clarification to the rule: once you've published your history in 
   some public site, other people may be using it, and so now it's clearly 
   not your _private_ history any more.

   So the minor clarification really is that it's not just about "your 
   commit", it's also about it being private to your tree, and you haven't 
   pushed it out and announced it yet.

That's fairly straightforward, no?

Now the "clean" part is a bit more subtle, although the first rules are 
pretty obvious and easy:

 - Keep your own history readable

   Some people do this by just working things out in their head first, and 
   not making mistakes. but that's very rare, and for the rest of us, we 
   use "git rebase" etc while we work on our problems. 

   So "git rebase" is not wrong. But it's right only if it's YOUR VERY OWN 
   PRIVATE git tree.

 - Don't expose your crap.

   This means: if you're still in the "git rebase" phase, you don't push 
   it out. If it's not ready, you send patches around, or use private git 
   trees (just as a "patch series replacement") that you don't tell the 
   public at large about.

It may also be worth noting that excessive "git rebase" will not make 
things any cleaner: if you do too many rebases, it will just mean that all 
your old pre-rebase testing is now of dubious value. So by all means 
rebase your own work, but use _some_ judgement in it.

NOTE! The combination of the above rules ("clean your own stuff" vs "don't 
clean other peoples stuff") have a secondary indirect effect. And this is 
where it starts getting subtle: since you must not rebase other peoples 
work, that means that you must never pull into a branch that isn't already 
in good shape. Because after you've done a merge, you can no longer rebase 
you commits.

Notice? Doing a "git pull" ends up being a synchronization point. But it's 
all pretty easy, if you follow these two rules about pulling:

 - Don't merge upstream code at random points. 

   You should _never_ pull my tree at random points (this was my biggest 
   issue with early git users - many developers would just pull my current 
   random tree-of-the-day into their development trees). It makes your 
   tree just a random mess of random development. Don't do it!

   And, in fact, preferably you don't pull my tree at ALL, since nothing 
   in my tree should be relevant to the development work _you_ do. 
   Sometimes you have to (in order to solve some particularly nasty 
   dependency issue), but it should be a very rare and special thing, and 
   you should think very hard about it.

   But if you want to sync up with major releases, do a

        git pull linus-repo v2.6.29

   or similar to synchronize with that kind of _non_random_ point. That 
   all makes sense. A "Merge v2.6.29 into devel branch" makes complete 
   sense as a merge message, no? That's not a problem.

   But if I see a lot of "Merge branch 'linus'" in your logs, I'm not 
   going to pull from you, because your tree has obviously had random crap 
   in it that shouldn't be there. You also lose a lot of testability, 
   since now all your tests are going to be about all my random code.

 - Don't merge _downstream_ code at random points either.

   Here the "random points" comment is a dual thing. You should not mege 
   random points as far as downstream is concerned (they should tell you 
   what to merge, and why), but also not random points as far as your tree 
   is concerned.

   Simple version: "Don't merge unrelated downstream stuff into your own 
   topic branches."

   Slightly more complex version: "Always have a _reason_ for merging 
   downstream stuff". That reason might be: "This branch is the release 
   branch, and is _not_ the 'random development' branch, and I want to 
   merge that ready feature into my release branch because it's going to 
   be part of my next release".

See? All the rules really are pretty simple. There's that somewhat subtle 
interaction between "keep your own history clean" and "never try to clean 
up _other_ proples histories", but if you follow the rules for pulling, 
you'll never have that problem.

Of course, in order for all this to work, you also have to make sure that 
the people you pull _from_ also have clean histories.

And how do you make sure of that? Complain to them if they don't. Tell 
them what they should do, and what they do wrong. Push my complaints down 
to the people you pull from. You're very much allowed to quote me on this 
and use it as an explanation of "do this, because that is what Linus 
expects from the end result".

                        Linus

Index Home About Blog