Index Home About Blog
From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Kernel SCM saga..
Date: Mon, 11 Apr 2005 00:41:21 UTC
Message-ID: <>
Original-Message-ID: <>

On Sat, 9 Apr 2005, Paul Jackson wrote:
> I must be missing something here ...
> If the stat shows a possible change, then you shouldn't have to open the
> original version to determine if it really changed - just compute the
> SHA1 of the new file, and see if that changed from the original SHA1.

Yes. However, I've got two reasons for this:

 (a) it may actually be cheaper to just unpack the compressed thing than
     it is to compute the sha, _especially_ since it's very likely that
     you have to do that anyway (ie if it turns out that they _are_
     different, you need the unpacked data to then look at the

     So when you come from your backup angle, you only care about "has it
     changed", and you'll do a backup. In "git", you usually care about
     the old contents too.

 (b) while I depend on the fact that if the SHA of an object matches, the
     objects are the same, I generally try to avoid the reverse
     dependency. Why? Because if I end up changing the way I pack objects,
     and still want to work with old objects, I may end up in the
     situation that two identical objects could get different object

I don't actually know how valid a point "(b)" is, and I don't think it's
likely, but imagine that SHA1 ends up being broken (*) and I decide that I
want to pack new objects with a new-and-improved-SHA256 or something. Such
a thing would obviously mean that you end up with lots of _duplicate_ data
(any new data that is repackaged with the new name will now cause a new
git object), but "duplicate" is better than "broken".

I don't actually guarantee that "git" could handle that right, but I've
been idly trying to avoid locking myself into the mindset that "file
equality has to mean name equality over the long run". So while the system
right now works on the 1:1 "name" <-> "content" mapping, it's possible
that it _could_ work with a more relaxed 1:n "content" -> "name" mapping.

But it's entirely possible that I'm being a git about this.


(*) yeah, yeah, I know about the current theoretical case, and I don't
care. Not only is it theoretical, the way my objects are packed you'd have
to not just generate the same SHA1 for it, it would have to _also_ still
be a valid zlib object _and_ get the header to match the "type + length"
of object part. IOW, the object validity checks are actually even stricter
than just "sha1 matches".

From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Kernel SCM saga..
Date: Mon, 11 Apr 2005 00:44:00 UTC
Message-ID: <>
Original-Message-ID: <>

On Fri, 8 Apr 2005, Chris Wedgwood wrote:
> Actually, I could probably make this *much* still faster with a
> caveat.  Given that my editor when I write a file will write a
> temporary file and rename it, for files in directories where nlink==2
> I can check chat first and skip the stat of the individual files.

Yes, doing the stat just on the directory (on leaf directories only, of
course, but nlink==2 does say that on most filesystems) is indeed a huge
potential speedup.

It doesn't matter so much for the cached case, but it _does_ matter for
the uncached one. Makes a huge difference, in fact (I was playing with
exactly that back when I started doing "bkr" in BK/tools - three years

It turns out that I expect to cache my source tree (at least the mail
outline), and that guides my optimizations, but yes, your dir stat does
help in the case of "occasionally working with lots of large projects"
rather than "mostly working on the same ones with enough RAM to cache it

And "git" is actually fairly anal in this respect: it not only stats all
files, but the index file contains a lot more of the stat info than you'd
expect. So for example, it checks both ctime and mtime to the nanosecond
(did I mention that I didn't worry too much about portability?) exactly so
that it can catch any changes except for actively malicious things.

And if you do actively malicious things in your own directory, you get
what you deserve. It's actually _hard_ to try to fool git into believing a
file hasn't changed: you need to not only replace it with the exact same
file length and ctime/mtime, you need to reuse the same inode/dev numbers
(again - I didn't worry about portability, and filesystems where those
aren't stable are a "don't do that then") and keep the mode the same. Oh,
and uid/gid, but that was much me being silly.


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Git-commits mailing list feed.
Date: Sat, 23 Apr 2005 17:31:51 UTC
Message-ID: <>
Original-Message-ID: <>

On Sun, 24 Apr 2005, David Woodhouse wrote:
> Nah, asking Linus to tag his releases is the most comfortable way.
> mkdir .git/tags
> echo 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 > .git/tags/2.6.12-rc2
> echo a2755a80f40e5794ddc20e00f781af9d6320fafb > .git/tags/2.6.12-rc3

The reason I've not done tags yet is that I haven't decided how to do

The git-pasky "just remember the tag name" approach certainly works, but I
was literally thinking of setting up some signing system, so that a tag
doesn't just say "commit 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 is
v2.6.12-rc2", but it would actually give stronger guarantees, ie it would
say "Linus says that commit 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 is
his 2.6.12-rc2 release".

That's something fundamentally more powerful, and it's also something that
I actually can integrate better into git.

In other words, I actually want to create "tag objects", the same way we
have "commit objects". A tag object points to a commit object, but in
addition it contains the tag name _and_ the digital signature of whoever
created the tag.

Then you just distribute these tag objects along with all the other
objects, and fsck-cache can pick them up even without any other knowledge,
but normally you'd actually point to them some other way too, ie you could
have the ".git/tags/xxx" files have the pointers, but now they are
_validated_ pointers.

That was my plan, at least. But I haven't set up any signature generation
thing, and this really isn't my area of expertise any more. But my _plan_
literally was to have the tag object look a lot like a commit object, but
instead of pointing to the tree and the commit parents, it would point to
the commit you are tagging. Somehting like

	commit a2755a80f40e5794ddc20e00f781af9d6320fafb
	tag v2.6.12-rc3
	signer Linus Torvalds

	This is my official original 2.6.12-rc2 release


with a few fixed headers and then a place for free-form commentary,
everything signed by the key (and then it ends up being encapsulated as an
object with the object type "tag", and SHA1-csummed and compressed, ie it
ends up being just another object as far as git is concerned, but now it's
an object that tells you about _trust_)

(The "signer" field is just a way to easily figure out which public key to
check the signature against, so that you don't have to try them all. Or
something. My point being that I know what I want, but because I normally
don't actually ever _use_ PGP etc, I don't know the scripts to create
these, so I've been punting on it all).

If somebody writes a script to generate the above kind of thing (and tells
me how to validate it), I'll do the rest, and start tagging things
properly. Oh, and make sure the above sounds sane (ie if somebody has a
better idea for how to more easily identify how to find the public key to
check against, please speak up).


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Git-commits mailing list feed.
Date: Sat, 23 Apr 2005 17:45:45 UTC
Message-ID: <>
Original-Message-ID: <>

On Sat, 23 Apr 2005, Linus Torvalds wrote:
> 	commit a2755a80f40e5794ddc20e00f781af9d6320fafb
> 	tag v2.6.12-rc3
> 	signer Linus Torvalds
> 	This is my official original 2.6.12-rc2 release
> 	....
> 	-----END PGP SIGNATURE-----

Btw, in case it wasn't clear, one of the advantages of this is that these
objects are really _not_ versioned themselves, and that they are totally
independent of the objects that they actually tag.

They spread together with all the other objects, so they fit very well
into the whole git infrastructure, but the real commit objects don't have
any linkages to the tag and the tag objects themselves don't have any
history amongst themselves, so you can create a tag at any (later) time,
and it doesn't actually change the commit in any way or affect other tags
in any way.

In particular, many different people can tag the same commit, and they
don't even need to tage their _own_ commit - you can use this tag objects
to show that you trust somebody elses commit. You can also throw the tag
objects away, since nothing else depends on them and they have nothing
linking to them - so you can make a "one-time" tag object that you can
pass off to somebody else, and then delete it, and now it's just a
"temporary tag"  that tells the recipient _something_ about the commit you
tagged, but that doesn't stay around in the archive.

That's important, because I actually want to have the ability for people
who want me to pull from their archive to send me a message that says
"pull from this archive, and btw, here's the tag that not only tells you
which head to merge, but also proves that it was me who created it".

Will we use this? Maybe not. Quite frankly, I think human trust is much
more important than automated trust through some technical means, but I
think it's good to have the _support_ for this kind of trust mechanism
built into the system. And I think it's a good way for distributors etc to
say: "this is the source code we used to build the kernel that we
released, and we tagged it 'v2.6.11-mm6-crazy-fixes-3.96'".

And if my key gets stolen, I can re-generate all the tags (from my archive
of tags that I trust), and sign them with a new key, and revoke the trust
of my old key. This is why it's important that tags don't have
interdependencies, they are just a one-way "this key trusts that release
and calls it xyzzy".


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Git-commits mailing list feed.
Date: Sat, 23 Apr 2005 18:30:29 UTC
Message-ID: <>
Original-Message-ID: <>

On Sat, 23 Apr 2005, Thomas Glanzmann wrote:
> # This creates the signature.
> gpg --clearsign < sign_this > signature

This really doesn't work for me - I do not want to have the gpg header
above it, only the signature below. Since I want git to actually
understand the tags, but do _not_ want git to have to know about whatever
signing method was used, I really want the resulting file to look like

	commit ....
	tag ...

	here goes comment
	here goes signature

and no headers.

Whether that can be faked by always forcing SHA1 as the hash, and then
just removing the top lines, and re-inserting them when verifying, or
whether there is some mode to make gpg not do the header crud at all, I
don't know. Which is exactly why I never even got started.


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Git-commits mailing list feed.
Date: Sat, 23 Apr 2005 19:38:14 UTC
Message-ID: <>
Original-Message-ID: <>

On Sat, 23 Apr 2005, Sean wrote:
> A script that knows how to validate signed tags, can easly strip off all
> the signing overhead for display.   Users of scripts that don't understand
> will see the cruft, but at least it will still be usable.


Guys, I will say this once more: git will not look at the signature.

That means that we don't "strip them off", because dammit, they DO NOT
EXIST as far as git is concerned. This is why a tag-file will _always_
start with

	commit <commit-sha1>
	tag <tag-name>

because that way we can use fsck and validate reachability and have things
that want trees (or commits) take tag-files instead, and git will
automatically look up the associated tree/commit. And it will do so
_without_ having to understand about signing, since signing is for trust
between _people_ not for git.

And that is why I from the very beginning tried to make it very clear that
the signature goes at the end. Not at the beginning, not in the middle,
and not in a different file. IT GOES AT THE END.


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Git-commits mailing list feed.
Date: Sat, 23 Apr 2005 19:59:14 UTC
Message-ID: <>
Original-Message-ID: <>

On Sat, 23 Apr 2005, Sean wrote:
> Okay now you're just being difficult <g>   You're acting like it's
> impossible for git to grab the SHA1 out of the clear text message if there
> is signing overhead above the tag reference.   That is nonesense.

No. It's not "impossible" for git to parse crap. But git won't.

There are two ways you can write programs:
 - reliably
 - unreliably

and I do the first one. That means that a program I write does something
_repeatable_. It does the same thing, regardless of whether a human
happened to write "REF:" in the comment section, or anything else.

The thing is, great programs come not out of great coding, but out of
great data structures. The whole git philosophy bases itself on getting
the data structure right.

And what you are asking for is doing it _wrong_. So in git I don't just
parse random free-form text and guess that a line that starts with REF: is
a reference to a commit. It has very rigid and well-specified data
structures, and that's how you make reliable programs.

I don't care what anybody else does on top of git, but dammit, I'll make
sure that the core infrastructure is designed the right way.

And that means that we don't guess, and that we don't parse random ASCII
blobs. It means that we have very very fixed formats so that programs can
either do the right thing or unambiguously say "that's crap".

I've said it before, and I'll say it again: we have enough crap that calls
itself SCM's out there already. I want git to be reliable and _simple_,
not a collection of crap that just happens to work.


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Git-commits mailing list feed.
Date: Sat, 23 Apr 2005 20:22:52 UTC
Message-ID: <>
Original-Message-ID: <>

On Sat, 23 Apr 2005, Junio C Hamano wrote:
> If that is the case, can't you do it without introducing this
> new tag object, like this?

No, because I also want to sign the _name_ I gave it.

Otherwise somebody can take my "signed commit", and claim that I called it
something else.

Just signing the commit is indeed sufficient to just say "I trust this
commit". But I essentially what to also say what I trust it _for_ as well.

And sure, I could make a totally bogus "commit" object that just points to
the original commit, uses the same "tree" from that original commit, and
write what I want to trust into that commit. I then sign that, and create
yet _another_ commit that has the signature (and the pointer to the just
signed commit) in its commit message, and then I point to _that_ commit.

So yes, we can certainly do this with playing games with commits. That
sounds singularly ugly, though, since just doing a "tag" object is a lot
more straightforward, and really tells the world what's going on (and
makes it easy for automated tools to just browse the object database and
see "that's a tag").


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Git-commits mailing list feed.
Date: Sat, 23 Apr 2005 23:29:45 UTC
Message-ID: <>
Original-Message-ID: <>

On Sat, 23 Apr 2005, Jan Harkes wrote:
> I respectfully disagree,
> rsync works fine for now, but people are already looking at implementing
> smarter (more efficient) ways to synchronize git repositories by
> grabbing missing commits, and from there fetching any missing tree and
> file blobs.

But this is a _feature_.

Other people normally shouldn't be interested in your tags. I think it's a
mistake to make everybody care.

So you normally would fetch only tags you _know_ about. For example, one
of the reasons we've been _avoiding_ personal tags in the BK trees is that
it just gets really ugly really quickly because they get percolated up to
everybody else. That means that in a BK tree, you can't sanely use tags
for "private" stuff, like telling somebody else "please sync with this

So having the tag in the object database means that fsck etc will notice
these things, and can build up a list of tags you know about. It also
means that you can have tag-aware synchronization tools, ie exactly the
kind of tools that only grab missing commits can also then be used to
select missing tags according to some _private_ understanding of what tags
you might want to find..


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Updated git HOWTO for kernel hackers
Date: Thu, 23 Jun 2005 03:27:36 UTC
Message-ID: <>
Original-Message-ID: <>

On Wed, 22 Jun 2005, Jeff Garzik wrote:
> Concrete example:  I have a git tree on local disk.  I need to find out
> where, between 2.6.12-rc1 and 2.6.12, a driver broke.  This requires
> that I have -ALL- linux-2.6.git/refs/tags on disk already, so that I can
> bounce quickly and easily between tags.

Absolutely not.

I might have my private tags in my kernel, and you might have your private
tags ("tested") in your kernel, so there is no such thing as "ALL".

The fact that BK had it was a BK deficiency, and just meant that you
basically couldn't use tags at all with BK, the "official ones" excepted.
It basically meant that nobody else than me could ever tag a tree. Do you
not see how that violates the very notion of "distributed"?

This is _exactly_ the same thing as if you said "I want to merge with ALL
BRANCHES".  That notion doesn't exist. You can rsync the whole repository,
and you'll get all branches from that repository, that's really by virtue
of doing a filesystem operation, not because you asked git to get you all

A tag is even _implemented_ exactly like a branch, except it allows (but
does not require) that extra step of signing an object. The only
difference is literally whether it is in refs/branches or refs/tags.

> It is valuable to have a local copy of -all- tags, -before- you need
> them.

You seem to not realize that "all tags" is a nonsensical statement in a
distributed system.

If you want to have a list of official tags, why not just do exactly that?
What's so hard with saying "ok, that place has a list of 'official' tags,
let's fetch them".

How would you fetch them? You might use rsync, for example. Or maybe wget.
Or whatever. The point is that this works already. You're asking for
something nonsensical, outside of just a script that does

	rsync -r --ignore-existing repo/refs/tags/ .git/refs/tags/

See? What's your complaint with just doing that?


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Updated git HOWTO for kernel hackers
Date: Thu, 23 Jun 2005 05:57:37 UTC
Message-ID: <>
Original-Message-ID: <>

On Thu, 23 Jun 2005, Jeff Garzik wrote:
> No complaint with that operation.  The complaint is that it's an
> additional operation.  Re-read what Greg said:

Please re-read what I said.

Pulling a regular head _cannot_ and _must_not_ update tags. Tags are not
associated with the tree, and they _cannot_ and _must_not_ be so, exactly
because that would make them global instead of private, and it would
fundamentally make them not be distributed, and would mean that they'd be
pointless as anything but "Linus' official tags".

That's what we had in BK _AND IT DOES NOT WORK_!

Does it help when I scream?

> > Is there some reason why git doesn't pull the
> > tags in properly when doing a merge?  Chris and I just hit this when I
> > pulled his tree and and was wondering where the tag went.

And I suggested that if you want that, then you pull on the TAG. You take
my modification, you test it, and you see if

	git fetch tag ..repo.. tagname


That solves exactly the case that Greg is complaining about, and it solves
it in a _sane_ manner: you tell git that you want a tag, and git fetches
it for you. It's that simple, and it does not introduce the _BROKEN_
notion that tags are associated directly with the commit itself and
somehow visible to all.

> Multiple users -- not just me -- would prefer that git-pull-script
> pulled the tags, too.

And multiple users -- clearly including you -- aren't listening to me.
Tags are separate from the source they tag, and they HAVE TO BE. There is
no "you automatically get the tags when you get the tree", because the two
don't have a 1:1 relationship.

And not making them separate breaks a lot of things. As mentioned, it
fundamentally breaks the distributed nature, but that also means that it
breaks whenever two people use the same name for a tag, for example. You
can't "merge" tags. BK had a very strange form of merging, which was (I
think) to pick the one last in the BK ChangeSet file, but that didn't make
it "right". You just never noticed, because Linux could never use tags at
all due to the lack of privacy, except for big releases..

> Suggested solution:  add '--tags' to git-pull-script
> (git-fetch-script?), which calls
> 	rsync -r --ignore-existing repo/refs/tags/ .git/refs/tags/

How is this AT ALL different from just having a separate script that does
this? You've introduced nothing but syntactic fluff, and you've made it
less flexible at the same time. First off, you might want to get new tags
_without_ fetching anything else, and you might indeed want to get the
tags _first_ in order to decide what you want to fetch. In fact, in many
cases that's exactly what you want, namely you want to fetch the data
based on the tag.

Secondly, if your worry is that you forget, then hell, write a small shell
function, and be done with it.


When I fetch somebody elses head, I had better not fetch his tags. His
tags may not even make _sense_ in what I have - he may tag things in other
branches that I'm not fetching at all. In fact, his tag-namespace might be
_different_ from mine, ie he might have tagged something "broken" in his
tree, and I tagged something _else_ "broken" in mine, just because it
happens to be a very useful tag for when you want to mark "ok, that was a
broken tree".

It is wrong, wrong, _wrong_ to think that fetching somebody elses tree
means that you should fetch his tags. The _only_ reason you think it's
right is because you've only ever seen centralized tags: tags were the one
thing that BK kept centralized.

But once people realize that they can use tags in their own trees, and
nobody else will ever notice, they'll slowly start using them. Maybe it
takes a few months or even longer. But it will happen. And I refuse to
make stupid decisions that makes it not work.

And thinking that "fetching a tree fetches all the tags from that tree"
really _is_ a stupid decision. It's missing the big picture. It's missing
the fact that tags _should_ be normal every-day things that you just use
as "book-marks", and that the kind of big "synchronization point for many
people" tag should actually be the _rare_ case.

The fact that global tags make that private "bookmark" usage impossible
should be a big red blinking sign saying "don't do global tags".

> Let the kernel hacker say "yes, I really do want to download the tags
> Linus publicly posted in linux-2.6.git/refs/tags" because this was a
> common operation in the previous workflow, a common operation that we
> -made use of-.

And I already suggested a trivial script. Send me the script patch,
instead of arguing for stupid things.


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Updated git HOWTO for kernel hackers
Date: Thu, 23 Jun 2005 04:54:46 UTC
Message-ID: <>
Original-Message-ID: <>

On Wed, 22 Jun 2005, Adam Kropelin wrote:
> I know I shouldn't invoke this particular acronym, but I rather like
> CVS's approach.

The problem I have with "git commit" committing everything dirty by
default is that it encourages exactly the wrong kind of behaviour, ie the
"commit it all in one go without thinking about it".

Also, CVS really doesn't have much choice, since CVS doesn't _have_ the
notion of marking files for commits. In contrast, in git the index file
really does end up being a good way to say which files are ready to be

And "git status" really isn't that hard to type, and it will tell you
exactly what you've already marked for commit, and what you have dirty in
the tree but isn't marked for commit yet.

So I think the "git commit <file-list>" thing is very convenient, but it's
convenient exactly because it's concise yet still precise and doesn't
encourage the "just commit whatever random dirty state I have right now"

And if you have more than a few files dirty in your tree, I really think
it's much better to do "git status" and think about it a bit and select
the files you do want to commit than it is to just do "git commit" and let
it rip.

Now, I could well imagine adding an "--all" flag (and not even allow the
shorthand version) to both git-update-cache and "git commit". So that you
could say "commit all the dirty state", but you'd at least have to think
about it before you did so.


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Mercurial vs Updated git HOWTO for kernel hackers
Date: Fri, 24 Jun 2005 19:08:36 UTC
Message-ID: <>
Original-Message-ID: <>

On Fri, 24 Jun 2005, Matthias Urlichs wrote:
> Well, I don't. Main reason: It's simply a lot faster to create+switch to a
> branch locally for doing independent work, than to hardlink the whole
> Linux directory tree into a clone tree.
> Having one tree also simpifies the "what do I have that's not merged yet"
> question -- just call "gitk $(cat .git/refs/heads/*)". ;-)

Actually, I think that gets close to another real advantage of branches:
that is also what allows you to edit things that failed a merge.

For example, let's say that a merge fail. You've got the HEAD and the
MERGE_HEAD, but a file type conflict (like a symlink that has turned into
a directory) or something like that means that you can't resolve them
sanely at all.

So this is merge problem where you can't just do a three-way merge and fix
up the result and commit: you have to fix things up before you can even
really do the merge. This is when switching to the MERGE_HEAD thing and
fixing it up there, committing it, and then doing the merge with the
original HEAD and the new MERGE_HEAD is really convenient.

(No, the scripts don't help you in cases like this, and we don't do the
MERGE_HEAD as a real branch right now, but the point is that we _can_, and
that this is more than an efficiency issue, it's a fundamental issue of
working with multiple end-points together. You _could_ clone the other
head into a totally new repository, fix it there, and then try the merge
anew, but now you're working around a limitation, not just doing
something slower).

I still think you can go a bit too far on your branch usage (ie Jeff), but
hey, what's the difference between three branches and fifty, really?

(I'm kidding. The difference between three and fifty is how well you can
keep track of them in your head, but maybe Jeff just has a bigger head
than most people do. Jeff, do people go "Boy, you've got a big head" the
first time they meet you?)


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Finding what change broke ARM
Date: Fri, 24 Jun 2005 16:10:44 UTC
Message-ID: <>
Original-Message-ID: <>

On Fri, 24 Jun 2005, Russell King wrote:
> When building current git for ARM, I see:
>   CC      arch/arm/mm/consistent.o
> arch/arm/mm/consistent.c: In function `dma_free_coherent':
> arch/arm/mm/consistent.c:357: error: `mem_map' undeclared (first use in this function)
> arch/arm/mm/consistent.c:357: error: (Each undeclared identifier is reported only once
> arch/arm/mm/consistent.c:357: error: for each function it appears in.)
> make[2]: *** [arch/arm/mm/consistent.o] Error 1
> How can I find what change elsewhere in the kernel tree caused this
> breakage?

Ahhah! A real-world example of what cool things git can do.

Anyway, the first starting point is _exactly_ the same as under BK, except
the syntax is very different, and git does it better, in fact:

	git-whatchanged -p arch/arm/mm/consistent.c

However, in this case nothing has changed in that file over the whole
git history, so you get an empty answer. Let's go to phase two, but first
a comment:

> With bk, you could ask for a per-file revision history of the likely
> candidates, and then find the changeset to view the other related
> changes.
> With git... ?  We don't have per-file revision history so...

We don't _store_ changes as per-file revision histories, but we do store
it in a way where finding out what happened is efficient even per-file.
While a line-by-line "annotate" is not efficient, the "what changed"
certainly is.

And git actually does better than BK (or _any_ per-file history thing),
because "git-whatchanged" actually works over directories or multiple
independent files too, and it works purely on pathnames, so you can say
"git-whatchanged" for a file that has gone away to see _why_ it went away.
In most other systems it's really hard to see what happened to something
that isn't there any more..

Anyway, the problem clearly didn't happen because of any changes to that
file at all, so here per-file history simply doesn't help. But never fear,
we're not screwed yet. In particular, you will now obviously suspect that
since it wasn't that _file_ that changed, and since you know what changed
in the ARM code, it's going to be a generic linux header file change that
screwed you over.

So phase #2 is to do

	git-whatchanged -p include/linux

(which shows every commit that touches include/linux, and shows that part
as a patch, thus the "-p"). That starts up a pager on the results by
default, so we just be stupid about it and do a "/mem_map" to look for
changes that mention mem_map. Maybe we'll be lucky.

Even that doesn't show a whole lot: but it does point a very suspicious
finger to the recently merged sparse-mem stuff from Andy Whitcroft,

And now you have a commit to look at, namely the "sparsemem memory model"
one, commit ID d41dee369bff3b9dcb6328d4d822926c28cc2594.

In fact, looking at it, I think it's simply config option changes, and
probably the SPARSEMEM config option that has preempted your lack of
DISCONTIGMEM support.  But now you have somebody to blame and to ask for
help from: Andy Whitcroft and Dave Hansen, whom I've cc'd.

I might start phase #3 with

	git-whatchanged -p mm/Kconfig arch/arm/Kconfig

but at this point you may already have enough of a clue that you don't
even care any more.


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: RE: git pull on Linux/ACPI release tree
Date: Sun, 08 Jan 2006 19:11:12 UTC
Message-ID: <>
Original-Message-ID: <>

On Sun, 8 Jan 2006, Brown, Len wrote:
> Is it possible for it git, like bk, to simply ignore merge commits in its summary output?

That's not the point. It does: "git log --no-merges" does exactly that.

But fire up "gitk" to watch the history, and see the difference.

> Note that "Auto-update from upstream" is just the place-holder comment
> embedded in the wrapper script in git/Documentation/howto/using-topic-branches.txt

That has absolutely nothing to do with anything. It's not the comment
(which admittedly gives absolutely no information - but why should it,
since the _commit_ itself has no information in it?)

It's like you have empty commits that don't do anything at all, except
that they are worse, because they have two parents.

> I think that Tony's howto above captures two key requirements
> from all kernel maintainers -- which the exception of you --

No. Your commits make it harder for _everybody_ to track the history.

A merge by definition "couples" the history of two branches. That's what a
merge very fundamentally is. It ties two things together. But two things
that don't have any connection to each other _shouldn't_ be tied together.

Just as an example: because of the extra merges, you've made all your
commits dependent on what happened in my tree, with no real reason. So
let's say that somebody reports that something broke in ACPI. Now you
can't just go to the top of the ACPI history and work backwards - you'll
have tied up the two histories so that they are intertwined.

And yes, you can always work around it, but there's just no point. And
none of the other developers seem to need to do it. They do their
development, and then they say "please pull". At that point the two
histories are tied together, but now they are tied together for a
_reason_. It was an intentional synchronization point.

An "automated pull" by definition has no reason. If it works automated,
then the merge has zero semantic meaning.


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: RE: git pull on Linux/ACPI release tree
Date: Sun, 08 Jan 2006 19:42:30 UTC
Message-ID: <>
Original-Message-ID: <>

On Sun, 8 Jan 2006, Brown, Len wrote:
> Perhaps the tools should try to support what "a lot of people"
> expect, rather than making "a lot of people" do extra work
> because of the tools?
> Call me old fashioned, but I believe that tools are supposed to
> make work easier, not harder.

They DO.

Len, you're doing EXTRA WORK that is pointless.

Just stop doing the automated merges. Problems solved. It really is that
easy. Don't do what David suggests - he does it because he's apparently
_so_ comfortable with things that he prefers to do extra work just to keep
his trees extra clean (I actually would disagree - but git makes that
fairly easy to do, so if you prefer to have as linear a history as
possible, you can do it with git pretty easily).

Now, I'm only complaining about _automated_ merges. If you have a reason
to worry about my tree having clashes with your tree, do a real merge. For
example, in your latest pull, you had a

	"pull linus into release branch"

merge, where you merged my v2.6.15 tree. That makes perfect sense.

What I object to is that there were _also_ two automated merges within ten
hours or each other, with absolutely _zero_ development in your tree in
between. Why did you do that in your development tree? By _definition_ you
had done zero development. You just tracked the development in _my_ tree.

In case you wonder, the two commits I'm talking about are:


and neither of them have any reason to be in a development tree. You
didn't develop them.

They are real merges, because you had a trivial patch in your tree
(changing the acpi-devel mailing list address) that I didn't have, so when
you pulled, your end result was thus always different from something I had
(so you did a real "merge", even though it was totally trivial), but the
point is that there is a difference between "the ACPI development tree"
and "the tree that has random ACPI patches and then tracks Linus' tree as
closely as possible".


That's the most egregious example. There's two unnecessary pulls on
December 28 and 29th too (commits 0a5296dc and c1a959d8).

You can do

	gitk 0aec63e..f9a204e1

to see exactly what I see when I pulled from you. 11 commits, 5 of which
are just trivial merges that are no development, just tracking _my_ tree.
Of those, one makes sense (tracking a release).

(NOTE NOTE NOTE! It does make sense to track my tree in case you do big
changes and you worry about clashes. Then you would want to synchronize
those big changes with my changes, so that you can resolve any clashes
early. So I'm not saying that tracking trees is always bad: I'm saying
that doing so _unnecessarily_ is bad, because it adds no value, and it
just makes the history harder to read).

Now, most people don't read the history. It gets messy enough quickly
enough that it's hard to read anyway over time. My tree has tons of _real_
merges anyway, since it's by definition the one that is used for most
synchronization, so my tree is always pretty hard to follow.

But my guess is that this probably makes it harder for _you_ to see what
you've done too. If you didn't merge with me, then "git log" would show
just your own changes at the top, and that's likely what you care most
about anyway, no?

Also, if you didn't pull from me, and you decided that you needed to re-do
your tree (let's say that you notice that one of your commits was bad
_before_ you ask me to pull from your tree), then you'd also have an
easier time re-creating your own development without that buggy change,
exactly because _your_ tree wouldn't have my changed mixed up in it.

So your merges likely make git harder to use for you, not easier.


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH updated]: How to be a kernel driver maintainer
Date: Tue, 10 Jan 2006 00:10:34 UTC
Message-ID: <>
Original-Message-ID: <>

On Mon, 9 Jan 2006, Arjan van de Ven wrote:
> > Obviously, they have to do their work, and their development on
> > something that isn't in Linus tree. If they are doing this work, they
> > need to make sure that when they diff for patches, that they merge
> > changes before diffing. The only way this is close to automatic is with
> > git. Any other method requires manually merging.
> not correct. quilt is a very excellent counter example of that.

Yes. Anything that keeps a nice patch series that you can merge as a nice
patch series works fine.

For example, Al Viro used to use (still uses?) RCS to maintain his
work-in-progress. That worked fine, because he had a process where he
would just extract them as patches.

The reason CVS doesn't work well is partly because CVS just sucks at so
many levels, and because people start using it as more than a "series of
patches" repository. People might cherry-pick one or two changes from a
CVS, but it quickly becomes totally impossible to do anything sane at all,
or even to cherry-pick more than a few patches, because after a while
you've lost the ability to pick out individual changes.

Something like quilt works fine, because individual patches never get lost
in other patches (they might get merged with another patch on purpose, but
that's a separate issue). Anything that understands the notion of
changesets and can be taught to re-order them should be able to work the
same way.

So the important thing is to have _some_ proper linear changeset history,
preferably one where you can re-order them (so that if you cherry-pick a
set of changesets, you can mark them as having been merged, and keep the
_rest_ as a linear changeset history).

CVS just sucks. End of story. It works badly at so many levels that it's
just not even funny.


From: Theodore Ts'o <>
Newsgroups: fa.linux.kernel
Subject: Re: git for dummies, anyone? (was: Re: How in tarnation do I git 
	v2.6.16-rc2?  hg died and I still don't git git)
Date: Fri, 10 Feb 2006 14:09:10 UTC
Message-ID: <fa.u/1VUqHOQt1x/zVTWG/>
Original-Message-ID: <>

On Fri, Feb 10, 2006 at 07:23:09AM +0100, Willy Tarreau wrote:
> I did the opposite : I tried hg as soon as Matt announced it, but I
> got lost in the "python dumps" which appeared at the slightest error,
> because I did not understand what the problem was. Then I tried git,
> at least to be able to keep in sync with Marcelo. With git, I had
> some opportunities to catch some understandable error messages spit
> out of some shell scripts even when not caught by the script itself.
> But using it less than twice a week requires me to read the manual
> again before doing anything useful :-(

Mercurial has gotten a lot better about "python dumps" as a signal
that you had typed something wrong, or had permissions problems, etc.
I got annoyed about that too, but I just learned to ignore them and
assume that 90% of the time, it was due to a mistake on my end.
Things are a lot better there; claiming mercurial sucks for its early
rough edges is about as far as claiming that git sucks because its
performance sucked rocks on laptop drives in its early days before
Linus added support for packs.

						- Ted

From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: [GIT PATCH] USB patches for 2.6.17
Date: Thu, 22 Jun 2006 19:51:12 UTC
Message-ID: <>
Original-Message-ID: <>

On Thu, 22 Jun 2006, Greg KH wrote:
> I take that back.  I just used -M for the W1 patch series and I think it
> is very helpful as it shows only the lines that change in a rename,
> which can easily get lost in the noise of a longer patch.

Yes. The main reason to do rename detection is not that the patch shrinks
(although it does), but simply because in many cases it makes the patch a
hell of a lot more readable. It's often much more obvious what is actually
going on, when you don't see it as a large "one file got deleted, another
one added" thing.

> Very nice stuff, have I mentioned lately how much I love git?

It does seem to be working out, doesn't it?

I'm just constantly surprised by how people don't even seem to realize
what it can do sometimes. Part of it is that development has been pretty
active (and some of the things it can do simply weren't there three months
ago), but part of it must be because people don't even expect it to be
able to do something like that.

The "git log -p" thing is wonderful, but it's even more wonderful when you
realize that you can ask it to just tell you what changed in a specific
set of subdirectories. Or when you realize that you can actually have two
branches, and ask it to show only the commits that are in one and not the
other (even if the branches are _not_ subsets of each other).

For example, if you track my branch separately (not merging, and not
rebasing - just doing something like "git fetch linus" to track where I
am), you can always just do things like

	git log -p ..linus drivers/usb/

to see what _I_ have merged that might touch drivers/usb/, but that isn't
in your branch because I ended up taking a patch from somebody else (for
example, you might see some USB change that the MIPS merges brought in).

That kind of thing has, I believe, been the biggest success of the whole
git model (the "track whole _trees_ rather than individual files").

I realize that some people are used to individual file tracking, and that
the git model makes "git annotate individual_file.c" pretty inefficient,
but at least for me, the whole sub-tree tracking is a _lot_ more
important, and I don't think anybody else can do it as effortlessly and
efficiently as git does.

The above kind of command line is just _so_ powerful. Maybe it's because
I'm a top-level maintainer, but I basically _never_ care about a single
file, but often care about one (or a few) subdirectory.

And Junio has done a stellar job at maintaining git.


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: [GIT PATCH] USB patches for 2.6.17
Date: Thu, 22 Jun 2006 21:07:58 UTC
Message-ID: <>
Original-Message-ID: <>

On Thu, 22 Jun 2006, Linus Torvalds wrote:
> After that, I'm not quite sure what you mean by "--dry-run". Do you mean
> to know about file-level conflicts? You do need to do the merge in order
> to know whether the conflicts can be resolved, but even without doing the
> merge you can look for _file_level_ conflicts by other means.

Btw, what you can always do is just

	git pull <other-end>
	.. look at the result ..
	git reset --hard ORIG_HEAD

and you should be ok. It's obviously not a dry-run, it's more of a "do it
and then undo it", but hey, it should _work_.

(Look out for dirty state, though. "git pull" will happily pull into a
dirty tree if the changes don't actually affect any dirty files. The "git
reset --hard" thing will undo all dirty state, though).


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.19 -mm merge plans
Date: Mon, 25 Sep 2006 00:49:41 UTC
Message-ID: <>

On Sun, 24 Sep 2006, Sean wrote:
> When Thomas makes a sweeping statement about the applicability of one
> tool over another people will respond to him.  But if _you_ make such
> a statement yourself (even if it's based on his conclusions) then
> you better accept that people who disagree will respond to your statement.

I think it's unquestionable that sometimes it's better to work with
patches. The thing is, git by its very design is about tracking things
where history is firmly "set in stone", and that's not always what you

We've done a number of things to help people relax that a bit (notably
"git rebase" and "git cherry-pick"), and there are projects like stgit
that are more of a patch-management system on top of git, but even during
the early design phases I said that it's likely that git will be used in
_conjunction_ with tools like quilt etc, that are less strict than git is.

So I don't think we need to attack the notion of using non-git tools at
all. Especially if you don't know where you're going, git's very strict
immobile history can be a bit daunting.

(Of course, once you get really used to git, you use git _anyway_, and
then you use cherry-pick and other tools to re-write a cleaned-up version
of the thing that you originally screwed up because you didn't know what
you were doing. So you _can_ do this too with git, but that doesn't mean
that git would necessarily be the best way to do it).

That said, maybe we could help the "fixup" phases evenmore using git. For
example, right now you can do "git cherry-pick" to transfer individual
patches, but if you want to combine two commits while cherry-picking, it
immediately gets more involved (still quite doable: use cherry-pick
multiple times with the "-n" flag, but it's not as obvious any more).


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: x86/x86-64 merge for 2.6.19
Date: Tue, 26 Sep 2006 20:45:37 UTC
Message-ID: <fa.tWeF1uvLX6Nps3/>

On Tue, 26 Sep 2006, Andi Kleen wrote:
> Yes that is why I did it. I still use quilt for my tree because it works
> best for me, but together with all the i386 stuff I was over 230 patches
> and email clearly didn't scale well to that much.

Right. I'm actually surprised not more peole use git that way. It was
literally one of the _design_goals_ of git to have people use quilt a a
more "willy-nilly" front-end process, with git giving the true distributed
nature that you can't get from that kind of softer patch-queue like

We discussed some quilt integration stuff, but nobody actually ended up
ever using both (except indirectly, as with the whole Andrew->Linus
stage). StGit kind of comes closest.

So I don't think your usage should be considered to be even strange. I
think it makes sense. I just think that everybody agrees that if we can do
it in chunks of a few tens of patches most of the time instead of chunks
of 225, everybody will have an easier time, if only because the latency
goes down, and it's just easier to react.

That said, the merges with Andrew are also sometimes in the 150+ patch
range, and merging with other git trees can sometimes bring in even more.
So I'm not claiming any hard limits or anything like that, just that in
general it's nicer to get updates trickle in over time rather than all at

I suspect this was mostly a one-time startup-event.


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: x86/x86-64 merge for 2.6.19
Date: Tue, 26 Sep 2006 21:44:30 UTC
Message-ID: <>

On Tue, 26 Sep 2006, Paolo Ciarrocchi wrote:
> out of curiosity, wouldn't be better to sync with Andrew via git?
> Why via plain patches?
> What am I missing?

I think you're just missing that we've become so used to it that it's just
easier than all the alternatives.

Also, the way we do things with Andrew actually has a few advantages over
a straight git-to-git merge. In particular, when Andrew sends me his
current stable quilt queue, every email is also Cc'd to the people who
sent it to him originally or were otherwise involved.

So the very act of transferring the patches from one tree to another
sometimes produces an extra acknowledgement cycle, and we've had patches
that got NACK'ed at that point because it was an older version of the
patch etc.

Now, I suspect this is more of an advantage with Andrew's tree than with
most other trees (most other trees tend to have a much stricter focus),
and perhaps equally importantly, it also wouldn't really work very well if
_everybody_ did it, so I personally believe this is one of those
situations where what's good for _one_ case may not actually be wonderful
for _all_ cases.

I think it's worked out pretty well, no?


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: x86/x86-64 merge for 2.6.19
Date: Tue, 26 Sep 2006 22:21:54 UTC
Message-ID: <>

On Wed, 27 Sep 2006, Sam Ravnborg wrote:

> On Tue, Sep 26, 2006 at 01:44:42PM -0700, Linus Torvalds wrote:
> >
> > Right. I'm actually surprised not more peole use git that way. It was
> > literally one of the _design_goals_ of git to have people use quilt a a
> > more "willy-nilly" front-end process, with git giving the true distributed
> > nature that you can't get from that kind of softer patch-queue like
> > system.
> One of the major benefits from git is that whenever I decide to
> do some changes git allows me to start modifying as I like and when
> done I just do "git diff | less" to review it. And when it turns out
> to be a piece of crap I just do "git reset --hard" and be done with it.
> This make my life so much easier than having to copy symlinked tress
> around all the time - and I then may not touch the base for the symlinks.

Yeah, I won't argue against that too much. I'm a pure git user myself,
although my patterns tend to be different from most other people (only
fairly small code changes, and mostly merging other peoples code: I end up
often touching source code more when I do some trivial manual conflict
resolution than at most other times..)

And yes, making "git diff" as efficient as possible was definitely one of
the things that I worked on, exactly because it is how I work when I do
end up working on something: continually reminding myself what the other
changes I did were..

So "git" should work fine for pretty much any normal development, but a
patch maintenance system it ain't.


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Git training wheels for the pimple faced maintainer
Date: Thu, 19 Oct 2006 23:45:42 UTC
Message-ID: <>

On Thu, 19 Oct 2006, Pierre Ossman wrote:
> Stuff that need a bit more testing will be put in a public "for-andrew"
> branch. From what I gather, Andrew does a pull and a diff of these kinds
> of branches before putting together a -mm set. So this should be
> sufficient for your needs? Do you also prefer getting "[GIT PULL]"
> requests, or do you do the pull periodically anyway?
> Patches that are considered stable, either directly or by virtue of
> being in -mm for a while, will be moved into a "for-linus" tree and a
> "[GIT PULL]" sent to herr Torvalds.

That all sounds fine. Please just check the format for the "[GIT PULL]"
message: Andrew pulls peoples trees on his own and largely automatically,
so he doesn't much care _what_ is in the tree, but I care deeply. So I
want the diffstat and shortlog listings, and preferably a few sentences at
the top of the email describing what's going on and why things are

I think people have seen the messages that other people send out (eg at
least Greg KH tends to Cc: those messages to linux-kernel, so others can
see what's going on too - although not all other maintainers do that).

Basically, I want to know that the thing I pull makes sense for the stage
I'm pulling in (ie if it's after -rc1, think about trying to explain why
the fixes are all important bug-fixes for example), and what it affects.
The diffstat is part of that, but please include any other explanations
that seem meaningful.

> Now, the patch in "for-linus" will be a duplicate of one or several
> commits in "for-andrew". Will I get any problems from git once I do a
> new pull from Linus' tree into "for-andrew"?

No, git will take care of it, unless, of course, your extra patches
conflict with the ones you sent me.

> Another concern is all the merges. As I have modifications in my tree,
> every merge should generate at least one commit and one tree object. Is
> this kind of noise in the git history something that needs concern?

Yes and no.

An occasional merge by you is fine, and if the merge is about _you_
merging your own branches into one branch for me or Andrew to pull, then
the merge actually adds valid information.

HOWEVER. Please don't just pull from my tree just to keep your
development tree "up-to-date". Your development tree is YOUR development
tree, it should be about the stuff _you_ did - not about merging stuff
that I merged from others. See?

So there's simply no point in merging from me, unless you know that there
are clashes due to other development, and you actually want to fix them
up. You will just cause unnecessary criss-cross merges if you pull from my
tree after you've started development, and the history gets really really

There's several ways to handle this:

 - just base your work on a certain release, and ignore all the other
   changes. Then, when you're ready, just ask me to pull your changes. git
   will just merge it up to my current version, and everything will be

   (Of course, once I _have_ merged your changes, if you pull at that
   point, you'll just fast-forward, and there will be no unnecessary
   back-and-forth merging)

 - If you actually want your development tree to "track" my tree, I'd
   suggest you have your "for-linus" branch that you put the work you want
   to track into, and then a plain "linus" branch which tracks _my_ tree.
   Then you can just fetch my tree (to keep your "linus" branch
   up-to-date), and if you want your development branch to track those
   changes, you can just do a "git rebase linus" in your "for-linus"

 - If you actually notice that the stuff in my tree conflicts with the
   stuff you develop, _then_ you obviously want to merge (you can try to
   "rebase" things too and fix it up during the rebase, but merging is
   often easier, and at this point the merge is no longer "unnecessary
   noise", it's actually a real action of you doing a real merge to handle
   the conflict.

And hey, if there is occasionally an unnecessary merge, nobody really
cares. So don't be _too_ worried about it. But a clean history makes
things simpler to track, so I'm asking people to not generate noise that
simply doesn't help.

Other git maintainers may have other hints about how they work. Anybody?


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Git training wheels for the pimple faced maintainer
Date: Fri, 20 Oct 2006 15:26:49 UTC
Message-ID: <>

On Fri, 20 Oct 2006, Pierre Ossman wrote:
> I'm still learning the more fancy parts of git, but I think that would be:
> git diff master..for-linus | diffstat

Use "git diff -M --stat master..for-linus" instead.

The "-M" enables rename detection, and the "--stat" does the diffstat for
you (and better than plain diffstat, since it knows about renames, copies
and deletes).

HOWEVER! The above obviously only really works correctly if "master" is a
strict subset of "for-linus".

> git log master..for-linus | git shortlog


> And in order to test for conflicts, I assume I should have a "test tree"
> that I merge all my local stuff in, together with your current HEAD?

Exactly. It can be either just a random temporary branch (it's cheap), or
it can just be your "work tree" that you can keep as messy as you want,
and then the "for-linus" branch is the cleaned-up version.

And quite frankly, most of the time you don't even really need one. It
depends on what you work on, but a _lot_ of the kernel is so independent
of anything else, that you know that the only thing that will ever really
conflict is trivial things, and hey, one of the things I do is to fix up
those conflicts.

In fact, quite often the _right_ thing to do for most developers is to
just entirely ignore what everybody else is doing, because if there are
trivial conflicts, I'll take care of them, and if there are more serious
conflicts, I'll just let you know myself - and you may not even be in a
position to _know_ about it, because the conflicts could come from
somebody elses git tree that I just happened to pull before.

So don't worry too much about it. As already mentioned, the _worst_ thing
you can do is probably to continually pull from my tree to "stay on the
edge". The way we keep the kernel maintainable is not by having everybody
try to keep up with everybody else, but by trying to keep things so
independent that people don't _need_ to keep up with everybody else.

A lot of people seem to just synchronize up at major releases, and then
rebase their work (which they may even have kept in quilt or something
else: you don't even have to use "git" for this) on that, and ask me to
merge the result.

So don't worry too much.

Also - different people work in different ways, and it's _ok_.

> If I've understood git correctly, a rebase is a big no-no once I've
> published those changes as it reverts some history. Right?

That is mostly correct. It's a big no-no if somebody has already pulled
from you, and you want them to pull again. Because at that point, you're
essentially asking them to pull two totally different versions of the same
thing - git will do the right thing (since all the duplicates will usually
merge perfectly), but it will look like two different histories, and
you'll see every commit twice. That's just ugly.

On the other hand, things like the -mm tree are "throw-away" anyway:
Andrew re-creates the tree every time he pulls. So you can rebase the
branch you send to Andrew as much as you want. So it's not _entirely_
about whether something is "published" or not, it's literally more about
how something is actively _used_.

But yes - in general, the rule of thumb should be: rebase as much as you
want in your own _private_ sandbox to make things look nice, but once
you've exposed it to anybody else, it's set in stone.

> Big thanks for all the pointers. I have my account at, so it
> won't be long until my first [GIT PULL]. Be gentle.

Now, I may not be "gentle", because if there is something wrong with the
end result I'll tell you so and I'm not exactly known for always being
excessively polite ;)

But don't worry, it can be fixed up. At worst, you'll just get an email
back saying "I'm not going to pull this one, because you've been a
complete clutz, and did something really stupid wrt XYZ", and I'll ask you
to fix it up. Or I might say "I'll pull it this time, but I don't want to
see XYZ again in the future".

Or I might not say anything at all, and you'll just notice that I've
pulled from you.


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Git training wheels for the pimple faced maintainer
Date: Fri, 20 Oct 2006 15:35:53 UTC
Message-ID: <>

On Fri, 20 Oct 2006, Linus Torvalds wrote:
> Use "git diff -M --stat master..for-linus" instead.

Actually, use "git diff -M --stat --summary master..for-linus".

The "--summary" thing generates an additional summary at the end of the
diffstat that lists deleted/created/moved/copied files, which is nice to
see. There's a difference between a

   drivers/char/myserial.c  | 50 ++++++++
   1 file changed, 50 insertions(+), 0 deletions(-)


   drivers/char/myserial.c  | 50 ++++++++
   1 file changed, 50 insertions(+), 0 deletions(-)
   create mode 100644 drivers/char/myserial.c

because the latter tells that the new lines are actually in a new file,
while the previous says that you just added lines to an old one.

(Without "--summary", you can't tell the difference between these two

And the "-M" flag obviously means the difference between:

 drivers/pci/hotplug/pci_hotplug.h          |  236 ----------------------
 include/linux/pci_hotplug.h                |  236 ++++++++++++++++++++++
 2 files changed, 236 insertions(+), 236 deletions(-)
 delete mode 100644 drivers/pci/hotplug/pci_hotplug.h
 create mode 100644 include/linux/pci_hotplug.h


 .../pci/hotplug => include/linux}/pci_hotplug.h    |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
 rename drivers/pci/hotplug/pci_hotplug.h => include/linux/pci_hotplug.h (99%)

where the latter version clearly tells you a whole lot more about the
patch than the non-renaming one.

The reason rename detection isn't on by default is that non-git tools
don't understand the rename diffs. But if anybody sends me patches, please
feel free to use "git diff -M" to make them smaller and more readable in
the face of renames.


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Git training wheels for the pimple faced maintainer
Date: Sat, 21 Oct 2006 16:11:00 UTC
Message-ID: <fa./8ooeq220i8NhEhhUktmC/>

On Sat, 21 Oct 2006, Pierre Ossman wrote:
> >
> > HOWEVER! The above obviously only really works correctly if "master" is a
> > strict subset of "for-linus".
> Ah, that's a bit of a gotcha. Any nice tricks to keep track of where you
> where in sync with upstream last? Create a dummy branch/tag perhaps?

You don't need to. Git keeps track of the fork-point, and you can always
get it with

	git merge-base a b

where "a" and "b" are the two branches.

HOWEVER. If you have _merged_ since (ie your branch contains merges _from_
the branch that you are tracking), this will give you the last
merge-point (since that's the last common base), and as such a "diff" from
that point will _ignore_ your changes from before the merge. See?

But holding a tag to the "original fork point" is equally useless in that
case, since if you have merged from my tree since that tag, and you do a
"git diff tag..for-linus", then the diff will contain all the new stuff
that came from _me_ through your merge as well. See?

In other words: in both cases you really really shouldn't merge from me
after you started developing. And the reason in both cases is really
fundamentally the same: because merging from me obviously brings in commits
_from_me_, so any single diff thus obviously turns pointless: it will
_not_ talk about all your new work.

Anyway, notice the "single diff" caveat above. Git obviously does actually
keep track of individual commits, so the individual commits that are
unique to your repository are _still_ unique to your repository even after
you've merged with me - since I haven't merged with you. So you _can_ get
the information, but now you have to do something fundamentally

So if you've done merges with me since you started development, you cannot
now just say "what's the difference between <this> point and <that> point
in the development tree", because clearly there is no _single_ line of
development that shows just _your_ changes. But that doesn't mean that
your development isn't separatable, and you can do one of two things:

 (a) work on a "individual commit" level:

	git log -p linus..for-linus

     will show each commit that is in your "for-linus" branch but is _not_
     in your "linus" tracker branch. This does the right thing even in the
     presense of merges: it will show the merge commit you did (since that
     individual commit is _yours_), but it will not show the commits
     merged (since those came from _my_ line of development)

     So now a

	git log -p linux..for-linus | diffstat

     will give something that _approximates_ the diffstat I will see when
     merging. I say _approximates_, because it only really gives the right
     answer if all the commits are entirely independent, and you never
     have one commit that changes a line one way, and then a subsequent
     commit that changes the same lines another way.

     If you have commits that are inter-dependent, the diffstat above will
     show the "sum" of the diffs, which is not what I will see when I
     actually merge. I will see just the end result, which is more like
     the "union" of the diffs. And the two are the same only for
     independent diffs, of course.

So the above is simple, and gives _almost_ the right answer. The other
alternative is slightly smarter, and more involved, and gives the exact
right answer:

 (b) create a temporary new merge, and see what the difference of the
     merge is, as seen by me (eg as seen from "linus"). So this is

	git checkout -b test-branch for-linus
	git pull . linus
	git diff -M --stat --summary linus..

     will create a new branch ("checkout -b") based on your current
     "for-linus" state, and within that branch, do a merge of the "linus"
     branch (or you could have done it the other way around and made the
     merge as if you were me: check out the state of "linus" and then
     pull the "for-linus" branch instead).

     And then, the final step is to just diff the result of the merge
     against the "linus" branch. This obviously gives the same diffstat
     as the one _I_ should see when I merge, because you basically
     "pre-tested" my merge for me.

See? git does give you the tools, but if you merge from me and don't have
a branch that is a nice clear superset of what you started off with, but
have mixed in changes from _my_ tree since you started developing, you end
up having to do some extra work to separate out all the new changes.

So that's why I suggest not doing a lot of criss-crossing merges. It
generates an uglier history that is much harder to follow visually in
"gitk", but it also generates some extra work for you. Not a lot, but
considering that there are seldom any real upsides, this hopefully
explains why I suggest against it.

And again, as a final note: none of this is "set in stone". These are all
_suggestions_. Notice the "seldom any real upsides". I say "seldom" on
purpose, because quite frankly, sometimes it's just easier for you to
merge (especially if you know there are likely to be clashes), so that you
can fix up any issues that the merge brings.

Anyway, I hope this clarified the issue. I don't think we've actually had
a lot of problems with these things in practice. None of this is really
"hard", and a lot of it is just getting used to the model. Once you are
comfortable with how git works (and using "gitk" to show history tends to
be a very visual way to see what is going on in the presense of merges),
and get used to working with me, you'll do all of this without even
thinking about it.

It really just _sounds_ more complicated than it really is.


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: Git training wheels for the pimple faced maintainer
Date: Sat, 21 Oct 2006 19:08:18 UTC
Message-ID: <>

On Sat, 21 Oct 2006, Pierre Ossman wrote:
> If I read your response above and the man page for git-merge-base, it
> will do the right thing even if "linus" now is further in the future
> than the point I forked it.

Yes. You can continue to track my state in the "linus" branch as much as
you want, and "git merge-base" will show where your branch and mine
diverged, so you don't need to remember it explicitly.

Only if you start _mixing_ the branches (ie you merge "linus" into your
branch) do you end up in the situation where there now is no longer a
single-threaded line of development, so you can no longer expect to be
able to just use a direct "git diff".

> >  (a) work on a "individual commit" level:
> >
> > 	git log -p linus..for-linus
> >
> >      will show each commit that is in your "for-linus" branch but is _not_
> >      in your "linus" tracker branch. This does the right thing even in the
> >      presense of merges: it will show the merge commit you did (since that
> >      individual commit is _yours_), but it will not show the commits
> >      merged (since those came from _my_ line of development)
> Ah, so "git log" will not show the commits that have popped up on
> "linus" after "for-linus" branched off? Neat. :)

That is what the git "a..b" syntax means for everything _but_ "diff".
Doing a "git diff" really is actually the special case: to create a diff,
you need two end-points. For all other git commands, "a..b" really means
"all commits that are in 'b' but not in 'a'", ie it's _not_ really about
two end-points, it's about a _set_ operation.

You should think of "a..b" as the "set difference" operation, or "b-a".

There's also a "symmetric difference", which is called "a...b" (three
dots). That's the "union of the differences both ways", in other words,
"a...b" is the set of commits that exist in a _or_ b, but not in both.

You can do some even more complex operations, and one that I find
reasonably useful at times is for example

	gitk --all --not HEAD

which basically means: "show all commits in all branches, but subtract
everything that is reachable from the current HEAD". In other words, it
shows what commits exist in all the other branches that have not been
merged into the current one.

(The "--not HEAD" thing is mostly written as "^HEAD", but I wrote it out
in long-hand here because it is perhaps a bit more readable that way.)

> One concern I had was how to find stuff to cherry-pick when doing a
> stable review.

So looking at the above, what you can do is literally

	gitk --all ^linus

which shows all your branches _except_ stuff that is already merged into
the "linus" branch that tracks what I have merged.

Git really is very clever.

HOWEVER! A word of warning: especially when you start doing
cherry-picking, git will consider a commit that has been cherry-picked to
be totally _separate_ from the original one. So when you do things like
the above, and you have commits that have "identical patches" as the ones
I have already applied, they will show up as "not being in linus' branch".

That's because the identity of a commit is really not the patch it
describes at all: the commit is defined by the exact spot in the history,
and by the exact contents of that commit (which include date, time,
committer info, parents, exact tree state etc). So when you do a
"cherry-pick", you are very much creating a totally new commit - it just
happens to generate the same (or similar) _diff_.

There are tools to help you filter out cherry-picked commits too, by
literally looking at the diff and saying "oh, that same diff already
exists upstream", but that's different. If you really care, you can look
at what "git cherry" does (and it's not very efficient).

> git has a lot of these hidden features and ways of doing
> less-than-obvious things, so I'm just trying to broaden my repertoire by
> consulting those who have been using it on a more daily basis.

You really can do a _lot_ with git. Part of what seems to scare some
people is that git really allows for a lot of power and flexibility, and
you can do some very fancy stuff.

At the same time, you can mostly also use it as if it were a lot dumber
than it really is. There are ways to limit your usage so that you'll never
even need to worry about things like multiple branches or cherry-picking
or merging or anything else, and try to just see your work as a linear
progression on top of a particular release version.

I'll happily explain all the grotty details, but keep in mind that you
don't _need_ to use the features if you don't want to.

> I am just thankful git has a reset command ;)

You can undo almost any mess you get yourself into (you _can_ really screw
that up too, if you do a combination of "reset" and "git prune", but you
have to work at it).

The bigger problem may be that if you get yourself into a real mess, you
need to understand how you got there: you can always get back to a
previous state, sometimes you just need to know what that state _was_, and
if you get confused enough, even that can be a problem.

"gitk" really does tend to help clarify what happened.


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: [GIT PULL] i2c updates for 2.6.20
Date: Tue, 12 Dec 2006 18:07:58 UTC
Message-ID: <>

On Tue, 12 Dec 2006, Jean Delvare wrote:
> Please pull the i2c subsystem updates for Linux 2.6.20 from branch
> i2c-for-linus of repository git://
> There are 3 new i2c bus drivers, one old broken bus driver deleted, and a
> few cleanups and fixes in the i2c core and individual drivers.
> I'm not yet comfortable with git so please let me know if I did anything
> wrong.

Looks fine. Your "please pull" message hass some slight stylistic
problems, but the pull looks good, and matches what you claimed for it.

The stylistic problems are:

 - please write the git repo address and branch name on alone the same
   line so that I can't even by mistake pull from the wrong branch, and so
   that a triple-click just selects the whole thing.

   So the proper format is something along the lines of:

	"Please pull from

		git:// i2c-for-linus

	 to get these changes:"

   so that I don't have to hunt-and-peck for the address and inevitably
   get it wrong (actually, I've only gotten it wrong a few times, and
   checking against the diffstat tells me when I get it wrong, but I'm
   just a lot more comfortable when I don't have to "look for" the right
   thing to pull, and double-check that I have the right branch-name)

 - your diffstat was fine, but was line-wrapped for some reason, which
   just makes it harder for me to line up and compare against what I
   actually got when pulling (ie I just have two xterms open, one with
   the mail-reader, one with my shell command line, and I visually compare
   what I get with what I _should_ get, and then something as silly as
   incorrectly wrapped lines just makes the thing look visually different,
   which again just throws me for all the wrong reasons).

But everything looks fine apart from those trivial details. Pulled and
pushed out,


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: [ANNOUNCE] GIT 1.5.0
Date: Wed, 14 Feb 2007 16:47:32 UTC
Message-ID: <>

On Wed, 14 Feb 2007, Bill Lear wrote:
> This is enabled by passing the --enable=receive-pack to the
> git-daemon (usually in the [x]?inetd configuration).
> This has the benefit of:

Before you list the benefits, you should always talk about the lack of
security! Let nobody enable it without realizing the dangers! Tell people
to _only_ do this inside a company firewall, and even then, only if you
trust everybody.

>   2) A less ugly URL to use: git://server/repo, instead of, say,
>      ssh+git://server/path/to/repos/repo.

Why do people use that silly "ssh+git://" format?

It's a cogito thing. Native git has never done it, and only supports it
because cogito thought it must make sense.

The native git ssh URL is exactly the normal ssh URL:


and if you really want to use the "xxx://" format, you might as well just


which should also work fine.


PS. This is the commit message that  added "git+ssh://":

	Author: Linus Torvalds <>
	Date:   Fri Oct 14 17:14:56 2005 -0700

	    Support git+ssh:// and ssh+git:// URL

	    It seemed to be such a stupid syntax. It's both what "ssh://" means,
	    and it's what not specifying a protocol at _all_ means.

	    But hey, since we already have two ways of saying "use ssh with
	    pack-files", here's two more.

so it was deemed stupid from the get-go, and isn't even some "legacy"
thing. It's purely a "cogito people thought it makes sense to point out
that it's _both_ native git _and_ ssh protocol".

From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: [GIT PULL] kvm oops fix
Date: Thu, 19 Apr 2007 20:32:40 UTC
Message-ID: <>

On Thu, 19 Apr 2007, Jeff Garzik wrote:
> What is the easiest way to completely undo a pull, reverting the branch to the
> HEAD present before the pull?

You can either do

	git reset --hard ORIG_HEAD

(git will set ORIG_HEAD before things like pulls or resets, so you can
always go back), or, if you have reflogs enabled (and if you set up your
repository with a modern git version it probably will be enabled by
default), you can just do

	git reset --hard @{1}

where "@{1}" just means "HEAD ref state one change ago" (the same way you
can say "@{2.hours.ago}" to mean HEAD state two hours ago).

In either case, double-check that that is indeed the version you want to
revert to with

	git log ORIG_HEAD
	git log @{1}

first, since obviously if you give "git reset --hard" the wrong version,
it will reset to the wrong state. Although especially with reflogs, your
previous state will always be logged, so you can always re-do what you
undid by (again) doing "git reset --hard @{1}" to get back the previous
state ;)

ALSO! Make sure that you don't have any dirty state in your working tree
that you don't want to lose! "git reset --hard" will do what it implies:
it will reset your tree. Very much including throwing away all your dirty
state (and that you can't get back by going to a previous commit, since
it was never committed!)


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: [GIT PULL] kvm oops fix
Date: Thu, 19 Apr 2007 23:01:45 UTC
Message-ID: <fa.v9OXvPH9dBQvi/>

On Thu, 19 Apr 2007, Linus Torvalds wrote:
> You can either do
> 	git reset --hard ORIG_HEAD
> 	git reset --hard @{1}

Btw, on the same kind of subject: the whole "what was my previous HEAD"
issues are obviously also how you'd generally want to see what those
new patches were, regardless of whether you want to undo them or not.

So it might be worth repeating for people what I do after any pull that I
feel I want to give a quick look-over.. A simple

	gitk ORIG_HEAD..
	gitk HEAD@{1}..
	gitk @{1}..
	gitk @{12.hours.ago}..

are all variations of the same theme: show what is new since either "last
update" or "what I had in my tree 12 hours ago".

Btw, the

	gitk @{12.hours.ago}..

thing is very different from

	gitk --since=12.hours.ago

even if they involve the same date.

The "@{12.hours.ago}" syntax pinpoints a particular *commit*, namely what
your HEAD was pointing at 12 hours ago. So it's literally about your
particular repository history (give a branch name if you want to specify
one: so "for-linus@{2.hours.ago}" specifies the *commit* that was the head
of the "for-linus" branch in your repository 2 hours ago).

In contrast, the "--since=12.hours.ago" means something totally different:
it measn that you want to ignore all commits that are older than 12 hours,
regardless of whether they were actually in your tree at that point or
not. Which is often a very different issue indeed.

So another reasonably common things you can do:

	git fetch linus
	gitk linus@{1}..linus

this assumes that you've set up a separate tracking branch "linus", and
that you've taught it to fetch my current tree into it. So in the above
sequence, the "git fetch linus" will fetch everything new from my tree
into your "linus" tracking branch, and the "gitk" will then show all the
new commits on that branch that you got.

NOTE! The above is very much designed to work whether you are on that
branch or not, and in fact, the normal reason to do something like the
above is explicitly that you want to see what is going on in somebody
elses tree without actually necessarily merging it into your own branch
(perhaps in order to decide whether you _want_ to merge it or not).

And that "linus@{1}" really just means "what is the previous commit I had
on my 'linus' branch". You can obviously dig deeper down, and "linus@{10}"
is something less commonly used, but basically means "what was on that
branch ten revision updates ago".

Note that this is *very*different* from "linus~10", which means "what is
the tenth _parent_ of the "linus" branch. They *can* be the same thing (if
each operation adds exactly one commit), but if you do things like "git
fetch", then the "linus" branch ten operations ago may be hundreds of
commits ago, because some of those ten operations may have added lots of
commits thanks to synching up with some other tree!

And as already noted, the "branch@{xyzzy}" format also allows "xyzzy" to
be a date, not just a numeral. In fact, that was the original revlog
tracking behaviour, and the numeric thing, while simpler, is actually a
newer feature (as is the "don't specify a branch name at all", which just
means "current branch")


	gitk @{24.hours.ago}..

is a nice way to see what has happend in *your* repository, on the current
branch, in the last 24 hours.

(NOTE: You can also say "HEAD@{2.hours.ago}" and that actually doesn't use
the current branch at all, it actually says what HEAD was 2 hours ago: you
may have been on some totally _different_ branch back then, and if you
wonder what the heck of a branch you are running and you look at the time
of the binary, but you don't remember what branch you had checked out when
you built it, that may be what you want. Of course, you may also want a
better attention span ;)

Some of this is pretty recent, and generally, if some of this doesn't work
for you, it means that you are using some ancient version of git. If it's
not git-1.5.x, upgrade. It's worth it.


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: [git pull] core/softirq for v2.6.27
Date: Mon, 14 Jul 2008 17:14:38 UTC
Message-ID: <>

On Mon, 14 Jul 2008, Ingo Molnar wrote:
> /me suggests git-log-addendum feature to amend commit logs after the
>     fact, without changing any of the code :-)

It's called "git commit --amend".

But it obviously also changes the SHA1 of the commit - it generates a
totally new one. You cannot do it after-the-fact or deep in history (well,
you can, with the normal "rebase" op, but it literally is no different
from changing the patch itself too).

BK used to be able to change the logs after-the-fact, but that meant that
the logs were not trustworthy, and any changes also didn't distribute
right (they'd only be changed locally and in any subsequent distributions)


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH RFC] ext3 data=guarded v3
Date: Thu, 16 Apr 2009 18:41:48 UTC
Message-ID: <>

On Thu, 16 Apr 2009, Chris Mason wrote:
> Ah ok, it is just a missed i_size update.  Basically because file_write
> doesn't wait for page writeback to finish, someone can be updating
> i_size at the same time the end_io handler for the last page is running.
> Git triggers this when it does the sha1flush just before closing the
> file.

Can you say exactly what the IO pattern is?

One of the original git design issues was to actually never _ever_ do
anything even half-way strange in the filesystem patterns, exactly because
I've seen so many filesystem bugs over the years.

Now, it turns out that "original design intent" and "actual code" then
don't always match, and git did some things that are unusual and triggered

Example: in order to be extra safe, git does "fchown()" after doing all
the writes to file descriptor just before closing it. I wanted git to make
it hard to corrupt things by mistake, and marking all the files that only
get written once (which is most of them) read-only as soon as possible
seemed to be a great safety feature.

Except, in the process it triggers a network filesystem bug where earlier
writes were still writeback cached data hadn't made it to the server yet,
and then the client would do the whole "mark it read-only" before the
writes had even been done. Oops.

We had a few other issues with just renaming files around (basic rule:
only rename files _within_ one directory if you want to avoid filesystem
bugs) and with using "pread/pwrite" (basic rule: pread/pwrite is unusual,
and is apparently buggy on some operating systems. So avoid them).

Anyway, what was the exact pattern that caused this to show, and maybe I
can find yet another place where git could just be even more anally safe
by not doing anything half-way odd?


Index Home About Blog