Index Home About Blog
Newsgroups: fa.linux.kernel
From: Linus Torvalds <>
Subject: Re: Intel vs AMD x86-64
Original-Message-ID: <>
Date: Sun, 22 Feb 2004 03:08:12 GMT
Message-ID: <>

On Sun, 22 Feb 2004, Herbert Poetzl wrote:
> hmm, so the current x86_64 will be changed to x86-64 or
> will there be x86_64 and x86-64?

No. The filesystem policy _tends_ to be that dashes and spaces are turned
into underscores when used as filenames. Don't ask me why (well, the space
part is obvious, since real spaces tend to be a pain to use on the command
line, but don't ask me why people tend to convert a dash to an underscore).

So the real name is (and has always been, as far as I can tell) x86-64.

Actually, I'm a bit disgusted at Intel for not even _mentioning_ AMD in
their documentation or their releases, so I'd almost be inclined to rename
the thing as "AMD64" just to give credit where credit is due. However,
it's just not worth the pain and confusion.

Any Intel people on this list: tell your managers to be f*cking ashamed of
themselves. Just because Intel didn't care about their customers and has
been playing with some other 64-bit architecture that nobody wanted to use
is no excuse for not giving credit to AMD for what they did with x86-64.

(I'm really happy Intel finally got with the program, but it's pretty
petty to not even mention AMD in the documentation and try to make it
look like it was all their idea).


From: Linus Torvalds <>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.32-rc3: low mem - only 378MB on x86_32 with 64GB. Why?
Date: Sat, 10 Oct 2009 18:38:58 UTC
Message-ID: <>

On Sat, 10 Oct 2009, wrote:
> When the x86 went 64-bit, the register pressure relief from the
> additional registers usually more then outweighs the additional memory
> bandwidth (basically, if you're spending twice as much time on each
> load/store, but only doing it 40% as often, you come out ahead...)

That's mainly stack traffic, and x86 has always been good at it. More
registers makes for simpler (and fewer) instructions due to less reloads,
but for kernel loads, it's not the biggest advantage.

If you have 8GB of RAM or more, the biggest advantage _by_far_ for the
kernel is that you don't spend 25% of your system time playing with
k[un]map() and the TLB flushing that goes along with it. You also have
much more freedom to allocate (and thus cache) inodes, dentries and
various other fundamental kernel data structures.

Also, the reason MIPS and Sparc had a slowdown for 64-bit code was only
partially the bigger cache footprint (and that depends a lot on the app
anyway: many applications aren't that pointer-intensive. The kernel is
_very_ pointer-intensive, but even for something like that, most data
structures tend to blow up by 50%, not 100%).

The other reason for slowdown is that generating those pointers (for
function calls in particular) is more complex, and x86-64 is better at
that than MIPS and Sparc. That complex instruction encoding with
variable-size instructions means that you don't have to try to fit all
constants in the instruction stream either in the fixed-sized instruction,
or by doing indirect data access to memory through a GP register.

So x86-64 not only had the register expansion advantage, it had less of a
code generation downside to 64-bit mode to begin with. Want to have large
constants in the code? No problem. Sure, it makes your code bigger, but
you can still have them predecoded in the instruction stream rather than
have to load them from memory. Much nicer for everybody.

And for the kernel, the bigger virtual address space really is a _huge_
deal. HIGHMEM accesses really are very slow.  You don't see that in user
space, but I really have seen 25% performance differences between
non-highmem builds and CONFIG_HIGHMEM4G enabled for things that try to put
a lot of data in highmem (and the 64G one is even more expensive). And
that was just with 2GB of RAM.

And when it makes the difference between doing IO or not doing IO (ie
caching or not caching - when the dentry cache can't grow any more because
it _needs_ to be in lowmem), you can literally see an order-of-magnitude

With 8GB+ of ram, I guarantee you that the kernel spent tons of time on
just mapping high pages, _and_ it couldn't grow inodes and dentry caches
nearly as big as it would have wanted to. Going to x86-64 makes all those
issues just go away entirely.

So it's not "you can save a few instructions by not spilling to stack as
much". It's a much bigger deal than that. There's a reason I personally
refuse to even care about >2GB 32-bit machines. There's just no excuse
these days to do that.


Index Home About Blog