Index Home About Blog
From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: comp.arch
Subject: Re: Code density and performance?
Message-ID: <dcu7sn$49b$1@build.pdx.osdl.net>
Date: Thu, 04 Aug 2005 23:30:01 GMT

Terje Mathisen wrote:
>
> When I explain this stuff to people, I like to use a simple rule of
> thumb: "The time spent waiting for the transfer to start should be
> comparable to the time spent actually moving data."

Terje, you often make a lot of sense, but this ain't one of those times.

> So if you have a 10 ms (effective/measured average) seek time, and 50
> MB/s disk transfer rate, then you're looking at about 512 KB transfer
> size.

That may be a good rule in general, but it has nothing to do with page size
issues.

The thing about a page size is that it's obviously the smallest memory
mapping entity, and in combination with the fact that you want your memory
maps to be coherent (and let's face it, you _do_ want coherent mmap, the
alternatives just suck too badly), it means that the page size is also the
minimum granularity for any OS IO caching.

Now, maybe you'd like to live in a world with big files, but the fact is,
one very common pattern is actually a lot of _small_ files. For example,
let's pick a totally hypothetical software engineering project, which has
17,000+ files, at an average size of just under 12kB.

Now, maybe you'd like to have a 512kB page size, because that is the
ideal transfer size from disk, but the fact is, it's just not acceptable if
it has the implication that your caching granularity is also 512kB for
memory mapping purposes.

In fact, even 32kB is _waay_ too big a page size. Even 16kB is too big. A
page size of 8kB may be acceptable, but 4kB is actually a big win. Do the
numbers on memory fragmentation from the above hypothetical example, and
you'll realize why.

The thing is, for 99% of all loads, caching is a lot more important than
disk utilization. Small page sizes are worth it, even if it means that you
end up having worse disk access patterns - because your ability to cache
things goes up quite a lot.

Yeah, yeah, you can play games, and move caches around when you mmap things,
but that will just make sure that your base OS is flaky and unreliable. Or
you can say that you don't care about mmap cache coherency: people have done
that too.

So forget about big page sizes. They are horrible for general-purpose stuff,
and only useful for databases etc. And if that means that you get only half
the disk throughput - if small pages make your caches effectively twice the
size (and it does - do the numbers), you tend to win on a lot of loads
anyway.

So code density is good. So is data density. And large page sizes really are
very very bad for data density.

              Linus

Index Home About Blog