VMAs (Linus Torvalds)

Index Home About Blog

Date: 	Fri, 10 Aug 2001 14:55:11 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: /proc/<n>/maps getting _VERY_ long
Newsgroups: fa.linux.kernel

On Mon, 6 Aug 2001, Jamie Lokier wrote:
>
> There are garbage collectors that use mprotect() and SEGV trapping per
> page.  It would be nice if there was a way to change the protections per
> page without requiring a VMA for each one.

This is actually how Linux used to work a long long time ago - all
protection information was in the page tables, and you could do per-page
things without having to worry about piddling details like vma's.

It does work, but it had major downsides. Trivial things like re-creating
the permission after throwing a page out or swapping it out.

We used to have these "this is a COW page" and "this is shared writable"
bits in the page table etc - there are two sw bits on x86, and I think we
used them both.

These days, the vma's just have too much information, and the page tables
can't be counted on to have enough bits.

So on one level I basically agree with you, but at the same time it's just
not feasible any more. The VM got a lot better, and got ported to other
architectures. And it started needing more information - it used to be
enough to know whether a page was shared writable or privately writable or
not writable at all, but back then we didn't really support the full
semantics of shared memory or mprotect, so we didn't need all the
information we have to have now.

They were "the good old days", but trust me, you really don't want them
back. The vma's have some overhead, but it is not excessive, and they
really make things like a portable VM layer possible..

It's very hard to actually see any performance impact of the VMA handling.
It's a small structure, with reasonable lookup algorithms, and the common
case is still to not have all that many of them.

		Linus

Date: 	Fri, 10 Aug 2001 16:26:00 -0700 (PDT)
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: /proc/<n>/maps getting _VERY_ long
Newsgroups: fa.linux.kernel

On Fri, 10 Aug 2001, H. Peter Anvin wrote:
>
> Note that it isn't very hard to deal with *that* problem, *if you want
> to*... you just need to maintain a shadow data structure in the same
> format as the page tables and stuff your software bits in there.

Actually, this is what Linux already does.

The Linux page tables _are_ a "shadow data structure", and are
conceptually independent from the hardware page tables (or hash table, or
whatever the actual hardware uses to actually fill in the TLB).

This is most clearly seen on CPU's that don't have traditional page table
trees, but use software fill TLB's, hashes, or other things in hardware.

> Whether or not that is a good idea is another issue entirely, however,
> on some level it would make sense to separate protection from all the
> other VM things...

I think that the current Linux approach is much superior - the page tables
are conceptually a separate shadow data structure, but the way things are
set up, you can choose to make the mapping from the shadow data structure
to the actual hardware data structures be a 1:1 mapping.

This does mean that we do NOT want to make the Linux shadow page tables
contain stuff that is not easy to translate to hardware page tables.
Tough. It's a trade-off: either you overspecify the kernel page tables
(and take the hit of having to keep two separate page tables), or you say
"the kernel page tables are weaker than we could make them", and you get
the optimization of being able to "fold" them on top of the hardware page
tables.

I'm 100% convinced that the Linux VM does the right choice - we optimize
for the important case, and I will claim that it is _really_ hard for
anybody to make a VM that is as efficient and as fast as the Linux one.

Proof: show me a full-fledged VM setup that even comes _close_ in
performance, and gives the protection and the flexibility that the Linux
one does.

And yes, we do have _another_ shadow data structure too. It's called the
vm_area_struct, aka "vma", and we do not artificially limit ourself to
trying to look like hardware on that one.

Which brings us back to the original question, and answers it: we already
do all of this, and we do it RIGHT. We optimize for the right things.

		Linus

Index Home About Blog