Initramfs (Al Viro; Linus Torvalds)

Index Home About Blog

Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: [BK PATCHES] initramfs merge, part 1 of N
Original-Message-ID: <Pine.LNX.4.44.0211021049480.2413-100000@home.transmeta.com>
Date: Sat, 2 Nov 2002 19:03:09 GMT
Message-ID: <fa.m8e4dav.16ga80g@ifi.uio.no>

On Sat, 2 Nov 2002, Aaron Lehmann wrote:
>
> Won't the initial userspace be linked into the kernel? If so, why will
> the kernel image get smaller?

Note that the reason I personally really want initramfs is not to make the
kernel boot image smaller, or the kernel sources smaller. That won't
happen for a long time, since I suspect that we'll be carrying the
initramfs user space with us for quite a while (eventually it will
probably split up into a project of its own, but certainly for the
forseeable future it would be very closely tied to the kernel).

The real advantage to me is two-fold:

 - make it easier for people to customize their initial system without
   having to muck with kernel code or even use a different boot sequence.
   One example of this is the difference between vendor install kernels
   (using initrd) and a normal install kernel (which doesn't).

   So I'd much rather see us _always_ using initrd, and the difference
   between an install kernel and a regular kernel is really just the size
   of the initrd thing.

 - Many things are much more easily done in user space, because user space
   has protections, "infinite stack", and in general a lot better
   infrastructure (ie easier to debug etc). At the same time, many things
   need to be done _before_ the kernel is fully ready to hand over control
   to a normal user space: do ACPI parsing so that we can initialize the
   devices so that we can use the "real" user space that is installed on
   disk etc.

   Sometimes there is overlap between these two things (ie the "easier to
   do in user space" and "needs to be done before normal user space can be
   loaded"). ACPI is one potential example. Mounting the root filesystem
   over NFS after having done DHCP or other auto-discovery is another.

So "shrinking the kernel" is not on my list here. It's really a matter of
"some initialization is better done in user space", and not primarily "we
want to make the kernel smaller". I'm not a big believer in microkernels
and trying to get everything out of the kernel itself, but I _do_ believe
that sometimes it's easier to just let the user do his own choices (while
still giving him all the protection implied by running in user space).

		Linus

Newsgroups: fa.linux.kernel
From: Alexander Viro <viro@math.psu.edu>
Subject: Re: [BK PATCHES] initramfs merge, part 1 of N
Original-Message-ID: <Pine.GSO.4.21.0211021436200.25010-100000@steklov.math.psu.edu>
Date: Sat, 2 Nov 2002 20:26:06 GMT
Message-ID: <fa.mjlrj8v.1f1o0a0@ifi.uio.no>

On Sat, 2 Nov 2002, Linus Torvalds wrote:

> Note that the reason I personally really want initramfs is not to make the
> kernel boot image smaller, or the kernel sources smaller. That won't
> happen for a long time, since I suspect that we'll be carrying the
> initramfs user space with us for quite a while (eventually it will
> probably split up into a project of its own, but certainly for the
> forseeable future it would be very closely tied to the kernel).
>
> The real advantage to me is two-fold:
[snip]

Let me add the third one: userland is more limited.  And no, that's not
a typo - and it's a good thing.  Userland has to use normal, regular
syscalls instead of poking its fingers into hell knows what parts of
kernel data structures.

Which means that it's more robust and that it doesn't stand in the way
of work on kernel.  90% of PITA with super.c used to be of that kind -
mounting root filesystem had been done with very ugly kludges and what's
more, these kludges got filtered down in the normal codepath.  Getting
rid of that took a _lot_ of very careful manipulations with the guts
of the thing.  And guess what?  There was no reason why all that black
magic would be necessary - current code uses normal, garden-variety
system calls.

In effect, we used to have special cases of mount(2), etc., with very
kludgy semantics.  They were not exposed to userland, but that didn't
make them less nasty or less painful to work with.  They still cluttered
the code, they still stood in the way of work on the thing and they still
were butt-ugly.

And that's what moving code to userland should prevent - it's much easier
to catch somebody bringing a patch with magical extension of system call
than to catch an attempt to sneak special-case code used only by kernel.

BTW, that's a thing we need to watch for - there obviously will be a lot
of patches moving stuff to userland and there will be a strong temptation
to add magic interfaces just for that.  _That_ should be prevented - it's
better to leave ugly crap as is than export the same crap to userland.
The point is to get the things cleaned up and make sure that they stay
clean, not to cement them in place by adding a magic ioctl/syscall/flag/whatnot.
We may very well end up extending existing interfaces, but we'd damn better
make sure that such additions make sense for generic use.

We have a lot of ugly crap that would be unnecessary if we had early
access to writable fs.  Basically, we got magic methods, magic codepaths,
etc. simply because the normal access to the functionality in question
required opened file descriptors.  Now we _do_ have a writable filesystem
mounted very early, so that cruft can be killed off.  And moving code
to userland acts as a filter - there we don't have access to magic, so
all such magic immediately shows up.  It could be done in the kernel
(and quite a few things had been done already), but move to userland
acts as a safeguard against reintroduction of magic crap.

Index Home About Blog