Index Home About Blog
Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: silent semantic changes with reiser4
Original-Message-ID: <Pine.LNX.4.58.0408260936550.2304@ppc970.osdl.org>
Date: Thu, 26 Aug 2004 17:02:48 GMT
Message-ID: <fa.guehu6c.82oq1i@ifi.uio.no>

On Thu, 26 Aug 2004, Jan Harkes wrote:
>
> (btw. the same could be implemented completely in userspace, for
> instance in glibc. Whenever an open call gets an EISDIR error, simply
> retry the open, but this time with /_contents appended).

Yes, and no - just to make it obvious before people jump on this, a lot of
things can be prototyped in user space with things like this, but once you
have to deal with races and mixed tool environments, user space suddenly
doesn't work so well any more.

I think Jan understands this distinction, I just wanted to make sure
everybody else is aware of the _one_ thing that kernel land does well:

 - safely synchronize globally visible data structures

That's quite fundamental. 99% of what a kernel does is exactly that. TCP
would be in user space too, if it wasn't for _exactly_ this issue. A lot
of people think that kernels are about hardware access, and yes, that's
the other 99% of the picture (I see the _big_ picture, remember?), but the
"safe access to common data" is really very fundamental.

The kernel is literally the thing that makes sure that you don't have -
and _cannot_ have - user programs that confuse each other by modifying
data unsynchronized.

For example, a filesystem is really nothing but a way to access a disk in
a controlled manner - it's not so much about hardware access, as it is
about maintaining a coherent view of how some shared data (disk or
whatever) is maintained.

Same goes for caches. We could cache things in user space, but if you want
to _share_ your caches (so that you don't have to re-load them for every
new application), you need some entity that manages those shared data
structures in a secure manner. In other words, you need the kernel.

The same goes for something like a "container file". Whether you see it as
"dir-as-file" or "file-as-dir" (and I agree with Jan that the two are
totally equivalent), the point of having the capability in the kernel is
not that the operations cannot be done in user space - the point is that
they cannot be done in user space _safely_. The kernel is kind of the
thing that guarantees that everybody follows the rules.

Imagine the security problems if a set-uid program were to (unwittingly)
depend on a user-space library that implements what Jan's prototype
library would do? Races galore, since a user-space implementation wouldn't
have _any_ way to do tests like the above atomically.

		Linus


Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: silent semantic changes with reiser4
Original-Message-ID: <Pine.LNX.4.58.0408302055270.2295@ppc970.osdl.org>
Date: Tue, 31 Aug 2004 04:09:09 GMT
Message-ID: <fa.gtu2166.aicphm@ifi.uio.no>

On Mon, 30 Aug 2004, Tom Vier wrote:

> On Thu, Aug 26, 2004 at 09:48:04AM -0700, Linus Torvalds wrote:
> >  - safely synchronize globally visible data structures
> > That's quite fundamental. 99% of what a kernel does is exactly that. TCP
> > would be in user space too, if it wasn't for _exactly_ this issue. A lot
>
> What about microkernels? They do tcp in userspace.

No they don't. They do TCP in a separate address space from user space,
that just also happens to be separate from the "microkernel address
space".

So a microkernel will have _more_ address spaces, and they won't be "user
space". They'll be "server deamon space" or something. Now, that's also
why they tend to have performance problems - because you need to copy the
data between different address spaces, and switch the CPU context etc
around.

Not user space. They may be "ring 3" from a CPU standpoint, but they
aren't user space from a _user_ standpoint - it's still very much a
separate address space, with domain protection.

> So did winsock, iirc.

Now that is a different case. Things like the PalmOS TCP stack (and, I
believe, Winsock) are true "user space" TCP stacks, in that they really
do live as libraries in the same address space as the user app.

It sucks. Exactly because now the data structures are _not_ protected, and
they are _not_ shared. So your library basically ends up being a "single
client" library, with no protection between clients (or no sharing: you
can have "protected" single-stream TCP, but then you won't share the TCP
state that needs to be shared like listen queues etc).

This works in an environment like Palm or Win-3.0, which are really just
single-client _anyway_, without any protection. But notice how Windows
started doing TCP in the kernel, and notice how you can actually use it as
a server these days?

In short: you _need_ to have a separate address space (either kernel, or
"TCP server" or whatever) if you want to have reliable, secure and
generally usable TCP.


> As long as a trusted process keeps data such as free ports, what's the
> problem?

None - because it's not user space any more.

Well, performance might still suck, of course. And it does.

		Linus


Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: silent semantic changes with reiser4
Original-Message-ID: <Pine.LNX.4.58.0408311006340.2295@ppc970.osdl.org>
Date: Tue, 31 Aug 2004 17:37:56 GMT
Message-ID: <fa.gttrvu8.biaopk@ifi.uio.no>

On Tue, 31 Aug 2004, Alan Cox wrote:
>
> Several do TCP in user space. The only thing you need in kernel for
> TCP/IP is enough decode to decide who gets the packet.

Only thing? I don't think so.

You also want to make sure that regular users cannot send "impossible"
packets. Think about the old "ping of death" kind of thing, where a normal
mis-behaving (and I'm not saying intentionally so: it might be a small bug
that just overwrites some data) program should _not_ be able to cause
problems on the network.

Admins absolutely _hate_ that. They will ban an OS if it sends out packets
that cause troublem. You should remember that - we used to do strange
things on the net (long long time ago), and we brought down servers by
mistake, and nobody ever considered it a server bug: it was a Linux bug
that it wouldn't do the right thing.

Things like not sending FIN-packets when a program suddenly goes away is
NOT acceptable behaviour! Neither is it acceptable behaviour to allow user
programs to make up their own packets.

> Even some non microkernel embedded OS's do this in order to keep kernel
> size down.

...and I'm not disagreeing that it doesn't happen. I explicitly mentioned
PalmOS, I bet it happens in other cases too. But I'd strongly argue that
it's a bug, not a feature.

It's a bug that people tend to accept in a "single-client" environment.

NOTE! This is totally ignoring the fact that you can't be called "UNIX"
any more. You _need_ to have sequence numbers etc be shared between
multiple programs that all write to the stream. Again, that _does_ mean
that you have another protection domain (aka "kernel" or "TCP deamon")
that keeps track of the sequence number.

		Linus


Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: silent semantic changes with reiser4
Original-Message-ID: <Pine.LNX.4.58.0408311423280.2295@ppc970.osdl.org>
Date: Tue, 31 Aug 2004 21:33:56 GMT
Message-ID: <fa.gtec0e5.b2qo9h@ifi.uio.no>

On Tue, 31 Aug 2004, Frank van Maarseveen wrote:
>
> There is nothing in the networking or UNIX standards that prescibe another
> protection domain for this. Would be insane to leave that out in a hosted
> environment but it _can_ be done without.

My point is that TCP _does_ have a lot of state that needs to be handled
in a safe manner by a proper operating system.

The fact that there are OS's out there that are crap doesn't change that
matter. There are lots of embedded OS's out there that still do
multitasking in a purely cooperative way. I don't think it's a valid model
for anything but toys. Same goes for putting TCP in user space. It's
doable, but it's not an "OS". It's a program loader.

		Linus


Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: silent semantic changes with reiser4
Original-Message-ID: <Pine.LNX.4.58.0409021549430.2295@ppc970.osdl.org>
Date: Thu, 2 Sep 2004 23:03:00 GMT
Message-ID: <fa.gue606c.82cp1o@ifi.uio.no>

On Thu, 2 Sep 2004, Tom Vier wrote:
>
> > Not user space. They may be "ring 3" from a CPU standpoint, but they
> > aren't user space from a _user_ standpoint - it's still very much a
> > separate address space, with domain protection.
>
> How are they different from regular user procs, other then being trusted to
> manage certain resources?

Ehh, they are separate the same way "inetd" is separate. It's not a _user_
proc, it's a system proc. The user can't actually do anything about it.

In many ways UNIX _is_ a microkernel. It does nonessential stuff in "user
space".  Anything that is critical for performance or the working of the
machine is in kernel space.

The big difference between UNIX and what people _call_ "microkernels" is
that UNIX has a very functional and sane partitioning of what is a
critical thing.

But from a kernel _protection_ angle, the only part that is important is
that the services be in some protected domain. That was what started this
discussion: 99% of what the kernel does is protecting shared data. Whether
it does so by passing it on to some trusted third party or not is an
implementation issue, and is totally pointless from a user standpoint,
since the user won't see it anyway.

		Linus

Index Home About Blog