Index Home About Blog
Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: clone() <-> getpid() bug in 2.6?
Original-Message-ID: <Pine.LNX.4.58.0406051405110.7010@ppc970.osdl.org>
Date: Sat, 5 Jun 2004 21:17:15 GMT
Message-ID: <fa.gvdlt66.82gq1m@ifi.uio.no>

On Sat, 5 Jun 2004, Arjan van de Ven wrote:
>
> ... or glibc internally caches the getpid() result and doesn't notice the
> app calls clone() internally... strace seems to show 1 getpid() call total
> not 2.

Why the hell does glibc do that totally useless optimization?

It's a classic "optimize for benchmarks" thing - evil. I don't see any
real app caring, and clearly apps _do_ fail when you get it wrong, as
shown by Russell.

And it's really easy to get it wrong. You can't just invalidate the cache
when you see a "clone()" - you have to make sure that you never ever cache
it ever after either.

Uli, if Arjan is right, then please fix this. It's a buggy and pointless
optimization. Anybody who optimizes purely for benchmarks should be
ashamed of themselves.

		Linus




Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: clone() <-> getpid() bug in 2.6?
Original-Message-ID: <Pine.LNX.4.58.0406051540550.7010@ppc970.osdl.org>
Date: Sat, 5 Jun 2004 22:45:20 GMT
Message-ID: <fa.h1dvu61.a2qr1j@ifi.uio.no>

On Sat, 5 Jun 2004, Robert Love wrote:
>
> Eh, it definitely does, in nptl/sysdeps/unix/sysv/linux/getpid.c:

What a piece of shit.

> A few places, including the fork code, fix it:
>
> 	/* Adjust the PID field for the new process.  */
> 	THREAD_SETMEM (self, pid, THREAD_GETMEM (self, tid));
>
> But not direct calls to clone(2).

As mentioned, you can't even "adjust" it in a clone(). It's just _wrong_
to cache _ever_ after a clone(), whether it was cached before or not. You
can't just re-set the pid information like in a fork().

The whole notion of caching "pid" is sick and idiotic. It has no redeeming
values.

		Linus


Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: clone() <-> getpid() bug in 2.6?
Original-Message-ID: <Pine.LNX.4.58.0406051550580.7010@ppc970.osdl.org>
Date: Sat, 5 Jun 2004 23:03:15 GMT
Message-ID: <fa.h1e5ue1.a20r9j@ifi.uio.no>

On Sat, 5 Jun 2004, Robert Love wrote:
>
> It is almost certainly done to improve the speed of some stupid
> microbenchmark - say, one that just calls getpid() repeatedly (simple
> because it is NOT slow) to measure system call overhead.

getpid() is only used in bad benchmarks these days, exactly because
caching of "pid" has been done for a long time. Every single microkernel
OS ever built did it, I think.

So yes, you'll find some broken benchmarks using getpid(), but nobody sane
would take those benchmarks seriously anyway. If you want _real_ system
call performance measurements, use something like lmbench, which does:

 - getppid() - same cost as getpid(), except you can't just cache the
   results on any POSIX OS (well, you could, but it's more complex:
   you end up having to map the uarea-equvalent into user space or have
   some notifier for the parent dying).
 - read/write: a hell of a lot more important than the simple system calls
   anyway.
 - open/close/stat/fstat: very common system calls with interesting
   performance issues.

That gives you some _real_ data.

> Or maybe libc uses the PID a lot internally.  I don't know.

I think some threading libraries historically used hashes of the
thread-specific pid to look up the thread-specific data. But since glibc
shows the same pid for all regular threads, and gets the pid wrong for
other threads, that's clearly _not_ a valid reason for caching it either.

		Linus


Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: clone() <-> getpid() bug in 2.6?
Original-Message-ID: <Pine.LNX.4.58.0406051610430.7010@ppc970.osdl.org>
Date: Sat, 5 Jun 2004 23:19:55 GMT
Message-ID: <fa.h0ttte1.aigq9j@ifi.uio.no>

On Sat, 5 Jun 2004, Davide Libenzi wrote:
>
> It is likely used by pthread_self(), that is pretty much performance
> sensitive. I'd agree with Ulrich here.

It _can't_ be used for pthread_self(), since the pid is the _same_ across
all threads in a pthread environment.

See?

I re-iterate:

 - getpid() was used in some historical threading packages (maybe even
   LinuxThreads) to get the thread ID thing.

   But this particular usage requires the old Linux behaviour of returning
   separate threads ID's for separate threads. CLONE_THREAD does not do
   that, and in particular the glibc caching _breaks_ any such attempt
   even when you don't use CLONE_THREAD.

   pthreads() in modern libc sets CLONE_THREAD and then uses the
   per-thread segment thing to identify threads. No getpid() anywhere.

   Ergo: getpid() caching may result in faster execution, but in this case
   it's actively _wrong_, and breaks the app.  It's easy to make things go
   fast if you don't actually care about whether the end result is correct
   or not.

 - getpid() is used in some silly benchmarks.

   But in this case caching getpid() just lies to the benchmark, and is
   pointless.

   Ergo: getpid() caching results in higher scores, but the scores are now
   meaningless, since they have nothing to do with system call speed any
   more.

So in both of these cases, the caching literally results in WRONG
behaviour.

Can somebody actually find an application that doesn't fall into either of
the above two categories, and calls getpid() more than a handful of times?

		Linus


Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: clone() <-> getpid() bug in 2.6?
Original-Message-ID: <Pine.LNX.4.58.0406052244290.7010@ppc970.osdl.org>
Date: Sun, 6 Jun 2004 06:09:31 GMT
Message-ID: <fa.h0e1u65.82cr1n@ifi.uio.no>

On Sun, 6 Jun 2004, Kalin KOZHUHAROV wrote:
>
> Well, not exactly sure about my reply, but let me try.
>
> The other day I was debugging some config problems with my qmail instalation and I ended up doing:
> # strace -p 4563 -f -F
> ...
> [pid 13097] read(3, "\347\374\375TBH~\342\233\337\220\302l\220\317\237\37\25"..., 32) = 32
> [pid 13097] close(3)                    = 0
> [pid 13097] getpid()                    = 13097
> [pid 13097] getpid()                    = 13097
> [pid 13097] getuid32()                  = 89
> [pid 13097] getpid()                    = 13097
> [pid 13097] time(NULL)                  = 1086497450
> [pid 13097] getpid()                    = 13097
> [pid 13097] getpid()                    = 13097
> [pid 13097] getpid()                    = 13097

qmail is a piece of crap. The source code is completely unreadable, and it
seems to think that "getpid()" is a good source of random data. Don't ask
me why.

It literally does things like

	random = now() + (getpid() << 16);

and since there isn't a single comment in the whole source tree, it's
pointless to wonder why. (In case you wonder, "now()" just does a
"time(NULL)" call - whee.).

I don't understand why people bother with it. It's not like Dan Bernstein
is so charming that it makes up for the deficiencies of his programs.

But no, even despite the strange usage, this isn't a performance issue.
qmail will call "getpid()" a few tens of times per connection because of
the wonderful quality of randomness it provides, or something.

This is another gem you find when grepping for "getpid()" in qmail, and
apparently the source of most of them:

	if (now() - when < ((60 + (getpid() & 31)) << 6))

Don't you love it how timeouts etc seem to be based on random values that
are calculated off the lower 5 bits of the process ID? And don't you find
the above (totally uncommented) line just a thing of beauty and clarity?

Yeah.

Anyway, you did find something that used more than a handful of getpid()
calls, but no, it doesn't qualify as performance-critical, and even
despite it's peyote-induced (or hey, some people are just crazy on their
own) getpid() usage, it's not a reason to have a buggy glibc.

		Linus


Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: clone() <-> getpid() bug in 2.6?
Original-Message-ID: <Pine.LNX.4.58.0406060937330.7010@ppc970.osdl.org>
Date: Sun, 6 Jun 2004 16:58:47 GMT
Message-ID: <fa.gvu3tu9.9ieqpn@ifi.uio.no>

On Sun, 6 Jun 2004, Erik Andersen wrote:
>
> http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/libc/sysdeps/posix/tempname.c?rev=1.36&content-type=text/plain&cvsroot=glibc
>
>     /* Get some more or less random data.  */
>     random_time_bits = ((uint64_t) tv.tv_usec << 16) ^ tv.tv_sec;
>     value += random_time_bits ^ __getpid ();

This one is historically at least a bit understandable: it's quite common
for various programs to create their lockfiles with the "unique"
identifier that is the pid of the locker. So the algorithm used to be
roughly (used by various things, ranging from tty subsystem locking to
whatever):

	int fd, len;
	int pid = getpid();

	len = sprintf(name, "lockfile.%d", pid);
	fd = open(name, O_EXCL | O_CREAT | O_WRONLY, 0644);
	if (fd < 0) {
		if (errno == EEXIST) {
			fprintf(stderr, "Stale lockfile %s. Remove me\n", name);
			exit(1); /* Or just "unlink + repeat" */
		}
		perror("lockfile");
		exit(1);
	}

	/* Write the pid into the lockfile, fsync it */
	write(fd, name + 9, len - 9);
	fsync(fd);

	/* Try to move it atomically */
	while (rename("lockfile", name) < 0) {
		if (errno != EEXIST) {
			perror("lockfile");
			exit(1);
		}
		/* Try again later */
		sleep(1);
	}

	... We now have an exclusive lock ..

and here "pid" is in fact a good system-wide unique identifier. It's not
random, and it works badly with networked filesystems (which is why you'll
find versions of this algorithm that use the hostname in addition to the
pid too), but it's "obvious".

I think the original "mktemp()" also used to start off with "pid" as a
"unique" identifier. Because while it's not random, it _is_ system-wide
unique, so getpid() actually makes _sense_ in that respect.

In other words: getpid() makes little sense as a random value, but it does
make some sense as a unique value. Maybe less now than it did 20 years ago
due to the prevalence of networking, but history is powerful.

			Linus


Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: clone() <-> getpid() bug in 2.6?
Original-Message-ID: <Pine.LNX.4.58.0406061155290.7010@ppc970.osdl.org>
Date: Sun, 6 Jun 2004 19:02:07 GMT
Message-ID: <fa.gvtvue7.9iar9l@ifi.uio.no>

On Sun, 6 Jun 2004, Simon Kirby wrote:
>
> Unrelated to this discussion -- and there is a close() missing -- but is
> there any reason for fsync() to be there?

It only matters if the lock is meaningful over an involuntary reboot (aka
crash). Usually it isn't. So yes, the fsync() is usually not necessary.

Sometimes the lock file may contain real data that is meaningful for
recovery, though. The pid generally isn't, but it could point to a log
that describes what the thing was about to do. Or it could just be helpful
on next reboot when you have a stale lockfile, and you want to match that
lockfile to any syslog messages or similar.

		Linus


Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> 
	getpid()
Original-Message-ID: <Pine.LNX.4.58.0406061013250.7010@ppc970.osdl.org>
Date: Sun, 6 Jun 2004 17:20:21 GMT
Message-ID: <fa.gvtlte5.9igq9j@ifi.uio.no>

On Sun, 6 Jun 2004, Russell Leighton wrote:
>
> Linus said he could not imagine a program using getpid() more than a
> handful of times...

No, I said that I could not imagine using it more than a handful of times
_except_ for the cases of:

 - thread identification without a native thread area
 - benchmarking.

(and in both of these cases it is _wrong_ to cache the pid value)

> well, (I am ashamed to admit it) I found this getpid() bug with just
> such a program.
>
> Could someone can suggest an alternative to using getpid() for me?
> Here's the problem...
>
> I have a library that creates 2 threads using clone().

Your problem falls under the thread ID thing. It's fine and understandable
to use getpid() for that, although you could probably do it faster if you
are willing to use the support that modern kernels give you and that glibc
uses: the "TLS" aka Thread Local Storage support.

Thread-local storage involved having a user-mode register that points some
way to a special part of the address space. On x86, where the general
register set is very limited and stealing a general reg would thus be bad,
it uses a segment and loads the TLS pointer into a segment register
(segment registers are registers too - and since nobody sane uses them for
anything else these days, both %gs and %fs are freely usable).

>  Would gettid() be any better?

You'd avoid this particular glibc bug with gettid.

		Linus


Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: clone() <-> getpid() bug in 2.6?
Original-Message-ID: <Pine.LNX.4.58.0406061028050.7010@ppc970.osdl.org>
Date: Sun, 6 Jun 2004 17:58:36 GMT
Message-ID: <fa.gutltma.8igqho@ifi.uio.no>

On Sun, 6 Jun 2004, Denis Vlasenko wrote:
> >
> > 	if (now() - when < ((60 + (getpid() & 31)) << 6))
>
> timeout randomization across several qmail-send processes, to prevent
> stampede effect?
>
> > And don't you find
> > the above (totally uncommented) line just a thing of beauty and clarity?
>
> Yes, a comment would be in place. :(

The problem is not so much what it does, but how it's coded.

For example, if it really wanted to do this, then how about having a nice
variable that is initialized at run-time (preferably just once at
startup), and has a nice name and a comment on what it's all about? Or
even a macro, to make it more readable? Or even _just_ the comment?

You could write the above incomprehensible one-liner as

	/* Timeout in seconds: 64 - 97 minutes, depending on pid */
	#define ERROR_FAILURE_TIMEOUT ((60 + (getpid() & 31)) << 6)

	...

	/*
	 * If this IP address had a SMTP connection timeout last
	 * time, don't let it try to connect again immediately
	 */
	if (now() - when < ERROR_FAILURE_TIMEOUT)

and it would at least have made some sense.

As it is, you CANNOT read "tcpto.c" and make any sense of it. It's not
sensible code. You have to know what the hell the whole thing is all
about, and you have to actually try to figure out what the expression is
to even understand the code.

Trust me. Try sometime. The code is horrible.

It wouldn't irritate me so much, but then the person who wrote that
abomination has the galls to complain about sendmail, NFS, System V etc.

And it's not like qmail is so big and has such a long history that it
would be a huge deal to comment it and write it more readably. But since
the license doesn't allow for anything but patches, and patches would be
totally unmaintainable if they tried to fix these kinds of things, it will
never be done.

Which is a pity, since I actually _agree_ with you that qmail has some
good sides: it's small, it's efficient, and it's pretty modular. It has
the _potential_ to be great code.

			Linus

Index Home About Blog