Inlining functions (Linus Torvalds)

Index Home About Blog

Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: [PATCH-2.6.0-tiny] "uninline" {lock,release}_sock
Original-Message-ID: <Pine.LNX.4.58.0312280017060.2274@home.osdl.org>
Date: Sun, 28 Dec 2003 08:24:45 GMT
Message-ID: <fa.j2s4qtg.1cickog@ifi.uio.no>

On Sun, 28 Dec 2003, Arnaldo Carvalho de Melo wrote:
>
> 	Please apply on top of your 2.6.0-tiny1 tree, CC to netdev for
> eventual comments.

Please don't do it this way.

This is quite possibly faster even _normally_, so it might make more sense
to just do it globally instead of having a CONFIG_NEX_SMALL.

Basically, inline functions tend to win only when inlining them is smaller
and simpler than actually calling a function. The most common cause of
that is that some argument is commonly constant (and thus gets simplified
away by inlining), or the function itself literally expands to just a few
instructions (the list functions, the inline asms for things like "cli"
etc).

We use a lot too many inline functions for other reasons: one reason to
use them is that they are sometimes more convenient than it is to find a
good place for the non-inline version. Another common reason is that the
thing started out smaller than it eventually became - and the inline just
stuck around.

But if you do things like this for a CONFIG_SMALL, then the convenience
argument obviously isn't true any more, and you'd be a lot better off just
unconditionally making it a real function call.

Function calls aren't all that expensive, especially with FASTCALL() etc
to show that you don't have to follow the common calling conventions.
Right now I think FASTCALL() only matters on x86, but some other
architectures could make it mean "smaller call clobbered list" or similar.

Have you benchmarked with the smaller kernel?

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
Date: Mon, 02 Jan 2006 19:17:49 UTC
Message-ID: <fa.fu9f3jb.g0q6ir@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0601021105000.3668@g5.osdl.org>

On Mon, 2 Jan 2006, Krzysztof Halasa wrote:
>
> For example, I add "inline" for static functions which are only called
> from one place.

That's actually not a good practice. Two reasons:

 - debuggability goes way down. Oops reports give a much nicer call-chain
   and better locality for uninlined code.

 - Gcc can suck at big functions with lots of local variables. A
   function call can be _cheaper_ than trying to inline a function,
   regardless of whether it's called once or many times. I've seen
   functions that had several silly (and unnecessary) spills suddenly
   become quite readable when they were separate functions.

   More importantly, the "inline" sticks around. Later on, the function is
   used for some other place too, and the inline doesn't get removed.

The second "the inline sticks around" case is something that happens to
helper functions too. They started out as trivial macros in a header file,
but then they get converted to an inline function because they get more
complex, and eventually it grows a new hook. Suddenly what used to
generate two instructions generates ten instructions and a call, and would
have been much better off being uninlined in a .c file.

So inlines that make sense at one point have a tendency to _not_ make
sense a year or two later.

I suspect we'd be best off removing almost all inlines, both from C files
and headers. There are a few cases where inlining really helps: the
function really ends up being just a few instructions, and it's really
just a wrapper around a simpler calling convention or an inline assembly,
or it's called with constant arguments that are better left off simplified
at compile-time. Those are the cases where inlining really helps.

(Of course, then there's ia64. Which wants inlining just because..)

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
Date: Mon, 02 Jan 2006 22:47:54 UTC
Message-ID: <fa.g0pp4bf.igk6an@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0601021439520.3668@g5.osdl.org>

On Mon, 2 Jan 2006, Jan Engelhardt wrote:

> >* Linus Torvalds <torvalds@osdl.org> wrote:
> >
> >> > For example, I add "inline" for static functions which are only called
> >> > from one place.
> >>
> >> That's actually not a good practice. Two reasons:
> >>
> >>  - debuggability goes way down. Oops reports give a much nicer call-chain
> >>    and better locality for uninlined code.
>
> When I want to debug, I use
> CFLAGS="-O0 -ggdb3 -fno-inline -fno-omit-frame-pointer"
> for that particular file(s). That sure gets good results. Not sure about
> who wins in the kernel case: always_inline or -fno-inline.

This is totally not relevant.

99% of all bug-reports happen for non-developers. What developers can and
can not do from a debuggability standpoint is almost totally
uninteresting: quite often the developers won't even be able to recreate
the bug, but have to go on the bug report that comes in from the outside.

And yes, some users are willing to recompile the kernel, and try ten
different versions, and in general are just worth their weight in gold.
But many people have trouble even reporting the (short) oops details, much
less follow up on it.

So it's actually important that the default config is reasonably easy to
debug from the oops report. Because it may be the only thing you ever get.

So -O0 and -fno-inline simply isn't practical, because they are not an
option for a normal kernel.

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
Date: Mon, 02 Jan 2006 19:46:06 UTC
Message-ID: <fa.fvpf4b6.hgq6au@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.64.0601021130300.3668@g5.osdl.org>

On Mon, 2 Jan 2006, Jakub Jelinek wrote:
>
> Where does this certainity come from?  gcc-3.x (as well as 2.x) each had
> its own heuristics which functions should be inlined and which should not.
> inline keyword has always been a hint.

NO IT HAS NOT.

This is total revisionist history by gcc people. It did not use to be a
hint. If you asked for inlining, you got it unless there was some
technical reason why gcc couldn't inline it (ie recursive inlining, and
trampolines and some other issues). End of story.

So don't fall for the "hint" argument. It's simply not true.

At some point during gcc-3.1, gcc people changed it to be a hint, and did
so very badly indeed, where functions that really needed inlining (because
constant propagation made them go away) were not inlined any more. As a
result, we do

	#define inline   inline __attribute__((always_inline))

in <linux/compiler-gcc3.h> exactly because gcc-3.1 broke so badly.

And nobody sane will argue that we would _ever_ not do that. gcc-3 was
just too broken. Some architectures (sparc64, MIPS, s390) ended up trying
to tune the inlining limits by hand (usually making them effectively
infinitely large), but the basic rule was that gcc-3 inlining was just not
working.

It may have improved in later gcc-3 versions, and apparently it's getting
better in gcc-4. And that's the only thing we're discussing: whether to
let gcc _4_ finally make some inlining decisions.

And people are nervous about it, exactly because the gcc people have
historically just changed what "inline" means, with little regard for
real-life code that uses it. And then they have this revisionist agenda,
trying to change history and claiming that it's always been "just a hint".
Despite the fact that the gcc manual itself very much said otherwise (I'm
sure the manuals have been changed too).

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops
Date: Mon, 19 Mar 2007 18:40:47 UTC
Message-ID: <fa.lIdVAovo8b4r7+GbAGlbTaC/dhc@ifi.uio.no>

On Mon, 19 Mar 2007, Eric W. Biederman wrote:
>
> True.  You can use all of the call clobbered registers.

Quite often, the biggest single win of inlining is not so much the code
size (although if done right, that will be smaller too), but the fact that
inlining DOES NOT CLOBBER AS MANY REGISTERS!

The lack of register clobbering, and the freedom for the compiler to
choose registers around an inlined function is usually the biggest win! If
you can't do that, then inlining generally doesn't actually even help: a
call/return to a single instruction isn't all that much slower than just
doing the "cli" in the first place.

If we end up with a setup where any inlined instruction needs to act as if
it was a function call (just with the "call" instruction papered over with
the inlined instruction sequence), then there is no point to this at all.

In short: people here seem to think that inlining is about avoiding the
call/ret sequence. Not so. The real advantages of inlining are elsewhere.

So *please* don't believe that you can make it "as cheap" to have some
automatic fixup of two sequences, one inlined and one as a "call".  It may
look so when you look at the single instruction generated, but you're
ignoring all the instructions *around* the site.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops
Date: Mon, 19 Mar 2007 18:46:30 UTC
Message-ID: <fa.ITcdqtldpwetBzyuvpZ/7KRwXhY@ifi.uio.no>

On Mon, 19 Mar 2007, Linus Torvalds wrote:
>
> So *please* don't believe that you can make it "as cheap" to have some
> automatic fixup of two sequences, one inlined and one as a "call".  It may
> look so when you look at the single instruction generated, but you're
> ignoring all the instructions *around* the site.

Side note, you can certainly fix things like this at least in theory, but
it requires:

 - the "call" instruction that is used instead of the inlining should
   basically have no callee-clobbers. Any paravirt_ops called this way
   should have a special calling sequence where they simple save and
   restore all the registers they use.

   This is usually not that bad. Just create a per-architecture wrapper
   function that saves/restores anything that the C calling convention on
   that architecture says is clobbered by calls.

 - if the function has arguments, and the inlined sequence can take the
   arguments in arbitrary registers, you are going to penalize the inlined
   sequence anyway (by forcing some fixed arbitrary register allocation
   policy).This thing is largely unfixable without some really extreme
   compiler games (like post-processing the assembler output and having
   different entry-points depending on where the arguments are..)

.. it will obviously depend on how things are done whether these things are
useful or not. But it does mean that it's always a good idea to just have
a config option of "turn off all the paravirt crap, because it *does* add
overhead, and replacing instructions on the fly doesn't make that
overhead go away".

		Linus

Subject: Re: [git patches 1/2] warnings: attack valid cases spotted by
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 18 Jul 2007 18:04:10 UTC
Message-ID: <fa.CRw7ZYVe1RTRuguIxXHxE7W38Xw@ifi.uio.no>
Newsgroups: fa.linux.kernel

On Wed, 18 Jul 2007, Roland Dreier wrote:
>
> BTW, I noticed one interesting thing while starting on this cleanup.
> I wanted to make sure that the generated code didn't change with the
> first step, and I actually discovered that the patch below seems to
> make the generated code *better*, maybe because some gcc alias
> analysis doesn't get as paranoid without the void *:

Absolutely.

The way to get pretty much any compiler to generate better code is:
 - code it simply
 - don't have tons of variables with overlapping lifetime
 - use limited-scope variables (ie don't have the variables at the
   outermost scope, declare them in the smallest scope you can)

and trying to split things up into functions helps all of these.

In fact, you can often get better code even when the functions aren't even
inlined, because of the "simpler code" issue. Gcc sometimes tries to be
too clever with its CSE etc, and generates really nasty complex code with
lots of register spills, just because it keeps stuff live rather than just
regenerating them.

So inlining a function doesn't even make it faster, all the time. You want
to inline when

 - the function is so small that the call is literally a big part of it,
   and it doesn't even need more than a couple of registers, so the
   calling convention has more register pressure than inlining the
   function itself would have.

OR

 - the callers tend to have constant arguments that can benefit from
   constant folding inside the function

but inlining in many other circumstances actually generates *worse* code
and just makes debugging harder and the I$ footprint bigger.

> And here's the patch itself -- I think this is a reasonable size step
> to break things up into.  I assume that this is the basic form of the
> cleanup that you're proposing?

Yes, this looks good. Doing these kinds of things for the various other
things is likely to make the code more readable, and as you already found
out, the generated code doesn't tend to be worse either. It might not
_always_ be a win size and performance-wise, but it can be, and so
readability should generally always be the #1 goal, because quite often it
actually does help the compiler too.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
Date: Tue, 24 Jul 2007 19:17:33 UTC
Message-ID: <fa.kHGwjdMHAItHjJPEabfryabxTaQ@ifi.uio.no>

On Tue, 24 Jul 2007, Andrew Morton wrote:
>
> fwiw, -fno-inline-functions-called-once (who knew?) takes i386 allnoconfig
> vmlinux .text from 928360 up to 955362 bytes (27k larger).
>
> A surprisingly large increase - I wonder if it did something dumb.  It
> appears to still correctly inline those things which we've manually marked
> inline.  hm.

I think inlining small enough functions is worth it, and the thing is, the
kernel is actually pretty damn good at having lots of small functions.
It's one of the few things I really care about from a coding style
standpoint.

So I'm not surprised that "-fno-inline-functions-called-once" makes things
larger, because I think it's generally a good idea to inline things that
are just called once. But it does make things harder to debug, and the
performance advantages become increasingly small for bigger functions.

And that's a balancing act. Do we care about performance? Yes. But do we
care so much that it's worth inlining something like buffered_rmqueue()?

So I would not be surprised if "-fno-inline-functions-called-once" will
disable *all* the inlining heuristics, and say "oh, it's not an inline
function, and it's only called once, so we won't inline it at all".

So "called once" should probably make the inlining weight bigger (ie
inline *larger* functions than you would otherwise), it just shouldn't
make it "infinite". It's not worth it.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
Date: Tue, 24 Jul 2007 19:46:35 UTC
Message-ID: <fa.BVIX07zJkFBpdga/pgquVAWrd3s@ifi.uio.no>

On Tue, 24 Jul 2007, Andi Kleen wrote:
>
> There's probably a --param where it can be tweaked exactly. The
> problem is that --params tend to be very gcc version specific
> and might do something completely different on a newer or
> older version. So it's better not to use them.

I agree wholeheartedly with that sentiment. We've tried at times (because
some gcc snapshots made some *truly* insane choices for a while), and
maybe we still have some around. Not worth the pain.

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
Date: Tue, 24 Jul 2007 19:49:29 UTC
Message-ID: <fa.Qyx7gfbe5u/9pM7b3F9HKyOlGuk@ifi.uio.no>

On Tue, 24 Jul 2007, Adrian Bunk wrote:
>
> > But do we
> > care so much that it's worth inlining something like buffered_rmqueue()?
> >...
>
> Where is the problem with having buffered_rmqueue() inlined?

In this case, it was a pain to just even try to find the call chain, or
read the asm.

I would encourage lots of kernel hackers to read the assembler code gcc
generates. I suspect people being aware of code generation issues (and
writing their code with that in mind) is a *much* bigger performance
impact than gcc inlining random functions.

So maybe I'm old-fashioned and crazy, but "readability of the asm result"
actually is a worthwhile goal. Not because we care directly, but because
I'd like to encourage people to do it, due to the *indirect* benefits.

		Linus

Index Home About Blog