IRQ routing (Linus Torvalds)

Index Home About Blog

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot
Date: Fri, 06 Oct 2006 16:03:50 UTC
Message-ID: <fa.dDjXUQDDw2zhDg5dqy91asSong4@ifi.uio.no>

On Fri, 6 Oct 2006, Eric W. Biederman wrote:
>
> The change the patch introduced was that we are now always
> pointing irqs towards individual cpus, and not accepting an irq
> if it comes into the wrong cpu.

I think we should just revert that thing. I don't think there is any real
reason to force irq's to specific cpu's: the vectors haven't been _that_
problematic a resource, and being limited to just 200+ possible vectors
globally really hasn't been a real problem once we started giving out the
vectors more sanely.

And the new code clearly causes problems, and it seems to limit our use of
irq's in fairly arbitrary ways. It also would seem to depend on the irq
routing being both sane and reliable, something I'm not sure we should
rely on.

Also, I suspect the whole notion of only accepting an irq on one
particular CPU is fundamentally fragile. The irq delivery tends to be a
multi-phase process that we can't even _control_ from software (ie the irq
may be pending inside an APIC or a bridge chip or other system logic, so
things may be happening _while_ we possibly try to change the cpu
delivery).

So how about just reverting that change?

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot
Date: Fri, 06 Oct 2006 18:13:08 UTC
Message-ID: <fa.G3yUhXzPEWqY8taWwgYg6JAx8/4@ifi.uio.no>

On Fri, 6 Oct 2006, Eric W. Biederman wrote:
>
> Forcing irqs to specific cpus is not something this patch adds.  That
> is the way the ioapic routes irqs.

What that patch adds is to make it an ERROR if some irq goes to an
unexpected cpu.

And that very much is wrong.

> Yes.  A single problem over several months of testing has been found.

Umm. It got found the moment it became part of the standard tree.

The fact is, "months of testing" is not actually very much, if it's the
-mm tree. That's at best a "good vetting", but it really doesn't prove
anything.

> So this is fairly fundamentally an irq migration problem.  If you
> never change which cpu an irq is pointed at you don't have problems,
> as there are no races.

So? Does that change the issue that this new model seems inherently racy?

> The current irq migration logic does everything in the irq handler
> after an irq has been received so we can avoid various kinds of races.

No. You don't understand, or you refuse to face the issue.

The races are in _hardware_, outside the CPU. The fact that we do things
in an irq handler doesn't seem to change a lot.

And what do you intend to do if it turns out that the reason it doesn't
work on x366 is that the _hardware_ just is incompatible with your model?

I'm not saying that's the case, and maybe there's some stupid bug that has
been overlooked, and maybe it can all work fine. But the new model _does_
seem to be at least _potentially_ fundamentally broken.

			Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot
Date: Sat, 07 Oct 2006 19:04:04 UTC
Message-ID: <fa.tkr0J2r3sxsHWaWOxUW8CkDjbAk@ifi.uio.no>

On Sat, 7 Oct 2006, Eric W. Biederman wrote:
>
> I am hoping that by running the apics in a different delivery mode
> that explicitly says just deliver this interrupt to this cpu we
> will avoid the problem you are seeing.

Note that having too strict delivery modes could be a major pain in the
future, with things like multicore CPU's a lot more actively doing power
management on their own, and effectively going into sleep-states with
reasonably long latencies.

Especially with schedulers that are aware of things like that (and we
_try_, at least to some degree, and people are interested in more of it),
you can easily be in the situation that one of the cores is being fairly
actively kept in a low-power state, and can have millisecond latencies
(not to mention no L1 cache contents etc).

So I really do think that the belief that we should force irqs to a
particular core is fundamentally flawed.

We used to do lowest-priority stuff in hw, and then Intel broke it, but I
always told them that they were _stupid_ to break it. The fact is,
especially with multi-core, it actually makes a lot of sense to have
hardware decide which core to interrupt, because hardware simply
potentially knows better.

This is one of those age-old questions: in _theory_ you can do a better
job in software, but in _practice_ it's just too damn expensive and
complicated to do a perfect job especially with dynamic decisions, so in
_practice_ it tends to be better to let hardware make some of the
decisions.

We can see the same thing in instruction scheduling: in _theory_ a
compiler can do a better job of scheduling, since it can spend inordinate
amounts of resources on doing things once, and then the hardware can be
simpler and faster and never worry about it. In _practice_, however, the
biggest scheduling decisions are all dynamic at run-time, and depend on
things like cache misses etc, and only total idiots (or embedded people)
will do static scheduling these days.

I think it's a huge mistake to do static interrupt routing for the same
reason.

			Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot
Date: Sat, 07 Oct 2006 19:58:48 UTC
Message-ID: <fa.ZjklsoMxsR/XNSof+GGMmtuKZ1s@ifi.uio.no>

On Sat, 7 Oct 2006, Arjan van de Ven wrote:
>
> it seems the right mix at this time is to have the software select the
> package, and the hardware pick the core within the package.

I think that sounds like a fairly good approach.

Software obviously can make the "rough" selections, it's the fine-grained
ones that are harder (and might need to be done at a frequency that just
makes it impractical).

So yes, having software say "We want to steer this particular interrupt to
this L3 cache domain" sounds eminently sane.

Having software specify which L1 cache domain it wants to pollute is
likely just crazy micro-management.

		Linus

Index Home About Blog