Suspend (Linus Torvalds; Theodore Tso)

Index Home About Blog

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [linux-usb-devel] Re: 2.6.13-mm2
Date: Thu, 29 Sep 2005 15:37:58 UTC
Message-ID: <fa.g0ap3bd.m007an@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.58.0509290832190.3308@g5.osdl.org>

On Wed, 28 Sep 2005, David Brownell wrote:
>
> You could try adding
>
> 	ohci_writel(ohci, OHCI_INTR_MIE, &ohci->regs->intrdisable);
>
> near the end of ohci_pci_suspend().

Give it up.

The right thing is to not free and re-acquire the damn interrupt in the
first place. It was a MISTAKE. We undid the ACPI braindamage that made it
be required a month ago, because sane people REALIZED it was a mistake.

It's not just "random luck" that not releasing the interrupt over suspend
fixes the problem. The problem is _due_ to drivers releasing the
interrupt in the first place.

IT DOESN'T MATTER what we do before the suspend, because we don't control
the wakeup sequence. If the BIOS wakeup enables the devices again, the
fact that we disabled them on suspend makes zero difference.

And yes, we can always "fix" things by selecting the right order to
re-acquire the interrupts, but the thing is, the "right order" will be
machine-dependent and in general depend on the phase of the moon and BIOS
version, and ACPI quirks.

The _only_ sane thing to do is to not drop the interrupts in the first
place. So that if you start getting interrupts before you expect them, you
can still handle them.

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] PM: suspend/resume debugging should depend on
Date: Sun, 19 Nov 2006 17:37:01 UTC
Message-ID: <fa.YMVQ6sabKF/IkEHUCoiQoxoHWZA@ifi.uio.no>

On Sun, 19 Nov 2006, Chuck Ebbert wrote:
>
> When doing 'make oldconfig' we should ask about suspend/resume
> debug features when SOFTWARE_SUSPEND is not enabled.

That's wrong.

I never use SOFTWARE_SUSPEND, and I think the whole concept is totally
broken.

Sane people use suspend-to-ram, and that's when you need the suspend and
resume debugging.

Software-suspend is silly. I want my machine back in three seconds, not
waiting for minutes..

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] PM: suspend/resume debugging should depend on
Date: Sun, 19 Nov 2006 18:24:27 UTC
Message-ID: <fa.I2W5LoVPlRbd+ggMofH4+G1Vo58@ifi.uio.no>

On Sun, 19 Nov 2006, Rafael J. Wysocki wrote:
>
> In fact that's up to 30 seconds on a modern box, usually less than that.

Right. If the machine boots quickly, it's fast. Of course, if the machine
boots quickly, you might as well often just shut down and reboot.

> And suspend-to-ram doesn't work on quilte a lot of boxes right now.  Also, you
> can use the software suspend on boxes that don't support the suspend-to-ram
> at all.

One large reason STR often doesn't work is that people don't even test it,
because people point to the suspend-to-disk instead.  suspend-to-disk is
the problem, not the solution.

I've been working at making the machines I have able to STR, and almost
always it's a driver that is buggy. Thank God for the suspend/resume
debugging - the thing that Chuck tried to disable. That's often the _only_
way to debug these things, and it's actually pretty powerful (but
time-consuming - having to insert TRACE_RESUME() markers into the device
driver that doesn't resume and recompile and reboot).

Anyway, the way to debug this for people who are interested (have a
machine that doesn't boot) is:

 - enable PM_DEBUG, and PM_TRACE

 - use a script like this:

	#!/bin/sh
	sync
	echo 1 > /sys/power/pm_trace
	echo mem > /sys/power/state

   to suspend

 - if it doesn't come back up (which is usually the problem), reboot by
   holding the power button down, and look at the dmesg output for things
   like

	Magic number: 4:156:725
	hash matches drivers/base/power/resume.c:28
	hash matches device 0000:01:00.0

   which means that the last trace event was just before trying to resume
   device 0000:01:00.0. Then figure out what driver is controlling that
   device (lspci and /sys/devices/pci* is your friend), and see if you can
   fix it, disable it, or trace into its resume function.

For example, the above happens to be the VGA device on my EVO, which I
used to run with "radeonfb" (it's an ATI Radeon mobility). It turns out
that "radeonfb" simply cannot resume that device - it tries to set the
PLL's, and it just _hangs_. Using the regular VGA console and letting X
resume it instead works fine.

The point being that PM_TRACE is wonderful, and it's wonderful exactly for
NOT using suspend-to-disk. The other point being that people have gotten
lazy, and accept half a minute (minimum - usually it's longer) boot times,
when STR is a lot more pleasant, but it does require some detective work
when it doesn't boot.

I wish more people tried STR, instead of having the STD people tell them
not to!

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] PM: suspend/resume debugging should depend on
Date: Sun, 19 Nov 2006 19:55:43 UTC
Message-ID: <fa.gDZyHBEpP/fioyAWyQaiU+nbLvw@ifi.uio.no>

On Sun, 19 Nov 2006, Rafael J. Wysocki wrote:
>
> > because people point to the suspend-to-disk instead.
>
> Who they?

Like you _just_ did.

> >  - enable PM_DEBUG, and PM_TRACE
>
> This only works on i386, no?

Right now the trivial functions are only available on i386, yes. The
concept works anywhere that has a CMOS chip, so if somebody were to spend
a few minutes testing it on x86-64 and others, it would work elsewhere
too..

> I don't know of anyone who's doing that.

I know. I'm probably the only one. Frustrating.

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] PM: suspend/resume debugging should depend on
Date: Sun, 19 Nov 2006 23:25:57 UTC
Message-ID: <fa.AV868hkuSPCR4uyX+BICEcZ0h4U@ifi.uio.no>

On Sun, 19 Nov 2006, Rafael J. Wysocki wrote:
> > concept works anywhere that has a CMOS chip, so if somebody were to spend
> > a few minutes testing it on x86-64 and others, it would work elsewhere
> > too..
>
> I can do that if someone gives me the code.

Well, I actually _think_ it works almost as-is on x86-64 too, but the
magic is all in that small inline asm thing in <linux/resume-trace.h>.

The ".long" in that inline asm probably needs to be a ".quad" - see gow
"generate_resume_trace()" uses it.

Also the "show_file_hash()" right now has a "tracedata += 6", and it
should probably be "tracedata += 2 + sizeof(void *)" instead.

IOW, it really should be very easy to get to work on x86-64, I just
haven't had the energy or inclination, since the only devices I've
personally used it on have been regular 32-bit ones..

		Linus

From: Theodore Tso <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: [patch] PM: suspend/resume debugging should depend on 
	SOFTWARE_SUSPEND
Date: Mon, 20 Nov 2006 22:19:24 UTC
Message-ID: <fa.m2GGcUvRLIHCs9OKwyM/bJTbkYY@ifi.uio.no>

One reason why I've generally avoided using suspend-to-ram is that
after my laptop (IBM Thinkpad T60p) comes out of suspend-to-ram, it is
consuming 30W of power --- as opposed to before I hibernated, when my
laptop was consuming only 17W or 18W of power.  (And even without
doing things like unloading all of my USB modules, normally my laptop
will consume about 24W after a fresh reboot --- which makes the 30W
power consumption after a suspend-to-ram especially troubling,)

If I unload all of my USB modules, and shutdown the parallel port, the
wired ethernet port, etc., I get power savinges down to 17W --- and
once I was able to push it all the way down to 15W, although in
practice it's much more common that I can get the power consumption
down to 17W or 18W.  Unfortunately, after the laptop wakss up from a
suspend-to-ram, the laptop is apparently powering up all of the
devices in high power mode while I can get some of the power savings
back by manually loading and then unlodaing a whole bunch of device
drivers, in practice that only gets me from 30W to 24W or so.  This is
probably much more of a hardware bug than an OS bug, but because of
this fact, if I'm going to be running the laptop for any amount of
time after resume, I'm better off using suspend-to-disk.

If someone has a suggestion for how I can save the power state of all
of the various components in my laptop so that the laptop can be
brought back to the 18W state after a suspend-to-ram, I'm all ears....

						- Ted

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: CONFIG_SUSPEND? (was: Re: [GIT PATCH] ACPI patches for
Date: Sat, 28 Jul 2007 16:56:43 UTC
Message-ID: <fa.gFvYhxiEgHrL5ICg+Q2NJLr1iMU@ifi.uio.no>

On Sat, 28 Jul 2007, Linus Torvalds wrote:
>
> And it's the *top*level* code that selects HOTPLUG_CPU. Through
> SUSPEND_SMP (which will select HOTPLUG_CPU) and SOFTWARE_SUSPEND.

In other words, the problem seems to be that

	kernel/power/main.c:
		suspend_devices_and_enter()

does the proper "disable/enable_nonboot_cpus()", but it does so without
having enabled CPU hotplug.

And you seem to think that it's ACPI that should enable the hotplug, even
though the code that actually needs it is _outside_ ACPI. And I think
that's wrong, and that this is a bug.

So I think the real issue is that we allow that
"suspend_devices_and_enter()" code to be compiled without HOTPLUG_CPU in
the first place. It's not supposed to work that way.

Of course, it may well be that other architectures can happily suspend
even with multiple CPU's active, which may be the cause of this mess. But
I really think it shouldn't be ACPI that has to select the CPU hotplug,
since it's not ACPI that _uses_ it in the first place.

Rafael: making a config option for STR (the same way we have a config
option for hibernate), and just not allowing it on SMP without HOTPLUG_CPU
seems to be the right thing. Len is right in that we do insane things
right now (trying to STR with multiple CPU's still active), and I just
don't think he's the one that should work around it!

		Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [Suspend-devel] 2.6.25-rc2 System no longer powers off
Date: Fri, 22 Feb 2008 01:29:44 UTC
Message-ID: <fa.G5UEXCeephlSKYE22ejmWvJJIJI@ifi.uio.no>

On Thu, 21 Feb 2008, Jesse Barnes wrote:
>
> So the advantage of the kernel suspend/resume hooks for the DRM layer is that
> the kernel video drivers can do full state save/restore (which X usually
> doesn't do, and isn't really designed to do), so that if your platform
> *doesn't* do it all, you'll still end up with a usable machine in the end.

Well, I'm also hoping that eventually we could even just not do the VT
switch at all, and the kernel can treat X as "just another user process"
that it freezes.

At least from a mode setting standpoint.

We'd still want to make sure that X repaints the screen if the contents
were lost, of course. And this is going to depend very intimately on the
type of graphics card and whether the video RAM is saved by STR or not -
for the Intel integrated graphics kind of situation, the video RAM will be
refreshed along with all the other memory, but for other cards we may end
up having to do the VT switch not so much for modesetting reasons as just
a way to get X to save and restore all the *other* state.

How close is the i915 driver from not having to even signal X? Or is that
just a pipedream of mine?

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts
Date: Sun, 08 Mar 2009 17:21:26 UTC
Message-ID: <fa.1Xcv8+J+QHwlkzGLPBvgA0i3kiI@ifi.uio.no>

On Sat, 7 Mar 2009, Alan Stern wrote:
>
> You didn't answer my question.  Why bother to distinguish between
> "wake-up" interrupts and non-"wake-up" interrupts?
>
> In other words, why not simply abort the suspend if IRQ_PENDING is set
> for _any_ interrupt during sysdev_suspend()?

. because some drivers might not actually shut down the hardware until
they get to "suspend_late"? If even then, for that matter - a driver may
simply not care, knowing that the hardware will be powered off, and will
be re-initialized at resume.

The thinking that you have to shut your hardware down at "->suspend()"
time is a _disease_. There are literally classes of hardware out there
where that would be an outright _bug_, like for a PCI bridge device. For
many devices, "suspend()" has to be the phase where you shut down the
_external_ stuff (eg for a disk controller, it's when you'd flush and stop
your disks), but the controller itself may well be alive until later.

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts
Date: Mon, 09 Mar 2009 15:01:22 UTC
Message-ID: <fa.YRgIH7AEJQ9SUW6lxbQZiRE+Umo@ifi.uio.no>

On Sun, 8 Mar 2009, Alan Stern wrote:
>
> There have been examples in the past of devices that, for one reason or
> another, _did_ generate IRQs at inconvenient times.  The hardware or
> the BIOS may have done improper initialization, for example.  On a
> shared IRQ this led to interrupt storms.  IIRC, the solution was to add
> a PCI quirk routine to disable IRQ generation at an early stage.
> Didn't e100 have this problem?

. and this is exactly the reason why we've done all these changes.

There are tons of drivers that are unable to cope with interrupts that
happen after they've done their "pci_set_power_state(PCI_D3hot)".

With shared interrupts (and _another_ device still live), they do stupid
things like read the interrupt status register, getting all-ones (because
the device is dead), and then deciding that that means that that need to
handle the interrupt. And that goes on in a loop. Forever.

Or they do _that_ part right, but their suspend also free'd some data
structure, so now the interrupt handler will follow a NULL pointer and/or
scribble to freed memory. The source of bugs is infinite, and not fixable
(because, quite frankly, most device driver writers are very focused on
the hardware, and have a hard time thinking about it as part of the bigger
system - and even if they happen test suspend/resume, they probably won't
be testing it with shared interrupts, so it will work _for_them_ even if
it's totally broken).

So what all the PCI changes try to do is to basically not have the driver
do the "pci_set_power_state(PCI_D3)" at _all_, and do it in the PCI layer.
But more importantly, it needs to be done _after_ interrupts have been
disabled for this all to work. And, for exactly the same reason, the PCI
layer needs to wake the device up and restore its config space _before_
enabling interrupts again, and _before_ doing any ->resume calls.

And that, in turn, means that since we have all these ACPI ordering
things, and many cases want to use ACPI to wake things up, and/or have
delays etc, we end up actually wanting things like timer interrupts
working at that time - but not normal "device" interrupts. Because many
delays do need them, even as simple delays as the (fairly short, but not
"busy loop" short) one for turning the device back into PCI_D0 again.

So this literally explains all the re-ordering, and all the interrupt
games we now play in Rafael's patch-set. The _whole_ (and only) point is
to make it easier for device drivers, while also changing the environment
so that we can call ACPI and we can sleep even before the devices have
really resumed (or even early_resume'd).

			Linus

From: Linus Torvalds <torvalds@linux-foundation.org>
Newsgroups: fa.linux.kernel
Subject: Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts
Date: Mon, 09 Mar 2009 15:42:11 UTC
Message-ID: <fa.QvM0o3EjhNo2k0l1LVNN3CM087c@ifi.uio.no>

On Mon, 9 Mar 2009, Alan Stern wrote:
>
> I see.  The unstated key point is this:
>
> 	Unsophisticated drivers [...]

Another key point is:

 - _un_sophisticated is the norm, and anybody who expects otherwise is
   living in some odd la-la-land together with his or her pink unicorn and
   endless supplies of quaaludes.

The thing is, we have about a metric sh*tload of drivers, and many of them
are effectively written by people who don't really do kernel work, and are
basically unmaintained in the long run (ie they may be maintained while
written, but two years down the line they have a couple of hundred users
and nobody who really cares about it, because the original author long
since moved on to fancier hardware).

			Linus

Index Home About Blog