From: Paul E. McKenney on
On Fri, Aug 06, 2010 at 06:33:25PM +0100, Mark Brown wrote:
> On Fri, Aug 06, 2010 at 10:22:26AM -0700, Paul E. McKenney wrote:
> > On Fri, Aug 06, 2010 at 01:30:48PM +0100, Mark Brown wrote:
>
> > > this (the one following the rename to suspend blockers). Essentially
> > > what happens in a mainline context is that some subsystems can with
> > > varying degress of optionality ignore some or all of the instruction to
> > > suspend and keep bits of the system alive during suspend.
>
> > This underscores a basic difference between servers and these embedded
> > devices. When you suspend a server, it is doing nothing, because servers
> > rely very heavily on the CPUs. In contrast, many embedded devices can
> > perform useful work even when the CPUs are completely powered down.
>
> Well, not really from the Linux point of view. It's not massively
> different to something like keeping an ethernet controller sufficiently
> alive to allow it to provide wake on LAN functionality while the system
> is suspended in terms of what Linux has to do, and quite a few servers
> have lights out management systems which aren't a million miles away
> from the modem on a phone in terms of their relationship with the host
> computer.

The wake-on-LAN and the lights-out management systems are indeed
interesting examples, and actually pretty good ones. The reason I
excluded them is that they don't do any application processing -- their
only purpose is the care and feeding of the system itself. In contrast,
the embedded processors are able to do significant applications processing
(e.g., play back music) while any CPUs are completely shut down and most
of the memory is powered down as well.

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Stern on
On Thu, 5 Aug 2010, Arve Hj�nnev�g wrote:

> count, tells you how many times the wakelock was activated. If a
> wakelock prevented suspend for a long time a large count tells you it
> handled a lot of events while a small count tells you it took a long
> time to process the events, or the wakelock was not released properly.

As noted, we already have this.

> expire_count, tells you how many times the timeout expired. For the
> input event wakelock in the android kernel (which has a timeout) an
> expire count that matches the count tells you that someone opened an
> input device but is not reading from it (this has happened several
> times).

This is a little tricky. Rafael's model currently does not allow
wakeup events started by pm_wakeup_event() to be cancelled any way
other than by having their timer expire. This essentially means that
for some devices, expire_count will always be the same as count and for
others it will always be 0. To change this would require adding an
extra timer struct, which could be done (in fact, an earlier version of
the code included it). It would be nice if we could avoid the need.

Does Android use any kernel-internal wakelocks both with a timer and
with active cancellation?

> wake_count, tells you that this is the first wakelock that was
> acquired in the resume path. This is currently less useful than I
> would like on the Nexus One since it is usually "SMD_RPCCALL" which
> does not tell me a lot.

This could be done easily enough, but if it's not very useful then
there's no point.

> active_since, tells you how long a a still active wakelock has been
> active. If someone activated a wakelock and never released it, it will
> be obvious here.

Easily added. But you didn't mention any field saying whether the
wakelock is currently active. That could be added too (although it
would be racy -- but for detecting unreleased wakelocks you wouldn't
care).

> total_time, total time the wake lock has been active. This one should
> be obvious.

Also easily added.

> sleep_time, total time the wake lock has been active when the screen was off.

Not applicable to general systems. Is there anything like it that
_would_ apply in general?

> max_time, longest time the wakelock was active uninterrupted. This
> used less often, but the battery on a device was draining fast, but
> the problem went away before looking at the stats this will show if a
> wakelock was active for a long time.

Again, easily added. The only drawback is that all these additions
will bloat the size of struct device. Of course, that's why you used
separately-allocated structures for your wakelocks. Maybe we can
change to do the same; it seems likely that the majority of device
structures won't ever be used for wakeup events.

> >> and I would prefer that the kernel interfaces would
> >> encourage drivers to block suspend until user space has consumed the
> >> event, which works for the android user space, instead of just long
> >> enough to work with a hypothetical user space power manager.

Rafael doesn't _discourage_ drivers from doing this. However you have
to keep in mind that many kernel developers are accustomed to working
on systems (mostly PCs) with a different range of hardware devices from
embedded systems like your phones. With PCI devices(*), for example,
there's no clear point where a wakeup event gets handed off to
userspace.

On the other hand, there's no reason the input layer shouldn't use
pm_stay_awake and pm_relax. It simply hasn't been implemented yet.

Alan Stern

(*) Speaking of PCI devices, I'm not convinced that the way Rafael is
using the pm_wakeup_event interface in the PCI core is entirely
correct. The idea is to resolve the race between wakeup events and
suspend. The code assumes that a wakeup event will be consumed in 100
ms or less, which is a reasonable assumption.

But what sorts of things qualify as wakeup events? Right now, the code
handles only events coming by way of the PME# signal (or its platform
equivalent). But that signal usually gets activated only when a PCI
device is in a low-power mode; if the device is at full power then it
simply generates an IRQ. It's the same event, but reported to the
kernel in a different way. So consider...

Case 1: The system is suspending and the PCI device has already been
placed in D3hot when an event occurs. PME# is activated,
the wakeup event is reported, the suspend is aborted, and the
system won't try to suspend again for at least 100 ms. Good.

Case 2: The system is running normally and the PCI device is at full
power when an event occurs. PME# isn't activated and
pm_wakeup_event doesn't get called. Then when the system
tries to suspend 25 ms later, there's nothing to prevent it
even though the event is still being processed. Bad.

In case 2 the race has not been resolved. It seems to me that the
only proper solution is to call pm_wakeup_event for _every_ PCI
interrupt. This may be too much to add to a hot path, but what's the
alternative?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on
On Friday, August 06, 2010, Alan Stern wrote:
> On Thu, 5 Aug 2010, Arve Hj�nnev�g wrote:
....
> But what sorts of things qualify as wakeup events? Right now, the code
> handles only events coming by way of the PME# signal (or its platform
> equivalent). But that signal usually gets activated only when a PCI
> device is in a low-power mode; if the device is at full power then it
> simply generates an IRQ. It's the same event, but reported to the
> kernel in a different way. So consider...
>
> Case 1: The system is suspending and the PCI device has already been
> placed in D3hot when an event occurs. PME# is activated,
> the wakeup event is reported, the suspend is aborted, and the
> system won't try to suspend again for at least 100 ms. Good.
>
> Case 2: The system is running normally and the PCI device is at full
> power when an event occurs. PME# isn't activated and
> pm_wakeup_event doesn't get called. Then when the system
> tries to suspend 25 ms later, there's nothing to prevent it
> even though the event is still being processed. Bad.
>
> In case 2 the race has not been resolved. It seems to me that the
> only proper solution is to call pm_wakeup_event for _every_ PCI
> interrupt. This may be too much to add to a hot path, but what's the
> alternative?

Arguably not every PCI interrupt should be regarded as a wakeup event, so
I think we can simply say in the cases when that's necessary the driver should
be responsible for using pm_wakeup_event() or pm_stay_awake() / pm_relax() as
appropriate.

My patch only added it to the bus-level code which covered the PME-based
wakeup events that _cannot_ be handled by device drivers.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: david on
On Fri, 6 Aug 2010, Paul E. McKenney wrote:

> On Fri, Aug 06, 2010 at 01:29:57AM -0700, david(a)lang.hm wrote:
>> On Thu, 5 Aug 2010, Paul E. McKenney wrote:
>>
>>> On Thu, Aug 05, 2010 at 01:26:18PM -0700, david(a)lang.hm wrote:
>>>> On Thu, 5 Aug 2010, kevin granade wrote:
>>>>
>>>>> On Thu, Aug 5, 2010 at 10:46 AM, <david(a)lang.hm> wrote:
>>>>>> On Thu, 5 Aug 2010, Paul E. McKenney wrote:
>>>>>>
>>>>>>> On Wed, Aug 04, 2010 at 10:18:40PM -0700, david(a)lang.hm wrote:
>>>>>>>>
>>>>>>>> On Wed, 4 Aug 2010, Paul E. McKenney wrote:
>>>>>>>>>
>>>>>>>>> On Wed, Aug 04, 2010 at 05:25:53PM -0700, david(a)lang.hm wrote:
>>>>>>>>>>
>>>>>>>>>> On Wed, 4 Aug 2010, Paul E. McKenney wrote:
>>>>>>>
>>>>>>> [ . . . ]
>>>>>>>
>>
>> it would be nice to get network traffic/connection stats.
>>
>> so two questions.
>>
>> first, what else would you need to get accumulated for the cgroup
>>
>> second, is there a fairly easy way to have these stats available?
>>
>> for the 'last time it ran' stat, this seems like you could have a
>> per-cpu variable per cgroup that's fairly cheap to update, but you
>> would need to take a global lock to read accuratly (the lock may be
>> expensive enough that it's worth trying to read the variables from
>> the other cpu without a lock, just to see if it's remotely possible
>> to sleep/suspend)
>>
>> with timers, is it possible to have multiple timer wheels (one per
>> cgroup)?
>
> I apologize in advance for what I am about to write, but...
>
> If you continue in this vein, you are likely to make suspend blockers
> look very simple and natural. ;-)

if that's the case then they should be implemented :-)

on the other hand, this may be something that's desirable for
idle-low-power as well.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: david on
On Fri, 6 Aug 2010, Paul E. McKenney wrote:

> On Fri, Aug 06, 2010 at 06:33:25PM +0100, Mark Brown wrote:
>> On Fri, Aug 06, 2010 at 10:22:26AM -0700, Paul E. McKenney wrote:
>>> On Fri, Aug 06, 2010 at 01:30:48PM +0100, Mark Brown wrote:
>>
>>>> this (the one following the rename to suspend blockers). Essentially
>>>> what happens in a mainline context is that some subsystems can with
>>>> varying degress of optionality ignore some or all of the instruction to
>>>> suspend and keep bits of the system alive during suspend.
>>
>>> This underscores a basic difference between servers and these embedded
>>> devices. When you suspend a server, it is doing nothing, because servers
>>> rely very heavily on the CPUs. In contrast, many embedded devices can
>>> perform useful work even when the CPUs are completely powered down.
>>
>> Well, not really from the Linux point of view. It's not massively
>> different to something like keeping an ethernet controller sufficiently
>> alive to allow it to provide wake on LAN functionality while the system
>> is suspended in terms of what Linux has to do, and quite a few servers
>> have lights out management systems which aren't a million miles away
>> from the modem on a phone in terms of their relationship with the host
>> computer.
>
> The wake-on-LAN and the lights-out management systems are indeed
> interesting examples, and actually pretty good ones. The reason I
> excluded them is that they don't do any application processing -- their
> only purpose is the care and feeding of the system itself. In contrast,
> the embedded processors are able to do significant applications processing
> (e.g., play back music) while any CPUs are completely shut down and most
> of the memory is powered down as well.

one other significant issue is that on the PC, things like wake-on-LAN,
lights out management cards, etc require nothing from the main system
other than power. If they do something, they are sending the signal to the
chipset, which then wakes the system up. they don't interact with the main
processor/memory/etc at all.

So as I see it, we need to do one of two things.

1. change the suspend definition to allow for some things to not be
suspended

or

2. change the sleep/low-power mode definition to have a more standardized
way of turning things off, and extend it to allow clocks to be turned off
as well (today we have things able to be turned off, drive spin-down for
example, but per comments in this thread it's all one-off methods)

to me #2 seems the better thing to do from a design/concept point of view

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/