From: Rafael J. Wysocki on
On Thursday 17 December 2009, Alan Stern wrote:
> On Wed, 16 Dec 2009, Rafael J. Wysocki wrote:
>
> > I've just put the first set of data, for the HP nx6325 at:
> > http://www.sisk.pl/kernel/data/nx6325/
> >
> > The *-dmesg.log files contain full dmesg outputs starting from a cold boot and
> > including one suspend-resume cycle in each case, with debug_initcall enabled.
> >
> > The *-suspend.log files are excerpts from the *-dmesg.log files containing
> > the suspend messages only, and analogously for *-resume.log.
>
> I've just started looking at the sync-suspend.log file. What are all
> the '+' characters and " @ 3368" strings after the device names?

I think the + is necessary for the Arjan's graph-generating script and the
@ number is the value of current (ie. the PID of the calling task).

> You didn't print out the parent name for each device, so the tree
> structure has been lost.

That's because the original Arjan's patch doesn't do that, I'm adding it
right now.

> Why do those "sd 0:0:0:0 [sda]" messages appear in between two
> callbacks? The cache-synchronization and the spin-down commands are
> not executed asynchronously.

Because the data are incomplete. :-(

I've just realized that the Arjan's patch only covers bus types and classes
that have been converted to dev_pm_ops already, so I'm extending it to the
"legacy" ones at the moment.

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on
On Thursday 17 December 2009, Rafael J. Wysocki wrote:
> On Thursday 17 December 2009, Alan Stern wrote:
> > On Wed, 16 Dec 2009, Rafael J. Wysocki wrote:
> >
> > > I've just put the first set of data, for the HP nx6325 at:
> > > http://www.sisk.pl/kernel/data/nx6325/
> > >
> > > The *-dmesg.log files contain full dmesg outputs starting from a cold boot and
> > > including one suspend-resume cycle in each case, with debug_initcall enabled.
> > >
> > > The *-suspend.log files are excerpts from the *-dmesg.log files containing
> > > the suspend messages only, and analogously for *-resume.log.
> >
> > I've just started looking at the sync-suspend.log file. What are all
> > the '+' characters and " @ 3368" strings after the device names?
>
> I think the + is necessary for the Arjan's graph-generating script and the
> @ number is the value of current (ie. the PID of the calling task).
>
> > You didn't print out the parent name for each device, so the tree
> > structure has been lost.
>
> That's because the original Arjan's patch doesn't do that, I'm adding it
> right now.
>
> > Why do those "sd 0:0:0:0 [sda]" messages appear in between two
> > callbacks? The cache-synchronization and the spin-down commands are
> > not executed asynchronously.
>
> Because the data are incomplete. :-(
>
> I've just realized that the Arjan's patch only covers bus types and classes
> that have been converted to dev_pm_ops already, so I'm extending it to the
> "legacy" ones at the moment.

New data files have been uploaded to:

http://www.sisk.pl/kernel/data/nx6325/
http://www.sisk.pl/kernel/data/wind/

Please let me know if you need more information.

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on
On Wednesday 16 December 2009, Rafael J. Wysocki wrote:
> On Wednesday 16 December 2009, Linus Torvalds wrote:
> >
> > On Wed, 16 Dec 2009, Rafael J. Wysocki wrote:
> > > >
> > > > Do you have any sample timing output with devices listed?
> > >
> > > I'm going to generate one shortly.
>
> I've just put the first set of data, for the HP nx6325 at:
> http://www.sisk.pl/kernel/data/nx6325/

As I said in a message to Alan, the data were incomplete, because the original
Arjan's patch only covers bus types and device classes converted to
dev_pm_ops, which I only noticed earlier today. So I added the appended patch
on top of the async tree and I applied a one-liner adding the name of the
parent to each device line during (regular) suspend and resume.

The new data sets are at:

http://www.sisk.pl/kernel/data/nx6325/
http://www.sisk.pl/kernel/data/wind/

and the format is the same as described below.

> The *-dmesg.log files contain full dmesg outputs starting from a cold boot and
> including one suspend-resume cycle in each case, with debug_initcall enabled.
>
> The *-suspend.log files are excerpts from the *-dmesg.log files containing
> the suspend messages only, and analogously for *-resume.log.
>
> The *-times.txt files contain suspend/resume time for every device sorted
> in the decreasing order.
>
> > From my bootup timings, I have this memory of SATA link bringup being
> > noticeable. I wonder if that is the case on resume too...

That actually is correct. On the nx6325 suspend is totally dominated by disk
spindown, almost everything else is negligible compared to it (well, except for
the audio), so we can't go down below 1 s during suspend on this box.

On the Wind, disk spindown time is comparable with serio suspend time,
so at least in principle we should be able to get .5 s suspend on this box -
if the disk spindown in async.

In turn, the resume on the Wind is dominated by disk spinup, so we can't
go below 1.5 s on this box during resume (notice that the "async+extra"
approach brings us close to this limit, although we could save .5 s more in
principle by making more devices async).

Resume on the nx6325 is a different story, though, as it is dominated by USB
and PCI devices, so marking those as async would probably bring us close to
the limit.

[Surprisingly enough to me some ACPI devices appear to take quite noticeable
amounts of time to resume on both boxes.]

Tomorrow I'll try to mark as many devices as reasonably possible as async
and see how the total suspend-resume times change.

Rafael

---
drivers/base/power/main.c | 97 ++++++++++++++++++++++++++++++++++++----------
1 file changed, 77 insertions(+), 20 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -165,6 +165,32 @@ void device_pm_move_last(struct device *
list_move_tail(&dev->power.entry, &dpm_list);
}

+static ktime_t initcall_debug_start(struct device *dev)
+{
+ ktime_t calltime = ktime_set(0, 0);
+
+ if (initcall_debug) {
+ pr_info("calling %s_i+ @ %i\n",
+ dev_name(dev), task_pid_nr(current));
+ calltime = ktime_get();
+ }
+
+ return calltime;
+}
+
+static void initcall_debug_report(struct device *dev, ktime_t calltime,
+ int error)
+{
+ ktime_t delta, rettime;
+
+ if (initcall_debug) {
+ rettime = ktime_get();
+ delta = ktime_sub(rettime, calltime);
+ pr_info("call %s+ returned %d after %Ld usecs\n", dev_name(dev),
+ error, (unsigned long long)ktime_to_ns(delta) >> 10);
+ }
+}
+
/**
* dpm_wait - Wait for a PM operation to complete.
* @dev: Device to wait for.
@@ -201,13 +227,9 @@ static int pm_op(struct device *dev,
pm_message_t state)
{
int error = 0;
- ktime_t calltime, delta, rettime;
+ ktime_t calltime;

- if (initcall_debug) {
- pr_info("calling %s+ @ %i\n",
- dev_name(dev), task_pid_nr(current));
- calltime = ktime_get();
- }
+ calltime = initcall_debug_start(dev);

switch (state.event) {
#ifdef CONFIG_SUSPEND
@@ -256,12 +278,7 @@ static int pm_op(struct device *dev,
error = -EINVAL;
}

- if (initcall_debug) {
- rettime = ktime_get();
- delta = ktime_sub(rettime, calltime);
- pr_info("call %s+ returned %d after %Ld usecs\n", dev_name(dev),
- error, (unsigned long long)ktime_to_ns(delta) >> 10);
- }
+ initcall_debug_report(dev, calltime, error);

return error;
}
@@ -338,8 +355,9 @@ static int pm_noirq_op(struct device *de
if (initcall_debug) {
rettime = ktime_get();
delta = ktime_sub(rettime, calltime);
- printk("initcall %s_i+ returned %d after %Ld usecs\n", dev_name(dev),
- error, (unsigned long long)ktime_to_ns(delta) >> 10);
+ printk("initcall %s_i+ returned %d after %Ld usecs\n",
+ dev_name(dev), error,
+ (unsigned long long)ktime_to_ns(delta) >> 10);
}

return error;
@@ -456,6 +474,26 @@ void dpm_resume_noirq(pm_message_t state
EXPORT_SYMBOL_GPL(dpm_resume_noirq);

/**
+ * legacy_resume - Execute a legacy (bus or class) resume callback for device.
+ * dev: Device to resume.
+ * cb: Resume callback to execute.
+ */
+static int legacy_resume(struct device *dev, int (*cb)(struct device *dev))
+{
+ int error;
+ ktime_t calltime;
+
+ calltime = initcall_debug_start(dev);
+
+ error = cb(dev);
+ suspend_report_result(cb, error);
+
+ initcall_debug_report(dev, calltime, error);
+
+ return error;
+}
+
+/**
* __device_resume - Execute "resume" callbacks for given device.
* @dev: Device to handle.
* @state: PM transition of the system being carried out.
@@ -477,7 +515,7 @@ static int __device_resume(struct device
error = pm_op(dev, dev->bus->pm, state);
} else if (dev->bus->resume) {
pm_dev_dbg(dev, state, "legacy ");
- error = dev->bus->resume(dev);
+ error = legacy_resume(dev, dev->bus->resume);
}
if (error)
goto End;
@@ -498,7 +536,7 @@ static int __device_resume(struct device
error = pm_op(dev, dev->class->pm, state);
} else if (dev->class->resume) {
pm_dev_dbg(dev, state, "legacy class ");
- error = dev->class->resume(dev);
+ error = legacy_resume(dev, dev->class->resume);
}
}
End:
@@ -734,6 +772,27 @@ EXPORT_SYMBOL_GPL(dpm_suspend_noirq);
static int async_error;

/**
+ * legacy_suspend - Execute a legacy (bus or class) suspend callback for device.
+ * dev: Device to suspend.
+ * cb: Suspend callback to execute.
+ */
+static int legacy_suspend(struct device *dev, pm_message_t state,
+ int (*cb)(struct device *dev, pm_message_t state))
+{
+ int error;
+ ktime_t calltime;
+
+ calltime = initcall_debug_start(dev);
+
+ error = cb(dev, state);
+ suspend_report_result(cb, error);
+
+ initcall_debug_report(dev, calltime, error);
+
+ return error;
+}
+
+/**
* device_suspend - Execute "suspend" callbacks for given device.
* @dev: Device to handle.
* @state: PM transition of the system being carried out.
@@ -755,8 +814,7 @@ static int __device_suspend(struct devic
error = pm_op(dev, dev->class->pm, state);
} else if (dev->class->suspend) {
pm_dev_dbg(dev, state, "legacy class ");
- error = dev->class->suspend(dev, state);
- suspend_report_result(dev->class->suspend, error);
+ error = legacy_suspend(dev, state, dev->class->suspend);
}
if (error)
goto End;
@@ -777,8 +835,7 @@ static int __device_suspend(struct devic
error = pm_op(dev, dev->bus->pm, state);
} else if (dev->bus->suspend) {
pm_dev_dbg(dev, state, "legacy ");
- error = dev->bus->suspend(dev, state);
- suspend_report_result(dev->bus->suspend, error);
+ error = legacy_suspend(dev, state, dev->bus->suspend);
}
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on
On Thursday 17 December 2009, Alan Stern wrote:
> On Thu, 17 Dec 2009, Rafael J. Wysocki wrote:
>
> > That actually is correct. On the nx6325 suspend is totally dominated by disk
> > spindown, almost everything else is negligible compared to it (well, except for
> > the audio), so we can't go down below 1 s during suspend on this box.
> >
> > On the Wind, disk spindown time is comparable with serio suspend time,
> > so at least in principle we should be able to get .5 s suspend on this box -
> > if the disk spindown in async.
> >
> > In turn, the resume on the Wind is dominated by disk spinup, so we can't
> > go below 1.5 s on this box during resume (notice that the "async+extra"
> > approach brings us close to this limit, although we could save .5 s more in
> > principle by making more devices async).
> >
> > Resume on the nx6325 is a different story, though, as it is dominated by USB
> > and PCI devices, so marking those as async would probably bring us close to
> > the limit.
>
> The implications seem pretty clear. If the following sorts of devices
> were async:
>
> USB (devices and interfaces), PCI, serio, SCSI (hosts, targets,
> devices)

Plus ACPI battery.

> then we would reap close to the maximum benefit -- providing:
>
> async threads are started in a first pass without waiting
> for synchronous devices, and

Agreed.

> It's not clear that making all these types of devices async will really
> work, but it's worth testing.

I'm working on it.

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on
On Thursday 17 December 2009, Rafael J. Wysocki wrote:
....
> Tomorrow I'll try to mark as many devices as reasonably possible as async
> and see how the total suspend-resume times change.

I didn't manage to do that, but I was able to mark sd and i8042 as async and
see the impact of this.

The raw data are in the usual place:

http://www.sisk.pl/kernel/data/async-suspend-resume.pdf

and the individual device timings and logs are in:

http://www.sisk.pl/kernel/data/nx6325/
http://www.sisk.pl/kernel/data/wind/

This is the summary (previous results are inculded for easier reference):

HP nx6325 MSI Wind U100

sync suspend 1482 (+/- 40) 1180 (+/- 24)
sync resume 2955 (+/- 2) 3597 (+/- 25)

async suspend 1553 (+/- 49) 1177 (+/- 32)
async resume 2692 (+/- 326) 3556 (+/- 33)

async+one-liner suspend 1600 (+/- 39) 1212 (+/- 41)
async+one-liner resume 2692 (+/- 324) 3579 (+/- 24)

async+extra suspend 1496 (+/- 37) 1217 (+/- 38)
async+extra resume 1859 (+/- 114) 1923 (+/- 35)

with "async" i8042 and sd:

async suspend 1319 (+/- 51) 1045 (+/- 41)
async resume 2929 (+/- 3) 3546 (+/- 27)

async+extra suspend 1327 (+/- 36) (didn't work)
async+extra resume 1742 (+/- 164) 1896 (+/- 28)

(the summary is also available at: http://www.sisk.pl/kernel/data/results.txt).

So, it actually makes the case for async suspend! Although it's not very
strong, with these two additional devices marked as "async" we get noticeable
suspend time improvement.

Still, the "extra" patch doesn't help on suspend at all and on the Wind the
suspend part of it didn't even work (I'm yet to figure out which of the two
devices crashed the suspend). Nevertheless the resume part of the "extra"
patch worked in both cases and worked better than without the two additional
"async" devices.

To me, this means that the suspend part of the "extra" patch is not really
useful. However, the resume part of it is _very_ useful, so I'd like to add
that part only to the async patchset. The explanation why it helps so much
is also straightforward to me. Namely, if slow async devices are last to
resume, then without the "extra" patch they need to wait for all of the
preceding sync devices and the speedup from executing their resume routines
asynchronously is very limited. Now, with the "extra" patch their resume
routines start as soon as their parents complete resuming and that may be
early enough for the speedup to be significant.

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/