From: Alex Deucher on
On Mon, Jun 14, 2010 at 10:53 AM, Rafael J. Wysocki <rjw(a)sisk.pl> wrote:
> Alex, Dave,
>
> I'm afraid hibernation is broken on all machines using radeon/KMS with r300
> after commit ce8f53709bf440100cb9d31b1303291551cf517f
> (drm/radeon/kms/pm: rework power management). �At least, I'm able to reproduce
> the symptom, which is that the machine hangs hard around the point where an
> image is created (probably during the device thaw phase), on two different
> boxes with r300 (the output of lspci from one of them is attached for
> reference, the other one is HP nx6325).
>
> Suspend to RAM appears to work fine at least on one of the affected boxes.
>
> Unfortunately, the commit above changes a lot of code and it's not too easy to
> figure out what's wrong with it and I didn't have the time to look more into
> details of this failure. �However, it looks like you use .suspend() and
> .resume() callbacks as .freeze() and .thaw() which may not be 100% correct
> (in fact it looks like the "legacy" PCI suspend/resume is used, which is not
> recommended any more).
>

Does it work any better after Dave's last drm pull request? With the
latest changes, pm should not be a factor unless it's explicitly
enabled via sysfs.

Alex

> Thanks,
> Rafael
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alex Deucher on
On Mon, Jun 14, 2010 at 3:03 PM, Rafael J. Wysocki <rjw(a)sisk.pl> wrote:
> On Monday, June 14, 2010, Alex Deucher wrote:
>> On Mon, Jun 14, 2010 at 10:53 AM, Rafael J. Wysocki <rjw(a)sisk.pl> wrote:
>> > Alex, Dave,
>> >
>> > I'm afraid hibernation is broken on all machines using radeon/KMS with r300
>> > after commit ce8f53709bf440100cb9d31b1303291551cf517f
>> > (drm/radeon/kms/pm: rework power management). �At least, I'm able to reproduce
>> > the symptom, which is that the machine hangs hard around the point where an
>> > image is created (probably during the device thaw phase), on two different
>> > boxes with r300 (the output of lspci from one of them is attached for
>> > reference, the other one is HP nx6325).
>> >
>> > Suspend to RAM appears to work fine at least on one of the affected boxes.
>> >
>> > Unfortunately, the commit above changes a lot of code and it's not too easy to
>> > figure out what's wrong with it and I didn't have the time to look more into
>> > details of this failure. �However, it looks like you use .suspend() and
>> > .resume() callbacks as .freeze() and .thaw() which may not be 100% correct
>> > (in fact it looks like the "legacy" PCI suspend/resume is used, which is not
>> > recommended any more).
>> >
>>
>> Does it work any better after Dave's last drm pull request?
>
> Nope. �The symptom is slightly different, though, because now it hangs after
> turning off the screen.
>
>> With the latest changes, pm should not be a factor unless it's explicitly
>> enabled via sysfs.
>
> Well, I guess the first pm patch changed more than just pm, then.

Does this patch help?
http://lists.freedesktop.org/archives/dri-devel/2010-June/001314.html

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rafał Miłecki on
2010/6/14 Rafael J. Wysocki <rjw(a)sisk.pl>:
> On Monday, June 14, 2010, Alex Deucher wrote:
>> On Mon, Jun 14, 2010 at 10:53 AM, Rafael J. Wysocki <rjw(a)sisk.pl> wrote:
>> > Alex, Dave,
>> >
>> > I'm afraid hibernation is broken on all machines using radeon/KMS with r300
>> > after commit ce8f53709bf440100cb9d31b1303291551cf517f
>> > (drm/radeon/kms/pm: rework power management).  At least, I'm able to reproduce
>> > the symptom, which is that the machine hangs hard around the point where an
>> > image is created (probably during the device thaw phase), on two different
>> > boxes with r300 (the output of lspci from one of them is attached for
>> > reference, the other one is HP nx6325).
>> >
>> > Suspend to RAM appears to work fine at least on one of the affected boxes.
>> >
>> > Unfortunately, the commit above changes a lot of code and it's not too easy to
>> > figure out what's wrong with it and I didn't have the time to look more into
>> > details of this failure.  However, it looks like you use .suspend() and
>> > .resume() callbacks as .freeze() and .thaw() which may not be 100% correct
>> > (in fact it looks like the "legacy" PCI suspend/resume is used, which is not
>> > recommended any more).
>> >
>>
>> Does it work any better after Dave's last drm pull request?
>
> Nope.  The symptom is slightly different, though, because now it hangs after
> turning off the screen.

"Just" turning the screen off (like dpms force off) or turning it off
while suspending?

Maybe it could be worth enabling drm.debug option and logging using
netconsole or similar...

--
Rafał
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alex Deucher on
On Wed, Jun 16, 2010 at 4:16 PM, Rafael J. Wysocki <rjw(a)sisk.pl> wrote:
> On Wednesday, June 16, 2010, Ondrej Zary wrote:
>> On Wednesday 16 June 2010, Rafael J. Wysocki wrote:
>> > On Tuesday, June 15, 2010, Rafael J. Wysocki wrote:
>> > > On Monday, June 14, 2010, Alex Deucher wrote:
>> > > > On Mon, Jun 14, 2010 at 3:03 PM, Rafael J. Wysocki <rjw(a)sisk.pl> wrote:
>> > > > > On Monday, June 14, 2010, Alex Deucher wrote:
>> > > > >> On Mon, Jun 14, 2010 at 10:53 AM, Rafael J. Wysocki <rjw(a)sisk.pl>
>> wrote:
>> > > > >> > Alex, Dave,
>> > > > >> >
>> > > > >> > I'm afraid hibernation is broken on all machines using radeon/KMS
>> > > > >> > with r300 after commit ce8f53709bf440100cb9d31b1303291551cf517f
>> > > > >> > (drm/radeon/kms/pm: rework power management).  At least, I'm able
>> > > > >> > to reproduce the symptom, which is that the machine hangs hard
>> > > > >> > around the point where an image is created (probably during the
>> > > > >> > device thaw phase), on two different boxes with r300 (the output
>> > > > >> > of lspci from one of them is attached for reference, the other one
>> > > > >> > is HP nx6325).
>> > > > >> >
>> > > > >> > Suspend to RAM appears to work fine at least on one of the
>> > > > >> > affected boxes.
>> > > > >> >
>> > > > >> > Unfortunately, the commit above changes a lot of code and it's not
>> > > > >> > too easy to figure out what's wrong with it and I didn't have the
>> > > > >> > time to look more into details of this failure.  However, it looks
>> > > > >> > like you use .suspend() and .resume() callbacks as .freeze() and
>> > > > >> > .thaw() which may not be 100% correct (in fact it looks like the
>> > > > >> > "legacy" PCI suspend/resume is used, which is not recommended any
>> > > > >> > more).
>> > > > >>
>> > > > >> Does it work any better after Dave's last drm pull request?
>> > > > >
>> > > > > Nope.  The symptom is slightly different, though, because now it
>> > > > > hangs after turning off the screen.
>> > > > >
>> > > > >> With the latest changes, pm should not be a factor unless it's
>> > > > >> explicitly enabled via sysfs.
>> > > > >
>> > > > > Well, I guess the first pm patch changed more than just pm, then..
>> > > >
>> > > > Does this patch help?
>> > > > http://lists.freedesktop.org/archives/dri-devel/2010-June/001314.html
>> > >
>> > > No, it doesn't.  I try to hibernate, everything works to the point where
>> > > the screen goes off and the box hangs (solid).  Normally, it would turn
>> > > the screen back on and continue with saving the image.
>> > >
>> > > But, since that happens with the patch above applied, I think it doesn't
>> > > really pass the suspend phase (IOW, it probably hangs somewhere in the
>> > > radeon's suspend routine).
>> >
>> > I've just verified that in fact hibernation works on HP nx6325 with
>> > 2.6.35-rc3, but it takes about 55 sec. to suspend the graphics adapter in
>> > the "freeze" phase.  Surprisingly enough, during suspend to RAM it works
>> > normally (as well as in the "poweroff" phase of hibernation).
>>
>> It takes 2 minutes on RV530:
>> https://bugzilla.redhat.com/show_bug.cgi?id=586522
>
> Well, my second affected box appears to hang somewhere in the radeon's suspend
> routine.

Does the attached patch help?

Alex
From: Alex Deucher on
2010/6/17 Rafael J. Wysocki <rjw(a)sisk.pl>:
> On Wednesday, June 16, 2010, Alex Deucher wrote:
>> On Wed, Jun 16, 2010 at 4:16 PM, Rafael J. Wysocki <rjw(a)sisk.pl> wrote:
>> > On Wednesday, June 16, 2010, Ondrej Zary wrote:
>> >> On Wednesday 16 June 2010, Rafael J. Wysocki wrote:
>> >> > On Tuesday, June 15, 2010, Rafael J. Wysocki wrote:
>> >> > > On Monday, June 14, 2010, Alex Deucher wrote:
>> >> > > > On Mon, Jun 14, 2010 at 3:03 PM, Rafael J. Wysocki <rjw(a)sisk.pl> wrote:
>> >> > > > > On Monday, June 14, 2010, Alex Deucher wrote:
>> >> > > > >> On Mon, Jun 14, 2010 at 10:53 AM, Rafael J. Wysocki <rjw(a)sisk.pl>
>> >> wrote:
>> >> > > > >> > Alex, Dave,
>> >> > > > >> >
>> >> > > > >> > I'm afraid hibernation is broken on all machines using radeon/KMS
>> >> > > > >> > with r300 after commit ce8f53709bf440100cb9d31b1303291551cf517f
>> >> > > > >> > (drm/radeon/kms/pm: rework power management). �At least, I'm able
>> >> > > > >> > to reproduce the symptom, which is that the machine hangs hard
>> >> > > > >> > around the point where an image is created (probably during the
>> >> > > > >> > device thaw phase), on two different boxes with r300 (the output
>> >> > > > >> > of lspci from one of them is attached for reference, the other one
>> >> > > > >> > is HP nx6325).
>> >> > > > >> >
>> >> > > > >> > Suspend to RAM appears to work fine at least on one of the
>> >> > > > >> > affected boxes.
>> >> > > > >> >
>> >> > > > >> > Unfortunately, the commit above changes a lot of code and it's not
>> >> > > > >> > too easy to figure out what's wrong with it and I didn't have the
>> >> > > > >> > time to look more into details of this failure. �However, it looks
>> >> > > > >> > like you use .suspend() and .resume() callbacks as .freeze() and
>> >> > > > >> > .thaw() which may not be 100% correct (in fact it looks like the
>> >> > > > >> > "legacy" PCI suspend/resume is used, which is not recommended any
>> >> > > > >> > more).
>> >> > > > >>
>> >> > > > >> Does it work any better after Dave's last drm pull request?
>> >> > > > >
>> >> > > > > Nope. �The symptom is slightly different, though, because now it
>> >> > > > > hangs after turning off the screen.
>> >> > > > >
>> >> > > > >> With the latest changes, pm should not be a factor unless it's
>> >> > > > >> explicitly enabled via sysfs.
>> >> > > > >
>> >> > > > > Well, I guess the first pm patch changed more than just pm, then.
>> >> > > >
>> >> > > > Does this patch help?
>> >> > > > http://lists.freedesktop.org/archives/dri-devel/2010-June/001314.html
>> >> > >
>> >> > > No, it doesn't. �I try to hibernate, everything works to the point where
>> >> > > the screen goes off and the box hangs (solid). �Normally, it would turn
>> >> > > the screen back on and continue with saving the image.
>> >> > >
>> >> > > But, since that happens with the patch above applied, I think it doesn't
>> >> > > really pass the suspend phase (IOW, it probably hangs somewhere in the
>> >> > > radeon's suspend routine).
>> >> >
>> >> > I've just verified that in fact hibernation works on HP nx6325 with
>> >> > 2.6.35-rc3, but it takes about 55 sec. to suspend the graphics adapter in
>> >> > the "freeze" phase. �Surprisingly enough, during suspend to RAM it works
>> >> > normally (as well as in the "poweroff" phase of hibernation).
>> >>
>> >> It takes 2 minutes on RV530:
>> >> https://bugzilla.redhat.com/show_bug.cgi?id=586522
>> >
>> > Well, my second affected box appears to hang somewhere in the radeon's suspend
>> > routine.
>>
>> Does the attached patch help?
>
> It helps, but from what I can see in the code, it still has a few problems.
>
> First, the mutex around cancel_delayed_work() in radeon_pm_suspend()
> doesn't really serve any purpose, because rdev->pm.pm_method cannot change
> at this point and cancel_delayed_work() only tries to delete the work's timer.
> Moreover, it doesn't prevent the work handler from running, in which case the
> handler can do some wrong things and will rearm itself to do some more wrong
> things going forward. �So, I think it's better to wait for the handler to run in case it's
> already been queued up and it should also be prevented from rearming itself in
> that case.
>
> Second, in radeon_set_pm_method() the cancel_delayed_work() is not sufficient
> to prevent the work handler from running and queing up itself for the next run
> (the failure scenario is that cancel_delayed_work_sync() returns 0, so the
> handler is run, it waits on the mutex and then rearms itself after the mutex
> has been released), so it looks like cancel_delayed_work_sync()
> should be used to make sure it's not going to run again, but calling
> that cancel_delayed_work_sync() from under the mutex is not a good idea.
>
> Finally, there's a potential deadlock in radeon_pm_fini(), where
> cancel_delayed_work_sync() is called under rdev->pm.mutex, but the
> work handler tries to acquire the same mutex (if it wins the race).
>
> So, I think something like the appended patch is needed.
>

Looks reasonable. Does it fix the suspend issue?

Alex

> Rafael
>
>
> Signed-off-by: Rafael J. Wysocki <rjw(a)sisk.pl>
> ---
> �drivers/gpu/drm/radeon/radeon.h � �| � �3 +-
> �drivers/gpu/drm/radeon/radeon_pm.c | � 41 ++++++++++++++++++++++++++++++-------
> �2 files changed, 36 insertions(+), 8 deletions(-)
>
> Index: linux-2.6/drivers/gpu/drm/radeon/radeon_pm.c
> ===================================================================
> --- linux-2.6.orig/drivers/gpu/drm/radeon/radeon_pm.c
> +++ linux-2.6/drivers/gpu/drm/radeon/radeon_pm.c
> @@ -397,13 +397,20 @@ static ssize_t radeon_set_pm_method(stru
> � � � � � � � �rdev->pm.dynpm_planned_action = DYNPM_ACTION_DEFAULT;
> � � � � � � � �mutex_unlock(&rdev->pm.mutex);
> � � � �} else if (strncmp("profile", buf, strlen("profile")) == 0) {
> + � � � � � � � bool flush_wq = false;
> +
> � � � � � � � �mutex_lock(&rdev->pm.mutex);
> - � � � � � � � rdev->pm.pm_method = PM_METHOD_PROFILE;
> + � � � � � � � if (rdev->pm.pm_method == PM_METHOD_DYNPM) {
> + � � � � � � � � � � � cancel_delayed_work(&rdev->pm.dynpm_idle_work);
> + � � � � � � � � � � � flush_wq = true;
> + � � � � � � � }
> � � � � � � � �/* disable dynpm */
> � � � � � � � �rdev->pm.dynpm_state = DYNPM_STATE_DISABLED;
> � � � � � � � �rdev->pm.dynpm_planned_action = DYNPM_ACTION_NONE;
> - � � � � � � � cancel_delayed_work(&rdev->pm.dynpm_idle_work);
> + � � � � � � � rdev->pm.pm_method = PM_METHOD_PROFILE;
> � � � � � � � �mutex_unlock(&rdev->pm.mutex);
> + � � � � � � � if (flush_wq)
> + � � � � � � � � � � � flush_workqueue(rdev->wq);
> � � � �} else {
> � � � � � � � �DRM_ERROR("invalid power method!\n");
> � � � � � � � �goto fail;
> @@ -418,9 +425,18 @@ static DEVICE_ATTR(power_method, S_IRUGO
>
> �void radeon_pm_suspend(struct radeon_device *rdev)
> �{
> + � � � bool flush_wq = false;
> +
> � � � �mutex_lock(&rdev->pm.mutex);
> - � � � cancel_delayed_work(&rdev->pm.dynpm_idle_work);
> + � � � if (rdev->pm.pm_method == PM_METHOD_DYNPM) {
> + � � � � � � � cancel_delayed_work(&rdev->pm.dynpm_idle_work);
> + � � � � � � � if (rdev->pm.dynpm_state == DYNPM_STATE_ACTIVE)
> + � � � � � � � � � � � rdev->pm.dynpm_state = DYNPM_STATE_SUSPENDED;
> + � � � � � � � flush_wq = true;
> + � � � }
> � � � �mutex_unlock(&rdev->pm.mutex);
> + � � � if (flush_wq)
> + � � � � � � � flush_workqueue(rdev->wq);
> �}
>
> �void radeon_pm_resume(struct radeon_device *rdev)
> @@ -432,6 +448,12 @@ void radeon_pm_resume(struct radeon_devi
> � � � �rdev->pm.current_sclk = rdev->clock.default_sclk;
> � � � �rdev->pm.current_mclk = rdev->clock.default_mclk;
> � � � �rdev->pm.current_vddc = rdev->pm.power_state[rdev->pm.default_power_state_index].clock_info[0].voltage.voltage;
> + � � � if (rdev->pm.pm_method == PM_METHOD_DYNPM
> + � � � � � && rdev->pm.dynpm_state == DYNPM_STATE_SUSPENDED) {
> + � � � � � � � rdev->pm.dynpm_state = DYNPM_STATE_ACTIVE;
> + � � � � � � � queue_delayed_work(rdev->wq, &rdev->pm.dynpm_idle_work,
> + � � � � � � � � � � � � � � � � � � � msecs_to_jiffies(RADEON_IDLE_LOOP_MS));
> + � � � }
> � � � �mutex_unlock(&rdev->pm.mutex);
> � � � �radeon_pm_compute_clocks(rdev);
> �}
> @@ -486,6 +508,8 @@ int radeon_pm_init(struct radeon_device
> �void radeon_pm_fini(struct radeon_device *rdev)
> �{
> � � � �if (rdev->pm.num_power_states > 1) {
> + � � � � � � � bool flush_wq = false;
> +
> � � � � � � � �mutex_lock(&rdev->pm.mutex);
> � � � � � � � �if (rdev->pm.pm_method == PM_METHOD_PROFILE) {
> � � � � � � � � � � � �rdev->pm.profile = PM_PROFILE_DEFAULT;
> @@ -493,13 +517,16 @@ void radeon_pm_fini(struct radeon_device
> � � � � � � � � � � � �radeon_pm_set_clocks(rdev);
> � � � � � � � �} else if (rdev->pm.pm_method == PM_METHOD_DYNPM) {
> � � � � � � � � � � � �/* cancel work */
> - � � � � � � � � � � � cancel_delayed_work_sync(&rdev->pm.dynpm_idle_work);
> + � � � � � � � � � � � cancel_delayed_work(&rdev->pm.dynpm_idle_work);
> + � � � � � � � � � � � flush_wq = true;
> � � � � � � � � � � � �/* reset default clocks */
> � � � � � � � � � � � �rdev->pm.dynpm_state = DYNPM_STATE_DISABLED;
> � � � � � � � � � � � �rdev->pm.dynpm_planned_action = DYNPM_ACTION_DEFAULT;
> � � � � � � � � � � � �radeon_pm_set_clocks(rdev);
> � � � � � � � �}
> � � � � � � � �mutex_unlock(&rdev->pm.mutex);
> + � � � � � � � if (flush_wq)
> + � � � � � � � � � � � flush_workqueue(rdev->wq);
>
> � � � � � � � �device_remove_file(rdev->dev, &dev_attr_power_profile);
> � � � � � � � �device_remove_file(rdev->dev, &dev_attr_power_method);
> @@ -720,12 +747,12 @@ static void radeon_dynpm_idle_work_handl
> � � � � � � � � � � � �radeon_pm_get_dynpm_state(rdev);
> � � � � � � � � � � � �radeon_pm_set_clocks(rdev);
> � � � � � � � �}
> +
> + � � � � � � � queue_delayed_work(rdev->wq, &rdev->pm.dynpm_idle_work,
> + � � � � � � � � � � � � � � � � � � � msecs_to_jiffies(RADEON_IDLE_LOOP_MS));
> � � � �}
> � � � �mutex_unlock(&rdev->pm.mutex);
> � � � �ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
> -
> - � � � queue_delayed_work(rdev->wq, &rdev->pm.dynpm_idle_work,
> - � � � � � � � � � � � � � � � � � � � msecs_to_jiffies(RADEON_IDLE_LOOP_MS));
> �}
>
> �/*
> Index: linux-2.6/drivers/gpu/drm/radeon/radeon.h
> ===================================================================
> --- linux-2.6.orig/drivers/gpu/drm/radeon/radeon.h
> +++ linux-2.6/drivers/gpu/drm/radeon/radeon.h
> @@ -619,7 +619,8 @@ enum radeon_dynpm_state {
> � � � �DYNPM_STATE_DISABLED,
> � � � �DYNPM_STATE_MINIMUM,
> � � � �DYNPM_STATE_PAUSED,
> - � � � DYNPM_STATE_ACTIVE
> + � � � DYNPM_STATE_ACTIVE,
> + � � � DYNPM_STATE_SUSPENDED,
> �};
> �enum radeon_dynpm_action {
> � � � �DYNPM_ACTION_NONE,
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/