From: Johannes Berg on
On Fri, 2010-01-29 at 15:41 +1100, Benjamin Herrenschmidt wrote:
> Hi Ingo !
>
> Johannes and I see this on our quad G5s... it -could- be similar to
> one reported a short while ago by Xiaotian Feng <xtfeng(a)gmail.com>
> under the subject [2.6.33-rc4] sysfs lockdep warnings on cpu hotplug.
>
> Basically, the machine deadlocks right after printing the following
> when doing a shutdown:
>
> halt/4071 is trying to acquire lock:
> (s_active){++++.+}, at: [<c0000000001ef868>]
> .sysfs_addrm_finish+0x58/0xc0
>
> but task is already holding lock:
> (&per_cpu(cpu_policy_rwsem, cpu)){+.+.+.}, at: [<c0000000004cd6ac>]
> .lock_policy_rwsem_write+0x84/0xf4
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:

This is still happening with -rc8. Any news?

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on


On Thu, 18 Feb 2010, Johannes Berg wrote:
> >
> > Basically, the machine deadlocks right after printing the following
> > when doing a shutdown:
> >
> > halt/4071 is trying to acquire lock:
> > (s_active){++++.+}, at: [<c0000000001ef868>]
> > .sysfs_addrm_finish+0x58/0xc0
> >
> > but task is already holding lock:
> > (&per_cpu(cpu_policy_rwsem, cpu)){+.+.+.}, at: [<c0000000004cd6ac>]
> > .lock_policy_rwsem_write+0x84/0xf4
> >
> > which lock already depends on the new lock.

You don't have a full backtrace for these things?

We've had lots of trouble with the cpu governors, and I suspect the
problem isn't new, but the lockdep warning is likely new (see commit
846f99749ab68bbc7f75c74fec305de675b1a1bf: "sysfs: Add lockdep annotations
for the sysfs active reference").

So it is likely to be an old issue that (a) now gets warned about and (b)
might have had timing changes enough to trigger it.

I suspect it is G5-specific (or specific to whatever CPU frequency code
that gets used there), since I think we'd have had lots of reports if this
happened on x86.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Johannes Berg on
On Thu, 2010-02-18 at 08:31 -0800, Linus Torvalds wrote:

> > > halt/4071 is trying to acquire lock:
> > > (s_active){++++.+}, at: [<c0000000001ef868>]
> > > .sysfs_addrm_finish+0x58/0xc0
> > >
> > > but task is already holding lock:
> > > (&per_cpu(cpu_policy_rwsem, cpu)){+.+.+.}, at:
> [<c0000000004cd6ac>]
> > > .lock_policy_rwsem_write+0x84/0xf4
> > >
> > > which lock already depends on the new lock.
>
> You don't have a full backtrace for these things?

No, it deadlocks right there, unfortunately.

> We've had lots of trouble with the cpu governors, and I suspect the
> problem isn't new, but the lockdep warning is likely new (see commit
> 846f99749ab68bbc7f75c74fec305de675b1a1bf: "sysfs: Add lockdep
> annotations
> for the sysfs active reference").
>
> So it is likely to be an old issue that (a) now gets warned about and
> (b) might have had timing changes enough to trigger it.

Well, it used to not deadlock and actually shut down the machine :) So
in that sense it's definitely new. It might have printed a lockdep
warning before, which you wouldn't normally see since the machine turns
off right after this.

> I suspect it is G5-specific (or specific to whatever CPU frequency
> code
> that gets used there), since I think we'd have had lots of reports if
> this
> happened on x86.

Yeah, that's puzzling me as well.

johannes
From: Américo Wang on
On Fri, Feb 19, 2010 at 12:31 AM, Linus Torvalds
<torvalds(a)linux-foundation.org> wrote:
>
>
> On Thu, 18 Feb 2010, Johannes Berg wrote:
>> >
>> > Basically, the machine deadlocks right after printing the following
>> > when doing a shutdown:
>> >
>> > halt/4071 is trying to acquire lock:
>> >  (s_active){++++.+}, at: [<c0000000001ef868>]
>> > .sysfs_addrm_finish+0x58/0xc0
>> >
>> > but task is already holding lock:
>> >  (&per_cpu(cpu_policy_rwsem, cpu)){+.+.+.}, at: [<c0000000004cd6ac>]
>> > .lock_policy_rwsem_write+0x84/0xf4
>> >
>> > which lock already depends on the new lock.
>
> You don't have a full backtrace for these things?
>
> We've had lots of trouble with the cpu governors, and I suspect the
> problem isn't new, but the lockdep warning is likely new (see commit
> 846f99749ab68bbc7f75c74fec305de675b1a1bf: "sysfs: Add lockdep annotations
> for the sysfs active reference").
>
> So it is likely to be an old issue that (a) now gets warned about and (b)
> might have had timing changes enough to trigger it.
>

Right.

This is a real deadlock case found by lockdep added to s_active.

The problem is that we did kobject_put(&data->kobj) while holding policy_rwsem
which is used to protect 'data'. It is not so easy to fix this,
probably we need to
do more work on cpufreq code.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Américo Wang on
On Fri, Feb 19, 2010 at 2:45 AM, Johannes Berg
<johannes(a)sipsolutions.net> wrote:
> On Thu, 2010-02-18 at 08:31 -0800, Linus Torvalds wrote:
>
>> > > halt/4071 is trying to acquire lock:
>> > >  (s_active){++++.+}, at: [<c0000000001ef868>]
>> > > .sysfs_addrm_finish+0x58/0xc0
>> > >
>> > > but task is already holding lock:
>> > >  (&per_cpu(cpu_policy_rwsem, cpu)){+.+.+.}, at:
>> [<c0000000004cd6ac>]
>> > > .lock_policy_rwsem_write+0x84/0xf4
>> > >
>> > > which lock already depends on the new lock.
>>
>> You don't have a full backtrace for these things?
>
> No, it deadlocks right there, unfortunately.
>
>> We've had lots of trouble with the cpu governors, and I suspect the
>> problem isn't new, but the lockdep warning is likely new (see commit
>> 846f99749ab68bbc7f75c74fec305de675b1a1bf: "sysfs: Add lockdep
>> annotations
>> for the sysfs active reference").
>>
>> So it is likely to be an old issue that (a) now gets warned about and
>> (b) might have had timing changes enough to trigger it.
>
> Well, it used to not deadlock and actually shut down the machine :) So
> in that sense it's definitely new. It might have printed a lockdep
> warning before, which you wouldn't normally see since the machine turns
> off right after this.
>
>> I suspect it is G5-specific (or specific to whatever CPU frequency
>> code
>> that gets used there), since I think we'd have had lots of reports if
>> this
>> happened on x86.
>
> Yeah, that's puzzling me as well.
>

Does my following untested patch help?

Signed-off-by: WANG Cong <xiyou.wangcong(a)gmail.com>

---------