From: Tejun Heo on
Hello,

On 04/01/2010 01:28 PM, Cong Wang wrote:
>> Hmmm... can you please try to see whether this circular locking
>> warning involving wq->lockdep_map is reproducible w/ the bonding
>> locking fixed? I still can't see where wq -> cpu_add_remove_lock
>> dependency is created.
>>
>
> I thought this is obvious.
>
> Here it is:
>
> void destroy_workqueue(struct workqueue_struct *wq)
> {
> const struct cpumask *cpu_map = wq_cpu_map(wq);
> int cpu;
>
> cpu_maps_update_begin(); <----------------- Hold
> cpu_add_remove_lock here
> spin_lock(&workqueue_lock);
> list_del(&wq->list);
> spin_unlock(&workqueue_lock);
>
> for_each_cpu(cpu, cpu_map)
> cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, cpu));
> <------ See below
> cpu_maps_update_done(); <----------------- Release
> cpu_add_remove_lock here
>
> ...
> static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq)
> {
> /*
> * Our caller is either destroy_workqueue() or CPU_POST_DEAD,
> * cpu_add_remove_lock protects cwq->thread.
> */
> if (cwq->thread == NULL)
> return;
>
> lock_map_acquire(&cwq->wq->lockdep_map); <-------------- Lockdep
> complains here.
> lock_map_release(&cwq->wq->lockdep_map);
> ...

Yeap, the above is cpu_add_remove_lock -> wq->lockdep_map dependency.
I can see that but I'm failing to see where the dependency the other
direction is created.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on
Tejun Heo wrote:
> Hello,
>
> On 04/01/2010 01:28 PM, Cong Wang wrote:
>>> Hmmm... can you please try to see whether this circular locking
>>> warning involving wq->lockdep_map is reproducible w/ the bonding
>>> locking fixed? I still can't see where wq -> cpu_add_remove_lock
>>> dependency is created.
>>>
>> I thought this is obvious.
>>
>> Here it is:
>>
>> void destroy_workqueue(struct workqueue_struct *wq)
>> {
>> const struct cpumask *cpu_map = wq_cpu_map(wq);
>> int cpu;
>>
>> cpu_maps_update_begin(); <----------------- Hold
>> cpu_add_remove_lock here
>> spin_lock(&workqueue_lock);
>> list_del(&wq->list);
>> spin_unlock(&workqueue_lock);
>>
>> for_each_cpu(cpu, cpu_map)
>> cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, cpu));
>> <------ See below
>> cpu_maps_update_done(); <----------------- Release
>> cpu_add_remove_lock here
>>
>> ...
>> static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq)
>> {
>> /*
>> * Our caller is either destroy_workqueue() or CPU_POST_DEAD,
>> * cpu_add_remove_lock protects cwq->thread.
>> */
>> if (cwq->thread == NULL)
>> return;
>>
>> lock_map_acquire(&cwq->wq->lockdep_map); <-------------- Lockdep
>> complains here.
>> lock_map_release(&cwq->wq->lockdep_map);
>> ...
>
> Yeap, the above is cpu_add_remove_lock -> wq->lockdep_map dependency.
> I can see that but I'm failing to see where the dependency the other
> direction is created.
>

Hmm, it looks like I misunderstand lock_map_acquire()? From the changelog,
I thought it was added to complain its caller is holding a lock when invoking
it, thus cpu_add_remove_lock is not an exception.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on
Cong Wang wrote:
> Tejun Heo wrote:
>> Hello,
>>
>> On 04/01/2010 01:28 PM, Cong Wang wrote:
>>>> Hmmm... can you please try to see whether this circular locking
>>>> warning involving wq->lockdep_map is reproducible w/ the bonding
>>>> locking fixed? I still can't see where wq -> cpu_add_remove_lock
>>>> dependency is created.
>>>>
>>> I thought this is obvious.
>>>
>>> Here it is:
>>>
>>> void destroy_workqueue(struct workqueue_struct *wq)
>>> {
>>> const struct cpumask *cpu_map = wq_cpu_map(wq);
>>> int cpu;
>>>
>>> cpu_maps_update_begin(); <----------------- Hold
>>> cpu_add_remove_lock here
>>> spin_lock(&workqueue_lock);
>>> list_del(&wq->list);
>>> spin_unlock(&workqueue_lock);
>>>
>>> for_each_cpu(cpu, cpu_map)
>>> cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq,
>>> cpu)); <------ See below
>>> cpu_maps_update_done(); <----------------- Release
>>> cpu_add_remove_lock here
>>>
>>> ...
>>> static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq)
>>> {
>>> /*
>>> * Our caller is either destroy_workqueue() or CPU_POST_DEAD,
>>> * cpu_add_remove_lock protects cwq->thread.
>>> */
>>> if (cwq->thread == NULL)
>>> return;
>>>
>>> lock_map_acquire(&cwq->wq->lockdep_map); <-------------- Lockdep
>>> complains here.
>>> lock_map_release(&cwq->wq->lockdep_map);
>>> ...
>>
>> Yeap, the above is cpu_add_remove_lock -> wq->lockdep_map dependency.
>> I can see that but I'm failing to see where the dependency the other
>> direction is created.
>>
>
> Hmm, it looks like I misunderstand lock_map_acquire()? From the changelog,
> I thought it was added to complain its caller is holding a lock when
> invoking
> it, thus cpu_add_remove_lock is not an exception.
>

Oh, I see, wq->lockdep_map is acquired again in run_workqueue(), so I was wrong. :)
I think you and Oleg are right, the lockdep warning is not irrelevant.

Sorry for the noise, ignore this patch please.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Tejun Heo on
Hello,

On 04/01/2010 03:05 PM, Cong Wang wrote:
>> Hmm, it looks like I misunderstand lock_map_acquire()? From the
>> changelog, I thought it was added to complain its caller is holding
>> a lock when invoking it, thus cpu_add_remove_lock is not an
>> exception.

Oh, that just tells the code is trying to grab a pseudo lock. It's
not really a lock but to lockdep it looks like one and lockdep can use
it to compute problem cases.

> Oh, I see, wq->lockdep_map is acquired again in run_workqueue(), so
> I was wrong. :) I think you and Oleg are right, the lockdep warning
> is not irrelevant.

Yeah, I think the circular dependency you reported on wq->lockdep_map
is completed only through dependency through rtnl_mutex. If you fix
rtnl_mutex locking, it should go away too.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on
Oleg Nesterov wrote:
> On 04/01, Cong Wang wrote:
>>> I must have missed something, but it seems to me this patch tries to
>>> supress the valid warning.
>>>
>>> Could you please clarify?
>> Sure, below is the whole warning. Please teach me how this is valid.
>
> Oh, I can never understand the output from lockdep, it is much more
> clever than me ;)
>
> But at first glance,
>
>> Mar 31 16:15:02 dhcp-66-70-5 kernel: -> #2 (rtnl_mutex){+.+.+.}:
>> Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff810a6bc1>] validate_chain+0x1019/0x1540
>> Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff810a7e75>] __lock_acquire+0xd8d/0xe55
>> Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff810aa3a4>] lock_acquire+0x160/0x1af
>> Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff815523f8>] mutex_lock_nested+0x64/0x4e9
>> Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff8147af16>] rtnl_lock+0x1e/0x27
>> Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffffa0836779>] bond_mii_monitor+0x39f/0x74b [bonding]
>> Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff8108654f>] worker_thread+0x2da/0x46c
>> Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff8108b1ea>] kthread+0xdd/0xec
>> Mar 31 16:15:02 dhcp-66-70-5 kernel: [<ffffffff81004894>] kernel_thread_helper+0x4/0x10
>
> OK, so work->func() takes rtnl_mutex.
>
> This means it is not safe to do flush_workqueue() or destroy_workqueue()
> under rtnl_lock(). This is known fact.
>
>> Mar 31 16:15:03 dhcp-66-70-5 kernel: -> #0 ((bond_dev->name)){+.+...}:
>> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff810a6696>] validate_chain+0xaee/0x1540
>> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff810a7e75>] __lock_acquire+0xd8d/0xe55
>> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff810aa3a4>] lock_acquire+0x160/0x1af
>> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff81085278>] cleanup_workqueue_thread+0x59/0x10b
>> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff81085428>] destroy_workqueue+0x9c/0x107
>> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffffa0839d32>] bond_uninit+0x524/0x58a [bonding]
>> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff8146967b>] rollback_registered_many+0x205/0x2e3
>> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff81469783>] unregister_netdevice_many+0x2a/0x75
>> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff8147ada3>] __rtnl_kill_links+0x8b/0x9d
>> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff8147adea>] __rtnl_link_unregister+0x35/0x72
>> Mar 31 16:15:03 dhcp-66-70-5 kernel: [<ffffffff8147b293>] rtnl_link_unregister+0x2c/0x43
>
> However, rtnl_link_unregister() takes rtnl_mutex and then bond_uninit()
> does cleanup_workqueue_thread().
>
> So, looks like this warning is valid, this path can deadlock if
> destroy_workqueue() is called when bond->mii_work is queued.


Yeah, this is right.

>
>
> Lockdep decided to blaim cpu_add_remove_lock in this chain.
>

Yes, this is what makes me confused. ;)

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/