From: David Miller on
From: Shreyas Bhatewara <sbhatewara(a)vmware.com>
Date: Tue, 13 Jul 2010 17:49:52 -0700 (PDT)

>
> Do not reset when the device is not opened
>
> If a reset is scheduled, and the device goes thru close and open, it
> may happen that reset and open may run in parallel.
> The reset code now bails out if the device is not opened.
>
> Signed-off-by: Ronghua Zang <ronghua(a)vmware.com>
> Signed-off-by: Matthieu Bucchianeri <matthieu(a)vmware.com>
> Signed-off-by: Shreyas Bhatewara <sbhatewara(a)vmware.com>

Testing IFF_UP is just making your race hole a little smaller but
it isn't fixing the problem.

In fact, there really isn't any legitimate reason for a driver
to test the IFF_UP state bit. netif_running() is the correct
test, and asynchronous work queue things like resets should
synchronize with open/close using the RTNL lock so that your
test of netif_running() is a stable one.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Shreyas Bhatewara on


On Wed, 14 Jul 2010, David Miller wrote:

> From: Shreyas Bhatewara <sbhatewara(a)vmware.com>
> Date: Tue, 13 Jul 2010 17:49:52 -0700 (PDT)
>
> >
> > Do not reset when the device is not opened
> >
> > If a reset is scheduled, and the device goes thru close and open, it
> > may happen that reset and open may run in parallel.
> > The reset code now bails out if the device is not opened.
> >
> > Signed-off-by: Ronghua Zang <ronghua(a)vmware.com>
> > Signed-off-by: Matthieu Bucchianeri <matthieu(a)vmware.com>
> > Signed-off-by: Shreyas Bhatewara <sbhatewara(a)vmware.com>
>
> Testing IFF_UP is just making your race hole a little smaller but
> it isn't fixing the problem.
>
> In fact, there really isn't any legitimate reason for a driver
> to test the IFF_UP state bit. netif_running() is the correct
> test, and asynchronous work queue things like resets should
> synchronize with open/close using the RTNL lock so that your
> test of netif_running() is a stable one.
>

Is this what you suggest :

---

Hold rtnl_lock to get the right link state.

While asynchronously resetting the device, hold rtnl_lock to get the
right value from netif_running. If a reset is scheduled, and the device
goes thru close and open, it may happen that reset and open may run in
parallel. Holding rtnl_lock will avoid this.

---

drivers/net/vmxnet3/vmxnet3_drv.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 1b0ce8c..c4d7e42 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -2420,6 +2420,7 @@ vmxnet3_reset_work(struct work_struct *data)
return;

/* if the device is closed, we must leave it alone */
+ rtnl_lock();
if (netif_running(adapter->netdev)) {
printk(KERN_INFO "%s: resetting\n", adapter->netdev->name);
vmxnet3_quiesce_dev(adapter);
@@ -2428,6 +2429,7 @@ vmxnet3_reset_work(struct work_struct *data)
} else {
printk(KERN_INFO "%s: already closed\n", adapter->netdev->name);
}
+ rtnl_unlock();

clear_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Miller on
From: Shreyas Bhatewara <sbhatewara(a)vmware.com>
Date: Thu, 15 Jul 2010 18:20:52 -0700 (PDT)

> Is this what you suggest :
>
> ---
>
> Hold rtnl_lock to get the right link state.

It ought to work, but make sure that it is legal to take the
RTNL semaphore in all contexts in which this code block
might be called.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Shreyas Bhatewara on


On Thu, 15 Jul 2010, David Miller wrote:

> From: Shreyas Bhatewara <sbhatewara(a)vmware.com>
> Date: Thu, 15 Jul 2010 18:20:52 -0700 (PDT)
>
> > Is this what you suggest :
> >
> > ---
> >
> > Hold rtnl_lock to get the right link state.
>
> It ought to work, but make sure that it is legal to take the
> RTNL semaphore in all contexts in which this code block
> might be called.
>

This code block is called only from the workqueue handler, which runs in
process context, so it is legal to take rtnl semaphore.
Tested this code by simulating event interrupts (which schedule this
code) at considerable frequency while the interface was brought up and
down in a loop. Similar stress testing had revealed the bug originally.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Miller on
From: Shreyas Bhatewara <sbhatewara(a)vmware.com>
Date: Fri, 16 Jul 2010 01:17:29 -0700 (PDT)

>
>
> On Thu, 15 Jul 2010, David Miller wrote:
>
>> From: Shreyas Bhatewara <sbhatewara(a)vmware.com>
>> Date: Thu, 15 Jul 2010 18:20:52 -0700 (PDT)
>>
>> > Is this what you suggest :
>> >
>> > ---
>> >
>> > Hold rtnl_lock to get the right link state.
>>
>> It ought to work, but make sure that it is legal to take the
>> RTNL semaphore in all contexts in which this code block
>> might be called.
>>
>
> This code block is called only from the workqueue handler, which runs in
> process context, so it is legal to take rtnl semaphore.
> Tested this code by simulating event interrupts (which schedule this
> code) at considerable frequency while the interface was brought up and
> down in a loop. Similar stress testing had revealed the bug originally.

Awesome, please submit this formally. The copy you sent lacked a commit
message and signoff.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/