From: Paul Gortmaker on
On Thu, Feb 25, 2010 at 12:49 PM, Anton Vorontsov
<avorontsov(a)ru.mvista.com> wrote:
> On Thu, Feb 25, 2010 at 07:51:41PM +0300, Anton Vorontsov wrote:
>> On Thu, Feb 25, 2010 at 04:46:54PM +0000, Martyn Welch wrote:
>> [...]
>> > > nfs: server 192.168.0.1 not responding, still trying
>> > >
>> >
>> > Further testing has shown that this isn't restricted to warm reboots, it
>> > happens from cold as well. In addition, the exact timing of the failure
>> > seems to vary, some boots have got further before failing.
>>
>> Unfortunately I don't have any 8641 boards near me, so I can't
>> debug this myself. Though, I tested gianfar on MPC8568E-MDS with
>> 2.6.33 kernel, and it seems to work just fine.
>>
>> I see you use SMP. Can you try to turn it off? If that will fix
>> the issue, then it'll be a good data point.
>>
>> Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you
>> know the results.
>
> Nope, no luck. Can't trigger the issue. :-/
> Tested with NFS boot, TCP and UDP netperf tests.

I was able to reproduce it on an 8641D and bisected it down to this:

-----------
commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
Author: Anton Vorontsov <avorontsov(a)ru.mvista.com>
Date: Tue Nov 10 14:11:10 2009 +0000

gianfar: Revive SKB recycling

Before calling gfar_clean_tx_ring() the driver grabs an irqsave
spinlock, and then tries to recycle skbs. But since
skb_recycle_check() returns 0 with IRQs disabled, we'll never
recycle any skbs.

It appears that gfar_clean_tx_ring() and gfar_start_xmit() are
mostly idependent and can work in parallel, except when they
modify num_txbdfree.

So we can drop the lock from most sections and thus fix the skb
recycling.
-----------

....which probably explains why you weren't seeing it on non-SMP.
I'd imagine it would show up on any of the e500mc boards too.

I'd done a rev-list on gianfar.[ch] from 32 to 33-rc1, and then
cherry-picked those onto a 32 baseline to reduce the scale of
the bisection, but I don't think that should impact the final
result I got in any meaningful way.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Anton Vorontsov on
On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
[...]
> I was able to reproduce it on an 8641D and bisected it down to this:
>
> -----------
> commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
> Author: Anton Vorontsov <avorontsov(a)ru.mvista.com>
> Date: Tue Nov 10 14:11:10 2009 +0000
>
> gianfar: Revive SKB recycling

Thanks for the bisect. I have a guess why tx hangs in
SMP case. Could anyone try the patch down below?

[...]
> ...which probably explains why you weren't seeing it on non-SMP.
> I'd imagine it would show up on any of the e500mc boards too.

Yeah.. Pity, I don't have SMP boards anymore. I'll try
to get one though.


diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 8bd3c9f..3ff3bd0 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct *napi, int budget)
tx_queue = priv->tx_queue[rx_queue->qindex];

tx_cleaned += gfar_clean_tx_ring(tx_queue);
+ if (!tx_cleaned && !tx_queue->num_txbdfree)
+ tx_cleaned += 1; /* don't complete napi */
rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue,
budget_per_queue);
rx_cleaned += rx_cleaned_per_queue;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Kumar Gopalpet-B05799 on


>-----Original Message-----
>From: Anton Vorontsov [mailto:avorontsov(a)ru.mvista.com]
>Sent: Friday, February 26, 2010 8:45 AM
>To: Paul Gortmaker
>Cc: Martyn Welch; linuxppc-dev list; netdev(a)vger.kernel.org;
>linux-kernel(a)vger.kernel.org; Kumar Gopalpet-B05799;
>davem(a)davemloft.net; Kumar Gala
>Subject: Re: Gianfar driver failing on MPC8641D based board
>
>On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
>[...]
>> I was able to reproduce it on an 8641D and bisected it down to this:
>>
>> -----------
>> commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
>> Author: Anton Vorontsov <avorontsov(a)ru.mvista.com>
>> Date: Tue Nov 10 14:11:10 2009 +0000
>>
>> gianfar: Revive SKB recycling
>
>Thanks for the bisect. I have a guess why tx hangs in SMP
>case. Could anyone try the patch down below?
>
>[...]
>> ...which probably explains why you weren't seeing it on non-SMP.
>> I'd imagine it would show up on any of the e500mc boards too.
>
>Yeah.. Pity, I don't have SMP boards anymore. I'll try to get
>one though.
>
>
>diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
>index 8bd3c9f..3ff3bd0 100644
>--- a/drivers/net/gianfar.c
>+++ b/drivers/net/gianfar.c
>@@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct
>*napi, int budget)
> tx_queue = priv->tx_queue[rx_queue->qindex];
>
> tx_cleaned += gfar_clean_tx_ring(tx_queue);
>+ if (!tx_cleaned && !tx_queue->num_txbdfree)
>+ tx_cleaned += 1; /* don't
>complete napi */
> rx_cleaned_per_queue =
>gfar_clean_rx_ring(rx_queue,
>
>budget_per_queue);
> rx_cleaned += rx_cleaned_per_queue;
>

Anton,

There is also one more issue that I have been observing with the patch
"gianfar: Revive SKB recycling".
The issue is when I do a IPV4 forwarding test scenario with
bidirectional flows (SMP environment). I am using Spirent smart bits
(smartflow) for automation testing and I frequently observe smart flow
reporting "Rx packet counte greater than Tx packet count. Duplicate
packets might have been received".

To just get over the issue I have removed this patch and I didn't see
the issue.

To a certain extent I could get over the problem by using atomic_t for
num_txbdfree (atomic_add and atomic_dec instructions for updating the
num_txbdfree) and completely removing the spin_locks in the tx routines.

Also, I feel we might want to make some more changes to the
gfar_clean_tx_ring( ) and gfar_start_xmit() routines so that they can
operate parallely.

I am really sorry for not posting it a bit earlier as I am caught up
with some urgent issues.

--

Thanks
Sandeep
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Martyn Welch on
Anton Vorontsov wrote:
> On Thu, Feb 25, 2010 at 04:46:54PM +0000, Martyn Welch wrote:
> [...]
>
>>> nfs: server 192.168.0.1 not responding, still trying
>>>
>>>
>> Further testing has shown that this isn't restricted to warm reboots, it
>> happens from cold as well. In addition, the exact timing of the failure
>> seems to vary, some boots have got further before failing.
>>
>
> Unfortunately I don't have any 8641 boards near me, so I can't
> debug this myself. Though, I tested gianfar on MPC8568E-MDS with
> 2.6.33 kernel, and it seems to work just fine.
>
> I see you use SMP. Can you try to turn it off? If that will fix
> the issue, then it'll be a good data point.
>
> Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you
> know the results.
>
> Thanks

I removed the second core from the dts file rather than truly disabling
SMP in the kernel config. Doing this allowed the board to boot reliably.

Martyn

--
Martyn Welch (Principal Software Engineer) | Registered in England and
GE Intelligent Platforms | Wales (3828642) at 100
T +44(0)127322748 | Barbirolli Square, Manchester,
E martyn.welch(a)ge.com | M2 3AB VAT:GB 927559189

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Martyn Welch on
Anton Vorontsov wrote:
> On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
> [...]
>
>> I was able to reproduce it on an 8641D and bisected it down to this:
>>
>> -----------
>> commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
>> Author: Anton Vorontsov <avorontsov(a)ru.mvista.com>
>> Date: Tue Nov 10 14:11:10 2009 +0000
>>
>> gianfar: Revive SKB recycling
>>
>
> Thanks for the bisect. I have a guess why tx hangs in
> SMP case. Could anyone try the patch down below?
>

Yup, no problem. I'm afraid it doesn't resolve the problem for me.

> [...]
>
>> ...which probably explains why you weren't seeing it on non-SMP.
>> I'd imagine it would show up on any of the e500mc boards too.
>>
>
> Yeah.. Pity, I don't have SMP boards anymore. I'll try
> to get one though.
>
>
> diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> index 8bd3c9f..3ff3bd0 100644
> --- a/drivers/net/gianfar.c
> +++ b/drivers/net/gianfar.c
> @@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct *napi, int budget)
> tx_queue = priv->tx_queue[rx_queue->qindex];
>
> tx_cleaned += gfar_clean_tx_ring(tx_queue);
> + if (!tx_cleaned && !tx_queue->num_txbdfree)
> + tx_cleaned += 1; /* don't complete napi */
> rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue,
> budget_per_queue);
> rx_cleaned += rx_cleaned_per_queue;
>


--
Martyn Welch (Principal Software Engineer) | Registered in England and
GE Intelligent Platforms | Wales (3828642) at 100
T +44(0)127322748 | Barbirolli Square, Manchester,
E martyn.welch(a)ge.com | M2 3AB VAT:GB 927559189

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/