From: Michael Breuer on
On 1/7/2010 12:54 AM, Michael Breuer wrote:
> On 1/7/2010 12:32 AM, Michael Breuer wrote:
>> On 1/6/2010 11:53 PM, Stephen Hemminger wrote:
>>> On Wed, 06 Jan 2010 23:00:34 -0500
>>> Michael Breuer<mbreuer(a)majjas.com> wrote:
>>>
>>>> Changing MTU to 9000, everything basically breaks - Can't use X11
>>>> (local
>>>> or remote - get X11 screen after gdm login locally, but then goes back
>>>> to greeter; remote gets no greeter); ssh sessions hang; etc. This
>>>> time I
>>>> was able to reset the MTU back to 1500 without a reboot - but I did
>>>> have
>>>> to ifconfig eth0 down and then up. Looking at the sk98lin code, it
>>>> looks
>>>> to me like they do a bit more work with existing buffers before
>>>> completing the MTU switch. Note that even doing this, X11 did not work
>>>> (it did with the old mtu change code). Tried changing to mtu 4500 -
>>>> same
>>>> effect as 9000... but when I switched back to 1500, ksoftirqd started
>>>> spinning using 100% of one core.
>>> The problem is that patch was enabling scatter-gather and checksum
>>> offload
>>> that won't work on EC_U hardware with 9K MTU. At least, it never
>>> worked
>>> for me when I tested it. So because of that it really doesn't change
>>> anything
>>> for the better on that chip version.
>>>
>>> What version chip is on that motherboard? Mine is:
>>> Yukon-2 EC Ultra chip revision 3
>>> which corresponds to B0 step.
>>>
>>> Another possibility is the PHY register which controls number of ticks
>>> of buffering. The default is zero, which gives the most buffering
>>> (good),
>>> but the firmware could be reprogramming it (bad). In general, the
>>> driver
>>> doesn't fiddle with bits that are already set correctly, because
>>> sometimes
>>> vendors need to tweak PCI timing in firmware/BIOS. It seems the
>>> firmware on this
>>> chip is just a bunch of register setups done on power on.
>> Also - I'm seeing a huge number of dropped packets (RX)
>> 200-300/second. Probably why this is so slow.
>>
>> Current ifconfig:
>> eth0 Link encap:Ethernet HWaddr 00:26:18:00:1C:3B
>> inet addr:10.0.0.1 Bcast:10.0.0.255 Mask:255.255.255.0
>> inet6 addr: fe80::226:18ff:fe00:1c3b/64 Scope:Link
>> UP BROADCAST RUNNING ALLMULTI MULTICAST MTU:1500 Metric:1
>> RX packets:26647536 errors:0 dropped:517884 overruns:0 frame:0
>> TX packets:12112780 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:38960063319 (36.2 GiB) TX bytes:1889879762 (1.7 GiB)
>> Interrupt:18
>>
>>
>>
>
> Never mind... spoke too soon. Crashed again. Just took longer:
....
Reapplied a couple of earlier patches - still can't do jumbo frames, but
the rx errors are gone and speed has improved. Too early to assure that
it's stable.

Patches that seem to fix the rx drops (all from Stephen):
1) Patch change to tx_init
2) Patch to lock netif_device_detach
3) Patch to sky2_tx_complete to add netif_device_present test
Also in the mix: Jarek's alternative 2

With this set and mtu=1500 all seems good - decent if not stellar
throughput; no logged errors; no reported packet loss. As before, will
leave running and see if anything falls apart.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jarek Poplawski on
On Thu, Jan 07, 2010 at 02:20:22AM -0500, Michael Breuer wrote:
> ...
> Reapplied a couple of earlier patches - still can't do jumbo frames, but
> the rx errors are gone and speed has improved. Too early to assure that
> it's stable.
>
> Patches that seem to fix the rx drops (all from Stephen):
> 1) Patch change to tx_init
> 2) Patch to lock netif_device_detach
> 3) Patch to sky2_tx_complete to add netif_device_present test
> Also in the mix: Jarek's alternative 2

BTW, the main difference between alt. 1 and 2 is error notification:
alternative 2 doesn't hide some (most) of drops, so, dependending on
app, there might be more and faster retransmits. (I don't know what
apps used by you (other than dhcp) can depend so much on this.)

>
> With this set and mtu=1500 all seems good - decent if not stellar
> throughput; no logged errors; no reported packet loss. As before, will
> leave running and see if anything falls apart.

Good news!

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael Breuer on
On 1/7/2010 2:47 AM, Jarek Poplawski wrote:
> On Thu, Jan 07, 2010 at 02:20:22AM -0500, Michael Breuer wrote:
>
>> ...
>> Reapplied a couple of earlier patches - still can't do jumbo frames, but
>> the rx errors are gone and speed has improved. Too early to assure that
>> it's stable.
>>
>> Patches that seem to fix the rx drops (all from Stephen):
>> 1) Patch change to tx_init
>> 2) Patch to lock netif_device_detach
>> 3) Patch to sky2_tx_complete to add netif_device_present test
>> Also in the mix: Jarek's alternative 2
>>
> BTW, the main difference between alt. 1 and 2 is error notification:
> alternative 2 doesn't hide some (most) of drops, so, dependending on
> app, there might be more and faster retransmits. (I don't know what
> apps used by you (other than dhcp) can depend so much on this.)
>
>
Unless I misread the code, I think that in some cases e skb is actually
freed if the cfq (among others perhaps) scheduler returns an error on
enqueue (flow control perhaps). Thus with alternative 1, it is possible
that the skb is acted upon after being freed - this would be consistent
with the DMAR errors I saw.

>> With this set and mtu=1500 all seems good - decent if not stellar
>> throughput; no logged errors; no reported packet loss. As before, will
>> leave running and see if anything falls apart.
>>
> Good news!
>
> Jarek P.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jarek Poplawski on
On Thu, Jan 07, 2010 at 02:55:20AM -0500, Michael Breuer wrote:
> On 1/7/2010 2:47 AM, Jarek Poplawski wrote:
>> On Thu, Jan 07, 2010 at 02:20:22AM -0500, Michael Breuer wrote:
>>
>>> ...
>>> Reapplied a couple of earlier patches - still can't do jumbo frames, but
>>> the rx errors are gone and speed has improved. Too early to assure that
>>> it's stable.
>>>
>>> Patches that seem to fix the rx drops (all from Stephen):
>>> 1) Patch change to tx_init
>>> 2) Patch to lock netif_device_detach
>>> 3) Patch to sky2_tx_complete to add netif_device_present test
>>> Also in the mix: Jarek's alternative 2
>>>
>> BTW, the main difference between alt. 1 and 2 is error notification:
>> alternative 2 doesn't hide some (most) of drops, so, dependending on
>> app, there might be more and faster retransmits. (I don't know what
>> apps used by you (other than dhcp) can depend so much on this.)
>>
>>
> Unless I misread the code, I think that in some cases e skb is actually
> freed if the cfq (among others perhaps) scheduler returns an error on
> enqueue (flow control perhaps). Thus with alternative 1, it is possible
> that the skb is acted upon after being freed - this would be consistent
> with the DMAR errors I saw.

I can't see your point: could you give some scenario?

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael Breuer on
On 1/7/2010 3:21 AM, Jarek Poplawski wrote:
> On Thu, Jan 07, 2010 at 02:55:20AM -0500, Michael Breuer wrote:
>
>> On 1/7/2010 2:47 AM, Jarek Poplawski wrote:
>>
>>> On Thu, Jan 07, 2010 at 02:20:22AM -0500, Michael Breuer wrote:
>>>
>>>
>>>> ...
>>>> Reapplied a couple of earlier patches - still can't do jumbo frames, but
>>>> the rx errors are gone and speed has improved. Too early to assure that
>>>> it's stable.
>>>>
>>>> Patches that seem to fix the rx drops (all from Stephen):
>>>> 1) Patch change to tx_init
>>>> 2) Patch to lock netif_device_detach
>>>> 3) Patch to sky2_tx_complete to add netif_device_present test
>>>> Also in the mix: Jarek's alternative 2
>>>>
>>>>
>>> BTW, the main difference between alt. 1 and 2 is error notification:
>>> alternative 2 doesn't hide some (most) of drops, so, dependending on
>>> app, there might be more and faster retransmits. (I don't know what
>>> apps used by you (other than dhcp) can depend so much on this.)
>>>
>>>
>>>
>> Unless I misread the code, I think that in some cases e skb is actually
>> freed if the cfq (among others perhaps) scheduler returns an error on
>> enqueue (flow control perhaps). Thus with alternative 1, it is possible
>> that the skb is acted upon after being freed - this would be consistent
>> with the DMAR errors I saw.
>>
> I can't see your point: could you give some scenario?
>
> Jarek P.
>
With NET_CLS_ACT set, net_dev_enqueue can return an error after freeing
the skb. Alternative 1 disregards the error and assumes the skb is still
valid. The original code and alternative 2 exit the loop assuming the
skb has been freed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/