From: Jarek Poplawski on
On Wed, Jan 06, 2010 at 03:33:05PM -0500, Michael Breuer wrote:
> On 1/6/2010 3:22 PM, Jarek Poplawski wrote:
> >On Wed, Jan 06, 2010 at 02:49:38PM -0500, Michael Breuer wrote:
> >>On 1/6/2010 2:22 AM, Jarek Poplawski wrote:
> >>>On Tue, Jan 05, 2010 at 09:36:28PM -0500, Michael Breuer wrote:
> >>>>On 1/5/2010 6:07 PM, Jarek Poplawski wrote:
> >>>>>----------------->
> >>>>>
> >>>>>Changing an skb after dev_queue_xmit() is illegal. And since it's
> >>>>>inconsistent to treat specially net_xmit_errno() non-zero return,
> >>>>>while ignoring other dev_queue_xmit() errors, there is no reason
> >>>>>to break the loop in tpacket_snd() in this case.
> >>>>>
> >>>>>With debugging by: Stephen Hemminger<shemminger(a)linux-foundation.org>
> >>>>>
> >>>>>Reported-by: Michael Breuer<mbreuer(a)majjas.com>
> >>>>>Signed-off-by: Jarek Poplawski<jarkao2(a)gmail.com>
> >>>>>---
> >>>>>
> >>>>> net/packet/af_packet.c | 8 +++-----
> >>>>> 1 files changed, 3 insertions(+), 5 deletions(-)
> >>>>>
> >>>>>diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> >>>>>index e0516a2..984a1fa 100644
> >>>>>--- a/net/packet/af_packet.c
> >>>>>+++ b/net/packet/af_packet.c
> >>>>>@@ -1021,8 +1021,9 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
> >>>>>
> >>>>> status = TP_STATUS_SEND_REQUEST;
> >>>>> err = dev_queue_xmit(skb);
> >>>>>- if (unlikely(err> 0&& (err = net_xmit_errno(err)) != 0))
> >>>>>- goto out_xmit;
> >>>>>+ if (unlikely(err> 0))
> >>>>>+ err = net_xmit_errno(err);
> >>>>>+
> >>>>> packet_increment_head(&po->tx_ring);
> >>>>> len_sum += tp_len;
> >>>>> } while (likely((ph != NULL) ||
> >>>>>@@ -1033,9 +1034,6 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
> >>>>> err = len_sum;
> >>>>> goto out_put;
> >>>>>
> >>>>>-out_xmit:
> >>>>>- skb->destructor = sock_wfree;
> >>>>>- atomic_dec(&po->tx_ring.pending);
> >>>>> out_status:
> >>>>> __packet_set_status(po, ph, status);
> >>>>> kfree_skb(skb);
> >>>>>--
....
> >>This patch at first behaved similarly to the previous one - seemed
> >>to be running a bit better... until the adapter went down :(
> >I'm not sure: do you mean this patch above vs previous one by Stephen,
> >or did you manage to try my "alernative #2" patch already?
> >
> >BTW, I forgot to mention, and maybe it doesn't matter here, but it
> >would be better to (always) use my sky2 patch from Berck Nash's
> >thread.
> >
> >Jarek P.
> This was using "alternative #2" patch. I didn't get the hang with
> alternative #1. Your sky2 patch from Berck Nash's thread was
> included in both cases; Stephen's was not.

OK, so I guess "alternative #1" (above) seems safer to recommend for
now (as I assumed earlier).

On the other hand, we really don't know if it's only because it's
because it's nicer for your hardware (or still some other bug around),
so as before: let David choose ;-)

BTW, I think you could still use Stephen's patch too (there might be
still something more like this). There was also mentioned this network
manager again. I might be wrong, but IMHO there could be some
interaction even if it doesn't use this device; so could/did you try
to disable it entirely?

Thanks for testing!
Jarek P.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael Breuer on
On 1/6/2010 4:10 PM, Stephen Hemminger wrote:
> On Wed, 06 Jan 2010 14:49:38 -0500
> Michael Breuer<mbreuer(a)majjas.com> wrote:
>
>
>> This patch at first behaved similarly to the previous one - seemed to be
>> running a bit better... until the adapter went down :(
>>
>> This is the syslog output at the time the network failed:
>> Jan 6 14:11:01 mail kernel: sky2 0000:06:00.0: error interrupt
>> status=0x40000008
>> Jan 6 14:11:01 mail kernel: sky2 software interrupt status 0x40000008
>>
> Could you go back to baseline sky2 driver. The display code might be buggy.
> These bits indicate an error in the MAC. The interrupt source enabled
> is Transmit FIFO underrun.
>
> Looking at how vendor driver handles this.
> It looks like the Yukon EC_U chip doesn't really do Jumbo frames correctly.
> Maybe not enough internal buffering to ensure that the whole packet
> is in the chip. Of course, none of this is in the chip manual.
>
> Does this help
> --------------
> --- a/drivers/net/sky2.c 2010-01-06 12:48:43.012318966 -0800
> +++ b/drivers/net/sky2.c 2010-01-06 13:05:31.273987255 -0800
> @@ -792,33 +792,21 @@ static void sky2_set_tx_stfwd(struct sky
> {
> struct net_device *dev = hw->dev[port];
>
> - if ( (hw->chip_id == CHIP_ID_YUKON_EX&&
> - hw->chip_rev != CHIP_REV_YU_EX_A0) ||
> - hw->chip_id>= CHIP_ID_YUKON_FE_P) {
> - /* Yukon-Extreme B0 and further Extreme devices */
> - /* enable Store& Forward mode for TX */
> -
> - if (dev->mtu<= ETH_DATA_LEN)
> - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T),
> - TX_JUMBO_DIS | TX_STFW_ENA);
> -
> - else
> - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T),
> - TX_JUMBO_ENA| TX_STFW_ENA);
> - } else {
> - if (dev->mtu<= ETH_DATA_LEN)
> - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_ENA);
> - else {
> - /* set Tx GMAC FIFO Almost Empty Threshold */
> - sky2_write32(hw, SK_REG(port, TX_GMF_AE_THR),
> - (ECU_JUMBO_WM<< 16) | ECU_AE_THR);
> -
> - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_DIS);
> -
> - /* Can't do offload because of lack of store/forward */
> - dev->features&= ~(NETIF_F_TSO | NETIF_F_SG | NETIF_F_ALL_CSUM);
> - }
> - }
> + if ( (hw->chip_id == CHIP_ID_YUKON_EX&& hw->chip_rev != CHIP_REV_YU_EX_A0) ||
> + hw->chip_id>= CHIP_ID_YUKON_FE_P) {
> + /* Yukon-Extreme B0 and further Extreme devices */
> + /* enable Store& Forward mode for TX */
> + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_ENA);
> + } else if (dev->mtu> ETH_DATA_LEN) {
> + /* set Tx GMAC FIFO Almost Empty Threshold */
> + sky2_write32(hw, SK_REG(port, TX_GMF_AE_THR),
> + (ECU_JUMBO_WM<< 16) | ECU_AE_THR);
> + /* disable Store& Forward mode for TX */
> + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_DIS);
> + } else {
> + /* enable Store& Forward mode for TX */
> + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_ENA);
> + }
> }
>
> static void sky2_mac_init(struct sky2_hw *hw, unsigned port)
> @@ -2185,11 +2173,16 @@ static int sky2_change_mtu(struct net_de
> if (new_mtu< ETH_ZLEN || new_mtu> ETH_JUMBO_MTU)
> return -EINVAL;
>
> + /* MTU> 1500 on yukon FE and FE+ not allowed */
> if (new_mtu> ETH_DATA_LEN&&
> (hw->chip_id == CHIP_ID_YUKON_FE ||
> hw->chip_id == CHIP_ID_YUKON_FE_P))
> return -EINVAL;
>
> + /* TSO on Yukon Ultra and MTU> 1500 not supported */
> + if (new_mtu> ETH_DATA_LEN&& hw->chip_id == CHIP_ID_YUKON_EC_U)
> + dev->features&= ~NETIF_F_TSO;
> +
> if (!netif_running(dev)) {
> dev->mtu = new_mtu;
> return 0;
> @@ -2233,6 +2226,15 @@ static int sky2_change_mtu(struct net_de
> if (err)
> dev_close(dev);
> else {
> + /* WA for dev. #4.209 */
> + if (hw->chip_id == CHIP_ID_YUKON_EC_U&&
> + hw->chip_rev == CHIP_REV_YU_EC_U_A1) {
> + /* enable/disable Store& Forward mode for TX */
> + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T),
> + sky2->speed != SPEED_1000
> + ? TX_STFW_ENA : TX_STFW_DIS);
> + }
> +
> gma_write16(hw, port, GM_GP_CTRL, ctl);
>
> netif_wake_queue(dev);
> --- a/drivers/net/sky2.h 2010-01-06 12:48:48.632247424 -0800
> +++ b/drivers/net/sky2.h 2010-01-06 12:59:57.322078964 -0800
> @@ -1901,8 +1901,8 @@ enum {
> TX_VLAN_TAG_ON = 1<<25,/* enable VLAN tagging */
> TX_VLAN_TAG_OFF = 1<<24,/* disable VLAN tagging */
>
> - TX_JUMBO_ENA = 1<<23,/* PCI Jumbo Mode enable (Yukon-EC Ultra) */
> - TX_JUMBO_DIS = 1<<22,/* PCI Jumbo Mode enable (Yukon-EC Ultra) */
> + TX_PCI_JUM_ENA = 1<<23,/* Enable PCI Jumbo Mode (Yukon-EC Ultra) */
> + TX_PCI_JUM_DIS = 1<<22,/* Disable PCI Jumbo Mode (Yukon-EC Ultra) */
>
> GMF_WSP_TST_ON = 1<<18,/* Write Shadow Pointer Test On */
> GMF_WSP_TST_OFF = 1<<17,/* Write Shadow Pointer Test Off */
>
I'll try this a bit later today. However, early on, I saw the same
issues with MTU=1500. Also, maybe I'm missing something, but I can only
recreate the issue with a high receive rate. Given the interaction with
DHCP, for example, I'm thinking that there is some precondition that is
as yet unknown. May be buggy hardware, or perhaps a race condition
resulting in a corrupt i/o buffer somewhere. I'm wondering whether
there's some useful place to insert some diagnostics on the RX side - at
least we can see if there are any consistent events on the RX side
preceding the TX error.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael Breuer on
On 1/6/2010 4:09 PM, Jarek Poplawski wrote:
> On Wed, Jan 06, 2010 at 03:33:05PM -0500, Michael Breuer wrote:
>
>> On 1/6/2010 3:22 PM, Jarek Poplawski wrote:
>>
>>> On Wed, Jan 06, 2010 at 02:49:38PM -0500, Michael Breuer wrote:
>>>
>>>> On 1/6/2010 2:22 AM, Jarek Poplawski wrote:
>>>>
>>>>> On Tue, Jan 05, 2010 at 09:36:28PM -0500, Michael Breuer wrote:
>>>>>
>>>>>> On 1/5/2010 6:07 PM, Jarek Poplawski wrote:
>>>>>>
>>>>>>> ----------------->
>>>>>>>
>>>>>>> Changing an skb after dev_queue_xmit() is illegal. And since it's
>>>>>>> inconsistent to treat specially net_xmit_errno() non-zero return,
>>>>>>> while ignoring other dev_queue_xmit() errors, there is no reason
>>>>>>> to break the loop in tpacket_snd() in this case.
>>>>>>>
>>>>>>> With debugging by: Stephen Hemminger<shemminger(a)linux-foundation.org>
>>>>>>>
>>>>>>> Reported-by: Michael Breuer<mbreuer(a)majjas.com>
>>>>>>> Signed-off-by: Jarek Poplawski<jarkao2(a)gmail.com>
>>>>>>> ---
>>>>>>>
>>>>>>> net/packet/af_packet.c | 8 +++-----
>>>>>>> 1 files changed, 3 insertions(+), 5 deletions(-)
>>>>>>>
>>>>>>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
>>>>>>> index e0516a2..984a1fa 100644
>>>>>>> --- a/net/packet/af_packet.c
>>>>>>> +++ b/net/packet/af_packet.c
>>>>>>> @@ -1021,8 +1021,9 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
>>>>>>>
>>>>>>> status = TP_STATUS_SEND_REQUEST;
>>>>>>> err = dev_queue_xmit(skb);
>>>>>>> - if (unlikely(err> 0&& (err = net_xmit_errno(err)) != 0))
>>>>>>> - goto out_xmit;
>>>>>>> + if (unlikely(err> 0))
>>>>>>> + err = net_xmit_errno(err);
>>>>>>> +
>>>>>>> packet_increment_head(&po->tx_ring);
>>>>>>> len_sum += tp_len;
>>>>>>> } while (likely((ph != NULL) ||
>>>>>>> @@ -1033,9 +1034,6 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
>>>>>>> err = len_sum;
>>>>>>> goto out_put;
>>>>>>>
>>>>>>> -out_xmit:
>>>>>>> - skb->destructor = sock_wfree;
>>>>>>> - atomic_dec(&po->tx_ring.pending);
>>>>>>> out_status:
>>>>>>> __packet_set_status(po, ph, status);
>>>>>>> kfree_skb(skb);
>>>>>>> --
>>>>>>>
> ...
>
>>>> This patch at first behaved similarly to the previous one - seemed
>>>> to be running a bit better... until the adapter went down :(
>>>>
>>> I'm not sure: do you mean this patch above vs previous one by Stephen,
>>> or did you manage to try my "alernative #2" patch already?
>>>
>>> BTW, I forgot to mention, and maybe it doesn't matter here, but it
>>> would be better to (always) use my sky2 patch from Berck Nash's
>>> thread.
>>>
>>> Jarek P.
>>>
>> This was using "alternative #2" patch. I didn't get the hang with
>> alternative #1. Your sky2 patch from Berck Nash's thread was
>> included in both cases; Stephen's was not.
>>
> OK, so I guess "alternative #1" (above) seems safer to recommend for
> now (as I assumed earlier).
>
> On the other hand, we really don't know if it's only because it's
> because it's nicer for your hardware (or still some other bug around),
> so as before: let David choose ;-)
>
> BTW, I think you could still use Stephen's patch too (there might be
> still something more like this). There was also mentioned this network
> manager again. I might be wrong, but IMHO there could be some
> interaction even if it doesn't use this device; so could/did you try
> to disable it entirely?
>
> Thanks for testing!
> Jarek P.
>
>
>
Just reran without the network manager - no change. Going to rerun with
Stephen's new patch, alternative #1, and the patch from Berck Nash's thread.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael Breuer on
On 1/6/2010 4:10 PM, Stephen Hemminger wrote:
> On Wed, 06 Jan 2010 14:49:38 -0500
> Michael Breuer<mbreuer(a)majjas.com> wrote:
>
>
>> This patch at first behaved similarly to the previous one - seemed to be
>> running a bit better... until the adapter went down :(
>>
>> This is the syslog output at the time the network failed:
>> Jan 6 14:11:01 mail kernel: sky2 0000:06:00.0: error interrupt
>> status=0x40000008
>> Jan 6 14:11:01 mail kernel: sky2 software interrupt status 0x40000008
>>
> Could you go back to baseline sky2 driver. The display code might be buggy.
> These bits indicate an error in the MAC. The interrupt source enabled
> is Transmit FIFO underrun.
>
> Looking at how vendor driver handles this.
> It looks like the Yukon EC_U chip doesn't really do Jumbo frames correctly.
> Maybe not enough internal buffering to ensure that the whole packet
> is in the chip. Of course, none of this is in the chip manual.
>
> Does this help
> --------------
> --- a/drivers/net/sky2.c 2010-01-06 12:48:43.012318966 -0800
> +++ b/drivers/net/sky2.c 2010-01-06 13:05:31.273987255 -0800
> @@ -792,33 +792,21 @@ static void sky2_set_tx_stfwd(struct sky
> {
> struct net_device *dev = hw->dev[port];
>
> - if ( (hw->chip_id == CHIP_ID_YUKON_EX&&
> - hw->chip_rev != CHIP_REV_YU_EX_A0) ||
> - hw->chip_id>= CHIP_ID_YUKON_FE_P) {
> - /* Yukon-Extreme B0 and further Extreme devices */
> - /* enable Store& Forward mode for TX */
> -
> - if (dev->mtu<= ETH_DATA_LEN)
> - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T),
> - TX_JUMBO_DIS | TX_STFW_ENA);
> -
> - else
> - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T),
> - TX_JUMBO_ENA| TX_STFW_ENA);
> - } else {
> - if (dev->mtu<= ETH_DATA_LEN)
> - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_ENA);
> - else {
> - /* set Tx GMAC FIFO Almost Empty Threshold */
> - sky2_write32(hw, SK_REG(port, TX_GMF_AE_THR),
> - (ECU_JUMBO_WM<< 16) | ECU_AE_THR);
> -
> - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_DIS);
> -
> - /* Can't do offload because of lack of store/forward */
> - dev->features&= ~(NETIF_F_TSO | NETIF_F_SG | NETIF_F_ALL_CSUM);
> - }
> - }
> + if ( (hw->chip_id == CHIP_ID_YUKON_EX&& hw->chip_rev != CHIP_REV_YU_EX_A0) ||
> + hw->chip_id>= CHIP_ID_YUKON_FE_P) {
> + /* Yukon-Extreme B0 and further Extreme devices */
> + /* enable Store& Forward mode for TX */
> + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_ENA);
> + } else if (dev->mtu> ETH_DATA_LEN) {
> + /* set Tx GMAC FIFO Almost Empty Threshold */
> + sky2_write32(hw, SK_REG(port, TX_GMF_AE_THR),
> + (ECU_JUMBO_WM<< 16) | ECU_AE_THR);
> + /* disable Store& Forward mode for TX */
> + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_DIS);
> + } else {
> + /* enable Store& Forward mode for TX */
> + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_ENA);
> + }
> }
>
> static void sky2_mac_init(struct sky2_hw *hw, unsigned port)
> @@ -2185,11 +2173,16 @@ static int sky2_change_mtu(struct net_de
> if (new_mtu< ETH_ZLEN || new_mtu> ETH_JUMBO_MTU)
> return -EINVAL;
>
> + /* MTU> 1500 on yukon FE and FE+ not allowed */
> if (new_mtu> ETH_DATA_LEN&&
> (hw->chip_id == CHIP_ID_YUKON_FE ||
> hw->chip_id == CHIP_ID_YUKON_FE_P))
> return -EINVAL;
>
> + /* TSO on Yukon Ultra and MTU> 1500 not supported */
> + if (new_mtu> ETH_DATA_LEN&& hw->chip_id == CHIP_ID_YUKON_EC_U)
> + dev->features&= ~NETIF_F_TSO;
> +
> if (!netif_running(dev)) {
> dev->mtu = new_mtu;
> return 0;
> @@ -2233,6 +2226,15 @@ static int sky2_change_mtu(struct net_de
> if (err)
> dev_close(dev);
> else {
> + /* WA for dev. #4.209 */
> + if (hw->chip_id == CHIP_ID_YUKON_EC_U&&
> + hw->chip_rev == CHIP_REV_YU_EC_U_A1) {
> + /* enable/disable Store& Forward mode for TX */
> + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T),
> + sky2->speed != SPEED_1000
> + ? TX_STFW_ENA : TX_STFW_DIS);
> + }
> +
> gma_write16(hw, port, GM_GP_CTRL, ctl);
>
> netif_wake_queue(dev);
> --- a/drivers/net/sky2.h 2010-01-06 12:48:48.632247424 -0800
> +++ b/drivers/net/sky2.h 2010-01-06 12:59:57.322078964 -0800
> @@ -1901,8 +1901,8 @@ enum {
> TX_VLAN_TAG_ON = 1<<25,/* enable VLAN tagging */
> TX_VLAN_TAG_OFF = 1<<24,/* disable VLAN tagging */
>
> - TX_JUMBO_ENA = 1<<23,/* PCI Jumbo Mode enable (Yukon-EC Ultra) */
> - TX_JUMBO_DIS = 1<<22,/* PCI Jumbo Mode enable (Yukon-EC Ultra) */
> + TX_PCI_JUM_ENA = 1<<23,/* Enable PCI Jumbo Mode (Yukon-EC Ultra) */
> + TX_PCI_JUM_DIS = 1<<22,/* Disable PCI Jumbo Mode (Yukon-EC Ultra) */
>
> GMF_WSP_TST_ON = 1<<18,/* Write Shadow Pointer Test On */
> GMF_WSP_TST_OFF = 1<<17,/* Write Shadow Pointer Test Off */
>
Ok ... results - and maybe some more clues...

Running with this patch; Jarek's "alternative 1", and the patch from the
other thread. Not so good.

No reported errors (sky2, etc.) - however with mtu=9000, lots of stuff
broke: XDMCP; http via MASQ/netfilter, ssh connections intermittently
(when large frames involved perhaps), etc. Tried to change mtu to 1500
on the fly, got a bunch of errors - and network watchdog kicked in. Have
now rebooted with the same patches and mtu=1500.
.... with mtu=1500, Everything is again working (i.e., XDMCP, netfilter,
etc.)
Load test with mtu=1500 went well for a while - high throughput
sustained for a few minutes - then similar crash as before... but no
interrup error messages this time until after the oops:
<nothing of note before this>
Jan 6 18:17:54 mail kernel: DRHD: handling fault status reg 2
Jan 6 18:17:54 mail kernel: DMAR:[DMA Read] Request device [06:00.0]
fault addr 1bbfe000
Jan 6 18:17:54 mail kernel: DMAR:[fault reason 06] PTE Read access is
not set
Jan 6 18:17:54 mail kernel: sky2 0000:06:00.0: error interrupt
status=0x80000000
Jan 6 18:17:54 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
Jan 6 18:18:04 mail kernel: ------------[ cut here ]------------
Jan 6 18:18:04 mail kernel: WARNING: at net/sched/sch_generic.c:261
dev_watchdog+0xf3/0x164()
Jan 6 18:18:04 mail kernel: Hardware name: System Product Name
Jan 6 18:18:04 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit
queue 0 timed out
Jan 6 18:18:04 mail kernel: Modules linked in: ip6table_filter
ip6table_mangle ip6_tables ipt_MASQUERADE iptable_nat nf_nat
iptable_mangle iptable_raw bridge stp appletalk psnap llc nfsd lockd
nfs_acl auth_rpcgss exportfs hwmon_vid coretemp sunrpc acpi_cpufreq sit
tunnel4 ipt_LOG nf_conntrack_netbios_ns nf_conntrack_ftp xt_DSCP xt_dscp
xt_MARK nf_conntrack_ipv6 xt_multiport ipv6 dm_multipath kvm_intel kvm
snd_hda_codec_analog snd_ens1371 gameport snd_rawmidi snd_ac97_codec
snd_hda_intel snd_hda_codec ac97_bus snd_hwdep snd_seq snd_seq_device
gspca_spca505 gspca_main videodev v4l1_compat snd_pcm
v4l2_compat_ioctl32 pcspkr asus_atk0110 hwmon i2c_i801 iTCO_wdt
firewire_ohci iTCO_vendor_support firewire_core crc_itu_t snd_timer snd
sky2 soundcore wmi snd_page_alloc fbcon tileblit font bitblit softcursor
raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy
async_tx raid1 ata_generic pata_acpi pata_marvell nouveau ttm
drm_kms_helper drm agpgart fb i2c_algo_bit cfbcopyarea i2c_core
cfbimgblt cfbfil
Jan 6 18:18:04 mail kernel: lrect [last unloaded: microcode]
Jan 6 18:18:04 mail kernel: Pid: 0, comm: swapper Tainted: G W
2.6.32-00840-gec8257c-dirty #41
Jan 6 18:18:04 mail kernel: Call Trace:
Jan 6 18:18:04 mail kernel: <IRQ> [<ffffffff8105365a>]
warn_slowpath_common+0x7c/0x94
Jan 6 18:18:04 mail kernel: [<ffffffff810536c9>]
warn_slowpath_fmt+0x41/0x43
Jan 6 18:18:04 mail kernel: [<ffffffff813e12bf>] ? netif_tx_lock+0x44/0x6c
Jan 6 18:18:04 mail kernel: [<ffffffff813e1427>] dev_watchdog+0xf3/0x164
Jan 6 18:18:04 mail kernel: [<ffffffff81077696>] ?
sched_clock_cpu+0x47/0xd1
Jan 6 18:18:04 mail kernel: [<ffffffff8106316b>]
run_timer_softirq+0x1c8/0x270
Jan 6 18:18:04 mail kernel: [<ffffffff8105ae3b>] __do_softirq+0xf8/0x1cd
Jan 6 18:18:04 mail kernel: [<ffffffff8107ef33>] ?
tick_program_event+0x2a/0x2c
Jan 6 18:18:04 mail kernel: [<ffffffff81012e1c>] call_softirq+0x1c/0x30
Jan 6 18:18:04 mail kernel: [<ffffffff810143a3>] do_softirq+0x4b/0xa6
Jan 6 18:18:04 mail kernel: [<ffffffff8105aa1b>] irq_exit+0x4a/0x8c
Jan 6 18:18:04 mail kernel: [<ffffffff8146dd32>]
smp_apic_timer_interrupt+0x86/0x94
Jan 6 18:18:04 mail kernel: [<ffffffff810127e3>]
apic_timer_interrupt+0x13/0x20
Jan 6 18:18:04 mail kernel: <EOI> [<ffffffff812c4a06>] ?
acpi_idle_enter_c1+0xb2/0xd0
Jan 6 18:18:04 mail kernel: [<ffffffff812c49ff>] ?
acpi_idle_enter_c1+0xab/0xd0
Jan 6 18:18:04 mail kernel: [<ffffffff813a43b8>] ?
cpuidle_idle_call+0x9e/0xfa
Jan 6 18:18:04 mail kernel: [<ffffffff81010c90>] ? cpu_idle+0xb4/0xf6
Jan 6 18:18:04 mail kernel: [<ffffffff81463312>] ?
start_secondary+0x201/0x242
Jan 6 18:18:04 mail kernel: ---[ end trace 57f7151f6a5def07 ]---
Jan 6 18:18:04 mail kernel: sky2 eth0: tx timeout
Jan 6 18:18:04 mail kernel: sky2 eth0: transmit ring 21 .. 108
report=21 done=21
Jan 6 18:18:04 mail kernel: sky2 eth0: disabling interface
Jan 6 18:18:04 mail kernel: sky2 eth0: enabling interface
<eth0 dead after this>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael Breuer on
On 1/6/2010 6:26 PM, Michael Breuer wrote:
> On 1/6/2010 4:10 PM, Stephen Hemminger wrote:
>> On Wed, 06 Jan 2010 14:49:38 -0500
>> Michael Breuer<mbreuer(a)majjas.com> wrote:
>>
>>> This patch at first behaved similarly to the previous one - seemed
>>> to be
>>> running a bit better... until the adapter went down :(
>>>
>>> This is the syslog output at the time the network failed:
>>> Jan 6 14:11:01 mail kernel: sky2 0000:06:00.0: error interrupt
>>> status=0x40000008
>>> Jan 6 14:11:01 mail kernel: sky2 software interrupt status 0x40000008
>> Could you go back to baseline sky2 driver. The display code might be
>> buggy.
>> These bits indicate an error in the MAC. The interrupt source enabled
>> is Transmit FIFO underrun.
>>
>> Looking at how vendor driver handles this.
>> It looks like the Yukon EC_U chip doesn't really do Jumbo frames
>> correctly.
>> Maybe not enough internal buffering to ensure that the whole packet
>> is in the chip. Of course, none of this is in the chip manual.
>>
>> Does this help
>> --------------
>> --- a/drivers/net/sky2.c 2010-01-06 12:48:43.012318966 -0800
>> +++ b/drivers/net/sky2.c 2010-01-06 13:05:31.273987255 -0800
>> @@ -792,33 +792,21 @@ static void sky2_set_tx_stfwd(struct sky
>> {
>> struct net_device *dev = hw->dev[port];
>>
>> - if ( (hw->chip_id == CHIP_ID_YUKON_EX&&
>> - hw->chip_rev != CHIP_REV_YU_EX_A0) ||
>> - hw->chip_id>= CHIP_ID_YUKON_FE_P) {
>> - /* Yukon-Extreme B0 and further Extreme devices */
>> - /* enable Store& Forward mode for TX */
>> -
>> - if (dev->mtu<= ETH_DATA_LEN)
>> - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T),
>> - TX_JUMBO_DIS | TX_STFW_ENA);
>> -
>> - else
>> - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T),
>> - TX_JUMBO_ENA| TX_STFW_ENA);
>> - } else {
>> - if (dev->mtu<= ETH_DATA_LEN)
>> - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_ENA);
>> - else {
>> - /* set Tx GMAC FIFO Almost Empty Threshold */
>> - sky2_write32(hw, SK_REG(port, TX_GMF_AE_THR),
>> - (ECU_JUMBO_WM<< 16) | ECU_AE_THR);
>> -
>> - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_DIS);
>> -
>> - /* Can't do offload because of lack of store/forward */
>> - dev->features&= ~(NETIF_F_TSO | NETIF_F_SG |
>> NETIF_F_ALL_CSUM);
>> - }
>> - }
>> + if ( (hw->chip_id == CHIP_ID_YUKON_EX&& hw->chip_rev !=
>> CHIP_REV_YU_EX_A0) ||
>> + hw->chip_id>= CHIP_ID_YUKON_FE_P) {
>> + /* Yukon-Extreme B0 and further Extreme devices */
>> + /* enable Store& Forward mode for TX */
>> + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_ENA);
>> + } else if (dev->mtu> ETH_DATA_LEN) {
>> + /* set Tx GMAC FIFO Almost Empty Threshold */
>> + sky2_write32(hw, SK_REG(port, TX_GMF_AE_THR),
>> + (ECU_JUMBO_WM<< 16) | ECU_AE_THR);
>> + /* disable Store& Forward mode for TX */
>> + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_DIS);
>> + } else {
>> + /* enable Store& Forward mode for TX */
>> + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_ENA);
>> + }
>> }
>>
>> static void sky2_mac_init(struct sky2_hw *hw, unsigned port)
>> @@ -2185,11 +2173,16 @@ static int sky2_change_mtu(struct net_de
>> if (new_mtu< ETH_ZLEN || new_mtu> ETH_JUMBO_MTU)
>> return -EINVAL;
>>
>> + /* MTU> 1500 on yukon FE and FE+ not allowed */
>> if (new_mtu> ETH_DATA_LEN&&
>> (hw->chip_id == CHIP_ID_YUKON_FE ||
>> hw->chip_id == CHIP_ID_YUKON_FE_P))
>> return -EINVAL;
>>
>> + /* TSO on Yukon Ultra and MTU> 1500 not supported */
>> + if (new_mtu> ETH_DATA_LEN&& hw->chip_id == CHIP_ID_YUKON_EC_U)
>> + dev->features&= ~NETIF_F_TSO;
>> +
>> if (!netif_running(dev)) {
>> dev->mtu = new_mtu;
>> return 0;
>> @@ -2233,6 +2226,15 @@ static int sky2_change_mtu(struct net_de
>> if (err)
>> dev_close(dev);
>> else {
>> + /* WA for dev. #4.209 */
>> + if (hw->chip_id == CHIP_ID_YUKON_EC_U&&
>> + hw->chip_rev == CHIP_REV_YU_EC_U_A1) {
>> + /* enable/disable Store& Forward mode for TX */
>> + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T),
>> + sky2->speed != SPEED_1000
>> + ? TX_STFW_ENA : TX_STFW_DIS);
>> + }
>> +
>> gma_write16(hw, port, GM_GP_CTRL, ctl);
>>
>> netif_wake_queue(dev);
>> --- a/drivers/net/sky2.h 2010-01-06 12:48:48.632247424 -0800
>> +++ b/drivers/net/sky2.h 2010-01-06 12:59:57.322078964 -0800
>> @@ -1901,8 +1901,8 @@ enum {
>> TX_VLAN_TAG_ON = 1<<25,/* enable VLAN tagging */
>> TX_VLAN_TAG_OFF = 1<<24,/* disable VLAN tagging */
>>
>> - TX_JUMBO_ENA = 1<<23,/* PCI Jumbo Mode enable (Yukon-EC
>> Ultra) */
>> - TX_JUMBO_DIS = 1<<22,/* PCI Jumbo Mode enable (Yukon-EC
>> Ultra) */
>> + TX_PCI_JUM_ENA = 1<<23,/* Enable PCI Jumbo Mode (Yukon-EC
>> Ultra) */
>> + TX_PCI_JUM_DIS = 1<<22,/* Disable PCI Jumbo Mode (Yukon-EC
>> Ultra) */
>>
>> GMF_WSP_TST_ON = 1<<18,/* Write Shadow Pointer Test On */
>> GMF_WSP_TST_OFF = 1<<17,/* Write Shadow Pointer Test Off */
> Ok ... results - and maybe some more clues...
>
> Running with this patch; Jarek's "alternative 1", and the patch from
> the other thread. Not so good.
>
> No reported errors (sky2, etc.) - however with mtu=9000, lots of stuff
> broke: XDMCP; http via MASQ/netfilter, ssh connections intermittently
> (when large frames involved perhaps), etc. Tried to change mtu to 1500
> on the fly, got a bunch of errors - and network watchdog kicked in.
> Have now rebooted with the same patches and mtu=1500.
> ... with mtu=1500, Everything is again working (i.e., XDMCP,
> netfilter, etc.)
> Load test with mtu=1500 went well for a while - high throughput
> sustained for a few minutes - then similar crash as before... but no
> interrup error messages this time until after the oops:
> <nothing of note before this>
> Jan 6 18:17:54 mail kernel: DRHD: handling fault status reg 2
> Jan 6 18:17:54 mail kernel: DMAR:[DMA Read] Request device [06:00.0]
> fault addr 1bbfe000
> Jan 6 18:17:54 mail kernel: DMAR:[fault reason 06] PTE Read access is
> not set
> Jan 6 18:17:54 mail kernel: sky2 0000:06:00.0: error interrupt
> status=0x80000000
> Jan 6 18:17:54 mail kernel: sky2 0000:06:00.0: PCI hardware error
> (0x2010)
> Jan 6 18:18:04 mail kernel: ------------[ cut here ]------------
> Jan 6 18:18:04 mail kernel: WARNING: at net/sched/sch_generic.c:261
> dev_watchdog+0xf3/0x164()
> Jan 6 18:18:04 mail kernel: Hardware name: System Product Name
> Jan 6 18:18:04 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit
> queue 0 timed out
> Jan 6 18:18:04 mail kernel: Modules linked in: ip6table_filter
> ip6table_mangle ip6_tables ipt_MASQUERADE iptable_nat nf_nat
> iptable_mangle iptable_raw bridge stp appletalk psnap llc nfsd lockd
> nfs_acl auth_rpcgss exportfs hwmon_vid coretemp sunrpc acpi_cpufreq
> sit tunnel4 ipt_LOG nf_conntrack_netbios_ns nf_conntrack_ftp xt_DSCP
> xt_dscp xt_MARK nf_conntrack_ipv6 xt_multiport ipv6 dm_multipath
> kvm_intel kvm snd_hda_codec_analog snd_ens1371 gameport snd_rawmidi
> snd_ac97_codec snd_hda_intel snd_hda_codec ac97_bus snd_hwdep snd_seq
> snd_seq_device gspca_spca505 gspca_main videodev v4l1_compat snd_pcm
> v4l2_compat_ioctl32 pcspkr asus_atk0110 hwmon i2c_i801 iTCO_wdt
> firewire_ohci iTCO_vendor_support firewire_core crc_itu_t snd_timer
> snd sky2 soundcore wmi snd_page_alloc fbcon tileblit font bitblit
> softcursor raid456 async_raid6_recov async_pq raid6_pq async_xor xor
> async_memcpy async_tx raid1 ata_generic pata_acpi pata_marvell nouveau
> ttm drm_kms_helper drm agpgart fb i2c_algo_bit cfbcopyarea i2c_core
> cfbimgblt cfbfil
> Jan 6 18:18:04 mail kernel: lrect [last unloaded: microcode]
> Jan 6 18:18:04 mail kernel: Pid: 0, comm: swapper Tainted: G
> W 2.6.32-00840-gec8257c-dirty #41
> Jan 6 18:18:04 mail kernel: Call Trace:
> Jan 6 18:18:04 mail kernel: <IRQ> [<ffffffff8105365a>]
> warn_slowpath_common+0x7c/0x94
> Jan 6 18:18:04 mail kernel: [<ffffffff810536c9>]
> warn_slowpath_fmt+0x41/0x43
> Jan 6 18:18:04 mail kernel: [<ffffffff813e12bf>] ?
> netif_tx_lock+0x44/0x6c
> Jan 6 18:18:04 mail kernel: [<ffffffff813e1427>] dev_watchdog+0xf3/0x164
> Jan 6 18:18:04 mail kernel: [<ffffffff81077696>] ?
> sched_clock_cpu+0x47/0xd1
> Jan 6 18:18:04 mail kernel: [<ffffffff8106316b>]
> run_timer_softirq+0x1c8/0x270
> Jan 6 18:18:04 mail kernel: [<ffffffff8105ae3b>] __do_softirq+0xf8/0x1cd
> Jan 6 18:18:04 mail kernel: [<ffffffff8107ef33>] ?
> tick_program_event+0x2a/0x2c
> Jan 6 18:18:04 mail kernel: [<ffffffff81012e1c>] call_softirq+0x1c/0x30
> Jan 6 18:18:04 mail kernel: [<ffffffff810143a3>] do_softirq+0x4b/0xa6
> Jan 6 18:18:04 mail kernel: [<ffffffff8105aa1b>] irq_exit+0x4a/0x8c
> Jan 6 18:18:04 mail kernel: [<ffffffff8146dd32>]
> smp_apic_timer_interrupt+0x86/0x94
> Jan 6 18:18:04 mail kernel: [<ffffffff810127e3>]
> apic_timer_interrupt+0x13/0x20
> Jan 6 18:18:04 mail kernel: <EOI> [<ffffffff812c4a06>] ?
> acpi_idle_enter_c1+0xb2/0xd0
> Jan 6 18:18:04 mail kernel: [<ffffffff812c49ff>] ?
> acpi_idle_enter_c1+0xab/0xd0
> Jan 6 18:18:04 mail kernel: [<ffffffff813a43b8>] ?
> cpuidle_idle_call+0x9e/0xfa
> Jan 6 18:18:04 mail kernel: [<ffffffff81010c90>] ? cpu_idle+0xb4/0xf6
> Jan 6 18:18:04 mail kernel: [<ffffffff81463312>] ?
> start_secondary+0x201/0x242
> Jan 6 18:18:04 mail kernel: ---[ end trace 57f7151f6a5def07 ]---
> Jan 6 18:18:04 mail kernel: sky2 eth0: tx timeout
> Jan 6 18:18:04 mail kernel: sky2 eth0: transmit ring 21 .. 108
> report=21 done=21
> Jan 6 18:18:04 mail kernel: sky2 eth0: disabling interface
> Jan 6 18:18:04 mail kernel: sky2 eth0: enabling interface
> <eth0 dead after this>
Walked through the code based on Jarek's patches... came upon
NET_CLS_ACT. At least in some cases (sch_cbq.c for example), the net
transmit error could be returned from here... after releasing the skb. A
quick scan of the various files in net/sched suggests that with
NET_CLS_ACT the skb may or may not have been freed in the event of an
error. If I have time later I'll see if I can bypass NET_CLS_ACT and see
whether this is even relevant.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/