From: Stephen Hemminger on
On Wed, 27 Jan 2010 10:34:51 -0500
Michael Breuer <mbreuer(a)majjas.com> wrote:

> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
> > On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
> >
> >> When the packets were dropped, there was a different sequence in the
> >> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
> >> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
> >> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
> >>
> > Anyway, I'd be intersted if the switch matters here.
> >
> > Plus one more test: could you try to load sky2 with the parameter:
> > "copybreak=1" (the rest as in any recent test, which gave you dmar
> > errors; any switch).
> >
> > Thanks,
> > Jarek P.
> >
> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
> to confirm that I haven't inadvertently fixed something. However, given
> that it might be copybreak-related, I looked at sky2.c again and I'm
> wondering about the copybreak max size in sky2_rx_start:
>
> size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
>
> /* Stopping point for hardware truncation */
> thresh = (size - 8) / sizeof(u32);
>
> sky2->rx_nfrags = size >> PAGE_SHIFT;
> BUG_ON(sky2->rx_nfrags > ARRAY_SIZE(re->frag_addr));
>
> /* Compute residue after pages */
> size -= sky2->rx_nfrags << PAGE_SHIFT;
>
> /* Optimize to handle small packets and headers */
> if (size < copybreak)
> size = copybreak;
> if (size < ETH_HLEN)
> size = ETH_HLEN;
>
>
> Why would increasing size to copybreak be valid here?
>
> Guessing a bit as I'm not sure about rx_nfrags, but if I read this
> correctly, if size is ever less than copybreak it's because there isn't
> enough space left for anything larger. If so, wouldn't increasing size
> potentially corrupt something? I'd further guess that the resulting
> condition manifests sooner (or at least with a more visible effect) when
> using DMAR.
>
> In any event, why "copybreak" as the minimum buffer size? I'd suggest
> that if it isn't possible to allocate at least MTU + overhead that
> sky2_rx_start ought to be delayed until there is room.

This code is where driver decides how much data will be received in skb
data area and the remaining data spills over into skb frags.
Copybreak is the threshold so that packets less than size are copied
to a new skb. The code doing the copying there assumes the data is
totally contained in the skb (not in frags). The size increase there
is to make sure that assumption is always true. I suppose you
could do something perverse like setting copybreak really huge
and confuse driver, but that is a user error.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael Breuer on
On 1/27/2010 11:50 AM, Stephen Hemminger wrote:
> On Wed, 27 Jan 2010 10:34:51 -0500
> Michael Breuer<mbreuer(a)majjas.com> wrote:
>
>
>> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
>>
>>> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
>>>
>>>
>>>> When the packets were dropped, there was a different sequence in the
>>>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
>>>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
>>>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
>>>>
>>>>
>>> Anyway, I'd be intersted if the switch matters here.
>>>
>>> Plus one more test: could you try to load sky2 with the parameter:
>>> "copybreak=1" (the rest as in any recent test, which gave you dmar
>>> errors; any switch).
>>>
>>> Thanks,
>>> Jarek P.
>>>
>>>
>> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
>> to confirm that I haven't inadvertently fixed something. However, given
>> that it might be copybreak-related, I looked at sky2.c again and I'm
>> wondering about the copybreak max size in sky2_rx_start:
>>
>> size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
>>
>> /* Stopping point for hardware truncation */
>> thresh = (size - 8) / sizeof(u32);
>>
>> sky2->rx_nfrags = size>> PAGE_SHIFT;
>> BUG_ON(sky2->rx_nfrags> ARRAY_SIZE(re->frag_addr));
>>
>> /* Compute residue after pages */
>> size -= sky2->rx_nfrags<< PAGE_SHIFT;
>>
>> /* Optimize to handle small packets and headers */
>> if (size< copybreak)
>> size = copybreak;
>> if (size< ETH_HLEN)
>> size = ETH_HLEN;
>>
>>
>> Why would increasing size to copybreak be valid here?
>>
>> Guessing a bit as I'm not sure about rx_nfrags, but if I read this
>> correctly, if size is ever less than copybreak it's because there isn't
>> enough space left for anything larger. If so, wouldn't increasing size
>> potentially corrupt something? I'd further guess that the resulting
>> condition manifests sooner (or at least with a more visible effect) when
>> using DMAR.
>>
>> In any event, why "copybreak" as the minimum buffer size? I'd suggest
>> that if it isn't possible to allocate at least MTU + overhead that
>> sky2_rx_start ought to be delayed until there is room.
>>
> This code is where driver decides how much data will be received in skb
> data area and the remaining data spills over into skb frags.
> Copybreak is the threshold so that packets less than size are copied
> to a new skb. The code doing the copying there assumes the data is
> totally contained in the skb (not in frags). The size increase there
> is to make sure that assumption is always true. I suppose you
> could do something perverse like setting copybreak really huge
> and confuse driver, but that is a user error.
>
>
Ok - but I'm wondering under what circumstances size would be <
copybreak in the first place after computing the residue. If size ends
up being unreasonably small, is simply increasing the number to whatever
copybreak is correct? Assuming my testing is correct, then the crash
I've been experiencing when using dmar (only) seems related to the value
of copybreak. I don't think the other use (skb reuse) is the issue (but
hey, I could have missed something). The crash occurs when copybreak is
the default of 128, didn't happen when I set copybreak to 1.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Stephen Hemminger on
On Wed, 27 Jan 2010 11:57:35 -0500
Michael Breuer <mbreuer(a)majjas.com> wrote:

> On 1/27/2010 11:50 AM, Stephen Hemminger wrote:
> > On Wed, 27 Jan 2010 10:34:51 -0500
> > Michael Breuer<mbreuer(a)majjas.com> wrote:
> >
> >
> >> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
> >>
> >>> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
> >>>
> >>>
> >>>> When the packets were dropped, there was a different sequence in the
> >>>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
> >>>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
> >>>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
> >>>>
> >>>>
> >>> Anyway, I'd be intersted if the switch matters here.
> >>>
> >>> Plus one more test: could you try to load sky2 with the parameter:
> >>> "copybreak=1" (the rest as in any recent test, which gave you dmar
> >>> errors; any switch).
> >>>
> >>> Thanks,
> >>> Jarek P.
> >>>
> >>>
> >> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
> >> to confirm that I haven't inadvertently fixed something. However, given
> >> that it might be copybreak-related, I looked at sky2.c again and I'm
> >> wondering about the copybreak max size in sky2_rx_start:
> >>
> >> size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
> >>
> >> /* Stopping point for hardware truncation */
> >> thresh = (size - 8) / sizeof(u32);
> >>
> >> sky2->rx_nfrags = size>> PAGE_SHIFT;
> >> BUG_ON(sky2->rx_nfrags> ARRAY_SIZE(re->frag_addr));
> >>
> >> /* Compute residue after pages */
> >> size -= sky2->rx_nfrags<< PAGE_SHIFT;
> >>
> >> /* Optimize to handle small packets and headers */
> >> if (size< copybreak)
> >> size = copybreak;
> >> if (size< ETH_HLEN)
> >> size = ETH_HLEN;
> >>
> >>
> >> Why would increasing size to copybreak be valid here?
> >>
> >> Guessing a bit as I'm not sure about rx_nfrags, but if I read this
> >> correctly, if size is ever less than copybreak it's because there isn't
> >> enough space left for anything larger. If so, wouldn't increasing size
> >> potentially corrupt something? I'd further guess that the resulting
> >> condition manifests sooner (or at least with a more visible effect) when
> >> using DMAR.
> >>
> >> In any event, why "copybreak" as the minimum buffer size? I'd suggest
> >> that if it isn't possible to allocate at least MTU + overhead that
> >> sky2_rx_start ought to be delayed until there is room.
> >>
> > This code is where driver decides how much data will be received in skb
> > data area and the remaining data spills over into skb frags.
> > Copybreak is the threshold so that packets less than size are copied
> > to a new skb. The code doing the copying there assumes the data is
> > totally contained in the skb (not in frags). The size increase there
> > is to make sure that assumption is always true. I suppose you
> > could do something perverse like setting copybreak really huge
> > and confuse driver, but that is a user error.
> >
> >
> Ok - but I'm wondering under what circumstances size would be <
> copybreak in the first place after computing the residue. If size ends
> up being unreasonably small, is simply increasing the number to whatever
> copybreak is correct? Assuming my testing is correct, then the crash
> I've been experiencing when using dmar (only) seems related to the value
> of copybreak. I don't think the other use (skb reuse) is the issue (but
> hey, I could have missed something). The crash occurs when copybreak is
> the default of 128, didn't happen when I set copybreak to 1.
>

Setting it to 1 causes driver to never go through the dma_sync_single/memcpy
path. Perhaps the code for DMAR doesn't do dma_sync_single_for_cpu
properly, or the value passed to sync_single_for_cpu doesn't account for
all the overhead of padding and/or ether header.

--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Stephen Hemminger on
On Wed, 27 Jan 2010 11:57:35 -0500
Michael Breuer <mbreuer(a)majjas.com> wrote:

> On 1/27/2010 11:50 AM, Stephen Hemminger wrote:
> > On Wed, 27 Jan 2010 10:34:51 -0500
> > Michael Breuer<mbreuer(a)majjas.com> wrote:
> >
> >
> >> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
> >>
> >>> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
> >>>
> >>>
> >>>> When the packets were dropped, there was a different sequence in the
> >>>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
> >>>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
> >>>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
> >>>>
> >>>>
> >>> Anyway, I'd be intersted if the switch matters here.
> >>>
> >>> Plus one more test: could you try to load sky2 with the parameter:
> >>> "copybreak=1" (the rest as in any recent test, which gave you dmar
> >>> errors; any switch).
> >>>
> >>> Thanks,
> >>> Jarek P.
> >>>
> >>>
> >> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
> >> to confirm that I haven't inadvertently fixed something. However, given
> >> that it might be copybreak-related, I looked at sky2.c again and I'm
> >> wondering about the copybreak max size in sky2_rx_start:
> >>
> >> size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
> >>
> >> /* Stopping point for hardware truncation */
> >> thresh = (size - 8) / sizeof(u32);
> >>
> >> sky2->rx_nfrags = size>> PAGE_SHIFT;
> >> BUG_ON(sky2->rx_nfrags> ARRAY_SIZE(re->frag_addr));
> >>
> >> /* Compute residue after pages */
> >> size -= sky2->rx_nfrags<< PAGE_SHIFT;
> >>
> >> /* Optimize to handle small packets and headers */
> >> if (size< copybreak)
> >> size = copybreak;
> >> if (size< ETH_HLEN)
> >> size = ETH_HLEN;
> >>
> >>
> >> Why would increasing size to copybreak be valid here?
> >>
> >> Guessing a bit as I'm not sure about rx_nfrags, but if I read this
> >> correctly, if size is ever less than copybreak it's because there isn't
> >> enough space left for anything larger. If so, wouldn't increasing size
> >> potentially corrupt something? I'd further guess that the resulting
> >> condition manifests sooner (or at least with a more visible effect) when
> >> using DMAR.
> >>
> >> In any event, why "copybreak" as the minimum buffer size? I'd suggest
> >> that if it isn't possible to allocate at least MTU + overhead that
> >> sky2_rx_start ought to be delayed until there is room.
> >>
> > This code is where driver decides how much data will be received in skb
> > data area and the remaining data spills over into skb frags.
> > Copybreak is the threshold so that packets less than size are copied
> > to a new skb. The code doing the copying there assumes the data is
> > totally contained in the skb (not in frags). The size increase there
> > is to make sure that assumption is always true. I suppose you
> > could do something perverse like setting copybreak really huge
> > and confuse driver, but that is a user error.
> >
> >
> Ok - but I'm wondering under what circumstances size would be <
> copybreak in the first place after computing the residue. If size ends
> up being unreasonably small, is simply increasing the number to whatever
> copybreak is correct? Assuming my testing is correct, then the crash
> I've been experiencing when using dmar (only) seems related to the value
> of copybreak. I don't think the other use (skb reuse) is the issue (but
> hey, I could have missed something). The crash occurs when copybreak is
> the default of 128, didn't happen when I set copybreak to 1.

Does this change it? If so the dma code is (not sky2) is buggy and not
rounding up properly.

--- a/drivers/net/sky2.c 2010-01-27 09:46:10.940005248 -0800
+++ b/drivers/net/sky2.c 2010-01-27 09:53:47.141267850 -0800
@@ -2257,13 +2257,16 @@ static struct sk_buff *receive_copy(stru

skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
if (likely(skb)) {
+ unsigned dma_align = dma_get_cache_alignment();
+ unsigned dma_size = ALIGN(length+1, dma_align);
+
pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
- length, PCI_DMA_FROMDEVICE);
+ dma_size, PCI_DMA_FROMDEVICE);
skb_copy_from_linear_data(re->skb, skb->data, length);
skb->ip_summed = re->skb->ip_summed;
skb->csum = re->skb->csum;
pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
- length, PCI_DMA_FROMDEVICE);
+ dma_size, PCI_DMA_FROMDEVICE);
re->skb->ip_summed = CHECKSUM_NONE;
skb_put(skb, length);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael Breuer on
On 1/27/2010 12:45 PM, Stephen Hemminger wrote:
> On Wed, 27 Jan 2010 11:57:35 -0500
> Michael Breuer<mbreuer(a)majjas.com> wrote:
>
>
>> On 1/27/2010 11:50 AM, Stephen Hemminger wrote:
>>
>>> On Wed, 27 Jan 2010 10:34:51 -0500
>>> Michael Breuer<mbreuer(a)majjas.com> wrote:
>>>
>>>
>>>
>>>> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
>>>>
>>>>
>>>>> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
>>>>>
>>>>>
>>>>>
>>>>>> When the packets were dropped, there was a different sequence in the
>>>>>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
>>>>>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
>>>>>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
>>>>>>
>>>>>>
>>>>>>
>>>>> Anyway, I'd be intersted if the switch matters here.
>>>>>
>>>>> Plus one more test: could you try to load sky2 with the parameter:
>>>>> "copybreak=1" (the rest as in any recent test, which gave you dmar
>>>>> errors; any switch).
>>>>>
>>>>> Thanks,
>>>>> Jarek P.
>>>>>
>>>>>
>>>>>
>>>> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
>>>> to confirm that I haven't inadvertently fixed something. However, given
>>>> that it might be copybreak-related, I looked at sky2.c again and I'm
>>>> wondering about the copybreak max size in sky2_rx_start:
>>>>
>>>> size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
>>>>
>>>> /* Stopping point for hardware truncation */
>>>> thresh = (size - 8) / sizeof(u32);
>>>>
>>>> sky2->rx_nfrags = size>> PAGE_SHIFT;
>>>> BUG_ON(sky2->rx_nfrags> ARRAY_SIZE(re->frag_addr));
>>>>
>>>> /* Compute residue after pages */
>>>> size -= sky2->rx_nfrags<< PAGE_SHIFT;
>>>>
>>>> /* Optimize to handle small packets and headers */
>>>> if (size< copybreak)
>>>> size = copybreak;
>>>> if (size< ETH_HLEN)
>>>> size = ETH_HLEN;
>>>>
>>>>
>>>> Why would increasing size to copybreak be valid here?
>>>>
>>>> Guessing a bit as I'm not sure about rx_nfrags, but if I read this
>>>> correctly, if size is ever less than copybreak it's because there isn't
>>>> enough space left for anything larger. If so, wouldn't increasing size
>>>> potentially corrupt something? I'd further guess that the resulting
>>>> condition manifests sooner (or at least with a more visible effect) when
>>>> using DMAR.
>>>>
>>>> In any event, why "copybreak" as the minimum buffer size? I'd suggest
>>>> that if it isn't possible to allocate at least MTU + overhead that
>>>> sky2_rx_start ought to be delayed until there is room.
>>>>
>>>>
>>> This code is where driver decides how much data will be received in skb
>>> data area and the remaining data spills over into skb frags.
>>> Copybreak is the threshold so that packets less than size are copied
>>> to a new skb. The code doing the copying there assumes the data is
>>> totally contained in the skb (not in frags). The size increase there
>>> is to make sure that assumption is always true. I suppose you
>>> could do something perverse like setting copybreak really huge
>>> and confuse driver, but that is a user error.
>>>
>>>
>>>
>> Ok - but I'm wondering under what circumstances size would be<
>> copybreak in the first place after computing the residue. If size ends
>> up being unreasonably small, is simply increasing the number to whatever
>> copybreak is correct? Assuming my testing is correct, then the crash
>> I've been experiencing when using dmar (only) seems related to the value
>> of copybreak. I don't think the other use (skb reuse) is the issue (but
>> hey, I could have missed something). The crash occurs when copybreak is
>> the default of 128, didn't happen when I set copybreak to 1.
>>
>>
> Setting it to 1 causes driver to never go through the dma_sync_single/memcpy
> path. Perhaps the code for DMAR doesn't do dma_sync_single_for_cpu
> properly, or the value passed to sync_single_for_cpu doesn't account for
> all the overhead of padding and/or ether header.
>
>
Ah - ok... will poke around there... if you have any suggestions,
diagnostics, whatever, let me know. Also, just an FYI - before rebooting
with copybreak back to defaults, I tried mtu=9000 again. That hung the
server immediately - no diagnostic output - system froze until watchdog
rebooted. Don't know right now if the copybreak had anything to do with
this, but when I've tried in the past I've had errors on sky2, but never
crashed the system like this. Only two things different were copybreak
and the length of time the system had been up. I'll try later with
copybreak default and copybreak=1 to see if that affects mtu behavior.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/