From: Robert Hancock on
On Fri, Feb 26, 2010 at 3:36 AM, David Miller <davem(a)davemloft.net> wrote:
> From: Robert Hancock <hancockrwd(a)gmail.com>
> Date: Mon, 22 Feb 2010 20:45:45 -0600
>
>> Many networking drivers have issues with the use of the NETIF_F_HIGHDMA flag.
>> This flag actually indicates whether or not the device/driver can handle
>> skbs located in high memory (as opposed to lowmem). However, many drivers
>> incorrectly treat this flag as indicating that 64-bit DMA is supported, which
>> has nothing to do with its actual function. It makes no sense to make setting
>> NETIF_F_HIGHDMA conditional on whether a 64-bit DMA mask has been set, as many
>> drivers do, since if highmem DMA is supported at all, it should work regardless
>> of whether 64-bit DMA is supported. Failing to set NETIF_F_HIGHDMA when it
>> should be can hurt performance on architectures which use highmem since it
>> results in needless data copying.
>>
>> This fixes up the networking drivers which currently use NETIF_F_HIGHDMA to
>> not do so conditionally on DMA mask settings.
>>
>> For the USB kaweth and usbnet drivers, this patch also uncomments and corrects
>> some code to set NETIF_F_HIGHDMA based on the USB host controller's DMA mask.
>> These drivers should be able to access highmem unless the host controller is
>> non-DMA-capable, which is indicated by the DMA mask being null.
>>
>> Signed-off-by: Robert Hancock <hancockrwd(a)gmail.com>
>
> Well, if the device isn't using 64-bit DMA addressing and the platform
> uses direct (no-iommu) mapping of physical to DMA addresses , won't
> your change break things? �The device will get a >4GB DMA address or
> the DMA mapping layer will signal an error.
>
> That's really part of the what the issue is I think.
>
> So, this trigger the check in check_addr() in
> arch/x86/kernel/pci-nommu.c when such packets try to get mapped by the
> driver, right?
>
> That will make the DMA mapping call fail, and the packet will be
> dropped permanently. �And hey, on top of it, many of these drivers you
> remove the setting from don't even check the mapping call return
> values for errors.
>
> So even bigger breakage. �One example is drivers/net/8139cp.c,
> it just does dma_map_single() and uses the result.
>
> It really depends upon that NETIF_F_HIGHDMA setting for correct
> operation.
>
> And even if something like swiotlb is available, now we're going
> to do bounce buffering which is largely equivalent to what
> a lack of NETIF_F_HIGHDMA will do. �Except that once NETIF_F_HIGHDMA
> copies the packet to lowmem it will only do that once, whereas if
> the packet goes to multiple devices swiotlb might copy the packet
> to a bounce buffer multiple times.
>
> We definitely can't apply your patch as-is.

Hmm.. Yeah, there is a bit of a mess there. I'm thinking of the
particular example of i386 where you have 32-bit DMA devices with more
than 4GB of RAM. If you then allow the device to access highmem then
the DMA mapping API can get presented with addresses above 4GB and
AFAIK I don't think it can cope with that situation on that platform.

Problem is that the NETIF_F_HIGHDMA check is generally too restrictive
in that situation, and it's really conflating two things into one (the
genuine can't-access-highmem part, and the "oh by the way, if highmem
can be >4GB then we can't access that") . If you have 3GB of RAM on
i386 with one of these drivers, you'll have packets being bounced
through lowmem without any real reason. I'll have a look into things a
bit further..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Robert Hancock on
On Fri, Feb 26, 2010 at 9:25 AM, Bartlomiej Zolnierkiewicz
<bzolnier(a)gmail.com> wrote:
> On Friday 26 February 2010 03:46:45 pm Robert Hancock wrote:
>> On Fri, Feb 26, 2010 at 3:36 AM, David Miller <davem(a)davemloft.net> wrote:
>> > From: Robert Hancock <hancockrwd(a)gmail.com>
>> > Date: Mon, 22 Feb 2010 20:45:45 -0600
>> >
>> >> Many networking drivers have issues with the use of the NETIF_F_HIGHDMA flag.
>> >> This flag actually indicates whether or not the device/driver can handle
>> >> skbs located in high memory (as opposed to lowmem). However, many drivers
>> >> incorrectly treat this flag as indicating that 64-bit DMA is supported, which
>> >> has nothing to do with its actual function. It makes no sense to make setting
>> >> NETIF_F_HIGHDMA conditional on whether a 64-bit DMA mask has been set, as many
>> >> drivers do, since if highmem DMA is supported at all, it should work regardless
>> >> of whether 64-bit DMA is supported. Failing to set NETIF_F_HIGHDMA when it
>> >> should be can hurt performance on architectures which use highmem since it
>> >> results in needless data copying.
>> >>
>> >> This fixes up the networking drivers which currently use NETIF_F_HIGHDMA to
>> >> not do so conditionally on DMA mask settings.
>> >>
>> >> For the USB kaweth and usbnet drivers, this patch also uncomments and corrects
>> >> some code to set NETIF_F_HIGHDMA based on the USB host controller's DMA mask.
>> >> These drivers should be able to access highmem unless the host controller is
>> >> non-DMA-capable, which is indicated by the DMA mask being null.
>> >>
>> >> Signed-off-by: Robert Hancock <hancockrwd(a)gmail.com>
>> >
>> > Well, if the device isn't using 64-bit DMA addressing and the platform
>> > uses direct (no-iommu) mapping of physical to DMA addresses , won't
>> > your change break things? �The device will get a >4GB DMA address or
>> > the DMA mapping layer will signal an error.
>> >
>> > That's really part of the what the issue is I think.
>> >
>> > So, this trigger the check in check_addr() in
>> > arch/x86/kernel/pci-nommu.c when such packets try to get mapped by the
>> > driver, right?
>> >
>> > That will make the DMA mapping call fail, and the packet will be
>> > dropped permanently. �And hey, on top of it, many of these drivers you
>> > remove the setting from don't even check the mapping call return
>> > values for errors.
>> >
>> > So even bigger breakage. �One example is drivers/net/8139cp.c,
>> > it just does dma_map_single() and uses the result.
>> >
>> > It really depends upon that NETIF_F_HIGHDMA setting for correct
>> > operation.
>> >
>> > And even if something like swiotlb is available, now we're going
>> > to do bounce buffering which is largely equivalent to what
>> > a lack of NETIF_F_HIGHDMA will do. �Except that once NETIF_F_HIGHDMA
>> > copies the packet to lowmem it will only do that once, whereas if
>> > the packet goes to multiple devices swiotlb might copy the packet
>> > to a bounce buffer multiple times.
>> >
>> > We definitely can't apply your patch as-is.
>>
>> Hmm.. Yeah, there is a bit of a mess there. I'm thinking of the
>> particular example of i386 where you have 32-bit DMA devices with more
>> than 4GB of RAM. If you then allow the device to access highmem then
>> the DMA mapping API can get presented with addresses above 4GB and
>> AFAIK I don't think it can cope with that situation on that platform.
>>
>> Problem is that the NETIF_F_HIGHDMA check is generally too restrictive
>> in that situation, and it's really conflating two things into one (the
>> genuine can't-access-highmem part, and the "oh by the way, if highmem
>> can be >4GB then we can't access that") . If you have 3GB of RAM on
>> i386 with one of these drivers, you'll have packets being bounced
>> through lowmem without any real reason. I'll have a look into things a
>> bit further..
>
> Maybe it would be useful to start with splitting NETIF_F_HIGHDMA on two
> independent flags, i.e.:
>
> � � � �#define NETIF_F_DMA_HIGH � � � �(1 << 27)
> � � � �#define NETIF_F_DMA_64BIT � � � (1 << 28)
>
> and keeping NETIF_F_HIGHDMA as
>
> � � � �#define NETIF_F_HIGHDMA � � � � (NETIF_F_DMA_HIGH | NET_F_DMA_64BIT)
>
> for now..?
>
> [ Next step would involve fixing illegal_highdma() check in net/core/dev.c
> �to distinguish between those new flags which in turn should allow sorting
> �out code in the device drivers on *per-driver* basis. ]

That seems like a reasonable approach to me. Only question is how to
implement the check for DMA_64BIT. Can we just check page_to_phys on
each of the pages in the skb to see if it's > 0xffffffff ? Are there
any architectures where it's more complicated than that?

Also, presumably these flags still wouldn't have any effect on
non-highmem architectures, same as NETIF_F_HIGHDMA now, since the best
we can do is copy the data to lowmem, which could still be >4GB in
this case. Therefore, the IOMMU (either HW or SWIOTLB, one of which
had better exist) just has to deal with it. (I suppose you could copy
to a buffer allocated with GFP_DMA32, but that likely wouldn't be a
win if you had a hardware IOMMU.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Robert Hancock on
On Sat, Feb 27, 2010 at 3:53 AM, David Miller <davem(a)davemloft.net> wrote:
> From: Robert Hancock <hancockrwd(a)gmail.com>
> Date: Fri, 26 Feb 2010 21:08:04 -0600
>
>> That seems like a reasonable approach to me. Only question is how to
>> implement the check for DMA_64BIT. Can we just check page_to_phys on
>> each of the pages in the skb to see if it's > 0xffffffff ? Are there
>> any architectures where it's more complicated than that?
>
> On almost every platform it's "more complicated than that".
>
> This is the whole issue. �What matters is the final DMA address and
> since we have IOMMUs and the like, it is absolutely not tenable to
> solve this by checking physical address attributes.

Yeah, physical address isn't quite right. There is a precedent for
such a check in the block layer though - look at
blk_queue_bounce_limit in block/blk-settings.c:

void blk_queue_bounce_limit(struct request_queue *q, u64 dma_mask)
{
unsigned long b_pfn = dma_mask >> PAGE_SHIFT;
int dma = 0;

q->bounce_gfp = GFP_NOIO;
#if BITS_PER_LONG == 64
/*
* Assume anything <= 4GB can be handled by IOMMU. Actually
* some IOMMUs can handle everything, but I don't know of a
* way to test this here.
*/
if (b_pfn < (min_t(u64, 0xffffffffUL, BLK_BOUNCE_HIGH) >> PAGE_SHIFT))
dma = 1;
q->limits.bounce_pfn = max_low_pfn;
#else
if (b_pfn < blk_max_low_pfn)
dma = 1;
q->limits.bounce_pfn = b_pfn;
#endif
if (dma) {
init_emergency_isa_pool();
q->bounce_gfp = GFP_NOIO | GFP_DMA;
q->limits.bounce_pfn = b_pfn;
}
}

and then in mm/bounce.c:

static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
mempool_t *pool)
{
struct page *page;
struct bio *bio = NULL;
int i, rw = bio_data_dir(*bio_orig);
struct bio_vec *to, *from;

bio_for_each_segment(from, *bio_orig, i) {
page = from->bv_page;

/*
* is destination page below bounce pfn?
*/
if (page_to_pfn(page) <= queue_bounce_pfn(q))
continue;

Following that logic then, it appears that page_to_pfn(page) >
(0xffffffff >> PAGE_SHIFT) should tell us what we want to know for the
DMA_64BIT flag.. or am I missing something?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Robert Hancock on
On Sat, Feb 27, 2010 at 6:05 AM, David Miller <davem(a)davemloft.net> wrote:
> From: Bartlomiej Zolnierkiewicz <bzolnier(a)gmail.com>
> Date: Sat, 27 Feb 2010 12:59:31 +0100
>
>> Having IOMMU (even if it is only a software one, i.e. this would
>> mean swiotlb for x86-32/highmem) always in place would simplify
>> things greatly..
>
> I agree, things would be a lot simpler.

Yeah, the situation kind of sucks on the platforms that don't have any
IOMMU support, since it means that the DMA API can't handle anything
over 4GB properly and you need all these hacks in the block layer,
networking layer, etc. It would be nice if some kind of IOMMU support
could be relied upon always.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: FUJITA Tomonori on
On Sat, 27 Feb 2010 12:15:19 -0600
Robert Hancock <hancockrwd(a)gmail.com> wrote:

> On Sat, Feb 27, 2010 at 6:05 AM, David Miller <davem(a)davemloft.net> wrote:
> > From: Bartlomiej Zolnierkiewicz <bzolnier(a)gmail.com>
> > Date: Sat, 27 Feb 2010 12:59:31 +0100
> >
> >> Having IOMMU (even if it is only a software one, i.e. this would
> >> mean swiotlb for x86-32/highmem) always in place would simplify
> >> things greatly..
> >
> > I agree, things would be a lot simpler.
>
> Yeah, the situation kind of sucks on the platforms that don't have any
> IOMMU support, since it means that the DMA API can't handle anything
> over 4GB properly and you need all these hacks in the block layer,
> networking layer, etc. It would be nice if some kind of IOMMU support
> could be relied upon always.

When I proposed such approach (always use swiotlb) before, IIRC,
the objections were:

- better to make allocation respect dma_mask. (I don't think that this
approach is possible since we don't know which device handles data
later when we allocate memory).

- swiotlb is not good for small systems since it allocates too much
memory (we can fix this though).

There might be more.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/