From: Herbert Xu on
On Sun, Jun 13, 2010 at 04:58:36PM +0800, Xin, Xiaohui wrote:
>
> Herbert,
> In this way, I think we should create 3 functions at least in drivers to allocate rx buffer, to receive the rx buffers, and to clean the rx buffers.
>
> We can also have another way here. We can provide a function to only substitute
> alloc_page(), and a function to release the pages when cleaning the rx buffers.

Yes that's exactly what I had in mind.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert(a)gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Herbert Xu on
On Sat, Jun 12, 2010 at 05:31:10PM +0800, Xin, Xiaohui wrote:
>
> 1) Modify driver from netdev_alloc_skb() to alloc user pages if dev is zero-copyed.
> If the driver support PS mode, then modify alloc_page() too.

Well if you were doing this then the driver won't be generating
skbs at all. So you don't need to change netdev_alloc_skb.

The driver would currently do alloc_page, so you would replace
that with netdev_alloc_page, which can call your new function
to allocate an external page where appropriate.

IOW you just need one change in the driver if it already uses
the skbless interface, to replace with alloc_page.

If the driver doesn't use the skbless interface then you need
to make a few more changes but it isn't too hard either, it'll
also mean that the driver will have less overhead even for normal
use which is a win-win situation.

> 2) Add napi_gro_frags() in driver to receive the user pages instead of driver's receiving
> function.
>
> 3) napi_gro_frags() will allocate small skb and pull the header data from
> the first page to skb->data.
>
> Is above the way what you have suggested?

Yes.

> I have thought something in detail about the way.
>
> 1) The first page will have an offset after the header is copied into allocated kernel skb.
> The offset should be recalculated when the user page data is transferred to guest. This
> may modify some of the gro code.

We could keep track whether the stack has modified the header,
since you can simply ignore it if it doesn't modify it, which
should be the common case for virt.

> 2) napi_gro_frags() may remove a page when it's data is totally be pulled, but we cannot
> put a user page as normally. This may modify the gro code too.

If it does anything like that, then we're not in the fast-path
case so you can just fall back to copying.

> 3) When the user buffer returned to guest, some of them need to be appended a vnet header.
> That means for some pages, the vnet header room should be reserved when allocated.
> But we cannot know which one will be used as the first page when allocated. If we reserved vnet header for each page, since the set_skb_frag() in guest driver only use the offset 0 for second pages, then page data will be wrong.

I don't see why this would be a problem, since as far as what
the driver is putting onto the physical RX ring nothing has
changed. IOW if you want to designate a certain page as special,
or the first page, you can still do so.

So can you explain which bits of your patches would be affected
by this?

> 4) Since the user buffer pages should be released, so we still need a dtor callback to do that, and then I still need a place to hold it. How do you think about to put it in skb_shinfo?

While I don't like that very much I guess I can live with that
if nobody else objects.

Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert(a)gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Xin, Xiaohui on
>-----Original Message-----
>From: Herbert Xu [mailto:herbert(a)gondor.apana.org.au]
>Sent: Thursday, June 17, 2010 7:21 PM
>To: Xin, Xiaohui
>Cc: Stephen Hemminger; netdev(a)vger.kernel.org; kvm(a)vger.kernel.org;
>linux-kernel(a)vger.kernel.org; mst(a)redhat.com; mingo(a)elte.hu; davem(a)davemloft.net;
>jdike(a)linux.intel.com
>Subject: Re: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
>
>On Sun, Jun 13, 2010 at 04:58:36PM +0800, Xin, Xiaohui wrote:
>>
>> Herbert,
>> In this way, I think we should create 3 functions at least in drivers to allocate rx buffer, to
>receive the rx buffers, and to clean the rx buffers.
>>
>> We can also have another way here. We can provide a function to only substitute
>> alloc_page(), and a function to release the pages when cleaning the rx buffers.
>
>Yes that's exactly what I had in mind.
>
Herbert,
I have questions about the idea above:
1) Since netdev_alloc_skb() is still there, and we only modify alloc_page(),
then we don't need napi_gro_frags() any more, the driver's original receiving
function is ok. Right?

2) Is napi_gro_frags() only suitable for TCP protocol packet?
I have done a small test for ixgbe driver to let it only allocate paged buffers
and found kernel hangs when napi_gro_frags() receives an ARP packet.

3) As I have mentioned above, with this idea, netdev_alloc_skb() will allocate
as usual, the data pointed by skb->data will be copied into the first guest buffer.
That means we should reserve sufficient room in guest buffer. For PS mode
supported driver (for example ixgbe), the room will be more than 128. After 128bytes,
we will put the first frag data. Look into virtio-net.c the function page_to_skb()
and receive_mergeable(), that means we should modify guest virtio-net driver to
compute the offset as the parameter for skb_set_frag().

How do you think about this? Attached is a patch to how to modify the guest driver.
I reserve 512 bytes as an example, and transfer the header len of the skb in hdr->hdr_len.

Thanks
Xiaohui
>Cheers,
>--
>Visit Openswan at http://www.openswan.org/
>Email: Herbert Xu ~{PmV>HI~} <herbert(a)gondor.apana.org.au>
>Home Page: http://gondor.apana.org.au/~herbert/
>PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
From: Herbert Xu on
On Fri, Jun 18, 2010 at 01:26:49PM +0800, Xin, Xiaohui wrote:
>
> Herbert,
> I have questions about the idea above:
> 1) Since netdev_alloc_skb() is still there, and we only modify alloc_page(),
> then we don't need napi_gro_frags() any more, the driver's original receiving
> function is ok. Right?

Well I was actually thinking about converting all drivers that
need this to napi_gro_frags. But now that you mention it, yes
we could still keep the old interface to minimise the work.

> 2) Is napi_gro_frags() only suitable for TCP protocol packet?
> I have done a small test for ixgbe driver to let it only allocate paged buffers
> and found kernel hangs when napi_gro_frags() receives an ARP packet.

It should work with any packet. In fact, I'm pretty sure the
other drivers (e.g., cxgb3) use that interface for all packets.

> 3) As I have mentioned above, with this idea, netdev_alloc_skb() will allocate
> as usual, the data pointed by skb->data will be copied into the first guest buffer.
> That means we should reserve sufficient room in guest buffer. For PS mode
> supported driver (for example ixgbe), the room will be more than 128. After 128bytes,
> we will put the first frag data. Look into virtio-net.c the function page_to_skb()
> and receive_mergeable(), that means we should modify guest virtio-net driver to
> compute the offset as the parameter for skb_set_frag().
>
> How do you think about this? Attached is a patch to how to modify the guest driver.
> I reserve 512 bytes as an example, and transfer the header len of the skb in hdr->hdr_len.

Expanding the buffer size to 512 bytes to accomodate PS mode
looks reasonable to me.

However, I don't think we should increase the copy threshold to
512 bytes at the same time. I don't have any figures myself but
I think if we are to make such a change it should be a separate
one and come with supporting numbers.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert(a)gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Xin, Xiaohui on
>-----Original Message-----
>From: Herbert Xu [mailto:herbert(a)gondor.apana.org.au]
>Sent: Friday, June 18, 2010 1:59 PM
>To: Xin, Xiaohui
>Cc: Stephen Hemminger; netdev(a)vger.kernel.org; kvm(a)vger.kernel.org;
>linux-kernel(a)vger.kernel.org; mst(a)redhat.com; mingo(a)elte.hu; davem(a)davemloft.net;
>jdike(a)linux.intel.com; Rusty Russell
>Subject: Re: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
>
>On Fri, Jun 18, 2010 at 01:26:49PM +0800, Xin, Xiaohui wrote:
>>
>> Herbert,
>> I have questions about the idea above:
>> 1) Since netdev_alloc_skb() is still there, and we only modify alloc_page(),
>> then we don't need napi_gro_frags() any more, the driver's original receiving
>> function is ok. Right?
>
>Well I was actually thinking about converting all drivers that
>need this to napi_gro_frags. But now that you mention it, yes
>we could still keep the old interface to minimise the work.
>
>> 2) Is napi_gro_frags() only suitable for TCP protocol packet?
>> I have done a small test for ixgbe driver to let it only allocate paged buffers
>> and found kernel hangs when napi_gro_frags() receives an ARP packet.
>
>It should work with any packet. In fact, I'm pretty sure the
>other drivers (e.g., cxgb3) use that interface for all packets.
>
Thanks for the verification. By the way, does that mean that nearly all drivers can use the
same napi_gro_frags() to receive buffers though currently each driver has it's own receiving
function?

>> 3) As I have mentioned above, with this idea, netdev_alloc_skb() will allocate
>> as usual, the data pointed by skb->data will be copied into the first guest buffer.
>> That means we should reserve sufficient room in guest buffer. For PS mode
>> supported driver (for example ixgbe), the room will be more than 128. After 128bytes,
>> we will put the first frag data. Look into virtio-net.c the function page_to_skb()
>> and receive_mergeable(), that means we should modify guest virtio-net driver to
>> compute the offset as the parameter for skb_set_frag().
>>
>> How do you think about this? Attached is a patch to how to modify the guest driver.
>> I reserve 512 bytes as an example, and transfer the header len of the skb in hdr->hdr_len.
>
>Expanding the buffer size to 512 bytes to accomodate PS mode
>looks reasonable to me.
>
>However, I don't think we should increase the copy threshold to
>512 bytes at the same time. I don't have any figures myself but
>I think if we are to make such a change it should be a separate
>one and come with supporting numbers.
>
Let me have a look to see if I can retain the copy threshold as 128 bytes
and copy the header data safely.

>Cheers,
>--
>Visit Openswan at http://www.openswan.org/
>Email: Herbert Xu ~{PmV>HI~} <herbert(a)gondor.apana.org.au>
>Home Page: http://gondor.apana.org.au/~herbert/
>PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/