From: Xin, Xiaohui on
Michael,

>>>>>> What we have not done yet:
>>>>>> packet split support
>>>>>>
>>>>>What does this mean, exactly?
>>>> We can support 1500MTU, but for jumbo frame, since vhost driver before don't
>>>>support mergeable buffer, we cannot try it for multiple sg.
>>>>
>>>I do not see why, vhost currently supports 64K buffers with indirect
>>>descriptors.
>>>
>> The receive_skb() in guest virtio-net driver will merge the multiple sg to skb frags, how >>can indirect descriptors to that?

>See add_recvbuf_big.

I don't mean this, it's for buffer submission. I mean when packet is received, in receive_buf(), mergeable buffer knows which pages received can be hooked in skb frags, it's receive_mergeable() which do this.

When a NIC driver supports packet split mode, then each ring descriptor contains a skb and a page. When packet is received, if the status is not EOP, then hook the page of the next descriptor to the prev skb. We don't how many frags belongs to one skb. So when guest submit buffers, it should submit multiple pages, and when receive, the guest should know which pages are belongs to one skb and hook them together. I think receive_mergeable() can do this, but I don't see how big->packets handle this. May I miss something here?

Thanks
Xiaohui

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Xin, Xiaohui on
Michael,

>Yes, I think this packet split mode probably maps well to mergeable buffer
>support. Note that
>1. Not all devices support large packets in this way, others might map
> to indirect buffers better

Do the indirect buffers accord to deal with the skb->frag_list?

> So we have to figure out how migration is going to work
Yes, different guest virtio-net driver may contain different features.
Does the qemu migration work with different features supported by virtio-net
driver now?

>2. It's up to guest driver whether to enable features such as
> mergeable buffers and indirect buffers
> So we have to figure out how to notify guest which mode
> is optimal for a given device
Yes. When a device is binded, the mp device may query the capabilities from driver.
Actually, there is a structure now in mp device can do this, we can add some field
to support more.

>3. We don't want to depend on jumbo frames for decent performance
> So we probably should support GSO/GRO
GSO is for the tx side, right? I think driver can handle it itself.
For GRO, I'm not sure it's easy or not. Basically, the mp device now
we have support is doing what raw socket is doing. The packets are not going to host stack.
--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael S. Tsirkin on
On Thu, Apr 22, 2010 at 04:57:56PM +0800, Xin, Xiaohui wrote:
> Michael,
>
> >Yes, I think this packet split mode probably maps well to mergeable buffer
> >support. Note that
> >1. Not all devices support large packets in this way, others might map
> > to indirect buffers better
>
> Do the indirect buffers accord to deal with the skb->frag_list?

We currently use skb->frags.

> > So we have to figure out how migration is going to work
> Yes, different guest virtio-net driver may contain different features.
> Does the qemu migration work with different features supported by virtio-net
> driver now?

For now, you must have identical feature-sets for migration to work.
And long as we manage the buffers in software, we can always make
features match.

> >2. It's up to guest driver whether to enable features such as
> > mergeable buffers and indirect buffers
> > So we have to figure out how to notify guest which mode
> > is optimal for a given device
> Yes. When a device is binded, the mp device may query the capabilities from driver.
> Actually, there is a structure now in mp device can do this, we can add some field
> to support more.
>
> >3. We don't want to depend on jumbo frames for decent performance
> > So we probably should support GSO/GRO
> GSO is for the tx side, right? I think driver can handle it itself.
> For GRO, I'm not sure it's easy or not. Basically, the mp device now
> we have support is doing what raw socket is doing. The packets are not going to host stack.

See commit bfd5f4a3d605e0f6054df0b59fe0907ff7e696d3
(it doesn't currently work with vhost net, but that's
a separate story).

> --
> MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael S. Tsirkin on
On Tue, Apr 20, 2010 at 10:21:55AM +0800, Xin, Xiaohui wrote:
> Michael,
>
> >>>>>> What we have not done yet:
> >>>>>> packet split support
> >>>>>>
> >>>>>What does this mean, exactly?
> >>>> We can support 1500MTU, but for jumbo frame, since vhost driver before don't
> >>>>support mergeable buffer, we cannot try it for multiple sg.
> >>>>
> >>>I do not see why, vhost currently supports 64K buffers with indirect
> >>>descriptors.
> >>>
> >> The receive_skb() in guest virtio-net driver will merge the multiple sg to skb frags, how >>can indirect descriptors to that?
>
> >See add_recvbuf_big.
>
> I don't mean this, it's for buffer submission. I mean when packet is received, in receive_buf(), mergeable buffer knows which pages received can be hooked in skb frags, it's receive_mergeable() which do this.
>
> When a NIC driver supports packet split mode, then each ring descriptor contains a skb and a page. When packet is received, if the status is not EOP, then hook the page of the next descriptor to the prev skb. We don't how many frags belongs to one skb. So when guest submit buffers, it should submit multiple pages, and when receive, the guest should know which pages are belongs to one skb and hook them together. I think receive_mergeable() can do this, but I don't see how big->packets handle this. May I miss something here?
>
> Thanks
> Xiaohui


Yes, I think this packet split mode probably maps well to mergeable buffer
support. Note that
1. Not all devices support large packets in this way, others might map
to indirect buffers better
So we have to figure out how migration is going to work
2. It's up to guest driver whether to enable features such as
mergeable buffers and indirect buffers
So we have to figure out how to notify guest which mode
is optimal for a given device
3. We don't want to depend on jumbo frames for decent performance
So we probably should support GSO/GRO

--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael S. Tsirkin on
On Sun, Apr 25, 2010 at 02:55:29AM -0700, David Miller wrote:
> From: xiaohui.xin(a)intel.com
> Date: Sun, 25 Apr 2010 17:20:06 +0800
>
> > The idea is simple, just to pin the guest VM user space and then let
> > host NIC driver has the chance to directly DMA to it.
>
> Isn't it much easier to map the RX ring of the network device into the
> guest's address space, have DMA map calls translate guest addresses to
> physical/DMA addresses as well as do all of this crazy page pinning
> stuff, and provide the translations and protections via the IOMMU?

This means we need guest know how the specific network device works.
So we won't be able to, for example, move guest between different hosts.
There are other problems: many physical systems do not have an iommu,
some guest OS-es do not support DMA map calls, doing VM exit
on each DMA map call might turn out to be very slow. And so on.

> What's being proposed here looks a bit over-engineered.

This is an attempt to reduce overhead for virtio (paravirtualization).
'Don't use PV' is kind of an alternative, but I do not
think it's a simpler one.

--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/