Provide a zero-copy method on KVM virtio-net. [Kernel]

Prev: [PATCH 05/13] AppArmor: dfa match engine
Next: 2.6.29.6: nfsd: non-standard errno: -9

From: Shirley Ma on 29 Jul 2010 18:40

Hello Xiaohui,

On Thu, 2010-07-29 at 19:14 +0800, xiaohui.xin(a)intel.com wrote:
> The idea is simple, just to pin the guest VM user space and then
> let host NIC driver has the chance to directly DMA to it.
> The patches are based on vhost-net backend driver. We add a device
> which provides proto_ops as sendmsg/recvmsg to vhost-net to
> send/recv directly to/from the NIC driver. KVM guest who use the
> vhost-net backend may bind any ethX interface in the host side to
> get copyless data transfer thru guest virtio-net frontend.

Since vhost-net already supports macvtap/tun backends, do you think
whether it's better to implement zero copy in macvtap/tun than inducing
a new media passthrough device here?

> Our goal is to improve the bandwidth and reduce the CPU usage.
> Exact performance data will be provided later.

I did some vhost performance measurement over 10Gb ixgbe, and found that
in order to get consistent BW results, netperf/netserver, qemu, vhost
threads smp affinities are required.

Looking forward to these results for small message size comparison. For
large message size 10Gb ixgbe BW already reached by doing vhost smp
affinity w/i offloading support, we will see how much CPU utilization it
can be reduced.

Please provide latency results as well. I did some experimental on
macvtap zero copy sendmsg, what I have found that get_user_pages latency
pretty high.

Thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Xin, Xiaohui on 30 Jul 2010 05:00

>Hello Xiaohui,
>
>On Thu, 2010-07-29 at 19:14 +0800, xiaohui.xin(a)intel.com wrote:
>> The idea is simple, just to pin the guest VM user space and then
>> let host NIC driver has the chance to directly DMA to it.
>> The patches are based on vhost-net backend driver. We add a device
>> which provides proto_ops as sendmsg/recvmsg to vhost-net to
>> send/recv directly to/from the NIC driver. KVM guest who use the
>> vhost-net backend may bind any ethX interface in the host side to
>> get copyless data transfer thru guest virtio-net frontend.
>
>Since vhost-net already supports macvtap/tun backends, do you think
>whether it's better to implement zero copy in macvtap/tun than inducing
>a new media passthrough device here?
>

I'm not sure if there will be more duplicated code in the kernel.

>> Our goal is to improve the bandwidth and reduce the CPU usage.
>> Exact performance data will be provided later.
>
>I did some vhost performance measurement over 10Gb ixgbe, and found that
>in order to get consistent BW results, netperf/netserver, qemu, vhost
>threads smp affinities are required.
>
>Looking forward to these results for small message size comparison. For
>large message size 10Gb ixgbe BW already reached by doing vhost smp
>affinity w/i offloading support, we will see how much CPU utilization it
>can be reduced.
>
>Please provide latency results as well. I did some experimental on
>macvtap zero copy sendmsg, what I have found that get_user_pages latency
>pretty high.
>
Ok, I will try that.

>Thanks
>Shirley
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Shirley Ma on 30 Jul 2010 12:00

Hello Avi,

On Fri, 2010-07-30 at 08:02 +0300, Avi Kivity wrote:
> get_user_pages() is indeed slow. But what about
> get_user_pages_fast()?
>
> Note that when the page is first touched, get_user_pages_fast() falls
> back to get_user_pages(), so the latency needs to be measured after
> quite a bit of warm-up.

Yes, I used get_user_pages_fast, however if falled back to
get_user_pages() when the apps doesn't allocate buffer on the same page.
If I run a single ping, the RTT is extremely high, but when running
multiple pings, the RTT time reduce significantly, but still it is not
as fast as copy from my initial test. I am thinking that we might need
to pre-pin memory pool.

Shirley

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Michael S. Tsirkin on 1 Aug 2010 04:40

On Thu, Jul 29, 2010 at 03:31:22PM -0700, Shirley Ma wrote:
> I did some vhost performance measurement over 10Gb ixgbe, and found that
> in order to get consistent BW results, netperf/netserver, qemu, vhost
> threads smp affinities are required.

Could you provide an example of a good setup?
Specifically, is it a good idea for the vhost thread
to inherit CPU affinities from qemu?

> Looking forward to these results for small message size comparison.

I think we should explore the idea for the driver to fall back on data copy
for small message sizes.
The benefit of zero copy would then be CPU utilization on large messages.

> For
> large message size 10Gb ixgbe BW already reached by doing vhost smp
> affinity w/i offloading support, we will see how much CPU utilization it
> can be reduced.
>
> Please provide latency results as well. I did some experimental on
> macvtap zero copy sendmsg, what I have found that get_user_pages latency
> pretty high.
>
> Thanks
> Shirley
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Shirley Ma on 2 Aug 2010 12:10

Hello Avi,

On Sun, 2010-08-01 at 11:18 +0300, Avi Kivity wrote:
> I don't understand. Under what conditions do you use
> get_user_pages()
> instead of get_user_pages_fast()? Why?

The code always calls get_user_pages_fast, however, the page will be
unpinned in skb_free if the same page is not used again for a new
buffer. The reason for unpin the page is we don't want to pin all of the
guest kernel memory(memory over commit). So get_user_pages_fast will
call slow path get_user_pages.

Your previous comment is suggesting to keep the page pinned for
get_user_pages_fast fast path?

Thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2 3 4
Prev: [PATCH 05/13] AppArmor: dfa match engine
Next: 2.6.29.6: nfsd: non-standard errno: -9