use unfair spinlock when running on hypervisor. [Kernel]

Prev: sched: adjust when cpu_active and cpuset configurations are updated during cpu on/offlining
Next: [PATCH 1/2] squashfs: xattr_handler don't inline

From: Avi Kivity on 1 Jun 2010 23:00

On 06/01/2010 08:27 PM, Andi Kleen wrote:
> On Tue, Jun 01, 2010 at 07:52:28PM +0300, Avi Kivity wrote:
>
>> We are running everything on NUMA (since all modern machines are now NUMA).
>> At what scale do the issues become observable?
>>
> On Intel platforms it's visible starting with 4 sockets.
>

Can you recommend a benchmark that shows bad behaviour? I'll run it
with ticket spinlocks and Gleb's patch. I have a 4-way Nehalem-EX,
presumably the huge number of threads will magnify the problem even more
there.

>>>> I understand that reason and do not propose to get back to old spinlock
>>>> on physical HW! But with virtualization performance hit is unbearable.
>>>>
>>>>
>>> Extreme unfairness can be unbearable too.
>>>
>>>
>> Well, the question is what happens first. In our experience, vcpu
>> overcommit is a lot more painful. People will never see the NUMA
>> unfairness issue if they can't use kvm due to the vcpu overcommit problem.
>>
> You really have to address both, if you don't fix them both
> users will eventually into one of them and be unhappy.
>

That's definitely the long term plan. I consider Gleb's patch the first
step.

Do you have any idea how we can tackle both problems?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Srivatsa Vaddagiri on 2 Jun 2010 01:30

On Wed, Jun 02, 2010 at 05:51:14AM +0300, Avi Kivity wrote:
> That's definitely the long term plan. I consider Gleb's patch the
> first step.
>
> Do you have any idea how we can tackle both problems?

I recall Xen posting some solution for a similar problem:

http://lkml.org/lkml/2010/1/29/45

Wouldn't a similar approach help KVM as well?

- vatsa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: H. Peter Anvin on 2 Jun 2010 03:50

On 06/01/2010 10:39 AM, Valdis.Kletnieks(a)vt.edu wrote:
> On Tue, 01 Jun 2010 19:52:28 +0300, Avi Kivity said:
>> On 06/01/2010 07:38 PM, Andi Kleen wrote:
>>>>> Your new code would starve again, right?
>>> Try it on a NUMA system with unfair memory.
>
>> We are running everything on NUMA (since all modern machines are now
>> NUMA). At what scale do the issues become observable?
>
> My 6-month-old laptop is NUMA? Comes as a surprise to me, and to the
> perfectly-running NUMA=n kernel I'm running.
>
> Or did you mean a less broad phrase than "all modern machines"?
>

All modern multisocket machines, unless configured in interleaved memory
mode.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andi Kleen on 2 Jun 2010 05:00

On Wed, Jun 02, 2010 at 05:51:14AM +0300, Avi Kivity wrote:
> On 06/01/2010 08:27 PM, Andi Kleen wrote:
>> On Tue, Jun 01, 2010 at 07:52:28PM +0300, Avi Kivity wrote:
>>
>>> We are running everything on NUMA (since all modern machines are now NUMA).
>>> At what scale do the issues become observable?
>>>
>> On Intel platforms it's visible starting with 4 sockets.
>>
>
> Can you recommend a benchmark that shows bad behaviour? I'll run it with

Pretty much anything with high lock contention.

> ticket spinlocks and Gleb's patch. I have a 4-way Nehalem-EX, presumably
> the huge number of threads will magnify the problem even more there.

Yes more threads cause more lock contention too.

> Do you have any idea how we can tackle both problems?

Apparently Xen has something, perhaps that can be leveraged
(but I haven't looked at their solution in detail)

Otherwise I would probably try to start with a adaptive
spinlock that at some point calls into the HV (or updates
shared memory?), like john cooper suggested. The tricky part here would
be to find the thresholds and fit that state into
paravirt ops and the standard spinlock_t.

-Andi

--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 2 Jun 2010 05:10

On 06/02/2010 11:50 AM, Andi Kleen wrote:
> On Wed, Jun 02, 2010 at 05:51:14AM +0300, Avi Kivity wrote:
>
>> On 06/01/2010 08:27 PM, Andi Kleen wrote:
>>
>>> On Tue, Jun 01, 2010 at 07:52:28PM +0300, Avi Kivity wrote:
>>>
>>>
>>>> We are running everything on NUMA (since all modern machines are now NUMA).
>>>> At what scale do the issues become observable?
>>>>
>>>>
>>> On Intel platforms it's visible starting with 4 sockets.
>>>
>>>
>> Can you recommend a benchmark that shows bad behaviour? I'll run it with
>>
> Pretty much anything with high lock contention.
>

Okay, we'll try to measure it here as soon as we can switch it into numa
mode.

>> Do you have any idea how we can tackle both problems?
>>
> Apparently Xen has something, perhaps that can be leveraged
> (but I haven't looked at their solution in detail)
>
> Otherwise I would probably try to start with a adaptive
> spinlock that at some point calls into the HV (or updates
> shared memory?), like john cooper suggested. The tricky part here would
> be to find the thresholds and fit that state into
> paravirt ops and the standard spinlock_t.
>
>

There are two separate problems: the more general problem is that the
hypervisor can put a vcpu to sleep while holding a lock, causing other
vcpus to spin until the end of their time slice. This can only be
addressed with hypervisor help. The second problem is that the extreme
fairness of ticket locks causes lots of context switches if the
hypervisor helps, and aggravates the first problem horribly if it
doesn't (since now a vcpu will spin waiting for its ticket even if the
lock is unlocked).

So yes, we'll need hypervisor assistance, but even with that we'll need
to reduce ticket lock fairness (retaining global fairness but
sacrificing some local fairness). I imagine that will be helpful for
non-virt as well as local unfairness reduces bounciness.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: sched: adjust when cpu_active and cpuset configurations are updated during cpu on/offlining
Next: [PATCH 1/2] squashfs: xattr_handler don't inline