use unfair spinlock when running on hypervisor. [Kernel]

Prev: sched: adjust when cpu_active and cpuset configurations are updated during cpu on/offlining
Next: [PATCH 1/2] squashfs: xattr_handler don't inline

From: Andi Kleen on 3 Jun 2010 11:20

On Thu, Jun 03, 2010 at 10:38:32PM +1000, Nick Piggin wrote:
> And they aren't even using ticket spinlocks!!

I suppose they simply don't have unfair memory. Makes things easier.

-Andi
--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Nick Piggin on 3 Jun 2010 11:40

On Thu, Jun 03, 2010 at 05:17:30PM +0200, Andi Kleen wrote:
> On Thu, Jun 03, 2010 at 10:38:32PM +1000, Nick Piggin wrote:
> > And they aren't even using ticket spinlocks!!
>
> I suppose they simply don't have unfair memory. Makes things easier.

That would certainly be a part of it, I'm sure they have stronger
fairness and guarantees at the expense of some performance. We saw the
spinlock starvation first on 8-16 core Opterons I think, wheras Altix
had been over 1024 cores and POWER7 1024 threads now apparently without
reported problems.

However I think more is needed than simply "fair" memory at the cache
coherency level, considering that for example s390 implements it simply
by retrying cas until it succeeds. So you could perfectly round-robin
all cache requests for the lock word, but one core could quite easily
always find it is granted the cacheline when the lock is already taken.

So I think actively enforcing fairness at the lock level would be
required. Something like if it is detected that a core is not making
progress on a tight cas loop, then it will need to enter a queue of
cores where the head of the queue is always granted the cacheline first
after it has been dirtied. Interrupts will need to be ignored from this
logic. This still doesn't solve the problem of an owner unfairly
releasing and grabbing the lock again, they could have more detection to
handle that.

I don't know how far hardware goes. Maybe it is enough to statistically
avoid starvation if memory is pretty fair. But it does seem a lot easier
to enforce fairness in software.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andi Kleen on 3 Jun 2010 13:30

> That would certainly be a part of it, I'm sure they have stronger
> fairness and guarantees at the expense of some performance. We saw the
> spinlock starvation first on 8-16 core Opterons I think, wheras Altix
> had been over 1024 cores and POWER7 1024 threads now apparently without
> reported problems.

I suppose P7 handles that in the HV through the pvcall.

Altix AFAIK has special hardware for this in the interconnect,
but as individual nodes get larger and have more cores you'll start
seeing it there too.

In general we now have the problem that with increasing core counts
per socket each NUMA node can be a fairly large SMP by itself
and several of the old SMP scalability problems that were fixed
by having per node datastructures are back now.

For example this is a serious problem with the zone locks in some
workloads now on 8core+HT systems.

> So I think actively enforcing fairness at the lock level would be
> required. Something like if it is detected that a core is not making

I suppose how that exactly works is IBM's secret sauce. Anyways
as long as there are no reports I wouldn't worry about it.

-Andi
--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev |
Pages: 1 2 3 4 5 6 7
Prev: sched: adjust when cpu_active and cpuset configurations are updated during cpu on/offlining
Next: [PATCH 1/2] squashfs: xattr_handler don't inline