use unfair spinlock when running on hypervisor. [Kernel]

Prev: sched: adjust when cpu_active and cpuset configurations are updated during cpu on/offlining
Next: [PATCH 1/2] squashfs: xattr_handler don't inline

From: Srivatsa Vaddagiri on 3 Jun 2010 09:00

On Thu, Jun 03, 2010 at 10:38:32PM +1000, Nick Piggin wrote:
> Holding a ticket in the queue is effectively the same as holding the
> lock, from the pov of processes waiting behind.
>
> The difference of course is that CPU cycles do not directly reduce
> latency of ticket holders (only the owner). Spinlock critical sections
> should tend to be several orders of magnitude shorter than context
> switch times. So if you preempt the guy waiting at the head of the
> queue, then it's almost as bad as preempting the lock holder.

Ok got it - although that approach is not advisable in some cases for ex: when
the lock holder vcpu and lock acquired vcpu are scheduled on the same pcpu by
the hypervisor (which was experimented with in [1] where they foud a huge hit in
perf).

I agree that in general we should look at deferring preemption of lock
acquirer esp when its at "head" as you suggest - I will consider that approach
as the next step (want to incrementally progress basically!).

> > > Have you also looked at how s390 checks if the owning vcpu is running
> > > and if so it spins, if not yields to the hypervisor. Something like
> > > turning it into an adaptive lock. This could be applicable as well.
> >
> > I don't think even s390 does adaptive spinlocks. Also afaik s390 zVM does gang
> > scheduling of vcpus, which reduces the severity of this problem very much -
> > essentially lock acquirer/holder are run simultaneously on different cpus all
> > the time. Gang scheduling is on my list of things to look at much later
> > (although I have been warned that its a scalablility nightmare!).
>
> It effectively is pretty well an adaptive lock. The spinlock itself
> doesn't sleep of course, but it yields to the hypervisor if the owner
> has been preempted. This is pretty close to analogous with Linux adaptive mutexes.

Oops you are right - sorry should have checked more closely earlier. Given that
we may not be able to always guarantee that locked critical sections will not be
preempted (ex: when a real-time task takes over), we will need a combination of
both approaches (i.e request preemption defer on lock hold path + yield on lock
acquire path if owner !scheduled). The advantage of former approach is that it
could reduce job turnaround times in most cases (as lock is available when we
want or we don't have to wait too long for it).

> s390 also has the diag9c instruction which I suppose somehow boosts
> priority of a preempted contended lock holder. In spite of any other
> possible optimizations in their hypervisor like gang scheduling,
> diag9c apparently provides quite a large improvement in some cases.

Ok - thx for that pointer - will have a look at diag9c.

> So I think these things are fairly important to look at.

I agree ..

- vatsa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Srivatsa Vaddagiri on 3 Jun 2010 09:10

On Thu, Jun 03, 2010 at 06:28:21PM +0530, Srivatsa Vaddagiri wrote:
> Ok got it - although that approach is not advisable in some cases for ex: when
> the lock holder vcpu and lock acquired vcpu are scheduled on the same pcpu by
> the hypervisor (which was experimented with in [1] where they foud a huge hit in
> perf).

1. http://lkml.org/lkml/2010/4/13/464

- vatsa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Nick Piggin on 3 Jun 2010 09:50

On Thu, Jun 03, 2010 at 06:28:21PM +0530, Srivatsa Vaddagiri wrote:
> On Thu, Jun 03, 2010 at 10:38:32PM +1000, Nick Piggin wrote:
> > Holding a ticket in the queue is effectively the same as holding the
> > lock, from the pov of processes waiting behind.
> >
> > The difference of course is that CPU cycles do not directly reduce
> > latency of ticket holders (only the owner). Spinlock critical sections
> > should tend to be several orders of magnitude shorter than context
> > switch times. So if you preempt the guy waiting at the head of the
> > queue, then it's almost as bad as preempting the lock holder.
>
> Ok got it - although that approach is not advisable in some cases for ex: when
> the lock holder vcpu and lock acquired vcpu are scheduled on the same pcpu by
> the hypervisor (which was experimented with in [1] where they foud a huge hit in
> perf).

Sure but if you had adaptive yielding, that solves that problem.

> I agree that in general we should look at deferring preemption of lock
> acquirer esp when its at "head" as you suggest - I will consider that approach
> as the next step (want to incrementally progress basically!).
>
> > > > Have you also looked at how s390 checks if the owning vcpu is running
> > > > and if so it spins, if not yields to the hypervisor. Something like
> > > > turning it into an adaptive lock. This could be applicable as well.
> > >
> > > I don't think even s390 does adaptive spinlocks. Also afaik s390 zVM does gang
> > > scheduling of vcpus, which reduces the severity of this problem very much -
> > > essentially lock acquirer/holder are run simultaneously on different cpus all
> > > the time. Gang scheduling is on my list of things to look at much later
> > > (although I have been warned that its a scalablility nightmare!).
> >
> > It effectively is pretty well an adaptive lock. The spinlock itself
> > doesn't sleep of course, but it yields to the hypervisor if the owner
> > has been preempted. This is pretty close to analogous with Linux adaptive mutexes.
>
> Oops you are right - sorry should have checked more closely earlier. Given that
> we may not be able to always guarantee that locked critical sections will not be
> preempted (ex: when a real-time task takes over), we will need a combination of
> both approaches (i.e request preemption defer on lock hold path + yield on lock
> acquire path if owner !scheduled). The advantage of former approach is that it
> could reduce job turnaround times in most cases (as lock is available when we
> want or we don't have to wait too long for it).

Both I think would be good. It might be interesting to talk with the
s390 guys and see if they can look at ticket locks and preempt defer
techniques too (considering they already do the other half of the
equation well).

> > s390 also has the diag9c instruction which I suppose somehow boosts
> > priority of a preempted contended lock holder. In spite of any other
> > possible optimizations in their hypervisor like gang scheduling,
> > diag9c apparently provides quite a large improvement in some cases.
>
> Ok - thx for that pointer - will have a look at diag9c.
>
> > So I think these things are fairly important to look at.
>
> I agree ..
>
> - vatsa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Srivatsa Vaddagiri on 3 Jun 2010 10:50

On Thu, Jun 03, 2010 at 11:45:00PM +1000, Nick Piggin wrote:
> > Ok got it - although that approach is not advisable in some cases for ex: when
> > the lock holder vcpu and lock acquired vcpu are scheduled on the same pcpu by
> > the hypervisor (which was experimented with in [1] where they foud a huge hit in
> > perf).
>
> Sure but if you had adaptive yielding, that solves that problem.

I guess so.

> > Oops you are right - sorry should have checked more closely earlier. Given that
> > we may not be able to always guarantee that locked critical sections will not be
> > preempted (ex: when a real-time task takes over), we will need a combination of
> > both approaches (i.e request preemption defer on lock hold path + yield on lock
> > acquire path if owner !scheduled). The advantage of former approach is that it
> > could reduce job turnaround times in most cases (as lock is available when we
> > want or we don't have to wait too long for it).
>
> Both I think would be good. It might be interesting to talk with the
> s390 guys and see if they can look at ticket locks and preempt defer
> techniques too (considering they already do the other half of the
> equation well).

Martin/Heiko,
Do you want to comment on this?

- vatsa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andi Kleen on 3 Jun 2010 11:20

On Thu, Jun 03, 2010 at 12:06:39PM +0100, David Woodhouse wrote:
> On Tue, 2010-06-01 at 21:36 +0200, Andi Kleen wrote:
> > > Collecting the contention/usage statistics on a per spinlock
> > > basis seems complex. I believe a practical approximation
> > > to this are adaptive mutexes where upon hitting a spin
> > > time threshold, punt and let the scheduler reconcile fairness.
> >
> > That would probably work, except: how do you get the
> > adaptive spinlock into a paravirt op without slowing
> > down a standard kernel?
>
> It only ever comes into play in the case where the spinlock is contended
> anyway -- surely it shouldn't be _that_ much of a performance issue?

The problem is fitting the state into the u32

Also "lightly contended" is not that uncommon.

-Andi
--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: sched: adjust when cpu_active and cpuset configurations are updated during cpu on/offlining
Next: [PATCH 1/2] squashfs: xattr_handler don't inline