Paravirtualized spinlock implementation for KVM guests [Kernel]

Prev: jz4740-adc: check kmalloc() result
Next: make 3.82 fails on powerpc defconfig update [was: Linux 2.6.35]

From: Avi Kivity on 2 Aug 2010 04:50

On 07/26/2010 09:15 AM, Srivatsa Vaddagiri wrote:
> Paravirtual spinlock implementation for KVM guests, based heavily on Xen guest's
> spinlock implementation.
>
>
> +
> +static struct spinlock_stats
> +{
> + u64 taken;
> + u32 taken_slow;
> +
> + u64 released;
> +
> +#define HISTO_BUCKETS 30
> + u32 histo_spin_total[HISTO_BUCKETS+1];
> + u32 histo_spin_spinning[HISTO_BUCKETS+1];
> + u32 histo_spin_blocked[HISTO_BUCKETS+1];
> +
> + u64 time_total;
> + u64 time_spinning;
> + u64 time_blocked;
> +} spinlock_stats;

Could these be replaced by tracepoints when starting to spin/stopping
spinning etc? Then userspace can reconstruct the histogram as well as
see which locks are involved and what call paths.

> +struct kvm_spinlock {
> + unsigned char lock; /* 0 -> free; 1 -> locked */
> + unsigned short spinners; /* count of waiting cpus */
> +};
> +
> +/*
> + * Mark a cpu as interested in a lock. Returns the CPU's previous
> + * lock of interest, in case we got preempted by an interrupt.
> + */
> +static inline void spinning_lock(struct kvm_spinlock *pl)
> +{
> + asm(LOCK_PREFIX " incw %0"
> + : "+m" (pl->spinners) : : "memory");
> +}
> +
> +/*
> + * Mark a cpu as no longer interested in a lock. Restores previous
> + * lock of interest (NULL for none).
> + */
> +static inline void unspinning_lock(struct kvm_spinlock *pl)
> +{
> + asm(LOCK_PREFIX " decw %0"
> + : "+m" (pl->spinners) : : "memory");
> +}
> +
> +static int kvm_spin_is_locked(struct arch_spinlock *lock)
> +{
> + struct kvm_spinlock *sl = (struct kvm_spinlock *)lock;
> +
> + return sl->lock != 0;
> +}
> +
> +static int kvm_spin_is_contended(struct arch_spinlock *lock)
> +{
> + struct kvm_spinlock *sl = (struct kvm_spinlock *)lock;
> +
> + /* Not strictly true; this is only the count of contended
> + lock-takers entering the slow path. */
> + return sl->spinners != 0;
> +}
> +
> +static int kvm_spin_trylock(struct arch_spinlock *lock)
> +{
> + struct kvm_spinlock *sl = (struct kvm_spinlock *)lock;
> + u8 old = 1;
> +
> + asm("xchgb %b0,%1"
> + : "+q" (old), "+m" (sl->lock) : : "memory");
> +
> + return old == 0;
> +}
> +
> +static noinline int kvm_spin_lock_slow(struct arch_spinlock *lock)
> +{
> + struct kvm_spinlock *sl = (struct kvm_spinlock *)lock;
> + u64 start;
> +
> + ADD_STATS(taken_slow, 1);
> +
> + /* announce we're spinning */
> + spinning_lock(sl);
> +
> + start = spin_time_start();
> + kvm_hypercall0(KVM_HC_YIELD);

Oh. This isn't really a yield since we expect to be woken up? It's
more of a sleep.

We already have a sleep hypercall, it's called HLT. If we can use it,
the thing can work on older hosts. It's tricky though:

- if interrupts were enabled before we started spinning, sleep with
interrupts enabled. This also allows the spinner to switch to another
process if some completion comes along so it's a good idea anyway. Wake
up sends an IPI.
- if not, we need to use NMI to wake up. This is somewhat icky since
there's no atomic "enable NMI and sleep" instruction, so we have to
handle the case of the wake up arriving before HLT (can be done by
examining RIP and seeing if it's in the critical section).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 2 Aug 2010 05:00

On 07/26/2010 09:15 AM, Srivatsa Vaddagiri wrote:
> Paravirtual spinlock implementation for KVM guests, based heavily on Xen guest's
> spinlock implementation.
>
> +static void kvm_spin_unlock(struct arch_spinlock *lock)
> +{
> + struct kvm_spinlock *sl = (struct kvm_spinlock *)lock;
> +
> + ADD_STATS(released, 1);
> +
> + smp_wmb(); /* make sure no writes get moved after unlock */
> + sl->lock = 0; /* release lock */
> +}

Wait, no wakeup?

So it is a yield, not a sleep. I'm worried it could seriously impact
fairness when one non-contending guest (or non-pv) is overcommitted
together with a spin-yield guest.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jeremy Fitzhardinge on 2 Aug 2010 11:30

On 08/02/2010 01:48 AM, Avi Kivity wrote:
> On 07/26/2010 09:15 AM, Srivatsa Vaddagiri wrote:
>> Paravirtual spinlock implementation for KVM guests, based heavily on
>> Xen guest's
>> spinlock implementation.
>>
>>
>> +
>> +static struct spinlock_stats
>> +{
>> + u64 taken;
>> + u32 taken_slow;
>> +
>> + u64 released;
>> +
>> +#define HISTO_BUCKETS 30
>> + u32 histo_spin_total[HISTO_BUCKETS+1];
>> + u32 histo_spin_spinning[HISTO_BUCKETS+1];
>> + u32 histo_spin_blocked[HISTO_BUCKETS+1];
>> +
>> + u64 time_total;
>> + u64 time_spinning;
>> + u64 time_blocked;
>> +} spinlock_stats;
>
> Could these be replaced by tracepoints when starting to spin/stopping
> spinning etc? Then userspace can reconstruct the histogram as well as
> see which locks are involved and what call paths.

Unfortunately not; the tracing code uses spinlocks.

(TBH I haven't actually tried, but I did give the code an eyeball to
this end.)

J

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 3 Aug 2010 03:00

On 08/02/2010 06:20 PM, Jeremy Fitzhardinge wrote:
> On 08/02/2010 01:48 AM, Avi Kivity wrote:
>> On 07/26/2010 09:15 AM, Srivatsa Vaddagiri wrote:
>>> Paravirtual spinlock implementation for KVM guests, based heavily on
>>> Xen guest's
>>> spinlock implementation.
>>>
>>>
>>> +
>>> +static struct spinlock_stats
>>> +{
>>> + u64 taken;
>>> + u32 taken_slow;
>>> +
>>> + u64 released;
>>> +
>>> +#define HISTO_BUCKETS 30
>>> + u32 histo_spin_total[HISTO_BUCKETS+1];
>>> + u32 histo_spin_spinning[HISTO_BUCKETS+1];
>>> + u32 histo_spin_blocked[HISTO_BUCKETS+1];
>>> +
>>> + u64 time_total;
>>> + u64 time_spinning;
>>> + u64 time_blocked;
>>> +} spinlock_stats;
>>
>> Could these be replaced by tracepoints when starting to spin/stopping
>> spinning etc? Then userspace can reconstruct the histogram as well
>> as see which locks are involved and what call paths.
>
> Unfortunately not; the tracing code uses spinlocks.
>
> (TBH I haven't actually tried, but I did give the code an eyeball to
> this end.)

Hm. The tracing code already uses a specialized lock (arch_spinlock_t),
perhaps we can make this lock avoid the tracing?

It's really sad, btw, there's all those nice lockless ring buffers and
then a spinlock for ftrace_vbprintk(), instead of a per-cpu buffer.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jeremy Fitzhardinge on 3 Aug 2010 13:50

On 08/02/2010 11:59 PM, Avi Kivity wrote:
> On 08/02/2010 06:20 PM, Jeremy Fitzhardinge wrote:
>> On 08/02/2010 01:48 AM, Avi Kivity wrote:
>>> On 07/26/2010 09:15 AM, Srivatsa Vaddagiri wrote:
>>>> Paravirtual spinlock implementation for KVM guests, based heavily
>>>> on Xen guest's
>>>> spinlock implementation.
>>>>
>>>>
>>>> +
>>>> +static struct spinlock_stats
>>>> +{
>>>> + u64 taken;
>>>> + u32 taken_slow;
>>>> +
>>>> + u64 released;
>>>> +
>>>> +#define HISTO_BUCKETS 30
>>>> + u32 histo_spin_total[HISTO_BUCKETS+1];
>>>> + u32 histo_spin_spinning[HISTO_BUCKETS+1];
>>>> + u32 histo_spin_blocked[HISTO_BUCKETS+1];
>>>> +
>>>> + u64 time_total;
>>>> + u64 time_spinning;
>>>> + u64 time_blocked;
>>>> +} spinlock_stats;
>>>
>>> Could these be replaced by tracepoints when starting to
>>> spin/stopping spinning etc? Then userspace can reconstruct the
>>> histogram as well as see which locks are involved and what call paths.
>>
>> Unfortunately not; the tracing code uses spinlocks.
>>
>> (TBH I haven't actually tried, but I did give the code an eyeball to
>> this end.)
>
> Hm. The tracing code already uses a specialized lock
> (arch_spinlock_t), perhaps we can make this lock avoid the tracing?

That's not really a specialized lock; that's just the naked
architecture-provided spinlock implementation, without all the lockdep,
etc, etc stuff layered on top. All these changes are at a lower level,
so giving tracing its own type of spinlock amounts to making the
architectures provide two complete spinlock implementations. We could
make tracing use, for example, an rwlock so long as we promise not to
put tracing in the rwlock implementation - but that's hardly elegant.

> It's really sad, btw, there's all those nice lockless ring buffers and
> then a spinlock for ftrace_vbprintk(), instead of a per-cpu buffer.

Sad indeed.

J

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

|
Pages: 1
Prev: jz4740-adc: check kmalloc() result
Next: make 3.82 fails on powerpc defconfig update [was: Linux 2.6.35]