x86: enlightenment for ticket spin locks - Xen implementation [Kernel]

Prev: [PATCH] x86-64: fix CFI coverage for error_entry()
Next: [PATCH] x86-64: adjust frame type at paranoid_exit:

From: Jeremy Fitzhardinge on 30 Jun 2010 09:30

On 06/30/2010 01:31 PM, Jan Beulich wrote:
>>>> On 30.06.10 at 12:07, Jeremy Fitzhardinge <jeremy(a)goop.org> wrote:
>>>>
>> On 06/29/2010 04:32 PM, Jan Beulich wrote:
>>
>>> Use the (alternative instructions based) callout hooks to the ticket
>>> spinlock code to enlighten ticket locks when running fully virtualized
>>> on Xen. Ultimately, this code might also be a candidate to be used
>>> when running para-virtualized.
>>>
>>>
>> I'm not sure what the gain is by making this independent of all the rest
>> of the Xen support in the kernel. Stefano is working on a series
>> (posted a few times now) to add more paravirtual features for Xen HVM
>> guests, and this work is conceptually very similar.
>>
> The intention really is for PARAVIRT_SPINLOCKS to go away as soon
> as pv-ops Xen can be switched over to this mechanism.
>

I don't see the point of having two distinct implementations of
paravirtualization, especially since they have similar mechanisms
(patching, etc).

>> Also, I'm not very keen on adding yet another kind of patching mechanism
>> to the kernel. While they're easy enough to get working in the first
>> place, they do tend to be fragile when other changes get introduced
>> (like changes to how the kernel is mapped RO/NX, etc), and this one will
>> be exercised relatively rarely. I don't see why the pvops mechanism
>> couldn't be reused, especially now that each set of ops can be
>> individually configured on/off.
>>
> Wasn't the main complaint with using pvops patching that it
> introduced extra calls into the native execution path? The point
> of this "new" (it's not really new, it's using existing infrastructure)
> mechanism is just to avoid such overhead for native.
>

When a particular class of pv calls is enabled in the config file, then
their baseline overhead amounts to a 6 byte nop. When in use, they are
a direct call (or <= 6 bytes of inlined instruction). It's possible to
add more padding space if its important to have larger inlined sequences.

For spinlocks, the pvop calls should only be in the slow case: when a
spinlock has been spinning for long enough, and on unlock when there's
someone waiting for the lock. The fastpath (no contention lock and
unlock) should have no extra calls.

So I don't think pvops overhead is really an issue here. Certainly I
don't think its worth prematurely optimising for.

>> This is especially acute in the case where you are using a full
>> pvops-using PV kernel, where it ends up using two mechanisms to put
>> paravirtualizations in place.
>>
> And I see nothing wrong with this - if the individual pieces are
> separate anyway, why shouldn't each of them use the most
> efficient technique?

Pluralitas non est ponenda sine necessitate.

Each of them doesn't need the most efficient technique, as that just
multiplies the number of different mechanisms which need to be
maintained. New mechanisms should only be introduced if one of the
existing ones is really, clearly, deficient.

> Or if a single mechanism is desirable, shouldn't
> one rather ask to convert the newer pvops patching mechanism
> to the alternative instruction patching one, as that had been in
> place long before?

pvops is a superset of alternative instruction patching, and are really
designed to serve different purposes. There are some areas in which
there's some overlap, but otherwise they are distinct. In particular,
alternative instructions are really only useful if you can express the
patch in terms of the presence or absence of a particular cpu feature.
It can't do multi-way choice, and it can't do anything other than insert
literal instructions. pvops patching can do multi-way, and has a
higher-level view of each patch site which allows it to do things like
generate appropraite save/restores, make inline vs call decisions, nop
out nop callsites, etc.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: H. Peter Anvin on 30 Jun 2010 18:20

On 06/30/2010 06:23 AM, Jeremy Fitzhardinge wrote:
>
> pvops is a superset of alternative instruction patching, and are really
> designed to serve different purposes. There are some areas in which
> there's some overlap, but otherwise they are distinct. In particular,
> alternative instructions are really only useful if you can express the
> patch in terms of the presence or absence of a particular cpu feature.
> It can't do multi-way choice, and it can't do anything other than insert
> literal instructions. pvops patching can do multi-way, and has a
> higher-level view of each patch site which allows it to do things like
> generate appropraite save/restores, make inline vs call decisions, nop
> out nop callsites, etc.
>

A lot of this -- in particular the multiway -- is a defect in the
alternatives implementation and should have been addressed as such. One
of the biggest problems with pvops as it currently stands is that it is
monolithic; in general we have this class of problems (static selection)
in a *lot* more places than we're dealing with right now, and as such,
generalizing *something* -- be it pvops or alternatives -- would be useful.

gcc 4.5 also includes a very powerful facility called "asm goto", which
I have already used to implement static_cpu_has(). Again, that
particular construct (unlike "asm goto" itself) doesn't support multiway.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jeremy Fitzhardinge on 5 Jul 2010 19:20

On 06/30/2010 03:14 PM, H. Peter Anvin wrote:
> On 06/30/2010 06:23 AM, Jeremy Fitzhardinge wrote:
>
>> pvops is a superset of alternative instruction patching, and are really
>> designed to serve different purposes. There are some areas in which
>> there's some overlap, but otherwise they are distinct. In particular,
>> alternative instructions are really only useful if you can express the
>> patch in terms of the presence or absence of a particular cpu feature.
>> It can't do multi-way choice, and it can't do anything other than insert
>> literal instructions. pvops patching can do multi-way, and has a
>> higher-level view of each patch site which allows it to do things like
>> generate appropraite save/restores, make inline vs call decisions, nop
>> out nop callsites, etc.
>>
>>
> A lot of this -- in particular the multiway -- is a defect in the
> alternatives implementation and should have been addressed as such.

The pvops and asm alternatives have different design goals. Asm
alternatives is all about replacing particular instructions with
different ones depending on the properties of the CPU. pvops is about
inserting an ops vector in front of various OS-level interfaces to allow
alternate implementations; the patching part was a later optimisation
sprinkled on top to reduce the abstraction overhead. The fact that
there's some similarity in mechanism is the result of convergence rather
than a desire to reinvent.

> One
> of the biggest problems with pvops as it currently stands is that it is
> monolithic; in general we have this class of problems (static selection)
> in a *lot* more places than we're dealing with right now, and as such,
> generalizing *something* -- be it pvops or alternatives -- would be useful.
>

Yes. The module system might also be a candidate for making more
general (by allowing the kernel to have unbound references, and the
module system can be used to create a runtime binding). Modules are
already arch neutral (the arch-specific bits), which is an improvement
over alternatives or pvops, but it has no notion of patching anything
beyond linker relocs (but aside from very hot-path pvops such as
interrupt enable/disable, instruction patching isn't used all that much).

> gcc 4.5 also includes a very powerful facility called "asm goto", which
> I have already used to implement static_cpu_has(). Again, that
> particular construct (unlike "asm goto" itself) doesn't support multiway.
>

What does the fallback for older compilers look like?

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

|
Pages: 1
Prev: [PATCH] x86-64: fix CFI coverage for error_entry()
Next: [PATCH] x86-64: adjust frame type at paranoid_exit: