From: H. Peter Anvin on
On 11/10/2009 09:24 AM, Alan Cox wrote:
>>
>> In the short term, yes, of course. However, if we're going to do
>> emulation, we might as well do it right.
>
> Why is using KVM doing it right ? It sounds like its doing it slowly,
> and hideously memory inefficiently. You are solving an uninteresting
> general case problem when you just need two tiny fixups (or perhaps 3 if
> you want to fix up early x86-64 prefetch)

Why do we only need "two tiny fixups"? Where do we draw the line in
terms of ISA compatibility? One could easily argue that the Right
Thing[TM] is to be able to process any optional instruction -- otherwise
one has a very difficult place to draw a line.

Consider SSE3, for example. Why should the same concept not apply to
SSE3 instructions as to CMOV?

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Avi Kivity on
On 11/10/2009 08:49 PM, H. Peter Anvin wrote:
>
>> Why is using KVM doing it right ? It sounds like its doing it slowly,
>> and hideously memory inefficiently. You are solving an uninteresting
>> general case problem when you just need two tiny fixups (or perhaps 3 if
>> you want to fix up early x86-64 prefetch)
>>
> Why do we only need "two tiny fixups"? Where do we draw the line in
> terms of ISA compatibility? One could easily argue that the Right
> Thing[TM] is to be able to process any optional instruction -- otherwise
> one has a very difficult place to draw a line.
>
> Consider SSE3, for example. Why should the same concept not apply to
> SSE3 instructions as to CMOV?
>

Because then user programs would run 20x or more slower than the user
expects. Better to terminate early (and teach userspace how to choose
the instruction subset correctly).

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on
On 11/10/2009 11:50 AM, Avi Kivity wrote:
>>
>> Consider SSE3, for example. Why should the same concept not apply to
>> SSE3 instructions as to CMOV?
>
> Because then user programs would run 20x or more slower than the user
> expects. Better to terminate early (and teach userspace how to choose
> the instruction subset correctly).
>

I picked the example carefully: SSE3 is a small set of instructions
which probably aren't used very heavily. In that sense, it has
*exactly* the same properties as CMOV - if you have the source, you're
better off recompiling, but it *might* help you if you happen to only
have a binary.

What I want people to understand is that this is a *huge* rathole, and
it doesn't have any obvious bottom that I can see.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Willy Tarreau on
On Tue, Nov 10, 2009 at 12:01:47PM -0800, H. Peter Anvin wrote:
> On 11/10/2009 11:50 AM, Avi Kivity wrote:
> >>
> >> Consider SSE3, for example. Why should the same concept not apply to
> >> SSE3 instructions as to CMOV?
> >
> > Because then user programs would run 20x or more slower than the user
> > expects. Better to terminate early (and teach userspace how to choose
> > the instruction subset correctly).
> >
>
> I picked the example carefully: SSE3 is a small set of instructions
> which probably aren't used very heavily. In that sense, it has
> *exactly* the same properties as CMOV - if you have the source, you're
> better off recompiling, but it *might* help you if you happen to only
> have a binary.
>
> What I want people to understand is that this is a *huge* rathole, and
> it doesn't have any obvious bottom that I can see.

Indeed, but there is a difference between [cmpxchg, bswap, cmov, nopl]
on one side and [sse*] on the other : distros are built assuming the
former are always available while they are not always. And the distro
which make the difference have to provide an dedicated build for earlier
systems just for compatibility. SSE*, 3dnow* etc... are only used by a
handful of media players/converters/encoders which are able to detect
themselves what to use and already have the necessary fallbacks because
these instruction sets vary too much between processors and vendors.

One could argue that cmpxchg/bswap/xadd are supported by 486 and that
implementing them for 386 is almost useless now (though it costs almost
nothing to provide them, I did a few years ago).

CMOV/NOPL are rarely used, thus have no reason to cause a massive
performance drop, but are frequent enough (at least cmov) for almost
any program to have at least one or two inside, making it incompatible
with a given processor, and are almost obvious to implement too.

SSE*/3dnow* would be much much harder and would only serve very few
programs, and serve them badly because when they're used, it would
be intensive.

I personally am not against being able to emulate every optional
instruction, quite the opposite instead. It's just that if in order
to do this, we add cost to the other obvious ones, we lose what we
expected to win (simplicity and efficiency).

Regards,
Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Willy Tarreau on
On Tue, Nov 10, 2009 at 12:25:02PM -0800, H. Peter Anvin wrote:
> On 11/10/2009 12:16 PM, Willy Tarreau wrote:
> >
> > Indeed, but there is a difference between [cmpxchg, bswap, cmov, nopl]
> > on one side and [sse*] on the other : distros are built assuming the
> > former are always available while they are not always. And the distro
> > which make the difference have to provide an dedicated build for earlier
> > systems just for compatibility. SSE*, 3dnow* etc... are only used by a
> > handful of media players/converters/encoders which are able to detect
> > themselves what to use and already have the necessary fallbacks because
> > these instruction sets vary too much between processors and vendors.
> >
>
> That is increasingly not true since gcc is now doing autovectorization.

But programs have to be built to use that specific platform anyway ; this
is different from all programs built with support for CMOV enabled by
default and which will work on 95% of the platforms.

(...)
> I could 970 cmovs in libc out of 322660 instructions. That is one in
> 333 instruction.

Not bad, I agree ! But on the C3, CMOV from/to register is implemented.
It's only CMOV from/to memory which has to be emulated, which makes it
a lot less common. Anyway that's why we need counters, so that the user
knows when he really ought to recompile.

(...)
> I don't see any particular subset as being more obvious than the other,
> with the *possible* exception of NOPL, simply because NOPL was
> undocumented for so long.

well, simply the availability of binaries making use of them. I'm not
sure you would find SSE* instructions in your libc where you found the
970 cmov. For NOPL, that's different, I first heard about it in this
thread, and my C3 running with the CMOV patch has never complained from
missing it :-)

Regards,
Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/