From: MitchAlsup on
On Aug 15, 2:44 pm, timcaff...(a)aol.com (Tim McCaffrey) wrote:
> I notice it breaks alot of the PPro rules as well: Uses a GPR, one XMM,
> one (unrestricted alignment) memory access (or another XMM), and a GPR
> and a Flags result all in one instruction. Must cause all kinds of
> havoc syncronizing the execution pipes.

Once the FCMP*I instructions went in (FP comparison, int flags result)
the
pipeline guys built a dedicated bus from the FPU to EFLAGS to make
these
fast. The FPUs do both 80-bit and SSE, so having a EFLAGS result is
straightforward (in the sense of interlocks and pipelining).

Mitch


From: Patrick de Zeester on
Piotr Wyderski wrote:
> John Mashey wrote:
>
>> BUT, the very FIRST thing to do is to profile a wide range of
>> programs, see how much time is consumed by str* functions, and decide
>> whether this is even worth arguing about [it might, or might not be;
>> the last time I did this was along ago, but at the time, bcopy/memcpy
>> was the only high-runner.]
>
> In my case SIMD does help a lot, therefore I don't care about
> "a wide range of programs". ;-) My programs should provide
> the highest performance available, the remaining ones
> can be slow (most of cases) or even should be slow
> (our competitors)...

If performance is a concern why scan for the zero terminator to
determine the length of a string? You could just stored the length with
the string itself.
From: Piotr Wyderski on
Patrick de Zeester wrote:

> If performance is a concern why scan for the zero terminator to
> determine the length of a string? You could just stored the length with
> the string itself.

strlen() is just a simple and clean example of what can be
done the SIMD way, hence it is a toy function to play with.
But the field of possible applications is much, much wider.
Anyway, even in the case of memcmp()/strcmp() the cached
length field doesn't help.

Best regards
Piotr Wyderski

First  |  Prev  | 
Pages: 1 2 3 4
Prev: VLIW pre-history
Next: what is CCIR-565