From: Terje Mathisen "terje.mathisen at on
Gubbi wrote:
> On Aug 10, 2:15 pm, Terje Mathisen<"terje.mathisen at tmsw.no">
> wrote:
>> Brett Davis wrote:
>
>>> So is CMOVE still implemented internally as a branch?
>>> (I know this is crazy sounding, but that is what both did...)
>>
>> Not a real branch, but it did hold up the pipeline for a short while afair?
>
> You have three data dependencies, the two source registers and the
> condition. With something like cmovc you end up waiting for the last
> ALU op to complete and set the carry before the CMOV can be executed.

Afaik the 2-cycle CMOVc is due to the three sources and not the latency
for the carry flag to propagate from the producer, i.e. the latency is
the same if I can generate the flag one cycle early or not.
>
> Compared to a correctly predicted branch where you end up with just
> one data dependency.

A correctly predicted branch has _zero_ dependencies, that's how it can
even pair with the instruction that produces the flags to branch on.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: Terje Mathisen "terje.mathisen at on
MitchAlsup wrote:
> On Aug 11, 12:25 am, Terje Mathisen<"terje.mathisen at tmsw.no">
> wrote:
>> (Personally I've never really understood what was so hard about x86,
>> except for register pressure, mapping algorithms onto the
>> register/instruction set have felt quite natural.)
>
> Once you (the programmer/debugger) are hidden behind a high level
> language (like<ahem> C):: does it really mater what the
> characteristics of the underlying instruction set? I posit No.

Of course not.

I was talking from the viewpoint of someone who has authored more
hand-optimized x86 asm code than most. :-)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: Gubbi on
On Aug 13, 7:14 am, Terje Mathisen <"terje.mathisen at tmsw.no">
wrote:
> Gubbi wrote:

> > You have three data dependencies, the two source registers and the
> > condition. With something like cmovc you end up waiting for the last
> > ALU op to complete and set the carry before the CMOV can be executed.
>
> Afaik the 2-cycle CMOVc is due to the three sources and not the latency
> for the carry flag to propagate from the producer, i.e. the latency is
> the same if I can generate the flag one cycle early or not.


That makes sense.

> > Compared to a correctly predicted branch where you end up with just
> > one data dependency.
>
> A correctly predicted branch has _zero_ dependencies, that's how it can
> even pair with the instruction that produces the flags to branch on.

Sorry, I wasn't clear If the branch is correctly predicted you only
have a data dependency on the register you are moving to your
destination. In the CMOV case you need both registers ready, - and the
condition.

Cheers