Prev: Effects of Memory Latency and Bandwidth onSupercomputer,Application Performance
Next: Effects of Memory Latency and Bandwidth on Supercomputer,Application Performance
From: Terje Mathisen "terje.mathisen at on 13 Aug 2010 01:14 Gubbi wrote: > On Aug 10, 2:15 pm, Terje Mathisen<"terje.mathisen at tmsw.no"> > wrote: >> Brett Davis wrote: > >>> So is CMOVE still implemented internally as a branch? >>> (I know this is crazy sounding, but that is what both did...) >> >> Not a real branch, but it did hold up the pipeline for a short while afair? > > You have three data dependencies, the two source registers and the > condition. With something like cmovc you end up waiting for the last > ALU op to complete and set the carry before the CMOV can be executed. Afaik the 2-cycle CMOVc is due to the three sources and not the latency for the carry flag to propagate from the producer, i.e. the latency is the same if I can generate the flag one cycle early or not. > > Compared to a correctly predicted branch where you end up with just > one data dependency. A correctly predicted branch has _zero_ dependencies, that's how it can even pair with the instruction that produces the flags to branch on. Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
From: Terje Mathisen "terje.mathisen at on 13 Aug 2010 01:16 MitchAlsup wrote: > On Aug 11, 12:25 am, Terje Mathisen<"terje.mathisen at tmsw.no"> > wrote: >> (Personally I've never really understood what was so hard about x86, >> except for register pressure, mapping algorithms onto the >> register/instruction set have felt quite natural.) > > Once you (the programmer/debugger) are hidden behind a high level > language (like<ahem> C):: does it really mater what the > characteristics of the underlying instruction set? I posit No. Of course not. I was talking from the viewpoint of someone who has authored more hand-optimized x86 asm code than most. :-) Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
From: Gubbi on 13 Aug 2010 03:13
On Aug 13, 7:14 am, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: > Gubbi wrote: > > You have three data dependencies, the two source registers and the > > condition. With something like cmovc you end up waiting for the last > > ALU op to complete and set the carry before the CMOV can be executed. > > Afaik the 2-cycle CMOVc is due to the three sources and not the latency > for the carry flag to propagate from the producer, i.e. the latency is > the same if I can generate the flag one cycle early or not. That makes sense. > > Compared to a correctly predicted branch where you end up with just > > one data dependency. > > A correctly predicted branch has _zero_ dependencies, that's how it can > even pair with the instruction that produces the flags to branch on. Sorry, I wasn't clear If the branch is correctly predicted you only have a data dependency on the register you are moving to your destination. In the CMOV case you need both registers ready, - and the condition. Cheers |