From: Skybuck Flying on
Hello,

Suppose I write x86 code like so:

bt eax, 0
adc [edx], 0

bt eax, 1
adc [edx + 4], 0

bt eax, 2
adc [edx + 8], 0

The bt (bit test) instruction sets the carry flag if the bit position is
set, otherwise the carry flag is cleared.
The adc instruction adds the carry flag.

Then this instruction pair is repeated multiple times as shown above.
(Slightly altered by an offset: +4, +8, etc)

I read about how cpu's can execute multiple integer instructions at the same
time, which makes me wonder.

Do cpu's nowadays have multiple carry flags underneath ? I would think so...
otherwise how can they possible execute multiple integer instructions ?

So my question is:

Can the instructions above be executed in parallel/at the same time ?

Bye,
Skybuck.


From: MitchAlsup on
On May 4, 8:54 am, "Skybuck Flying" <BloodySh...(a)hotmail.com> wrote:
> I read about how cpu's can execute multiple integer instructions at the same
> time, which makes me wonder.
>
> Do cpu's nowadays have multiple carry flags underneath ? I would think so....
> otherwise how can they possible execute multiple integer instructions ?

Todays CPUs obey restricted data-flow semantics. Thus, the ADC
instruction is dependent upon the BT instruction, and is scheduled
after that instruction executes. EFLAG fields are forwarded just like
any other register.

Athlon and Opteron manage EFLAGs as 3 independent fields, C, O, and
ZAPS, to give maximum flexibility to avoid conmdition code
dependencies that are not actually necessary. I believe that Intel
CPUs manage EFLAGs as 2 independent registers but this is not a firm
belief, and tehey have more implementations to consider.

Secondarily, Athlon and Opteron can have several EFLAGs manipulations
in flight simultaneously, just like several manipulations of EAX can
be in flight simultaneously. The write-back logic at the end of the
pipe puts all this stuff back into the EFLAGs rregister we know and
love.
From: Robert Redelmeier on
In alt.lang.asm Skybuck Flying <BloodyShame(a)hotmail.com> wrote in part:
> Do cpu's nowadays have multiple carry flags underneath ? I
> would think so... otherwise how can they possible execute
> multiple integer instructions ?

Yes, AFAIK the modern CPUs do register renaming on flags.
Otherwise, as you point out, parallelism stalls.

The problems come with instructions that only update some of
the flags (like INC), or where you create a dependency chains
(like your BT/ADC) without independant filler.

Your BT/ADC X, BT/ADC Y, BT/ADC Z will be reordered and
interally executed as:

BT X [flag0]
BT Y [flag1]
BT Z [flag2]

ADC X [flag0]
ADC Y [flag1]
ADC Z [flag2]


To allow multiple instructions running per clock.
Actually it is more complex than this, because your
ADCs are actually load, add, and store micro-ops.

-- Robert

From: Rod Pemberton on

"Skybuck Flying" <BloodyShame(a)hotmail.com> wrote in message
news:4f5a4$481dccc4$541983fa$24136(a)cache3.tilbu1.nb.home.nl...
> Hello,
>
> Suppose I write x86 code like so:
>
> bt eax, 0
> adc [edx], 0
>
> bt eax, 1
> adc [edx + 4], 0
>
> bt eax, 2
> adc [edx + 8], 0
>

Uh oh, he found the BT instruction... Yet, I'm not sure you found an
assembly solution for your bit-planes problem. You're warmer. Did you? Do
you see one? It's similar to what you just posted...

Skybuck Crashing, why haven't you learned assembly yet? or, C for that
matter? With all the Delphi code you (or mostly others for free for you)
have converted to assembly, haven't you proven to yourself that Delphi is
totally worthless? What _are_ you doing with all that code - writing new
libraries for Delphi? Writing a Delphi compatible compiler?!?

Anyway, think about repeated BT, RCR. It's likely to be slower
(non-pairable) than a solution using shift, and, or, etc. But, it should be
easier to code since it has far less complexity to extract and reorder bits.

I was hoping someone like Terje would respond on that one, since I wanted to
see solutions other than what I could come up with. But, it seems your
persistent insanity got a responce to another one...


Rod Pemberton

From: Skybuck Flying on
Hmm,

You did give me an interesting idea.

I ported my WriteLongwordBits SimInt64 Delphi 2007 version to Visual Studio
C/C++ 2008, to compare assembler outputs.

See other thread about that ;) :)

Bye,
Skybuck.