From: jukka@liimatta.org on
Skybuck wrote:

> Can you "unraffle" it into multiple lines (without all the () ) ?

Let's look at one line, sure we can refactor it:

x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2));

->

y = (x & 0xcccccccc) >> 2;
z = (x & 0x33333333) << 2;
x = y | z;

But that just introduces two new variables and doesn't solve the
primary problem of this fundamentally *iterative* algorithm:
dependency.

From: Robert Redelmeier on
In alt.lang.asm jukka(a)liimatta.org <jukka(a)liimatta.org> wrote in part:
> y = (x & 0xcccccccc) >> 2;
> z = (x & 0x33333333) << 2;
> x = y | z;
>
> But that just introduces two new variables and doesn't
> solve the primary problem of this fundamentally *iterative*
> algorithm: dependency.

Actually, programming in x86 asm or a decent compiler will introduce
the two variables so the masks & shifts can proceed in parallel
(using the hardware scheduler and perhaps the register renamer):

mov ebx, eax ; copy x from eax to ebx
and eax, 0CCCCCCCCh
and ebx, 033333333h
shr eax, 2
shl ebx, 2
or eax, ebx ; new x now in eax


The 5 combining ORs do introduce dependancies, but it is far
less than the 32 dependancies that SHR/SHL would introduce.

You could try a byte-wise lookup table (bigger would likely not be
in cache) but partial register operations (bytewise) can introduce
stalls on some processors, so you'd best run shifts. I doubt
you'll beat the 10-15 clocks the swapping algo takes (untested).

Given TCP & IP datagrams are big endian, while x86 is little endian,
the problem would appear to be important. Enough to add a much
quicker hardware instruction. However, most uses of datagrams
fields are as signatures not needing mathematical manipulation.
So leaving them as unconverted backwards tokens is sufficient
if speed is the over-riding concern.

I really miss a fast endian conversion instruction for lexical
sorts (for the diminishing cases where ASCII order suffices).
Little endian hexdumps are easy to read when printed right-to-left.

-- Robert


From: jukka on

Robert Redelmeier wrote:
> Actually, programming in x86 asm or a decent compiler will introduce
> the two variables so the masks & shifts can proceed in parallel
> (using the hardware scheduler and perhaps the register renamer):
>
> mov ebx, eax ; copy x from eax to ebx
> and eax, 0CCCCCCCCh
> and ebx, 033333333h
> shr eax, 2
> shl ebx, 2
> or eax, ebx ; new x now in eax

I didn't have that in mind, but rather, that writing out the
subexpressions as two discrete expressions wouldn't change anything.
Something like this.

Here's one of the versions compiled...

mov ecx, DWORD PTR _x$[esp-4]
mov eax, ecx
shr eax, 2
lea edx, DWORD PTR [ecx*4]
xor eax, edx
and eax, 858993459 ; 33333333H
shl ecx, 2
xor eax, ecx

And here is the other one...

mov ecx, DWORD PTR _x$[esp-4]
mov eax, ecx
shr eax, 2
lea edx, DWORD PTR [ecx*4]
xor eax, edx
and eax, 858993459 ; 33333333H
shl ecx, 2
xor eax, ecx

At quick glance, the output seems *identical*, so only benefit from
writing the subexpressions separately was to introduce two new
variables. It depends on the reader if this is better thing or not. I
prefer the construct as a single expression because for my tastes it
communicates the intent better than the other alternative.

Regarding the dependecy, I meant that the next line cannot be evaluated
until the previous expression is assigned. I'm well aware that a
compiler can blow out the scope and build operation tree and optimize
it using dynamic programming, what not or whatever the compiler is
programmed to do. Ofcourse, if we are interested in what kind of binary
the compilers of today turn out with the function in question we can
compile it, inspect the assembly output and draw our own conclusions.

I checked the output with g++ 4.2 and visual c++ 2005 and the output
was as predicted. (above snips are a good reference what to expect..)

From: Skybuck on
> you guys are nuts, i've coding Assembler with various processors over the
> years and all i had to worry about was byte order!, (big endian/little
> endian).
> the only time bit ordering had to be worked with is when you may
> receive data from some sort of serial device or encryption code to get
> it in the correct
> order. this has nothing to do with the platform your on...
>
> i have copied over piles of source code that was originated on Big
> Endian platforms over to little Endian that contain all kinds of bit
> operations etc.. never had a problem and never had to change anything
> other than byte order coding ..

Ok I just can't resist to make a nasty joke, here it comes, brace
yourself lol:

"I am glad I never hired Jamie to code my airoplanes :P ;)"

Bye,
Skybuck.

From: Skybuck on

jukka(a)liimatta.org schreef:

> > Also the code does not seem to make much sense, at first glance anyway,
> > and even after further inspection it's unclear what it does, how it
> > works, or how it can be extended to more than 8 bits.
>
> I don't understand what is so unclear about how it works.
>
>
> > Obfuscating is not allowed in my competitions, so this algorithm/code
> > is disqualified ! ;) =D
>
> It is not obfuscated, the method name clearly states what the intent
> is, if the name doesn't work out for you rename it to reverseBits() and
> it should make more sense. That's why we NAME functions so that you can
> determine what they are supposed to do.
>
>
> > Thank you for participating, you are free to try again and this time in
> > clearity, and then you can come again =D lol, have a nice day ! =D
>
>
> What you mean? The code is quite clear, let's rehash:
>
> > > v = ((v >> 1) & 0x55555555) | ((v << 1) & 0xaaaaaaaa);
> > > v = ((v >> 2) & 0x33333333) | ((v << 2) & 0xcccccccc);
> > > v = ((v >> 4) & 0x0f0f0f0f) | ((v << 4) & 0xf0f0f0f0);
> > > v = ((v >> 8) & 0x00ff00ff) | ((v << 8) & 0xff00ff00);
> > > v = (v >> 16) | (v << 16);
>
> Look this from bottom-up, what does the LAST statement do?

Do you always read code from bottom-up ?

I-DON'T-THINK-SO !

>
> v = (v >> 16) | (v << 16);
>
> It swaps the lowest and highest 16 bits. Then what does the statement
> BEFORE that do? Yes, it swaps the 8 bit quantities within each 16 bit
> quantity. And so on, until single bits are changing position inside 2
> bit groups.

Ok, this explanation is clear.

The code however is not.

You actually had to read it bottom-up.

Structured programming languages are read from top to bottom.

Switching the lines would have made it a bit more understandable.

But still.

Each single line is still unclear.

How can the variable 'v' be modified two times before assigning it to
itself ?

In theory this is impossible. There is only one variable V in the code.

This can mean two things:

1. Temporarely variables are used.
2. The variable is modified in the first part of the line and then
modified again in the second part of the line.

Or when looking at the assembler a third possibility could exist:
3. The code is invalid and something else is generated to solve it.

> In the end, well, the bits are reversed. It's so painstakingly obvious
> when you read the code that I was taken completely by surprise that
> such basic bit operations could be considered obfuscated!
>
> I'm baffled, I really am. G'day to you aswell sir.

Lol, do you hang upside down from the ceiling or something lol ?

Seriously.

I shall now try to re-implement this idea to see if it really works and
to see how crystal clear code could look like, and to see if it's still
as fast ! (my first guess would be probably not ;))

Thanks for your clearification/post.

Bye,
Skybuck.

First  |  Prev  |  Next  |  Last
Pages: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Prev: Apple II Debugger
Next: TMA Assembler?