From: J French on
On 20 Aug 2006 03:17:02 -0700, "Skybuck" <skybuck2000(a)hotmail.com>
wrote:

>
>Skybuck schreef:
>
>> Woops, sorry, I did not know comp.lang.asm.x86 is a moderated
>> newsgroup, as I don't like moderated newsgroups I post it again and
>> exclude comp.lang.asm.x86:
>>
>> Reversing bit order in delphi ?
>>
>> Hello,
>>
>> I think intel uses most significant bit first ?

>Nope, sorry, I was wrong.

>The byte order and bit order of data in RAM is both Little Endian for
>intel. (LSB)
>
>The byte order and bit order of CPU registers is both Little Endian for
>intel. (LSB)
>
>The drawing below was correct, but the comments are wrong.
>
>When transmitting bits and bytes intels reads from right to left.
>
>So intel starts reading at the lowest bit order/number.
>
>So that means least significant data bit/byte first.
>
>> [intel system, most significant bit first]

>Should be:

>[intel system, lest significant bit (and byte) first]

It depends on how you look at the bits in a byte
- physically the bits go from right to left inside a byte
- Intel does not really read bits - it grabs bytes

Bytes within a number go from left to right

A pure bit stream, as you read from a credit card, is not necessarily
the same as a series of bits in a number of bytes

I ran into that problem with bitmap graphics in the DOS days
From: f0dder on
Skybuck wrote:
> f0dder schreef:
>
>> A 256-byte LUT can easily fit in any CPU L1 cache. Typically a
>> cacheline is 32 bytes long, so you'd maximally get (256/32) 8 cache
>> misses before the entire LUT is cached. I'm pretty sure that this is
>> hard to beat no matter which fancy instruction set you're going to
>> use...
>
> Hello,
>
> Thank you for trying to explain it.
>
> I still do not quite understand it completely maybe you can clearify
> something:
>
> Does the above theory only apply to the xlat instruction or any
> instruction/memory access that uses the lookup table technique ?

Any instruction that references memory will cause an entire cache line to be
read (or written?). Iirc it's been like this since at least the 486. Things
are a bit more complex than this, though, especially on recent
architectures... check the intel and AMD docs, and Agner Fog's optimization
manuals.


From: Skybuck on

J French schreef:

> On 20 Aug 2006 03:17:02 -0700, "Skybuck" <skybuck2000(a)hotmail.com>
> wrote:
>
> >
> >Skybuck schreef:
> >
> >> Woops, sorry, I did not know comp.lang.asm.x86 is a moderated
> >> newsgroup, as I don't like moderated newsgroups I post it again and
> >> exclude comp.lang.asm.x86:
> >>
> >> Reversing bit order in delphi ?
> >>
> >> Hello,
> >>
> >> I think intel uses most significant bit first ?
>
> >Nope, sorry, I was wrong.
>
> >The byte order and bit order of data in RAM is both Little Endian for
> >intel. (LSB)
> >
> >The byte order and bit order of CPU registers is both Little Endian for
> >intel. (LSB)
> >
> >The drawing below was correct, but the comments are wrong.
> >
> >When transmitting bits and bytes intels reads from right to left.
> >
> >So intel starts reading at the lowest bit order/number.
> >
> >So that means least significant data bit/byte first.
> >
> >> [intel system, most significant bit first]
>
> >Should be:
>
> >[intel system, lest significant bit (and byte) first]
>
> It depends on how you look at the bits in a byte
> - physically the bits go from right to left inside a byte
> - Intel does not really read bits - it grabs bytes
>
> Bytes within a number go from left to right

Not on an intel system ;)

On an intel system the bytes go from right to left.

We humans however like to look at the big numbers and then at the
little numbers most of the time.

For example 1.324.234
|
We look at the first digits ;)

Intel stores this from right to left
ends reading here <------ starts reading here

The arror does not indicate where the reading starts.

The arrow does indicate HOW intel reads.

;)

Bye,
Skybuck.

From: J French on
On 20 Aug 2006 05:27:10 -0700, "Skybuck" <skybuck2000(a)hotmail.com>
wrote:

<snip>

>> Bytes within a number go from left to right
>
>Not on an intel system ;)
>
>On an intel system the bytes go from right to left.
>
>We humans however like to look at the big numbers and then at the
>little numbers most of the time.

>For example 1.324.234

>We look at the first digits ;)

>Intel stores this from right to left
>ends reading here <------ starts reading here

>The arror does not indicate where the reading starts.

>The arrow does indicate HOW intel reads.

Yes, there is a strange logic in what you say

- the actual bit layout is different from the 'sort of' order of
access

But it is only numeric access

I had problems with 2 colour bitmaps under DOS, and I am very wary of
the distinction between physical and logical layout.


From: Skybuck on
Day two of the competition.

Alan provided new code, but it's simply the same, none the less it was
included.

A new contender has entered the competition, Jukka/Steve Baker/Skybuck
joined effort.
It's fast, it's correct, but it's obfuscated, so it's disqualified,
none the less it's also included.

A new contender has entered the competition, Skybuck's XLAT
implementation =D
( with a little bit of help from Martin ;) :) Actually this was
somebody elses idea but he didn't provide an implementation... to bad
for him lol =D )

The performance chart on day two:

1st place, 107.180.332 calls, Herbert's implementation/idea wins (but
disqualified for cheating =D)
2st place, 90.123.101 calls, (Jukka + Steve Baker + Skybuck)
implementation. (but disqualified for obfuscation =D)
3st place, 84.320.368 calls, Skybuck's XLAT implementation. (but
diqualified for cheating =D)
4st place, 81.954.163 calls, Skybuck's second and third
implementation. (shared place)
5st place, 77.748.964 calls, Jukka and Steve (with a little help of
skybuck to fix bugs ;))
6st place, 38.171.975 calls, Frank Kotler implementation.
7st place, 29.151.010 calls, Skybuck's first implementation.
8st place, 21.531.280 calls, Alan Lloyd's basm (wrapped)
implementation. (Had to use a wrapper to fix it)
9st place, 18.937.822 calls, Alan Lloyd's pure delphi implementation.
(shared place) (Bummer dude, you end up last, good try though ;) )

Conclusion on day two:

Herbert's Table LookUp + Skybuck's Reverse Bit Order third
implementation wins for speed and clearity and purity (only delphi
needed).

Here is test program version 0.02 with some new implementations added:

*** Begin of Part 1 of Code ***

program TestReversingBitOrder;

{$APPTYPE CONSOLE}

{

Reversing Bit Order

Test program

version 0.01 created on 19 august 2006 by Skybuck Flying.

First I will create my own version, let's see how would I have done
it...

probably with some and's and shifts or so ;) Or maybe a bitset or some
of that code ;)

First I should build in some checking code to verify that the method
is correct/sound ;)

Writing this code was a lot of fun.

Benchmark method is implemented
Verification method is implemented.

Different implementations are implemented.

The fastest algorithm/idea is from Herbert.

But he cheating lol. He did not supply a method/algorithm to generate
the look up table,
nor did he supply a method to calculate the reversed bit order value
etc.

The most amazing thing is that my third implementation is the
fastest... well not that many contenders anyway.
My second implementation is the runner up.

Only Alan Lloyd and Frank Kotler dared to enter this speed competition,
maybe we'll see more contenders ? :)

Here are the result/output of the program:

<snipped away, see new result/output and conclusion in version 0.02>

}

{

version 0.02 created on 20 august 2006 by Skybuck Flying

+ New implementations are to be added ;) :)

Intel memory and register layout is as follows:

Memory layout:

Bit 31...Bit 0
Byte 3...Byte 0
<--------------
Longword 10
/|\
|
|
|
Longword 0

CPU Register layout:

Bit 31...Bit 0
<--------------

The arrow indicates HOW intel reads.

It starts reading at the right and proceeds to the left.
It starts reading at the bottom and proceeds to the top.

Intel reads from right to left, and down to top.

So intel's world is up side down ;)

However the most significant byte and bit is still on the left.

Final conclusion:

Intel's bit and byte order for both memory and registers is LSB.

Same changes to the code have been made to reflect these facts.

New conclusion and report:

A new contender has entered the arena =D

It's a combination of Jukka, Steve Bakker, and Skybuck ;)

It's the fastest performing method to flip bits so far, not counting
the memory access look up ;) which remains the fastest.

However it appears to be obfuscated, so it's disqualified as well ;)

The new results/output is:

0 00000000 00000000 0
1 00000001 10000000 128
2 00000010 01000000 64
3 00000011 11000000 192
4 00000100 00100000 32
5 00000101 10100000 160
6 00000110 01100000 96
7 00000111 11100000 224
8 00001000 00010000 16
9 00001001 10010000 144
10 00001010 01010000 80
11 00001011 11010000 208
12 00001100 00110000 48
13 00001101 10110000 176
14 00001110 01110000 112
15 00001111 11110000 240
16 00010000 00001000 8
17 00010001 10001000 136
18 00010010 01001000 72
19 00010011 11001000 200
20 00010100 00101000 40
21 00010101 10101000 168
22 00010110 01101000 104
23 00010111 11101000 232
24 00011000 00011000 24
25 00011001 10011000 152
26 00011010 01011000 88
27 00011011 11011000 216
28 00011100 00111000 56
29 00011101 10111000 184
30 00011110 01111000 120
31 00011111 11111000 248
32 00100000 00000100 4
33 00100001 10000100 132
34 00100010 01000100 68
35 00100011 11000100 196
36 00100100 00100100 36
37 00100101 10100100 164
38 00100110 01100100 100
39 00100111 11100100 228
40 00101000 00010100 20
41 00101001 10010100 148
42 00101010 01010100 84
43 00101011 11010100 212
44 00101100 00110100 52
45 00101101 10110100 180
46 00101110 01110100 116
47 00101111 11110100 244
48 00110000 00001100 12
49 00110001 10001100 140
50 00110010 01001100 76
51 00110011 11001100 204
52 00110100 00101100 44
53 00110101 10101100 172
54 00110110 01101100 108
55 00110111 11101100 236
56 00111000 00011100 28
57 00111001 10011100 156
58 00111010 01011100 92
59 00111011 11011100 220
60 00111100 00111100 60
61 00111101 10111100 188
62 00111110 01111100 124
63 00111111 11111100 252
64 01000000 00000010 2
65 01000001 10000010 130
66 01000010 01000010 66
67 01000011 11000010 194
68 01000100 00100010 34
69 01000101 10100010 162
70 01000110 01100010 98
71 01000111 11100010 226
72 01001000 00010010 18
73 01001001 10010010 146
74 01001010 01010010 82
75 01001011 11010010 210
76 01001100 00110010 50
77 01001101 10110010 178
78 01001110 01110010 114
79 01001111 11110010 242
80 01010000 00001010 10
81 01010001 10001010 138
82 01010010 01001010 74
83 01010011 11001010 202
84 01010100 00101010 42
85 01010101 10101010 170
86 01010110 01101010 106
87 01010111 11101010 234
88 01011000 00011010 26
89 01011001 10011010 154
90 01011010
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Prev: Apple II Debugger
Next: TMA Assembler?