From: Skybuck Flying on
Ok,

I wasn't statisfied with all the slow routines.

I probably need something faster.

Maybe if I clear the buffer up front and then simply place everything
sequentially it will go faster and fast enough for my purposes.

Actually this is all I really need at the moment. But if I do need something
special I can always use the previous routines if I need to overwrite
something later on without disturbing any bits.

So here is the lightning fast version. It simply adds bits to the end, at
least that's how it's supposed to be used.
DestBitIndex should "point" the last bit in the buffer.

The bits parameter is removed, since it is not relevant anymore. This
routine is dirty... it will overwrite the next longword in the buffer as
well... but that shouldn't be a problem if it's large enough which is within
specifications :)

Only 19 instructions, nice and fast ! ;) :)

If it's inlined it might be even faster/less instructions =D HAHA ;)

// Skybuck's Lightning Fast WriteLongwordBits version.
// Classification A1B1.
// This means: A1 means routine assumes buffer is cleared.
// This means: B1 means routine assumes trailing bits is not a problem.
// Bits parameter is no longer relevant.

// just 19 instructions
// ok I like this routine much better... I can get away with it...
procedure WriteLongwordBitsA1B1( Value : longword; DestAddress : pointer;
DestBitIndex : longword );
begin
longword(DestAddress) := longword(DestAddress) + (DestBitIndex shr 3); //
div 8

// DestBitIndex will now function as the shift.
DestBitIndex := DestBitIndex and 7; // mod 8

Plongword(DestAddress)^ := Plongword(DestAddress)^ or (Value shl
DestBitIndex);
Plongword(longword(DestAddress) + 4)^ := Plongword(longword(DestAddress) +
4)^ or (Value shr (32-DestBitIndex));
end;

// Generated Assembler:

{

Project1.dpr.1962: begin
00409058 53 push ebx
00409059 56 push esi
0040905A 8BD9 mov ebx,ecx
Project1.dpr.1963: longword(DestAddress) := longword(DestAddress) +
(DestBitIndex shr 3); // div 8
0040905C 8BCB mov ecx,ebx
0040905E C1E903 shr ecx,$03
00409061 01CA add edx,ecx
Project1.dpr.1966: DestBitIndex := DestBitIndex and 7; // mod 8
00409063 83E307 and ebx,$07
Project1.dpr.1968: Plongword(DestAddress)^ := Plongword(DestAddress)^ or
(Value shl DestBitIndex);
00409066 8BCB mov ecx,ebx
00409068 8BF0 mov esi,eax
0040906A D3E6 shl esi,cl
0040906C 0932 or [edx],esi
Project1.dpr.1969: Plongword(longword(DestAddress) + 4)^ :=
Plongword(longword(DestAddress) + 4)^ or (Value shr (32-DestBitIndex));
0040906E B920000000 mov ecx,$00000020
00409073 2BCB sub ecx,ebx
00409075 D3E8 shr eax,cl
00409077 83C204 add edx,$04
0040907A 0902 or [edx],eax
Project1.dpr.1970: end;
0040907C 5E pop esi
0040907D 5B pop ebx
0040907E C3 ret

}

You see if I scrap some requirements I can write lightning fast code which
is even more correct than the competition at the moment ! ;) :)

Bye,
Skybuck.


From: Skybuck Flying on
Yup here is an inlined assembly example:

// Generated Assembler:

// only 14 instructions !
Project1.dpr.2073: WriteLongwordBitsA1B1( Value, @Buffer, BitPointer );
0040A207 8BD7 mov edx,edi
0040A209 8BC8 mov ecx,eax
0040A20B C1E903 shr ecx,$03
0040A20E 01CA add edx,ecx
0040A210 83E007 and eax,$07
0040A213 8BC8 mov ecx,eax
0040A215 8BF3 mov esi,ebx
0040A217 D3E6 shl esi,cl
0040A219 0932 or [edx],esi
0040A21B B920000000 mov ecx,$00000020
0040A220 2BC8 sub ecx,eax
0040A222 D3EB shr ebx,cl
0040A224 83C204 add edx,$04
0040A227 091A or [edx],ebx

Bye,
Bye,
Skybuck ;) =D