From: Skybuck Flying on
I suspect there is a bug in this assembler routine, you'd be a hero and big
news if you can find it ! ;) :)

I suspect the problem is with copieing large blocks of memory which have a
weird size like +1, +2 +3 +4 +5 +6 +7 so anything that will fall within
a memory cell of say 4 or 8 bytes... either that or something else is going
on...

It seems to be using floating point registers... maybe those somehow screw
up ?!?

Below is the pascal version followed by the assembler version which is
probably the one that is being used...

Good luck finding the bug since that's some shitty complex assembler code ?!
;) :)

(* ***** BEGIN LICENSE BLOCK *****
*
* The assembly function Move is licensed under the CodeGear license terms.
*
* The initial developer of the original code is Fastcode
*
* Portions created by the initial developer are Copyright (C) 2002-2004
* the initial developer. All Rights Reserved.
*
* Contributor(s): John O'Harrow
*
* ***** END LICENSE BLOCK ***** *)
procedure Move(const Source; var Dest; count : Integer);
{$IFDEF PUREPASCAL}
var
S, D: PChar;
I: Integer;
begin
S := PChar(@Source);
D := PChar(@Dest);
if S = D then Exit;
if Cardinal(D) > Cardinal(S) then
for I := count-1 downto 0 do
D[I] := S[I]
else
for I := 0 to count-1 do
D[I] := S[I];
end;
{$ELSE}
asm
cmp eax, edx
je @@Exit {Source = Dest}
cmp ecx, 32
ja @@LargeMove {Count > 32 or Count < 0}
sub ecx, 8
jg @@SmallMove
@@TinyMove: {0..8 Byte Move}
jmp dword ptr [@@JumpTable+32+ecx*4]
@@SmallMove: {9..32 Byte Move}
fild qword ptr [eax+ecx] {Load Last 8}
fild qword ptr [eax] {Load First 8}
cmp ecx, 8
jle @@Small16
fild qword ptr [eax+8] {Load Second 8}
cmp ecx, 16
jle @@Small24
fild qword ptr [eax+16] {Load Third 8}
fistp qword ptr [edx+16] {Save Third 8}
@@Small24:
fistp qword ptr [edx+8] {Save Second 8}
@@Small16:
fistp qword ptr [edx] {Save First 8}
fistp qword ptr [edx+ecx] {Save Last 8}
@@Exit:
ret
nop {4-Byte Align JumpTable}
nop
@@JumpTable: {4-Byte Aligned}
dd @@Exit, @@M01, @@M02, @@M03, @@M04, @@M05, @@M06, @@M07, @@M08
@@LargeForwardMove: {4-Byte Aligned}
push edx
fild qword ptr [eax] {First 8}
lea eax, [eax+ecx-8]
lea ecx, [ecx+edx-8]
fild qword ptr [eax] {Last 8}
push ecx
neg ecx
and edx, -8 {8-Byte Align Writes}
lea ecx, [ecx+edx+8]
pop edx
@FwdLoop:
fild qword ptr [eax+ecx]
fistp qword ptr [edx+ecx]
add ecx, 8
jl @FwdLoop
fistp qword ptr [edx] {Last 8}
pop edx
fistp qword ptr [edx] {First 8}
ret
@@LargeMove:
jng @@LargeDone {Count < 0}
cmp eax, edx
ja @@LargeForwardMove
sub edx, ecx
cmp eax, edx
lea edx, [edx+ecx]
jna @@LargeForwardMove
sub ecx, 8 {Backward Move}
push ecx
fild qword ptr [eax+ecx] {Last 8}
fild qword ptr [eax] {First 8}
add ecx, edx
and ecx, -8 {8-Byte Align Writes}
sub ecx, edx
@BwdLoop:
fild qword ptr [eax+ecx]
fistp qword ptr [edx+ecx]
sub ecx, 8
jg @BwdLoop
pop ecx
fistp qword ptr [edx] {First 8}
fistp qword ptr [edx+ecx] {Last 8}
@@LargeDone:
ret
@@M01:
movzx ecx, [eax]
mov [edx], cl
ret
@@M02:
movzx ecx, word ptr [eax]
mov [edx], cx
ret
@@M03:
mov cx, [eax]
mov al, [eax+2]
mov [edx], cx
mov [edx+2], al
ret
@@M04:
mov ecx, [eax]
mov [edx], ecx
ret
@@M05:
mov ecx, [eax]
mov al, [eax+4]
mov [edx], ecx
mov [edx+4], al
ret
@@M06:
mov ecx, [eax]
mov ax, [eax+4]
mov [edx], ecx
mov [edx+4], ax
ret
@@M07:
mov ecx, [eax]
mov eax, [eax+3]
mov [edx], ecx
mov [edx+3], eax
ret
@@M08:
fild qword ptr [eax]
fistp qword ptr [edx]
end;
{$ENDIF}



"Skybuck Flying" <IntoTheFuture(a)hotmail.com> wrote in message
news:5ee6b$4c14c230$54190f09$19681(a)cache4.tilbu1.nb.home.nl...
> Hello my video codec has detected a strange problem with the "move"
> routine of Delphi 2007.
>
> The bug seems to go away when I do a manual copy of a frame instead of
> using the move routine like so:
> (Not only that... but it becomes faster too ?!)
>
> It only happens for some frames and not all frames, so it seems to be
> input dependent ?!?
>
> I am guessing that the move routine fails if the bytes end up halve way a
> 32 bit cell...
>
> I am guessing it does not copy the last 2 or 3 bytes of the last cell...
>
> For example:
>
> (800x600x3+1) bytes might fail because the size is not a multiple of 4
> bytes ?!?
>
> So either the move routine is bugged or something else is going on which
> seems unlikely
> since the bug goes away ?!?!??
>
> // bugged:
> (*
> // remember the original input
> procedure TFastVideoCompressor.Remember;
> begin
> // remember current/input frame
> move( mInput^, mPreviousFrame^, mSize );
> end;
> *)
>
> // correct:
> procedure TFastVideoCompressor.Remember;
> var
> vIndex : integer;
> vInput : Prgb;
> vPreviousFramePixel : Prgb;
> begin
> // remember current/output frame
> // move( mInput^, mPreviousFrame^, mSize );
>
> vInput := mInput;
> vPreviousFramePixel := mPreviousFrame;
>
> if mArea > 0 then
> for vIndex := 0 to mArea-1 do
> begin
> vPreviousFramePixel.mBlue := vInput.mBlue;
> vPreviousFramePixel.mGreen := vInput.mGreen;
> vPreviousFramePixel.mRed := vInput.mRed;
>
> longword(vInput) := longword(vInput) + SizeOf(Trgb);
> longword(vPreviousFramePixel) := longword(vPreviousFramePixel) +
> SizeOf(Trgb);
> end;
>
> end;

Bye,
Skybuck.


From: Skybuck Flying on
Actually now that I think about it... the frame is perfectly aligned on
memory cells it seems...

So there must be some big bug in this code... or there is some weird
floating point bug in my processor ?!?

(See code previous posting in this thread)

Bye,
Skybuck.


From: Skybuck Flying on
New theorie:

Maybe the routine only fails if other floating point routines/calculations
are done around it...

Like before/after the move... and then trying to do the move again and then
some more calculations and so forth...

Bye,
Skybuck.


From: Skybuck Flying on
Maybe it's not flawed after all... the bug seemed to disappear but now it's
back again...

Weird...

Something else must be the problem :(

Bye,
Skybuck.


From: Skybuck Flying on
Well shitty.. I did a little memory test with the computer and no errors
found.. so that can't be it...

So it must be a bug somewhere... it's kinda nasty with all the pointers
moving by one and then the sizes need to be reduced with one... it's a bit
messy :)

Gotta find a way to fix that and make it better... still no clue though...
could be algo bug too don't know...

However one thing is for sure... CPU is probably not fast enough to do
lossless video decoding with multiple compression/transformation methods...

So I might have to deviate to gpu decoding at least ;) :)

Bye,
Skybuck :)