From: Randall Hyde on
Hi All,
By processing four characters at a time and taking advantage of the
HLA string format, I've managed to double the speed of the
case insensitive string comparison. I don't know if this speed boost
will apply to typical (short) strings as this version works well for
long strings that mostly match. Further testing is going to be needed
to see if this works well with short strings.

Note a couple of features of HLA strings that this algorithm
takes advantage of:

1. The length of the string (four bytes) precedes the first character.
(the operand "(type str.strRec [esi]).length" fetches this value
from memory).

2. HLA string data is dword aligned.

3. HLA strings have a zero terminating byte

4. HLA string data is always a multiple of four bytes long (including
up to three bytes of padding at the end of the string, if necessary;
note that there is no guarantee about the values found in the padding
bytes).

Here's my latest attempt based on suggestions around here:

program t;
#include( "stdlib.hhf" )

static
// Str1 is the 255-char sequence #1..#255:

str1 :string := "" +
#for( chr:= 1 to 255 )
char( chr ) +
#endfor
"";

// As above, but substitute uppercase for lowercase:

str2 :string := "" +
#for( chr := 1 to 255 )
@uppercase( char( chr ), 0 ) +
#endfor
"";

// As above, but subsitute lowercase for uppercase:

str3 :string := "" +
#for( chr := 1 to 255 )
@lowercase( char( chr ), 0 ) +
#endfor
"";

// A string that won't match the above
// ( #1..'@' + '.'):

str4 :string := "" +
#for( chr := 1 to uns8('a')-1 )
char( chr ) +
#endfor
".";


procedure stricmp; @nodisplay; @noframe; align(16);
begin stricmp;

push( ebp );
push( ebx );
push( ecx );
push( edx );
mov( (type str.strRec [esi]).length, ebp );
mov( -4, ecx );


// Compare the two strings until we encounter a zero byte
// or until the corresonding characters are different.

align(4);
cmpLoop:
add( 4, ecx ); ;Process four chars at once
cmp( ecx, ebp ); ;Exceeded the string length?
jae theyMatched; ;If so, the strings match.
mov( [esi+ecx], eax ); ;Compare the current four chars
cmp( eax, [edi+ecx] ); ; and repeat the loop if they
je cmpLoop; ; all match.

// One or more of the four characters did not match.
// Let's compare them one at a time to see where they
// didn't match

mov( [edi+ecx], edx );
cmp( al, dl );
je Try2;
and( $df, al );
and( $df, dl );
cmp( al, 'A' );
jb dontMatch;
cmp( al, 'Z' );
ja dontMatch;
cmp( al, dl );
jne dontMatch;

Try2:
shr( 8, eax );
shr( 8, edx );

cmp( al, dl );
je Try3;
and( $df, al );
and( $df, dl );
cmp( al, 'A' );
jb dontMatch;
cmp( al, 'Z' );
ja dontMatch;
cmp( al, dl );
jne dontMatch;

Try3:
shr( 8, eax );
shr( 8, edx );

cmp( al, dl );
je Try4;
and( $df, al );
and( $df, dl );
cmp( al, 'A' );
jb dontMatch;
cmp( al, 'Z' );
ja dontMatch;
cmp( al, dl );
jne dontMatch;

Try4:
cmp( ah, dh );
je cmpLoop;
and( $df, ah );
and( $df, dh );
cmp( ah, 'A' );
jb dontMatch2;
cmp( ah, 'Z' );
ja dontMatch2;
cmp( ah, dh );
je cmpLoop;

// At this point, we've either encountered a zero byte in the source
// string or we've encountered two bytes that are not the same in
// the two strings. In either case, return the result of the
// comparison in the flags.
// Special case to compare ah & dh:

dontMatch2:
pop( edx );
pop( ecx );
pop( ebx );
pop( ebp );
cmp( ah, dh );
ret();


// At this point, we've either encountered a zero byte in the source
// string or we've encountered two bytes that are not the same in
// the two strings. In either case, return the result of the
// comparison in the flags.

dontMatch:
pop( edx );
pop( ecx );
pop( ebx );
pop( ebp );
cmp( al, dl );
ret();


theyMatched:
pop( edx );
pop( ecx );
pop( ebx );
pop( ebp );
cmp( al, al ); //Force Z-flag to be set.
ret();

end stricmp;


static
timeStart :int64;
timeEnd :int64;

begin t;

stdout.put( nl "Test 1:" nl );

nop;
nop;
nop;
nop;
rdtsc();
mov( eax, (type dword timeStart));
mov( edx, (type dword timeStart[4]));

mov( str1, esi );
mov( str1, edi );
stricmp();
jne Failure;

mov( str1, esi );
mov( str2, edi );
stricmp();
jne Failure;

mov( str1, esi );
mov( str3, edi );
stricmp();
jne Failure;

mov( str2, esi );
mov( str3, edi );
stricmp();
jne Failure;

mov( str3, esi );
mov( str2, edi );
stricmp();
jne Failure;

mov( str1, esi );
mov( str4, edi );
stricmp();
je Failure;

rdtsc();
sub( (type dword timeStart), eax );
mov( eax, (type dword timeEnd) );
sbb( (type dword timeStart[4]), edx );
mov( edx, (type dword timeEnd[4]));
stdout.put( "Sequence 1: ", timeEnd, nl );
exit t;

Failure:
stdout.put( "Stricmp failed" nl );

end t;


It would be nice to come up with a clever way to handle
the case where one of the bytes doesn't match (rather than
the four separate sequences I've got here). I'm getting tired,
so I'll have to think about this one later.

Cheers,
Randy Hyde



From: Sevag Krikorian on
Randall Hyde wrote:

>
> It would be nice to come up with a clever way to handle
> the case where one of the bytes doesn't match (rather than
> the four separate sequences I've got here). I'm getting tired,
> so I'll have to think about this one later.
>
> Cheers,
> Randy Hyde
>
>
>


It's a work in progress, it may be that XOR (or SUB)
will do the trick.

// force both registers to case neutrality
and ($dfdfdfdf, eax);
and ($dfdfdfdf, edx);

// zero out any characters that are equal
xor (eax, edx); // or sub (eax, edx);

// if they are equal, z flag will be set
jz cmpLoop;

There is one problem I see with this, if there
is no guarantee about the bytes after the zero-
termination, there might be a premature
no-match while matching the last dword of data.

Needs some more thought. Any quick way of checking
for an 8-bit lenght of zero in all of the 4 bytes
of a 32 bit register?

Also, a snipplet:

add( 4, ecx ); //Process four chars at once
cmp( ecx, ebp ); //Exceeded the string length?
jae theyMatched; //If so, the strings match.

Is this correct? The routine doesn't seem to be checking
for the equality of the last dword. If this is true, then
it could be a remedy for the above problem:


ja theyMatched;
je checkLastDword;


mov( [esi+ecx], eax ); ;Compare the current four chars
cmp( eax, [edi+ecx] ); ; and repeat the loop if they
je cmpLoop; ; all match.

and ($dfdfdfdf, eax);
and ($dfdfdfdf, edx);
xor (eax, edx);
jz cmpLoop;

// at this point, they don't match

...

checkLastDword:
// insert the Try routines here to test 1 byte
// at a time, don't forget to load eax/edx

--
[kain]
http://www.geocities.com/kahlinor

PS. Mind what comment character(s) you use :)
From: Sevag Krikorian on
Sevag Krikorian wrote:
> Randall Hyde wrote:
>
>>
>> It would be nice to come up with a clever way to handle
>> the case where one of the bytes doesn't match (rather than
>> the four separate sequences I've got here). I'm getting tired,
>> so I'll have to think about this one later.
>>
>> Cheers,
>> Randy Hyde
>>
>>
>>
>
>
> It's a work in progress, it may be that XOR (or SUB)
> will do the trick.
>
> // force both registers to case neutrality
> and ($dfdfdfdf, eax);
> and ($dfdfdfdf, edx);
>
> // zero out any characters that are equal
> xor (eax, edx); // or sub (eax, edx);
>
> // if they are equal, z flag will be set
> jz cmpLoop;
>
> There is one problem I see with this, if there
> is no guarantee about the bytes after the zero-
> termination, there might be a premature
> no-match while matching the last dword of data.
>
> Needs some more thought. Any quick way of checking
> for an 8-bit lenght of zero in all of the 4 bytes
> of a 32 bit register?
>
> Also, a snipplet:
>
> add( 4, ecx ); //Process four chars at once
> cmp( ecx, ebp ); //Exceeded the string length?
> jae theyMatched; //If so, the strings match.
>
> Is this correct? The routine doesn't seem to be checking
> for the equality of the last dword. If this is true, then
> it could be a remedy for the above problem:
>
>
> ja theyMatched;
> je checkLastDword;
>
>
> mov( [esi+ecx], eax ); ;Compare the current four chars
> cmp( eax, [edi+ecx] ); ; and repeat the loop if they
> je cmpLoop; ; all match.
>
> and ($dfdfdfdf, eax);
> and ($dfdfdfdf, edx);
> xor (eax, edx);
> jz cmpLoop;
>
> // at this point, they don't match
>
> ...
>
> checkLastDword:
> // insert the Try routines here to test 1 byte
> // at a time, don't forget to load eax/edx
>


I tried a test run and it improved speed by ~700 - 800


Full modified program below:


program t;
#include( "stdlib.hhf" )

static
// Str1 is the 255-char sequence #1..#255:

str1 :string := "" +
#for( chr:= 1 to 255 )
char( chr ) +
#endfor
"";

// As above, but substitute uppercase for lowercase:

str2 :string := "" +
#for( chr := 1 to 255 )
@uppercase( char( chr ), 0 ) +
#endfor
"";

// As above, but subsitute lowercase for uppercase:

str3 :string := "" +
#for( chr := 1 to 255 )
@lowercase( char( chr ), 0 ) +
#endfor
"";

// A string that won't match the above
// ( #1..'@' + '.'):

str4 :string := "" +
#for( chr := 1 to uns8('a')-1 )
char( chr ) +
#endfor
".";


procedure stricmp; @nodisplay; @noframe; align(16);
begin stricmp;

push( ebp );
push( ebx );
push( ecx );
push( edx );
mov( (type str.strRec [esi]).length, ebp );
mov( -4, ecx );


// Compare the two strings until we encounter a zero byte
// or until the corresonding characters are different.

align(4);
cmpLoop:
add( 4, ecx ); //Process four chars at once
cmp( ecx, ebp ); //Exceeded the string length?
ja theyMatched; //If so, the strings match.
je checkLast;
mov( [esi+ecx], eax ); //Compare the current four chars
cmp( eax, [edi+ecx] ); // and repeat the loop if they
je cmpLoop; // all match.

mov ([edi+ecx], edx);
and ($dfdfdfdf, eax);
and ($dfdfdfdf, edx);
xor (eax, edx);
jz cmpLoop;

mov ( [edi+ecx], edx);
cmp (eax, edx);
pop( edx );
pop( ecx );
pop( ebx );
pop( ebp );

ret();

// One or more of the four characters did not match.
// Let's compare them one at a time to see where they
// didn't match
checkLast:
mov( [esi+ecx], eax );
mov( [edi+ecx], edx );
cmp( al, dl );
je Try2;
and( $df, al );
and( $df, dl );
cmp( al, 'A' );
jb dontMatch;
cmp( al, 'Z' );
ja dontMatch;
cmp( al, dl );
jne dontMatch;

Try2:
shr( 8, eax );
shr( 8, edx );

cmp( al, dl );
je Try3;
and( $df, al );
and( $df, dl );
cmp( al, 'A' );
jb dontMatch;
cmp( al, 'Z' );
ja dontMatch;
cmp( al, dl );
jne dontMatch;

Try3:
shr( 8, eax );
shr( 8, edx );

cmp( al, dl );
je Try4;
and( $df, al );
and( $df, dl );
cmp( al, 'A' );
jb dontMatch;
cmp( al, 'Z' );
ja dontMatch;
cmp( al, dl );
jne dontMatch;

Try4:
cmp( ah, dh );
je cmpLoop;
and( $df, ah );
and( $df, dh );
cmp( ah, 'A' );
jb dontMatch2;
cmp( ah, 'Z' );
ja dontMatch2;

// they matched
cmp( ah, dh );
pop( edx );
pop( ecx );
pop( ebx );
pop( ebp );
ret();

// At this point, we've either encountered a zero byte in the source
// string or we've encountered two bytes that are not the same in
// the two strings. In either case, return the result of the
// comparison in the flags.
// Special case to compare ah & dh:

dontMatch2:
pop( edx );
pop( ecx );
pop( ebx );
pop( ebp );
cmp( ah, dh );
ret();


// At this point, we've either encountered a zero byte in the source
// string or we've encountered two bytes that are not the same in
// the two strings. In either case, return the result of the
// comparison in the flags.

dontMatch:
pop( edx );
pop( ecx );
pop( ebx );
pop( ebp );
cmp( al, dl );
ret();


theyMatched:
pop( edx );
pop( ecx );
pop( ebx );
pop( ebp );
cmp( al, al ); //Force Z-flag to be set.
ret();

end stricmp;


static
timeStart :int64;
timeEnd :int64;

begin t;

stdout.put( nl "Test 1:" nl );

nop;
nop;
nop;
nop;
rdtsc();
mov( eax, (type dword timeStart));
mov( edx, (type dword timeStart[4]));

mov( str1, esi );
mov( str1, edi );
stricmp();
jne Failure;

mov( str1, esi );
mov( str2, edi );
stricmp();
jne Failure;

mov( str1, esi );
mov( str3, edi );
stricmp();
jne Failure;

mov( str2, esi );
mov( str3, edi );
stricmp();
jne Failure;

mov( str3, esi );
mov( str2, edi );
stricmp();
jne Failure;

mov( str1, esi );
mov( str4, edi );
stricmp();
je Failure;

rdtsc();
sub( (type dword timeStart), eax );
mov( eax, (type dword timeEnd) );
sbb( (type dword timeStart[4]), edx );
mov( edx, (type dword timeEnd[4]));
stdout.put( "Sequence 1: ", timeEnd, nl );
exit t;

Failure:
stdout.put( "Stricmp failed" nl );

end t;



--
[kain]
http://www.geocities.com/kahlinor
From: Randall Hyde on

"Sevag Krikorian" <kahlinor(a)nop.yahoo.com> wrote in message
news:397kmeF5tt5n7U1(a)individual.net...
>
> It's a work in progress, it may be that XOR (or SUB)
> will do the trick.
>
> // force both registers to case neutrality
> and ($dfdfdfdf, eax);
> and ($dfdfdfdf, edx);
>
> // zero out any characters that are equal
> xor (eax, edx); // or sub (eax, edx);

Don't forget that you *must* check the individual
characters (at least one in each string) to ensure
that they are alphabetic before accepting them
as "equal". You do *not* want to compare
grave accent "`" and find that it is equal to "@"
(for example).


>
> // if they are equal, z flag will be set
> jz cmpLoop;
>
> There is one problem I see with this, if there
> is no guarantee about the bytes after the zero-
> termination, there might be a premature
> no-match while matching the last dword of data.

No, there is no guarantee about the characters
following the zero byte. You must stop after finding
a zero byte (or after processing "length" characters).

>
> Needs some more thought. Any quick way of checking
> for an 8-bit lenght of zero in all of the 4 bytes
> of a 32 bit register?

Sure. The strlen optimizations that process four
characters at a time do this, but that algorithm is
only effective for long strings.

>
> Also, a snipplet:
>
> add( 4, ecx ); //Process four chars at once
> cmp( ecx, ebp ); //Exceeded the string length?
> jae theyMatched; //If so, the strings match.
>
> Is this correct? The routine doesn't seem to be checking
> for the equality of the last dword. If this is true, then
> it could be a remedy for the above problem:

Note that the running length is initialized to -4.

Cheers,
Randy Hyde


From: Evenbit on

Randall Hyde wrote:
> Hi All,
>
> I'm currently in the process of correcting a defect in the
> HLA Standard Library case insensitive string comparison
> routines. While I'm at it, I'm trying to speed it up a bit.
> I've written the following code:
>
> procedure stricmp; @nodisplay; @noframe; align(4);
> begin stricmp;
>
> push( ebx );
>
>
> // Compare the two strings until we encounter a zero byte
> // or until the corresonding characters are different.
>
> cmpLoop:
>
> mov( [esi], al );
> mov( [edi], ah );
> add( 1, esi );
> add( 1, edi );
> cmp( al, 0 );
> je notAlpha;
> cmp( al, ah );
> je cmpLoop;

OOoops!! I was under the impression there would be more register
preservation code up in here. Now I know why my proggies are still
buggy even after HLA stops complainning at me. Back to the drawing
board...

Nathan.