|
From: Randall Hyde on 9 Mar 2005 00:29 Hi All, By processing four characters at a time and taking advantage of the HLA string format, I've managed to double the speed of the case insensitive string comparison. I don't know if this speed boost will apply to typical (short) strings as this version works well for long strings that mostly match. Further testing is going to be needed to see if this works well with short strings. Note a couple of features of HLA strings that this algorithm takes advantage of: 1. The length of the string (four bytes) precedes the first character. (the operand "(type str.strRec [esi]).length" fetches this value from memory). 2. HLA string data is dword aligned. 3. HLA strings have a zero terminating byte 4. HLA string data is always a multiple of four bytes long (including up to three bytes of padding at the end of the string, if necessary; note that there is no guarantee about the values found in the padding bytes). Here's my latest attempt based on suggestions around here: program t; #include( "stdlib.hhf" ) static // Str1 is the 255-char sequence #1..#255: str1 :string := "" + #for( chr:= 1 to 255 ) char( chr ) + #endfor ""; // As above, but substitute uppercase for lowercase: str2 :string := "" + #for( chr := 1 to 255 ) @uppercase( char( chr ), 0 ) + #endfor ""; // As above, but subsitute lowercase for uppercase: str3 :string := "" + #for( chr := 1 to 255 ) @lowercase( char( chr ), 0 ) + #endfor ""; // A string that won't match the above // ( #1..'@' + '.'): str4 :string := "" + #for( chr := 1 to uns8('a')-1 ) char( chr ) + #endfor "."; procedure stricmp; @nodisplay; @noframe; align(16); begin stricmp; push( ebp ); push( ebx ); push( ecx ); push( edx ); mov( (type str.strRec [esi]).length, ebp ); mov( -4, ecx ); // Compare the two strings until we encounter a zero byte // or until the corresonding characters are different. align(4); cmpLoop: add( 4, ecx ); ;Process four chars at once cmp( ecx, ebp ); ;Exceeded the string length? jae theyMatched; ;If so, the strings match. mov( [esi+ecx], eax ); ;Compare the current four chars cmp( eax, [edi+ecx] ); ; and repeat the loop if they je cmpLoop; ; all match. // One or more of the four characters did not match. // Let's compare them one at a time to see where they // didn't match mov( [edi+ecx], edx ); cmp( al, dl ); je Try2; and( $df, al ); and( $df, dl ); cmp( al, 'A' ); jb dontMatch; cmp( al, 'Z' ); ja dontMatch; cmp( al, dl ); jne dontMatch; Try2: shr( 8, eax ); shr( 8, edx ); cmp( al, dl ); je Try3; and( $df, al ); and( $df, dl ); cmp( al, 'A' ); jb dontMatch; cmp( al, 'Z' ); ja dontMatch; cmp( al, dl ); jne dontMatch; Try3: shr( 8, eax ); shr( 8, edx ); cmp( al, dl ); je Try4; and( $df, al ); and( $df, dl ); cmp( al, 'A' ); jb dontMatch; cmp( al, 'Z' ); ja dontMatch; cmp( al, dl ); jne dontMatch; Try4: cmp( ah, dh ); je cmpLoop; and( $df, ah ); and( $df, dh ); cmp( ah, 'A' ); jb dontMatch2; cmp( ah, 'Z' ); ja dontMatch2; cmp( ah, dh ); je cmpLoop; // At this point, we've either encountered a zero byte in the source // string or we've encountered two bytes that are not the same in // the two strings. In either case, return the result of the // comparison in the flags. // Special case to compare ah & dh: dontMatch2: pop( edx ); pop( ecx ); pop( ebx ); pop( ebp ); cmp( ah, dh ); ret(); // At this point, we've either encountered a zero byte in the source // string or we've encountered two bytes that are not the same in // the two strings. In either case, return the result of the // comparison in the flags. dontMatch: pop( edx ); pop( ecx ); pop( ebx ); pop( ebp ); cmp( al, dl ); ret(); theyMatched: pop( edx ); pop( ecx ); pop( ebx ); pop( ebp ); cmp( al, al ); //Force Z-flag to be set. ret(); end stricmp; static timeStart :int64; timeEnd :int64; begin t; stdout.put( nl "Test 1:" nl ); nop; nop; nop; nop; rdtsc(); mov( eax, (type dword timeStart)); mov( edx, (type dword timeStart[4])); mov( str1, esi ); mov( str1, edi ); stricmp(); jne Failure; mov( str1, esi ); mov( str2, edi ); stricmp(); jne Failure; mov( str1, esi ); mov( str3, edi ); stricmp(); jne Failure; mov( str2, esi ); mov( str3, edi ); stricmp(); jne Failure; mov( str3, esi ); mov( str2, edi ); stricmp(); jne Failure; mov( str1, esi ); mov( str4, edi ); stricmp(); je Failure; rdtsc(); sub( (type dword timeStart), eax ); mov( eax, (type dword timeEnd) ); sbb( (type dword timeStart[4]), edx ); mov( edx, (type dword timeEnd[4])); stdout.put( "Sequence 1: ", timeEnd, nl ); exit t; Failure: stdout.put( "Stricmp failed" nl ); end t; It would be nice to come up with a clever way to handle the case where one of the bytes doesn't match (rather than the four separate sequences I've got here). I'm getting tired, so I'll have to think about this one later. Cheers, Randy Hyde
From: Sevag Krikorian on 9 Mar 2005 01:50 Randall Hyde wrote: > > It would be nice to come up with a clever way to handle > the case where one of the bytes doesn't match (rather than > the four separate sequences I've got here). I'm getting tired, > so I'll have to think about this one later. > > Cheers, > Randy Hyde > > > It's a work in progress, it may be that XOR (or SUB) will do the trick. // force both registers to case neutrality and ($dfdfdfdf, eax); and ($dfdfdfdf, edx); // zero out any characters that are equal xor (eax, edx); // or sub (eax, edx); // if they are equal, z flag will be set jz cmpLoop; There is one problem I see with this, if there is no guarantee about the bytes after the zero- termination, there might be a premature no-match while matching the last dword of data. Needs some more thought. Any quick way of checking for an 8-bit lenght of zero in all of the 4 bytes of a 32 bit register? Also, a snipplet: add( 4, ecx ); //Process four chars at once cmp( ecx, ebp ); //Exceeded the string length? jae theyMatched; //If so, the strings match. Is this correct? The routine doesn't seem to be checking for the equality of the last dword. If this is true, then it could be a remedy for the above problem: ja theyMatched; je checkLastDword; mov( [esi+ecx], eax ); ;Compare the current four chars cmp( eax, [edi+ecx] ); ; and repeat the loop if they je cmpLoop; ; all match. and ($dfdfdfdf, eax); and ($dfdfdfdf, edx); xor (eax, edx); jz cmpLoop; // at this point, they don't match ... checkLastDword: // insert the Try routines here to test 1 byte // at a time, don't forget to load eax/edx -- [kain] http://www.geocities.com/kahlinor PS. Mind what comment character(s) you use :)
From: Sevag Krikorian on 9 Mar 2005 02:07 Sevag Krikorian wrote: > Randall Hyde wrote: > >> >> It would be nice to come up with a clever way to handle >> the case where one of the bytes doesn't match (rather than >> the four separate sequences I've got here). I'm getting tired, >> so I'll have to think about this one later. >> >> Cheers, >> Randy Hyde >> >> >> > > > It's a work in progress, it may be that XOR (or SUB) > will do the trick. > > // force both registers to case neutrality > and ($dfdfdfdf, eax); > and ($dfdfdfdf, edx); > > // zero out any characters that are equal > xor (eax, edx); // or sub (eax, edx); > > // if they are equal, z flag will be set > jz cmpLoop; > > There is one problem I see with this, if there > is no guarantee about the bytes after the zero- > termination, there might be a premature > no-match while matching the last dword of data. > > Needs some more thought. Any quick way of checking > for an 8-bit lenght of zero in all of the 4 bytes > of a 32 bit register? > > Also, a snipplet: > > add( 4, ecx ); //Process four chars at once > cmp( ecx, ebp ); //Exceeded the string length? > jae theyMatched; //If so, the strings match. > > Is this correct? The routine doesn't seem to be checking > for the equality of the last dword. If this is true, then > it could be a remedy for the above problem: > > > ja theyMatched; > je checkLastDword; > > > mov( [esi+ecx], eax ); ;Compare the current four chars > cmp( eax, [edi+ecx] ); ; and repeat the loop if they > je cmpLoop; ; all match. > > and ($dfdfdfdf, eax); > and ($dfdfdfdf, edx); > xor (eax, edx); > jz cmpLoop; > > // at this point, they don't match > > ... > > checkLastDword: > // insert the Try routines here to test 1 byte > // at a time, don't forget to load eax/edx > I tried a test run and it improved speed by ~700 - 800 Full modified program below: program t; #include( "stdlib.hhf" ) static // Str1 is the 255-char sequence #1..#255: str1 :string := "" + #for( chr:= 1 to 255 ) char( chr ) + #endfor ""; // As above, but substitute uppercase for lowercase: str2 :string := "" + #for( chr := 1 to 255 ) @uppercase( char( chr ), 0 ) + #endfor ""; // As above, but subsitute lowercase for uppercase: str3 :string := "" + #for( chr := 1 to 255 ) @lowercase( char( chr ), 0 ) + #endfor ""; // A string that won't match the above // ( #1..'@' + '.'): str4 :string := "" + #for( chr := 1 to uns8('a')-1 ) char( chr ) + #endfor "."; procedure stricmp; @nodisplay; @noframe; align(16); begin stricmp; push( ebp ); push( ebx ); push( ecx ); push( edx ); mov( (type str.strRec [esi]).length, ebp ); mov( -4, ecx ); // Compare the two strings until we encounter a zero byte // or until the corresonding characters are different. align(4); cmpLoop: add( 4, ecx ); //Process four chars at once cmp( ecx, ebp ); //Exceeded the string length? ja theyMatched; //If so, the strings match. je checkLast; mov( [esi+ecx], eax ); //Compare the current four chars cmp( eax, [edi+ecx] ); // and repeat the loop if they je cmpLoop; // all match. mov ([edi+ecx], edx); and ($dfdfdfdf, eax); and ($dfdfdfdf, edx); xor (eax, edx); jz cmpLoop; mov ( [edi+ecx], edx); cmp (eax, edx); pop( edx ); pop( ecx ); pop( ebx ); pop( ebp ); ret(); // One or more of the four characters did not match. // Let's compare them one at a time to see where they // didn't match checkLast: mov( [esi+ecx], eax ); mov( [edi+ecx], edx ); cmp( al, dl ); je Try2; and( $df, al ); and( $df, dl ); cmp( al, 'A' ); jb dontMatch; cmp( al, 'Z' ); ja dontMatch; cmp( al, dl ); jne dontMatch; Try2: shr( 8, eax ); shr( 8, edx ); cmp( al, dl ); je Try3; and( $df, al ); and( $df, dl ); cmp( al, 'A' ); jb dontMatch; cmp( al, 'Z' ); ja dontMatch; cmp( al, dl ); jne dontMatch; Try3: shr( 8, eax ); shr( 8, edx ); cmp( al, dl ); je Try4; and( $df, al ); and( $df, dl ); cmp( al, 'A' ); jb dontMatch; cmp( al, 'Z' ); ja dontMatch; cmp( al, dl ); jne dontMatch; Try4: cmp( ah, dh ); je cmpLoop; and( $df, ah ); and( $df, dh ); cmp( ah, 'A' ); jb dontMatch2; cmp( ah, 'Z' ); ja dontMatch2; // they matched cmp( ah, dh ); pop( edx ); pop( ecx ); pop( ebx ); pop( ebp ); ret(); // At this point, we've either encountered a zero byte in the source // string or we've encountered two bytes that are not the same in // the two strings. In either case, return the result of the // comparison in the flags. // Special case to compare ah & dh: dontMatch2: pop( edx ); pop( ecx ); pop( ebx ); pop( ebp ); cmp( ah, dh ); ret(); // At this point, we've either encountered a zero byte in the source // string or we've encountered two bytes that are not the same in // the two strings. In either case, return the result of the // comparison in the flags. dontMatch: pop( edx ); pop( ecx ); pop( ebx ); pop( ebp ); cmp( al, dl ); ret(); theyMatched: pop( edx ); pop( ecx ); pop( ebx ); pop( ebp ); cmp( al, al ); //Force Z-flag to be set. ret(); end stricmp; static timeStart :int64; timeEnd :int64; begin t; stdout.put( nl "Test 1:" nl ); nop; nop; nop; nop; rdtsc(); mov( eax, (type dword timeStart)); mov( edx, (type dword timeStart[4])); mov( str1, esi ); mov( str1, edi ); stricmp(); jne Failure; mov( str1, esi ); mov( str2, edi ); stricmp(); jne Failure; mov( str1, esi ); mov( str3, edi ); stricmp(); jne Failure; mov( str2, esi ); mov( str3, edi ); stricmp(); jne Failure; mov( str3, esi ); mov( str2, edi ); stricmp(); jne Failure; mov( str1, esi ); mov( str4, edi ); stricmp(); je Failure; rdtsc(); sub( (type dword timeStart), eax ); mov( eax, (type dword timeEnd) ); sbb( (type dword timeStart[4]), edx ); mov( edx, (type dword timeEnd[4])); stdout.put( "Sequence 1: ", timeEnd, nl ); exit t; Failure: stdout.put( "Stricmp failed" nl ); end t; -- [kain] http://www.geocities.com/kahlinor
From: Randall Hyde on 9 Mar 2005 09:02 "Sevag Krikorian" <kahlinor(a)nop.yahoo.com> wrote in message news:397kmeF5tt5n7U1(a)individual.net... > > It's a work in progress, it may be that XOR (or SUB) > will do the trick. > > // force both registers to case neutrality > and ($dfdfdfdf, eax); > and ($dfdfdfdf, edx); > > // zero out any characters that are equal > xor (eax, edx); // or sub (eax, edx); Don't forget that you *must* check the individual characters (at least one in each string) to ensure that they are alphabetic before accepting them as "equal". You do *not* want to compare grave accent "`" and find that it is equal to "@" (for example). > > // if they are equal, z flag will be set > jz cmpLoop; > > There is one problem I see with this, if there > is no guarantee about the bytes after the zero- > termination, there might be a premature > no-match while matching the last dword of data. No, there is no guarantee about the characters following the zero byte. You must stop after finding a zero byte (or after processing "length" characters). > > Needs some more thought. Any quick way of checking > for an 8-bit lenght of zero in all of the 4 bytes > of a 32 bit register? Sure. The strlen optimizations that process four characters at a time do this, but that algorithm is only effective for long strings. > > Also, a snipplet: > > add( 4, ecx ); //Process four chars at once > cmp( ecx, ebp ); //Exceeded the string length? > jae theyMatched; //If so, the strings match. > > Is this correct? The routine doesn't seem to be checking > for the equality of the last dword. If this is true, then > it could be a remedy for the above problem: Note that the running length is initialized to -4. Cheers, Randy Hyde
From: Evenbit on 9 Mar 2005 20:58
Randall Hyde wrote: > Hi All, > > I'm currently in the process of correcting a defect in the > HLA Standard Library case insensitive string comparison > routines. While I'm at it, I'm trying to speed it up a bit. > I've written the following code: > > procedure stricmp; @nodisplay; @noframe; align(4); > begin stricmp; > > push( ebx ); > > > // Compare the two strings until we encounter a zero byte > // or until the corresonding characters are different. > > cmpLoop: > > mov( [esi], al ); > mov( [edi], ah ); > add( 1, esi ); > add( 1, edi ); > cmp( al, 0 ); > je notAlpha; > cmp( al, ah ); > je cmpLoop; OOoops!! I was under the impression there would be more register preservation code up in here. Now I know why my proggies are still buggy even after HLA stops complainning at me. Back to the drawing board... Nathan. |