From: Frank Kotler on
arargh502NOSPAM(a)NOW.AT.arargh.com wrote:

....
> > mov( ax, bx );
>
> Why bother to move it?
> No matter what, ax is not needed anymore that I can see.

I thought that was a really good idea, but I ran into subtle
problems - if "c1" and "c2" are non-alpha, but differ by
20h, the final cmp can give a "false positive" if the
"converted" comparison is returned. If we delay the
"conversion" until after the "alpha" comparison, it works,
but we have to test for both "a - z" and "A - Z", or check
both al and ah. Haven't tested this, but I doubt if it's
faster.

> Also if use32 save a byte by "mov( eax, ebx );"

This definitely helps (and HLA is *always* use32).

> > and( $5f5f, bx );
> Could preload $5f5f into a reg

To my surprise, this seems slower... Using the 32-bit reg
helps, though.

>
> > cmp( bl, 'A' );
> > jb notAlpha;
> > cmp( bl, 'Z' );
> > ja notAlpha;
> What about bh?

I don't think we need to care about bh - if it isn't the
same "alphaness" as bl, we haven't got a match anyway.

> Could combine these two jmps to one using setcc and some more regs
> If you process bh that would be 4 jmps to one.

I haven't tried the code you posted using this method...

> > cmp( bl, bh );
> > je cmpLoop;
>
> If they are not equal is it possible that the cmp below can get an
> incorrect result in some cases?

I think it's okay... Returns the "unconverted" flags - that
is, 'a' > 'B'. Since we're doing insensitive, and
considering 'a' to be equivalent to 'A', it might make sense
to return the "converted" comparison, so 'B' > 'a'... Just
"converting" eax at this point should do that safely... I
think...

OTOH, the caller usually cares about match or not. If the
caller cares which is greater, it's probably for a "binary
search", and the return ought to match the way the data is
sorted... which might favor the "unconverted" comparison...

> > notAlpha:
> >
> > // At this point, we've either encountered a zero byte in the source
> > // string or we've encountered two bytes that are not the same in
> > // the two strings. In either case, return the result of the
> > // comparison in the flags.
> >
> > pop( ebx );
> > cmp( al, ah );
> > ret();
> >
> >end stricmp;

Best,
Frank
From: Randall Hyde on

"Frank Kotler" <fbkotler(a)comcast.net> wrote in message
news:422CFD3F.5AA1F13D(a)comcast.net...
>
> I'm not surprised that it's faster, since you're processing
> two bytes at once. I'm not sure it's really "safe" to do.
> What if the terminating zero butts right up against the end
> of memory you "own"? Attempting to read beyond the end of
> the string *might* cause a segfault or access violation.

Actually, HLA strings are guaranteed to be a multiple of four bytes
long. They almost always begin on a dword boundary, too (not
guaranteed, but for all HLA-compiler-generated strings this is
always true). Of course, the stricmp routine I'm trying to write
doesn't have to be fed HLA strings, but the truth is that end-users
aren't going to call stricmp, they call routines like str.ieq or
str.igt which, in turn, call stricmp, e.g.,

procedure str.igt( src1:string; src2:string ); @nodisplay; @noalignstack;
begin igt;

push( esi );
push( edi );

mov( src1, esi );
mov( src2, edi );
stricmp();
mov( 0, eax );
seta( al );

pop( edi );
pop( esi );

end igt;



One of the main reasons I've not gone crazy trying out more
than one byte at a time is because research (about 20 years
old, now, so it may be slightly out of date) suggests that the
average string found in a program is 10-15 characters long.
With such small strings, tricks that process strings in bulk
often wind up costing more than they save in the long run.
That's one reason why, for example, using MMX or SSE
for the operations is a complete waste of time (of course,
cleaning up after an MMX operation is *expensive* too,
not something you want in a general-purpose string
comparison.

I really should modify my tests to process a series of short
strings, to provide a better cost comparison against strings
you'd find in a typical program. It's rare that you get to
run through 256 bytes, as my examples have done.
Cheers,
Randy Hyde


From: Randall Hyde on

"Percival" <dragontamer5788(a)yahoo.com> wrote in message
news:pan.2005.03.07.05.35.17.948947(a)yahoo.com...
> On Mon, 07 Mar 2005 06:21:34 +0100, '\\o//'annabee wrote:
>
> > Pý Sun, 06 Mar 2005 23:39:02 GMT, skrev Randall Hyde
> > <randyhyde(a)earthlink.net>:
> >
> >> Hi All,
> >
> >
> > How does this compare ? Hope it is correct, as I was a bit tired when
> > writing it, and havent really tested it fully. On my machine, it seems
to
> > run about twice as fast, in theese test, under saturated circumstances.
> > (From Menu) Also, running on mediumlength normal strings, with a
> > "outsticking" char here and there, its seems to run about twice as fast.
> >
> > But anyway.... I am far to tired to confirm if its correctly running.
> > Later perhaps.

.....
> > FasterCmp:
> > Align 4
> > L0: mov dx wýeax | cmp dh 0 | je L8> | cmp dl 0 | je L9> | add ebx 2
|
> > add eax 2 | cmp dx wýebx-2 | je L0<
> > mov cx wýebx-2
> > and dx 0_5f5f
> > and cx 0_5f5f
> > add dx 02020
> > add cx 02020
> > cmp dx cx | je L0<
> > ret
> > L8:
> > cmp dl býebx | jne L9>
> > cmp býebx+1 0
> > L9:
> > ret
> >
>
> *bets Wannabee is kill filed* --- Forwarding message.

Yes, indeed.

Unfortunately, AFAICT, there is a minor problem with the "FasterCmp"
routine.
It doesn't explicitly check the characters to see if they are in the range
"a".."z" or "A".."Z". Therefore, it thinks that characters like ` and @
are equivalent, '[' and '{' are equivalent, and ctrl-A maps to '!' (and so
on).
Nice try, though.
I've tried to process two or four characters at a time, thus far I've not
been successful. Particularly for shorter strings the overhead eats you
alive.
Cheers,
Randy Hyde




From: Randall Hyde on

"Octavio" <933191278(a)terra.es> wrote in message
news:a3b1e869.0503071643.3653e1a4(a)posting.google.com...
> "Randall Hyde" <randyhyde(a)earthlink.net> wrote in message
news:<qwMWd.2584$oO4.1429(a)newsread3.news.pas.earthlink.net>...
> > I'm also wondering about the results to return when the strings
> > are not equal. In particular, the algorithm I've provided compares
> > the original characters and sets the flags if those characters do
> > not match. Therefore, 'a' > 'B'. I'm wondering if this is reasonable,
> > or if I should return the comparisions of the "converted" characters
> > (if they were uppercase)? I don't really care what *other* languages
> > (e.g., C/C++) do, I'm interested in a discussion of what is reasonable
> > to do here.

> I think is reasonable to compare the converted characters, and also a
> table lookup must be used to support chars like 'ýý'.

This is just an ASCII string comparison routine, it doesn't need to support
characters outside the range 0..$7F.

> About the speedup, i think it must be done elsewere, the table lookup
> can not be optimized very much.

No doubt about that. If the table is not in cache, accessing the table
kills you (and pollutes your cache, too). Granted, if you need to compare
eight-bit character codes and they're not just ASCII values, a lookup
table is probably the best way to go.

Your comment did point out a defect, though. The AND instruction should
be "and( $BFBF, ebx);" rather than $5F5F. No sense matching characters
with their H.O. bits set if a non-ASCII character happens to come along.
Cheers,
Randy Hyde


From: Randall Hyde on

"Frank Kotler" <fbkotler(a)comcast.net> wrote in message
news:422CFD3F.5AA1F13D(a)comcast.net...
>
> You seem to be forcing characters to uppercase whether
> they're alpha or not - this could result in a "false
> positive" match.

Exactly.
You *must* compare the characters to see if they are in the range 'a'..'z'
or 'A'..'Z'.

I'm thinking there *might* be a way to remove a branch by subtracting
'A' and comparing against 26, but we'll see how that turns out.
Cheers,
Randy Hyde