Generic compare function [C++]

Prev: We Wait For Thee: char16_t, char32_t.
Next: Use of facet codecvt for indenting output text

From: Le Chaud Lapin on 7 Jan 2010 14:51

On Jan 7, 12:06 pm, Francis Glassborow
<francis.glassbo...(a)btinternet.com> wrote:
> Adam Badura wrote:
> > Why C++ does not contain a generic compare function?
[snip]

I agree with Adam. This is the method I use exclusively in all my code
for any situation that requires comparison.

> I wonder if you have thought this through.

I have, quite a bit. :)

I had been using the typical operator < and operator > inside my set-
like containers and many other situations such as comparison in
Microsoft's Namesspace Extension IShellFolder::CompareIDs function...

http://msdn.microsoft.com/en-us/library/bb775062(VS.85).aspx

...., which is implemented by the programmer such that, given two
"things", conceptually, two icons in the shell, it determines the
relative ordering of those things as defined by the programmer, so
that the shell can sort, find, etc. the items (icons).

It turns out that having a compare function of the kind proposed by
Adam which returns -, 0, +, for <, ==, >; respectively; fits nicely
into the model prescribed by Microsoft's API. And there is a good
reason: it is optimal, both mechanically, and conceptually, IMHO.

I call my comparison function "signum()".

This is not the name of one particular function, but a family of
functions. It is also a principle, scattered throughout my code. It is
defined for primitive types as well as friend function of every class
for which comparison makes sense.

> How would you actually use such a function?

Like this:

friend signed int signum (const String &, const String &);

bool operator == (const String &that) const {return signum (*this,
that) == 0;}
bool operator != (const String &that) const {return signum (*this,
that) != 0;}
bool operator > (const String &that) const {return signum (*this,
that) > 0;}
bool operator < (const String &that) const {return signum (*this,
that) < 0;}
bool operator >= (const String &that) const {return signum (*this,
that) >= 0;}
bool operator <= (const String &that) const {return signum (*this,
that) <= 0;}

> How would it be implemented more efficiently than we
> can do for ourselves?

It would eliminate a superfluous comparison in STL set-like
containers, which, in extreme cases, causes execution time to double,
because signum() checks once, and STL checks twice.

You can see in the following STL code from VS2008 that the _Pred()
function is being invoked twice for a comparison:

template<class _Pr, class _Ty1, class _Ty2> inline
bool __CLRCALL_OR_CDECL _Debug_lt_pred(_Pr _Pred, const _Ty1& _Left,
const _Ty2& _Right,
const wchar_t *_Where, unsigned int _Line)
{ // test if _Pred(_Left, _Right) and _Pred is strict weak ordering
if (!_Pred(_Left, _Right))
return (false);
else if (_Pred(_Right, _Left))
_DEBUG_ERROR2("invalid operator<", _Where, _Line);
return (true);
}

With signum(), using my own equivalents of set<>, map<>, etc., the
comparison is invoked only once.

> How would you determine which ordering was to be
> used (many types have more then one logical ordering, and often include
> equivalence sets)? Even std::string has multiple orderings depending
> on the collation rules. How would you deal with types that have no
> ordering? What about types that have equivalence sets but no ordering?

Interesting question. :)

1. A few months ago, I asked a String expert how I should approach the
problem of comparing strings while redesigning my own international
UNICODE String class. My gut said to follow a strict rule whereby two
objects, of any kind, including String, would always be able to
determine their relative ordering, based on their mutual state,
without any external context. There would be no opportunity to supply
a comparison function. But this person had extensive experience with
UNICODE and C++, and a much broader context in general, so I was
practically committed to taking his advice before the conversation
even started.

He said that context-free comparison was a bad idea because I would
create a mess from which I would never be able to extract myself.

I mentioned that I had heavy nested template code that did not take a
comparison function as argument, that the complexity of which would
manifest even more drastically by the addition of a "helper" functions
for needy classes like String, etc.

He said it was what it was, and so I agreed to follow his advice, as I
had no interest in tackling the "String" problem, but hours of post-
conversation fidgeting, just to be sure, became days and days became
weeks.

Today, I am convinced that having a self-contained "fat" string class
whose objects know how to compare themselves without help from an
external helper function is one of the best design decisions that I
have ever made in C++.

> It may seem a very simple thing to do but in fact it is very complicated
> and far from trivial to provide generically.

True, it was not trivial, given all the rules of UNICODE, plus the
performance hit when those rules are actually followed.

And to see the size of the code base of ICU...

http://site.icu-project.org/

....does not instill confidence.

In fact, I intended to use ICU directly en lieu of my own String class
a few months ago. But it had several issues, mostly having to do with
"keeping one foot on the dock, the other on the boat", with regard to
transitioning from C to C++, like no exceptions and a member function
whose job it was to test if its object was in a bad state, so I decide
to write my own, based on the principles of UNICODE collation:

http://unicode.org/reports/tr10/

....using ICU as a crutch.

In the end, I concluded that our fixation with skinny string classes
is over-rated.

If the goal is mental relief and true universal applicability, it is
best to let the String class be what it is: inherently fat.

If the goal is to have a sequence of bytes that can be compared using
a helper function and otherwise fiddled-with, then certainly we
already know how to do that.

-Le Chaud Lapin-

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Goran on 7 Jan 2010 20:44

On Jan 8, 8:51 am, Le Chaud Lapin <jaibudu...(a)gmail.com> wrote:
> You can see in the following STL code from VS2008 that the _Pred()
> function is being invoked twice for a comparison:
>
> template<class _Pr, class _Ty1, class _Ty2> inline
> bool __CLRCALL_OR_CDECL _Debug_lt_pred(_Pr _Pred, const _Ty1& _Left,
> const _Ty2& _Right,
> const wchar_t *_Where, unsigned int _Line)
> { // test if _Pred(_Left, _Right) and _Pred is strict weak ordering
> if (!_Pred(_Left, _Right))
> return (false);
> else if (_Pred(_Right, _Left))
> _DEBUG_ERROR2("invalid operator<", _Where, _Line);
> return (true);
> }

Hmmm... I think MS people should check this. That second comparison
should be out of the picture in optimized builds, because it only has
debugging purposes. Quick look at sources tells me that ain't the
case. Implementation quality issue?

Goran.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Seungbeom Kim on 7 Jan 2010 20:44

Tony Delroy wrote:
> As Adam wrote, it's useful when comparing two containers (e.g.
> strings), where you scan over some long common part, then find a
> distinguishing character. There's a fair chance that you may know
> exactly which of <, == and > apply at that point, so it would be nice
> to have a good way to return that information. If you only indicate
> say <, then a second comparison may waste time skipping the early
> common elements before returning to the distinguishing ones....

The standard library provides std::less as the entry point for a
general comparison, and uses it as the default in many places.
Your point then, as I understand it, is why is it a two-way comparison
instead of a three-way one, and why does std::sort take a 'less'
function returning bool (true/false) instead of a 'compare' function
returning int (<0/=0/>0) that the good old qsort uses; is this correct?

>
> Many existing "int compare(... a, ... b)" functions only promise to
> return a negative, 0 or positive value, as subtracting elements may
> produce that in one step more efficiently than guaranteeing -1, 0, or
> 1. The latter scheme does make it easy to switch on the resultant
> value though...

A comparison primitive had better be more efficient, IMO. When you
need a 'stricter' version returning one of the three values, you can
always use a separate sign function.

--
Seungbeom Kim

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Tony Delroy on 7 Jan 2010 20:42

(Sorry... hit "send" by accident on the post to which this replies.
Please ignore everything from "f it helps the compiler a single"
onwards....)

Anyway, I demonstrated an efficiency issue with current C++, in that
sometimes compilers repeat a comparison when the only difference from
an earlier one was a test for == instead of <...

return argc < 7 ? 42 : argc == 7 ? 53 : 64;

g++ -S -O3 compare.cc # ver. 4.4.1

cmpl $6, %edi
movl $42, %eax
jle .L3
movb $64, %al
cmpl $7, %edi
movl $53, %edx
cmove %edx, %eax
..L3:
rep
ret

Here, cmpl $6 is followed by a cmpl $7. Perhaps a builtin less-than/
equal-to-/greater-than comparison would make it easier for compilers
to perform multiple tests on the flags set by a single comparison, but
perhaps it's an overlooked optimisation that could be easily done
already, some other compilation flags would have enabled it, or the
resultant code just happens to be slower on the target CPU here (i7
920 / Ubuntu Karmic Koala).

For ASCIIZ strings, the case is more clear cut, as a common substring
may need to be rescanned for the second comparison.

<francis.glassbo...(a)btinternet.com> wrote:
> ... many types have more then one logical ordering, and often include
> equivalence sets)? Even std::string has multiple orderings depending
> on the collation rules. How would you deal with types that have no
> ordering? What about types that have equivalence sets but no ordering?
> It may seem a very simple thing to do but in fact it is very complicated
> and far from trivial to provide generically.

That's true, but no more true than for the operators that do enjoy
language support. An overloadable operator allows customisation to
the exact data stored.

Cheers,
Tony

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Piotr Wyderski on 8 Jan 2010 01:39

Francis Glassborow wrote:

> I wonder if you have thought this through. How would you actually use
> such a function?

Yes, it should be a default base for std::less and the rest of the family.
To be more specific, there should be two functions:

std::equal_to<T>
std::compare<T>

The defaut implementation of std::less, greater, greater_equal and
less_equal
should reduce to std::compare. std::equal_to also, if not explicitly
provided,
and not_equal_to should be reduced to equal_to. That is because determining
equality is often much faster than a full three-way comparison (and does not
imply a partial order std::less requires). If one can implement std::less,
then
he is also able to implement std::compare using two less-es. On the
contrary,
std::less, greater etc. always reduce to a single invocation of compare, so
there is no need to restore the lost information.

> It may seem a very simple thing to do but in fact it is very complicated
> and far from trivial to provide generically.

No, I use it in my program and it turns out to be very useful.

Best regards
Piotr Wyderski

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: We Wait For Thee: char16_t, char32_t.
Next: Use of facet codecvt for indenting output text