comparing objects [Ruby]

Prev: [ANN] Money 3.0.2
Next: escape character with decimal value

From: Rein Henrichs on 11 Jun 2010 02:10

On 2010-06-10 22:52:14 -0700, Robert Dober said:

> On Thu, Jun 10, 2010 at 6:48 PM, Mark Abramov <markizko(a)gmail.com> wrote:
>> Robert Dober wrote:
>>> On Thu, Jun 10, 2010 at 5:30 PM, Rein Henrichs <reinh(a)reinh.com> wrote:
>>> overwriting #hash and #eql? breaks �Hash!
>> That's not true, I think.
> Judge for yourself
>
> require "forwardable"
>
> def count klass
> ObjectSpace.each_object( klass ).to_a.size
> end
> class N
> extend Forwardable
> attr_reader :n
> def_delegators :n, :hash
> def eql? otha
> n == otha.n
> end
> private
> def initialize n
> @n = n
> end
> end # class N
>
>
> h = { N.new( 42 ) => true }
> h[ N.new( 42 ) ] = 42
> p h
> GC.start
> p count(N)
>
> Cheers
> R.

This breaks Hash? Quite the opposite!

This is precisely what is meant by "defining the semantics" of a class
for use by hashes and the very behavior you want when you define #eql?
and #hash in the first place!

You wouldn't say that defining #<=> breaks Array#sort, so why would you
say that this "breaks Hash"? This doesn't break Hash. If anything, it
fixes it when using N objects as keys!
--
Rein Henrichs
http://puppetlabs.com
http://reinh.com

From: Robert Klemme on 11 Jun 2010 09:25

2010/6/11 Shot (Piotr Szotkowski) <shot(a)hot.pl>:
> Rein Henrichs:
>
>> #hash makes sense for Hash#[] and etc. #eql? makes more
>> sense for Array#&. I too find it odd that both are necessary.
>
> Both are necessary because #eql? says whether two objects are surely
> the same, while #hash says whether theyre surely different which,
> perhaps counterintuitively, is not the same problem.
>
> The difference is that in many, many cases its much faster to check
> whether two objects are surely different (via a fast #hash function)
> than whether theyre surely the same (#eql? can be quite slow).

This is not necessarily true. Any reasonable implementation of #eql?
will bail out as soon as it sees a difference. On the contrary, you
always need to look at the complete state of an instance to calculate
#hash. I can easily construct an example where #eql? beats #hash:

14:40:54 Temp$ ruby19 eql-test.rb
same
0.110000 0.000000 0.110000 ( 0.098000)
0.093000 0.000000 0.093000 ( 0.099000)
0.157000 0.000000 0.157000 ( 0.151000)
different early
0.093000 0.000000 0.093000 ( 0.101000)
0.094000 0.000000 0.094000 ( 0.096000)
0.000000 0.000000 0.000000 ( 0.000000)
different late
0.109000 0.000000 0.109000 ( 0.105000)
0.094000 0.000000 0.094000 ( 0.098000)
0.156000 0.000000 0.156000 ( 0.149000)
14:40:56 Temp$ cat eql-test.rb
require 'benchmark'
a1 = Array.new 1_000_000
a2 = Array.new 1_000_000
puts "same"
puts Benchmark.measure { a1.hash }
puts Benchmark.measure { a2.hash }
puts Benchmark.measure { a1.eql? a2 }
a1[0] = 1
a2[0] = 2
puts "different early"
puts Benchmark.measure { a1.hash }
puts Benchmark.measure { a2.hash }
puts Benchmark.measure { a1.eql? a2 }
a2[0] = a1[0]
a2[999_999] = 1
puts "different late"
puts Benchmark.measure { a1.hash }
puts Benchmark.measure { a2.hash }
puts Benchmark.measure { a1.eql? a2 }
14:40:58 Temp$

Notice also how #eql? with equal arrays is not much slower than #hash.

> The main difference betwen #eql? and #hash is that #hash can return the
> same value for objects that are not #eql? (but if two objects are #eql?
> then #hash must return the same value).
>
> An untested, and definitely not optimal
> (but hopefully simple) example follows. :)
>
> Imagine that you want to implement a new immutable string class, one
> which caches the string length (for performance reasons). Imagine also
> that the vast majority of such strings you use are of different lenghts,
> and that you want to use them as Hash keys.
>
>
> class ImmutableString
>
> def initialize string
> @string = string.dup.freeze
> @length = string.length
> end
>
> end
>
>
>
> Given the above assumptions, it might make sense for #hash to
> return the @length, while #eql? makes the proper comparison:
>
>
>
> class ImmutableString
>
> def hash
> @length

Bad hash implementation. Why don't you use String#hash?

> end
>
> alias eql? ==
>
> end
>
>
>
> This way in the vast majority of cases, when your ImmutableStrings will
> be considered for Hash keys, the check whether a given key exists will
> be very quick; only when two objects #hash to the same value (i.e.,
> when theyre not surely different) the #eql? is called to tell whether
> theyre surely the same.

If the set of attributes to be used for the specific comparison needed
in this thread is not the same as the set that we identify as keyish
for class User in general one cannot use User#eql? and User#hash for
quick set intersection. That's why I suggested to use a Struct for
key fields (which has proper #hash and #eql? built in).

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

From: Daniel Berger on 11 Jun 2010 12:29

On Jun 3, 4:29 pm, Anderson Leite <anderson...(a)gmail.com> wrote:
> How can I compare two objects and get true if some of his atributes are
> equals ?

include Comparable ?

Regards,

Dan

From: Robert Klemme on 11 Jun 2010 12:42

On 10.06.2010 18:27, Robert Dober wrote:
> On Thu, Jun 10, 2010 at 6:10 PM, Robert Klemme
> <shortcutter(a)googlemail.com> wrote:
>
>> http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html
>> http://blog.rubybestpractices.com/posts/rklemme/019-Complete_Numeric_Class.html
> I
> You define #eql? and #hash for your convenience. So good, so bad. My
> question simply was: Show my why *not* redefining #hash and #eql? will
> cause problems, because that was Wilson's statement. I am still
> waiting :(.

The advice to implement #eql? and #hash really only makes sense if
equivalence can reasonably be defined for a class and if instances of
that class should be used as Hash keys or in Set. If not at least
equivalence can be defined other than via identity (which is the
default) then it is perfectly reasonable to not override both methods
and go with the default implementation.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

From: Robert Dober on 11 Jun 2010 14:15

On Fri, Jun 11, 2010 at 6:47 PM, Robert Klemme
<shortcutter(a)googlemail.com> wrote:
> On 10.06.2010 18:27, Robert Dober wrote:
>>
>> On Thu, Jun 10, 2010 at 6:10 PM, Robert Klemme
>> <shortcutter(a)googlemail.com> wrote:
>>
>>> http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html
>>>
>>> http://blog.rubybestpractices.com/posts/rklemme/019-Complete_Numeric_Class.html
>>
>> I
>> You define #eql? and #hash for your convenience. So good, so bad. My
>> question simply was: Show my why *not* redefining #hash and #eql? will
>> cause problems, because that was Wilson's statement. I am still
>> waiting :(.
>
> The advice to implement #eql? and #hash really only makes sense if
> equivalence can reasonably be defined for a class and if instances of that
> class should be used as Hash keys or in Set. If not at least equivalence
> can be defined other than via identity (which is the default) then it is
> perfectly reasonable to not override both methods and go with the default
> implementation.
But that was *exactly* my point.

OP wanted to use Array#&, and Array#&, for a reason not too clear to
me, uses Object#eql? instead of Object#== I did discourage the
overloading of Object#eql? and Object#hash for *that purpose*.

If you want to change Hash then it is the right thing to do.
Now I might strongly disagree about if one should do that, but that is
rather OT and I would never have made such strong statements about
that issue.
However the technique you suggest is not to be put into non expert
hands as I tried to show with the memory leaking code above.

Cheers
Robert

>
> Kind regards
>
> robert
>
> --
> remember.guy do |as, often| as.you_can - without end
> http://blog.rubybestpractices.com/
>
>

--
The best way to predict the future is to invent it.
-- Alan Kay

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: [ANN] Money 3.0.2
Next: escape character with decimal value