From: Peter Duniho on
Tony Johansson wrote:
> I still mean that if the hash code is the same then you must have identical
> keys so I don't see any point why the
> framework must call the equals. ?
> Can somebody exaplin that ?

Three people have already explained that your assertion is wrong.

Think about it. If the hash code was really unique � that is, if only
one string ever has a given hash code � then we'd have a fabulous way to
compress _all_ strings down to 32 bits, no matter how long they are.

Hash codes necessarily can results in "collisions". That is, two
different objects having the same hash code. The fact that two objects
you check have the same hash code most certainly DOES NOT mean that you
have identical keys.

Pete
From: Arne Vajhøj on
On 05-06-2010 11:42, Tony Johansson wrote:
> "Arne Vajh�j"<arne(a)vajhoej.dk> skrev i meddelandet
> news:4c0a54ce$0$284$14726298(a)news.sunsite.dk...
>> On 05-06-2010 07:55, Tony Johansson wrote:
>>> First some background information before I ask the question.The text is
>>> just
>>> copied from a book that I'm reading
>>> The .Net Framework doesn't always understand equality as we do, however.
>>> For
>>> example, imagine you created class Fish
>>> that holds the name of the fish like below.
>>>
>>> public class Fish
>>> {
>>> string name;
>>>
>>> public Fish(string theName)
>>> {
>>> name = theName;
>>> }
>>> }
>>>
>>> Now if we create two instances of the Fish class with the same name, the
>>> Hashtable treats them as different objects, as shown in the following
>>> code
>>> example.
>>>
>>> Hashtable duplicates = new Hashtable();
>>> Fish key1 = new Fish("Herring");
>>> Fish key2 = new Fish("Herring");
>>> duplicates[key1] = "Hello";
>>> duplicates[key2] = "Hello";
>>> Console.WriteLine(duplicates.Count); // give 2
>>>
>>> The reason we have two items in the collection is because the Object
>>> class's
>>> implementation of GetHashCode creates a hash that is likely to be unique
>>> for
>>> each instance of a class. I can override the GetHashCode in the Fish
>>> class
>>> to try and let the Hashtable known they are equal.
>>>
>>> Here is the code that override the GetHashCode in class Fish
>>> public override int GetHashCode()
>>> {
>>> return name.GetHashCode();
>>> }
>>>
>>> If I return the hash of the fish's name, the two instances of the fish
>>> will
>>> have the same hash code.
>>> But is that enough for the Hashtable to determine they are identical
>>> objects. Unfortunately, no. If the Hashtable find two objects with the
>>> same
>>> hash, it calls their Equals method to see whether the two objects are in
>>> fact equal. Again, the default implementation of Objects.Equals will
>>> return
>>> false if the two objects are two different instances of the same class.
>>> So
>>> we need to also add an override of the Equals method to our Fish class.
>>>
>>> Now to my question: I mean that if the hash code is the same as in my
>>> example then the object are equal so I don't understand why the framework
>>> must call the equals.
>>>
>>> Can somebody give me an example when you have the same hashcode but the
>>> object is not equal.
>>
>> The logic is:
>> - If Equals return true it is the "same" object
>> - GetHashCode returns an int for all objects, if Equals return true then
>> GetHashCode must return the same value, but returning "good" values
>> from GetHashCode is only for good performance
>>
>> public override int GetHashCode()
>> {
>> return 1;
>> }
>>
>> is valid implementation.
>>
>> But it will make Hashtable lookup O(n) instead of O(1), so
>> it would be very bad for performance.
>>
>> You want a GetHashCode that returns many different values.

> I still mean that if the hash code is the same then you must have identical
> keys so I don't see any point why the
> framework must call the equals. ?
> Can somebody exaplin that ?

identical key means always identical hash code
identical hash code does not mean always identical keys
for a good hash function identical hash code means usually identical keys

But even with usually it is necessary to test with Equals to be sure.

Arne

From: Harlan Messinger on
Tony Johansson wrote:
> "Arne Vajh�j" <arne(a)vajhoej.dk> skrev i meddelandet
> news:4c0a54ce$0$284$14726298(a)news.sunsite.dk...
>> On 05-06-2010 07:55, Tony Johansson wrote:
>>> First some background information before I ask the question.The text is
>>> just
>>> copied from a book that I'm reading
>>> The .Net Framework doesn't always understand equality as we do, however.
>>> For
>>> example, imagine you created class Fish
>>> that holds the name of the fish like below.
>>>
>>> public class Fish
>>> {
>>> string name;
>>>
>>> public Fish(string theName)
>>> {
>>> name = theName;
>>> }
>>> }
>>>
>>> Now if we create two instances of the Fish class with the same name, the
>>> Hashtable treats them as different objects, as shown in the following
>>> code
>>> example.
>>>
>>> Hashtable duplicates = new Hashtable();
>>> Fish key1 = new Fish("Herring");
>>> Fish key2 = new Fish("Herring");
>>> duplicates[key1] = "Hello";
>>> duplicates[key2] = "Hello";
>>> Console.WriteLine(duplicates.Count); // give 2
>>>
>>> The reason we have two items in the collection is because the Object
>>> class's
>>> implementation of GetHashCode creates a hash that is likely to be unique
>>> for
>>> each instance of a class. I can override the GetHashCode in the Fish
>>> class
>>> to try and let the Hashtable known they are equal.
>>>
>>> Here is the code that override the GetHashCode in class Fish
>>> public override int GetHashCode()
>>> {
>>> return name.GetHashCode();
>>> }
>>>
>>> If I return the hash of the fish's name, the two instances of the fish
>>> will
>>> have the same hash code.
>>> But is that enough for the Hashtable to determine they are identical
>>> objects. Unfortunately, no. If the Hashtable find two objects with the
>>> same
>>> hash, it calls their Equals method to see whether the two objects are in
>>> fact equal. Again, the default implementation of Objects.Equals will
>>> return
>>> false if the two objects are two different instances of the same class.
>>> So
>>> we need to also add an override of the Equals method to our Fish class.
>>>
>>> Now to my question: I mean that if the hash code is the same as in my
>>> example then the object are equal so I don't understand why the framework
>>> must call the equals.
>>>
>>> Can somebody give me an example when you have the same hashcode but the
>>> object is not equal.
>> The logic is:
>> - If Equals return true it is the "same" object
>> - GetHashCode returns an int for all objects, if Equals return true then
>> GetHashCode must return the same value, but returning "good" values
>> from GetHashCode is only for good performance
>>
>> public override int GetHashCode()
>> {
>> return 1;
>> }
>>
>> is valid implementation.
>>
>> But it will make Hashtable lookup O(n) instead of O(1), so
>> it would be very bad for performance.
>>
>> You want a GetHashCode that returns many different values.
>>
>> Arne
>>
>
> Hi!
>
> I still mean that if the hash code is the same then you must have identical
> keys so I don't see any point why the
> framework must call the equals. ?
> Can somebody exaplin that ?

Then you aren't understanding the nature of a hash code. The number of
items in a hash table is going to be smaller, by orders of magnitude,
then the range of possible hash codes, and as long as the hash codes are
well distributed, collisions are going to be rare anyway. In addition,
the consequence of a collision when it does occur is very small.

Besides that, it isn't POSSIBLE to have what you're asking for because
the number of possible distinct object values is infinite, and the range
of available hash codes is finite.

Also, think of a dictionary with a separate tab (basically, a hash code)
for each letter of the alphabet. That helps you find a word faster,
right? Now imagine a dictionary with a separate tab for every word.
Would that be useful?
From: Harlan Messinger on
Peter Duniho wrote:
> Harlan Messinger wrote:
>> [...]
>> Besides that, it isn't POSSIBLE to have what you're asking for because
>> the number of possible distinct object values is infinite, and the
>> range of available hash codes is finite. [...]
>
> While it may seem as though modern PCs are capable of infinitely large
> data, it's not actually the case. :)

The formality of "subject to the physical limitations of the hardware"
is a little tedious sometimes. :-) *Conceptually*, of course, there is
no limit.

> It's true that the number of possible distinct object values is vastly
> larger than the number of available hash codes, it's still a finite number.
>
> Pete
From: Arne Vajhøj on
On 06-06-2010 16:46, Peter Duniho wrote:
> Harlan Messinger wrote:
>> [...]
>> Besides that, it isn't POSSIBLE to have what you're asking for because
>> the number of possible distinct object values is infinite, and the
>> range of available hash codes is finite. [...]
>
> While it may seem as though modern PCs are capable of infinitely large
> data, it's not actually the case. :)
>
> It's true that the number of possible distinct object values is vastly
> larger than the number of available hash codes, it's still a finite number.

..NET limits object size to 2 GB.

So there can definitely not be more than 256^2147483648
different objects.

And that is a finite number.

But also a very big number.

Arne