From: Rex Mottram on
Austin Appleby wrote:
> No joke. I found a mix function that's much faster and mixes even
> better - no differentials, no weaknesses with sparse keys.

I'm not a hash expert or a mathematician, just a programmer looking for
a fast, non-cryptographic hash with good distribution for use as a
"signature" for files. MurmurHash sounds ideal for that. I've previously
tested with CRC32 and found it to be slow, and since MurmurHash is
reputed to be ~5 times faster and to provide a better distribution, I
downloaded the latest version and tried it out. Unfortunately, for every
test case I run it measures around half as fast as CRC32, which is about
an order of magnitude slower than the comparison table shows. I would
really love to solve this mystery.

There's nothing complicated here - I wrote up a little test case which
maps a large file into memory, calls a hash function on it, and reports
how long it took to finish. I've downloaded and plugged into it all
three versions of MurmurHash (regular, aligned, endian-neutral). I also
threw in SuperFastHash for comparison's sake. The CRC32 routine I'm
using is the one from zlib.

All three versions of MH, as well as SFH, measure as slower than CRC32.
This has been tried on Sun/sparc, Windows/Intel, and Mac/Intel and the
results do not differ appreciably. It's almost as if the crc32()
function in zlib is 10 times faster than the version MH was benchmarked
against, but that seems highly unlikely.

So if anyone has ideas I'd like to hear them. The test case is attached.
It's 400-500 lines but most of that is just copies of MH and SFH, plus a
main program that maps, hashes, and unmaps. It should build easily on
Unix or Windows. I run it on a file of ~1GB and get timings in the 1-2
minute range.

Thanks,
RM
From: CBFalconer on
Rex Mottram wrote:
> Austin Appleby wrote:
>
>> No joke. I found a mix function that's much faster and mixes even
>> better - no differentials, no weaknesses with sparse keys.
>
.... snip ...
>
> All three versions of MH, as well as SFH, measure as slower than
> CRC32. This has been tried on Sun/sparc, Windows/Intel, and
> Mac/Intel and the results do not differ appreciably. It's almost
> as if the crc32() function in zlib is 10 times faster than the
> version MH was benchmarked against, but that seems highly unlikely.
>
> So if anyone has ideas I'd like to hear them. The test case is
> attached. It's 400-500 lines but most of that is just copies of MH
> and SFH, plus a main program that maps, hashes, and unmaps. It
> should build easily on Unix or Windows. I run it on a file of ~1GB
> and get timings in the 1-2 minute range.

The following generic hash functions are supplied with the hashlib
package for optional use, and are usually quite satisfactory. The
complete package is available at:

<http://cbfalconer.home.att.net/download/>

/* ============= Useful generic functions ============= */

/* NOTE: hash is called once per operation, while rehash is
called _no more_ than once per operation. Thus it
is preferable that hash be the more efficient.
*/

/* Hash a string quantity */
unsigned long hshstrhash(const char * string)
{
unsigned long h;

h = 0;
while (*string) {
h = h * 31UL + (unsigned char) *string++;
}
return h;
} /* hshstrhash */

/* 1------------------1 */

/* ReHash a string quantity */
unsigned long hshstrehash(const char * string)
{
unsigned long h;

h = 0;
while (*string) {
h = h * 37UL + (unsigned char) *string++;
}
return h;
} /* hshstrehash */


--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

** Posted from http://www.teranews.com **