|
Prev: free directory submission
Next: How to write an server application with high performance and high availability.
From: Rex Mottram on 13 Apr 2008 07:54 Austin Appleby wrote: > No joke. I found a mix function that's much faster and mixes even > better - no differentials, no weaknesses with sparse keys. I'm not a hash expert or a mathematician, just a programmer looking for a fast, non-cryptographic hash with good distribution for use as a "signature" for files. MurmurHash sounds ideal for that. I've previously tested with CRC32 and found it to be slow, and since MurmurHash is reputed to be ~5 times faster and to provide a better distribution, I downloaded the latest version and tried it out. Unfortunately, for every test case I run it measures around half as fast as CRC32, which is about an order of magnitude slower than the comparison table shows. I would really love to solve this mystery. There's nothing complicated here - I wrote up a little test case which maps a large file into memory, calls a hash function on it, and reports how long it took to finish. I've downloaded and plugged into it all three versions of MurmurHash (regular, aligned, endian-neutral). I also threw in SuperFastHash for comparison's sake. The CRC32 routine I'm using is the one from zlib. All three versions of MH, as well as SFH, measure as slower than CRC32. This has been tried on Sun/sparc, Windows/Intel, and Mac/Intel and the results do not differ appreciably. It's almost as if the crc32() function in zlib is 10 times faster than the version MH was benchmarked against, but that seems highly unlikely. So if anyone has ideas I'd like to hear them. The test case is attached. It's 400-500 lines but most of that is just copies of MH and SFH, plus a main program that maps, hashes, and unmaps. It should build easily on Unix or Windows. I run it on a file of ~1GB and get timings in the 1-2 minute range. Thanks, RM
From: CBFalconer on 13 Apr 2008 11:44
Rex Mottram wrote: > Austin Appleby wrote: > >> No joke. I found a mix function that's much faster and mixes even >> better - no differentials, no weaknesses with sparse keys. > .... snip ... > > All three versions of MH, as well as SFH, measure as slower than > CRC32. This has been tried on Sun/sparc, Windows/Intel, and > Mac/Intel and the results do not differ appreciably. It's almost > as if the crc32() function in zlib is 10 times faster than the > version MH was benchmarked against, but that seems highly unlikely. > > So if anyone has ideas I'd like to hear them. The test case is > attached. It's 400-500 lines but most of that is just copies of MH > and SFH, plus a main program that maps, hashes, and unmaps. It > should build easily on Unix or Windows. I run it on a file of ~1GB > and get timings in the 1-2 minute range. The following generic hash functions are supplied with the hashlib package for optional use, and are usually quite satisfactory. The complete package is available at: <http://cbfalconer.home.att.net/download/> /* ============= Useful generic functions ============= */ /* NOTE: hash is called once per operation, while rehash is called _no more_ than once per operation. Thus it is preferable that hash be the more efficient. */ /* Hash a string quantity */ unsigned long hshstrhash(const char * string) { unsigned long h; h = 0; while (*string) { h = h * 31UL + (unsigned char) *string++; } return h; } /* hshstrhash */ /* 1------------------1 */ /* ReHash a string quantity */ unsigned long hshstrehash(const char * string) { unsigned long h; h = 0; while (*string) { h = h * 37UL + (unsigned char) *string++; } return h; } /* hshstrehash */ -- [mail]: Chuck F (cbfalconer at maineline dot net) [page]: <http://cbfalconer.home.att.net> Try the download section. ** Posted from http://www.teranews.com ** |