From: gast128 on
On 2 apr, 13:45, Goran <goran.pu...(a)gmail.com> wrote:
> On Mar 31, 6:43 pm, "gast...(a)hotmail.com" <gast...(a)hotmail.com> wrote:
>
> > Hello all,
>
> > I am not sure if this is the correct newsgroup, but the information
> > might be helpful.
>
> > The (stdext::) hash_map in combination with strings (used on VStudio
> > 2003 / vc7.1) does not have a good performance. It's actual slower
> > than std::map<string, ...>. I noticed however that changing the hash
> > function (with boost::hash_value for example) does make it fast again.
> > Perhaps there is an issue with the hash function? A small test with
> > random strings (up to the a length of 2000) gave the following
> > results:
>
> > stdext::hash_value with string:
> > Mean: 3.73593e+009
> > Sum: 373592873859453
> > Min: 3735928607
> > Max: 3735928779
> > Var: 919.648
>
> > boost::hash_value with string:
> > Mean: 2.67717e+009
> > Sum: 267716741970601
> > Min: 1230214564
> > Max: 4217158627
> > Var: 2.11744e+018
>
> > It seems that stdext::hash_value has a very narrow distribution, which
> > may the root of the bad performance.
>
> > note: I am aware of the tr1 / boost unordered_map which will superceed
> > it. However this still may be an important issue for developpers who
> > dropped hash containers due to bad experience with above combination
> > (like myself).
>
> Well, what is your use-case? If it's slower than std::map as it is,
> why are you even bothering using a hash map (that is, what, if
> anything, do you gain over a lowly map for your use-case with the
> boost's hash)?
>
> Also, a reminder for others who might be reading this:
>
> 1. any algo is fast for a small value of n (related empirical fact:
> optimization is the root of all evil)
> 2. hardware likes linear searching due to better locality of
> references
> 2. think about your data first. Do you want/will want it ordered? If
> so, use an ordered container (set or map).
> 3. did you verify that hashing really is significantly more performant
> than an ordered container?

{ edits: quoted blank line block, signature, banner and Dutch (?) line removed.
please keep readers in mind when you quote. thank you. -mod }

I think this is somewhat besides the case I am rying to make. In many
books can be found that hash maps often out perform balanced trees.
With that knowledge I used the hash_map with strings years ago (in
some kind of q&d profiler which needed fast lookup). However I noticed
that it had a bad performance, so I dropped the hash_map and used the
standard map. Also I was wondering if this orginal claim was correct.

Until recently I started to use the boost::unordered_map and noticed
its fast perfromance. It even outperformed a sorted vector (with
contiguous memory which should be fast). So I was wondering why this
hash_map was slow. According to my obsevation now it isn't, only it is
negatively affected by its default hasher.

Maybe this is also something from Mr. Plauger to react on, but he may
have already dropped the hash_map (with its default hasher) in favor
of the tr1::unordered_map.

And yes everything is dependent on the size of the containers. However
bad performance is often caused by large containers and O(n^2)
behavior. In typically small container contexts it doesn't make much
difference what u use. Often the linear std::find with std::vector has
then the best performance. That will change as soon as that container
gets heavily loaded.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]