From: Johannes Baagoe on
In the process of refining my PNRGs, I stumbled on a problem: the
lack of good hash functions. One does not need them that often in
javascript, since in most of their uses in C or other lower-level
languages, Objects provide the required association between arbitrary
strings and whatever one wants to look up, or count, or whatever.
But sometimes, they are useful, in my case, to seed the PRNGs.

I first thought I could use well-established methods, like the
Fowler–Noll–Vo family of hashes, which seemed to meet the needs.

But that does not work: 32-bit FNV supposes that multiplications of
two 32-bit integers will retain the 32 lowest bits of the result.

This is not the case with ECMAScript Numbers, which are always
"double-precision 64-bit binary format IEEE 754 values". That implies
1. that in the case of overflow, the *lower* bits get discarded,

and 2. that one never retains more than 53 significant bits - 54 if you
count the sign.

(For the same reason, one cannot implement the classic LCG PRNG in
a naive way. Marsiglia's "easy to remember" multiplier 69069 is all
right, since the result of its multiplication by a 32-bit integer
will always fit in 53 bits. But 1103515245, which is widely used,
leads to overflow, which ruins the modulo 2^32 assumption.)

As a further, minor complication, FNV and the like still assume that
characters are bytes. ECMAScript String.prototype.charCodeAt returns
"a Number (a nonnegative integer less than 2^16)".

I have therefore attempted to write a general purpose hash function
for ECMAScript. It is called with `hash(data, n)`, where data is
whatever you want to hash, and the optional n the intended number
of buckets. (If you do not provide n, the result is a Number in
[0, 1[, just like Math.random(), but of course with a deterministic
relationship to the data, and with 53 bits precision.) The results
should be well distributed: changing just one bit in the data should
usually result in the change of about half the 32 first bits of the
result, and somewhat less in the 21 lowest.

The theory will be explained on my Random Musings pages Really
Soon Now. In the meantime, you are welcome to try to figure it out,
and to prove me wrong when I claim that it distributes *any* data
over *any* number of buckets in a suitably pseudo-random way :)
(I haven't tested it very extensively yet, so you might succeed.)

Here is the code:

function hash(data, n) {
var norm = Math.pow(2, -32);
var a = 2095533;
var s = 0, c = 1;
var t, t0 = 0;

data = data.toString();
for (var i = 0; i < data.length; i++) {
s -= data.charCodeAt(i) * 65537 * norm;
if (s < 0) {
s += 1;
}
t = a * s + c * norm;
t0 = s = t - (c = t | 0);
t = a * s + c * norm;
s = t - (c = t | 0);
}

if (n) {
return Math.floor(n * (s + t0 * norm));
} else {
return s + t0 * norm;
}
}

--
Johannes
From: Scott Sauyet on
Johannes Baagoe wrote:
> In the process of refining my PNRGs, I stumbled on a problem: the
> lack of good hash functions. [ ... ]
>
> I first thought I could use well-established methods, like the
> Fowler–Noll–Vo family of hashes, which seemed to meet the needs.
>
> But that does not work [ ... ]

Have you had an experience of unexpected collisions with FNV?

-- Scott

From: Johannes Baagoe on
Scott Sauyet :

> Have you had an experience of unexpected collisions with FNV?

No, what I experienced was totally different outputs from the javascript
and the C versions of my PRNGs. Which, retrospectively, is hardly
surprising - things happen very differently in the two languages when
you multiply a 32-bit non negative integer by 16777619 !

That being said, it should not be too difficult to construct a horrible
test case for a naive port of 32-bit FNV to javascript. It is not among my
priorities just now, but I may give it a try some day if nobody else does
it before.

NB: FNV works just fine on languages that have proper 32-bit integers, and
it could, obviously, be ported properly to javascript by the means of a
multi-precision multiplication routine. But one cannot simply cut and
paste, and change all the "Fnv32_t"s to "var"s... Which is essentially
what I did, and I would be surprised if I were the only one ever to do so.

--
Johannes
From: nick on
On Apr 14, 4:40 pm, Johannes Baagoe <baa...(a)baagoe.com> wrote:
> [...]
> It is called with `hash(data, n)`, where data is
> whatever you want to hash, and the optional n the intended number
> of buckets.
> [...]

Thanks Johannes! This looks incredibly useful; I was just looking for
something like this that doesn't require a thousand lines of code and
a crazy lookup table. I'll definitely be making use of this.

A question: I thought it would be cool to see this thing generate
string hashes. Here's what I came up with... please let me know if I'm
inadvertently breaking the rest of your code with these additions? I
don't fully understand your algorithm but I think this should be ok.

function hash(data, n, asString) {
var norm = Math.pow(2, -32);
var a = 2095533;
var s = 0, c = 1;
var t, t0 = 0;
var r;
data = data.toString();
for (var i = 0; i < data.length; i++) {
s -= data.charCodeAt(i) * 65537 * norm;
if (s < 0) {
s += 1;
}
t = a * s + c * norm;
t0 = s = t - (c = t | 0);
t = a * s + c * norm;
s = t - (c = t | 0);
}
if (n) {
r = Math.floor(n * (s + t0 * norm));
} else {
r = s + t0 * norm;
}
// return as string
if (asString) {
if (n) r = 1 / (r + 1.123);
r = r.toString(36).substr(2);
}
return r;
}

Thanks again!

-- Nick
From: Johannes Baagoe on
nick :

> A question: I thought it would be cool to see this thing generate string
> hashes. Here's what I came up with... please let me know if I'm
> inadvertently breaking the rest of your code with these additions? I
> don't fully understand your algorithm but I think this should be ok.

> function hash(data, n, asString) {

[...]

> // return as string
> if (asString) {
> if (n) r = 1 / (r + 1.123);
> r = r.toString(36).substr(2);
> }
> return r;
> }

It doesn't break anything, obviously, but just out of curiosity: why do
you need that ?

--
Johannes