Unhashing int64 to string [Theory]

Prev: Coins on chess-board puzzle
Next: Math/CompSci Interview Question - Thoughts?

From: Isliguezze on 19 Nov 2009 13:44

A classic string hashing algorithm can be simply described by the
following code (C++ like pseudo code)

int64 hash(string s)
{
int64 hash = 5381;

for (i = 0; i < s.length; i++)
hash = (hash * 33) XOR s[i];

return hash;
}

Is there any known approach to "unhash" a string (consisting no more
than of 20 elements) if its hash code ran out of bounds of 2<sup>64</
sup> and was "cut".

Thanks in advance.

From: Tim Little on 19 Nov 2009 18:34

On 2009-11-19, Isliguezze <isliguezze(a)gmail.com> wrote:
> Is there any known approach to "unhash" a string (consisting no more
> than of 20 elements) if its hash code ran out of bounds of 2<sup>64</
> sup> and was "cut".

There are only 2^64 possible hash values, but about 2^160 possible
original strings. The hash value can only tell you which set of about
2^96 strings the original string came from.

That said, you may have some information about the original string
that would allow you to distinguish it from the other 2^96 or so
theoretical possibilities. For example, it may be much more likely to
be natural-language text, or usually shorter than 20 characters.

This hash algorithm is cryptographically very weak. For example,
multiplication by 33 does nothing at all to the bits 0-4 of the hash,
and so these bits are all just the XOR of the corresponding bits in
the input (and the initial hash value). The upper bits depend only on
the earlier characters. There are also plenty of more subtle
weaknesses.

If the string were known to be a fragment of natural language text, it
might be totally reconstructed. Otherwise it would probably not be
possible, but you could quite rapidly construct strings with the same
hash.

- Tim

|
Pages: 1
Prev: Coins on chess-board puzzle
Next: Math/CompSci Interview Question - Thoughts?