From: Volker Hetzer on
Sergei schrieb:
> Volker Hetzer wrote:
>> Sergei schrieb:
>>> But if SHA-1 is a uniformly distributed function, how the Chinese guys
>>> could claim that it is possible to find a collision in 2^63 hash
>>> operations?
>> Because it's pseudorandom only. This has nothing to do with the collision
>> probability of two randomly chosen sets of data.
> Sure. But what random data has to do with "few doesns of __terabytes__
> of data", which should be partishioned in blocks and hashed?
So far I have assumed that they don't contain data generated for the
specific purpose of defeating MD5 or SHA1. If this assumption is true,
they are random enough when seen from the perspective of the hash
function.

Btw, you are curiously insistent about that terabyte stuff. That's
totally irrelevant for this discussion. What /is/ relevant is the
number of blocks. Which you haven't told us yet.
Whether you hash two 64 byte blocks or two 512 petabyte blocks
doesn't change the collision probability of the generated hashes.
SHA1 is specified up to a block size of two exabytes.

By the way, you still haven't said what kind of data and what you
want to do with the hashes.

Also, it doesn't matter how "random" the data or the hashes are.
You will have to deal with collisions anyway.
For statistical purposes the only difference between SHA-whatever
and MD5 is the hash length.

Lots of Greetings!
Volker
--
For email replies, please substitute the obvious.
From: Mike Amling on
Carlos Moreno wrote:
> Mike Amling wrote:
>
>> You can expect approximately one collision among 2^64 (resp. 2^80)
>> different inputs, because that many messages generates 2^128 (resp.
>> 2^160) pairs.
>
> Now my turn to nitpick!! :-)
>
> They generate 2^127 (resp. 2^159) pairs (actually, approximately;
> the exact number should be 2^N * (2^N + 1) / 2 (because we're
> talking unordered pairs --- the pair (M2, M1) does not count if we
> already counted the pair (M1, M2), given that the order does not
> make any difference to whether or not there's a collision.

D'0h! Of course! (Note to all true Finns: You may read this as "Of
cause!" if you wish.)

--Mike Amling
From: Unruh on
"Joseph Ashwood" <ashwood(a)msn.com> writes:

>"Sergei" <silentser(a)gmail.com> wrote in message
>news:1165441615.135737.121220(a)f1g2000cwa.googlegroups.com...

>> But if SHA-1 is a uniformly distributed function, how the Chinese guys
>> could claim that it is possible to find a collision in 2^63 hash
>> operations?



>I'll show how by using something we can prove if uniformly distributed, but
>is cryptographically weak enough for the process to be seen; modular
>division by 2^64, let's call it F(x). It is trivial to prove that the output
>of F is uniformly distributed across the integers [0,2^64-1]. However, we

It is uniformly distributed across any uniformly distributed
input with lenth greater than 64 bits.
>can also trivially compute x's that collide by simply adding 2^64 to the
>previous x. F is perfectly uniform across all integers x, but is trivially
>weak.

>Uniform distribution among the output set is not enough for cryptographic
>strength.
> Joe


From: Unruh on
"Sergei" <silentser(a)gmail.com> writes:

>Joseph Ashwood wrote:
>> "Sergei" <silentser(a)gmail.com> wrote in message
>> news:1165441615.135737.121220(a)f1g2000cwa.googlegroups.com...
>>
>> > But if SHA-1 is a uniformly distributed function, how the Chinese guys
>> > could claim that it is possible to find a collision in 2^63 hash
>> > operations?
>>
>> I'll show how by using something we can prove if uniformly distributed, but
>> is cryptographically weak enough for the process to be seen; modular
>> division by 2^64, let's call it F(x). It is trivial to prove that the output
>> of F is uniformly distributed across the integers [0,2^64-1]. However, we
>> can also trivially compute x's that collide by simply adding 2^64 to the
>> previous x. F is perfectly uniform across all integers x, but is trivially
>> weak.
>>
>> Uniform distribution among the output set is not enough for cryptographic
>> strength.
>> Joe

>Sorry for another dilettantish question, but what is "a uniformly
>distributed function"? I thought this is simply another name for a
>random oracle function. If this is a function that simpy gives
>uniformly distributed outputs when given a uniformly distributed inputs
>then, for example, F(x)=x is also uniformly distributed. But what does
>it have to do with the collision probabilities in the case described
>here?

It is precisely relevant. He wants to define collisions as collisions
between different inputs producing the same output. Obviously if the inputs
are not uniform, then the outputs are not since the function is
deterministic ( same input always gives the same output.)

Then assuming that the input is uniform, so is the output (or at least that
is the assumption) If that is true, then given 2^x different inputs, the probability
of having a collision is 1-exp(-2^(2x-N)) for x<N.

>Sergei

From: Unruh on
"Sergei" <silentser(a)gmail.com> writes:


>Volker Hetzer wrote:
>> Sergei schrieb:
>> > But if SHA-1 is a uniformly distributed function, how the Chinese guys
>> > could claim that it is possible to find a collision in 2^63 hash
>> > operations?
>> Because it's pseudorandom only. This has nothing to do with the collision
>> probability of two randomly chosen sets of data.
>>
>> Lots of Greetings!
>> Volker
>> --
>> For email replies, please substitute the obvious.

>Sure. But what random data has to do with "few doesns of __terabytes__
>of data", which should be partishioned in blocks and hashed? As I
>understood, all the estimations here were made by assuming that the
>data was random. But why do you think this "few doesns of __terabytes__
>of data" contain a random data?

I didn't at first. Ie, if two of the inputs are the same then all of those
hashes would provide the same output, and you would have the same hash for
two different blocks of input. If that counts as a collision, then the
distribution of the input clearly affects the probability of collision.
HOwever, if the question is "given that the input blocks are all different,
what is the probability of having two blocks with the same hash" then the
answers are reasonable.

>Sergei