From: J.D. on
On Apr 7, 3:12 pm, "Scott Fluhrer" <sfluh...(a)ix.netcom.com> wrote:
>
> Well, I believe you're getting closer, but you're not quite there yet.
> You're actually answering a different question that just happens to have the
> same answer.

Ah, I love it when that happens.

>
> For what 'expected value' is, seehttp://en.wikipedia.org/wiki/Expected_Value(which explains it better than I
> can hope to, and especially look in the Examples section).
>
> In this case, well, if we call the probability of a single query succeeding
> p=2^-512, and q=1-p, then the probability of us finding a hit (that is,
> something hashed to the value we're looking for) on the k-th query is:
>
>    q^(k-1) * p
>
> We have a factor of q^(k-1) there because if we got a hit before the k'th
> query, we would have stopped then, and so the k'th query would never have
> happened; hence what is neccessary and sufficient is that the first k-1
> queries failed, and the k'th query succeeded
>
> The result we get if the k'th query hit is 'k' (because, in this case, we
> did exactly k queries).
>
> So, by the definition of expected value, this expected value is the infinite
> sum over integer k > 0 (> 0 because we always do at least one query) of:
>
>    k * q^(k-1) * p
>
> This is precisely the infinite sum that unruh gave, and if you evaluate it,
> turns out to be 1/p, which (given our definition of p above) is the value
> 2^512.
>
> --
> poncho

OK, that's clears things up considerably, thanks!
From: unruh on
On 2010-04-07, J.D. <degolyer181(a)yahoo.com> wrote:
> On Apr 7, 2:49?am, unruh <un...(a)wormhole.physics.ubc.ca> wrote:
>> On 2010-04-07, Jonathan Degolyer <degolyer...(a)gmail.com> wrote:
>>
>>
>>
>> > On Apr 6, 10:48?pm, "Scott Fluhrer" <sfluh...(a)ix.netcom.com> wrote:
>> >> "David T. Ashley" <dash...(a)gmail.com> wrote in messagenews:98WdnZxwQIpmOibWnZ2dnUVZ_jSdnZ2d(a)giganews.com...
>>
>> >> > Assume that I have an SHA-512 hash of a file (512 bits). ?Assume that I
>> >> > want to find a second file with the same hash and can't attack the hash
>> >> > function (i.e. I can only repeatedly try new files).
>>
>> >> > This is harder than a birthday attack, right? ?The mathematics of a
>> >> > birthday attack
>>
>> >> >http://en.wikipedia.org/wiki/Birthday_attack
>>
>> >> > are results for finding a collision in a set of attempts. ?Having a single
>> >> > hash already and trying to match it is harder, right (you are looking for
>> >> > a SPECIFIC collision rather than any collision)?
>>
>> >> > And what is the expected value of the number of attempts until a match is
>> >> > found? ?I want to say approximately 2^511, but that would be too simple
>> >> > ....
>>
>> >> Actually, if we model SHA-512 as a random oracle, the expected number of
>> >> attempts is actually 2^512. ?This is because SHA-512 is a function, not a
>> >> permutation, and so finding one value that doesn't hash to the right value
>> >> doesn't make any other value apriori more likely to succeed (as opposed to,
>> >> say, searching for a plaintext block that AES encrypts with an unknown key
>> >> to a specific ciphertext block; we know that we'll come across the right one
>> >> after a maximum of 2^128 attempts, and so any failed attempt makes the next
>> >> attempt more likely to happen to be value you're searching for).
>>
>> >> --
>> >> poncho
>> > @Scott Fluhrer
>>
>> > Well, my calculator doesn't have the sufficient degree of precision
>> > for finding the odds of collisions using 512 bit hash functions, but
>> > for 256 bit hash functions I get a slightly different result from what
>> > you describe:
>>
>> > Assumption: For any randomly selected message, mx, the probability
>> > that h(mx) will collide with a specific 256 bit digest, h(ma), is 1 in
>> > 2^256. ?Additional assumption: the number of possible messages that
>> > can be hashed is infinite, or so gargantuan as to be effectively
>> > infinite, such that failing to produce a collision with one random
>> > message does not increase the collision odds of any other randomly
>> > selected message.
>>
>> > Under these assumptions, the odds that a randomly chosen message will
>> > NOT collide with a specific digest is thus 1-(2^-256). ?And the odds
>> > that _not even one_ out of y many randomly chosen messages will
>> > collide with the target digest is {1-(2^-256)}^y. ?So, if my math is
>> > right (and it's one o'clock in the morning, so that's a pretty big
>> > if), then there is a 63% chance with 2^256 randomly chosen messages
>> > that at least one of them will collide with the target digest. ?With
>> > just 2^255 randomly chosen messages there is only a 39% chance of a
>> > collision. ?So to have a 50% chance of finding a collision with a
>> > specific digest you need to hash slightly less than 2^256 random
>> > messages (approximately 2^255.5). ?Presumably there is a similar
>> > result for 512-bit hash functions, though I cannot confirm that with
>> > my crappy equipment. ?Also I may be completely off in my assumptions
>> > and/or calculations -- it wouldn't be the first time, and certainly
>> > won't be the last.
>>
>> The question was, what is the mean number of attempts you would have to
>> do to find a collision.
>>
>> While it is true that the probability of finding a collision by 2^512
>> tries is 2/e=.7, that was not the question.
>>
>
> OK, I guess I misunderstood the question...
>
>> As you say the probability of not having the first N not collide is
>> (1-1/2^512)^N, but the probability that one will then get a collision is
>> 1/2^512, so the probability of not colliding in N-1 attempts and then
>> colliding on the Nth is [(1-1/2^512)^(N-1)]/2^512. Thus the mean number is
>> SUM_N N (1-1/2^512)^(N-1) 1/2^512 =SUM_N -(d/dx) [(1-x/2^512)]^N|_(x=1) =
>> -d/dx (1/(1-(1-x/2^512)))|_(x=1) =-d/dx(2^512/x)|_(x=1) ?=2^512
>
> I am having a hard time following your calculations (I am not used to
> mathematical notation expressed in ASCII). However, are you
> essentially saying that if we think of a brute force preimage attack
> on a hash function as conducting a series of independent Bernoulli
> trials, then in order to have an expected value of 1 (i.e. np = 1,
> where n = no. of trials and p = probability of success for each trial)
> then if p = 2^-512 then n must = 2^512. It that correct? Or am I off
> on the wrong track again?

The expected number of trials is the average number of trials which is
The sum of the number of trials to success times the probability of that
that many trials will lead to success, summed over all possiblities.
It is not a question of "how many trials befor you have 50% prob of
success or any other question like that. Ie, on average, if you run a
HUGE number of attempts, how many trials do you have to carry out before
you find the answer.

The probability of getting on the nth trial is the probability of not
getting it on the pervious n-1 trials times the probability of
succeeding on the nth trial. Since the probability of success is assumed
constant (2^(-512)) on each trial, the probability of failure is
1-2^(-512) in each trial, and the probability of failure on all n-1
trials is
(1-2^(-512))^(n-1) The probability of success on the nth trial is
2^(-512). Thus the probability of failure on the first n-1 and success
on the nth is (1-2^(-512))^(n-1) 2^(-512).
etc.

From: Noob on
Scott Fluhrer wrote:
> "J.D."<degolyer181(a)yahoo.com> wrote in message
> news:3a2ed776-bd41-4cbd-b010-2d3affe00f3e(a)20g2000vbr.googlegroups.com...
> On Apr 7, 2:49 am, unruh<un...(a)wormhole.physics.ubc.ca> wrote:
>
>>> As you say the probability of not having the first N not collide is
>>> (1-1/2^512)^N, but the probability that one will then get a collision is
>>> 1/2^512, so the probability of not colliding in N-1 attempts and then
>>> colliding on the Nth is [(1-1/2^512)^(N-1)]/2^512. Thus the mean number
>>> is
>>> SUM_N N (1-1/2^512)^(N-1) 1/2^512 =SUM_N -(d/dx) [(1-x/2^512)]^N|_(x=1) =
>>> -d/dx (1/(1-(1-x/2^512)))|_(x=1) =-d/dx(2^512/x)|_(x=1) =2^512
>>
>> I am having a hard time following your calculations (I am not used to
>> mathematical notation expressed in ASCII). However, are you
>> essentially saying that if we think of a brute force preimage attack
>> on a hash function as conducting a series of independent Bernoulli
>> trials, then in order to have an expected value of 1 (i.e. np = 1,
>> where n = no. of trials and p = probability of success for each trial)
>> then if p = 2^-512 then n must = 2^512. It that correct? Or am I off
>> on the wrong track again?
>
> Well, I believe you're getting closer, but you're not quite there yet.
> You're actually answering a different question that just happens to have the
> same answer.
>
> For what 'expected value' is, see
> http://en.wikipedia.org/wiki/Expected_Value (which explains it better than I
> can hope to, and especially look in the Examples section).
>
> In this case, well, if we call the probability of a single query succeeding
> p=2^-512, and q=1-p, then the probability of us finding a hit (that is,
> something hashed to the value we're looking for) on the k-th query is:
>
> q^(k-1) * p
>
> We have a factor of q^(k-1) there because if we got a hit before the k'th
> query, we would have stopped then, and so the k'th query would never have
> happened; hence what is neccessary and sufficient is that the first k-1
> queries failed, and the k'th query succeeded
>
> The result we get if the k'th query hit is 'k' (because, in this case, we
> did exactly k queries).
>
> So, by the definition of expected value, this expected value is the infinite
> sum over integer k> 0 (> 0 because we always do at least one query) of:
>
> k * q^(k-1) * p
>
> This is precisely the infinite sum that unruh gave, and if you evaluate it,
> turns out to be 1/p, which (given our definition of p above) is the value
> 2^512.

Assuming the probability that a given string will hash to the
candidate hash is p=1/2^512

If one considers N strings, the probability that none will hash to the
candidate hash is (1-p)^N
(Independence and all that good stuff)

Therefore, the probability that AT LEAST ONE string hashes to the
candidate hash is PP = 1 - (1-p)^N

As far as I can tell, the question asked is: How many strings must one
consider before it is likely one finds at least one collision.

Taking "likely" to mean proba > 50%

EQ <=> 1 - (1-p)^N > 1/2 <=> (1-p)^N < 1/2

when p is small (which is the case) (1-p)^N ~ 1/e^Np

EQ <=> e^Np > 2 <=> Np > ln(2) <=> N > ln(2)/p

Thus, it would be likely to find at least one collision
after 0.693 x 2^512 strings.

Or did I miss something?

Regards.
From: Scott Fluhrer on

"Noob" <root(a)127.0.0.1> wrote in message
news:hpkl7e$tg2$1(a)speranza.aioe.org...
> Scott Fluhrer wrote:
>> "J.D."<degolyer181(a)yahoo.com> wrote in message
>> news:3a2ed776-bd41-4cbd-b010-2d3affe00f3e(a)20g2000vbr.googlegroups.com...
>> On Apr 7, 2:49 am, unruh<un...(a)wormhole.physics.ubc.ca> wrote:
>>
>>>> As you say the probability of not having the first N not collide is
>>>> (1-1/2^512)^N, but the probability that one will then get a collision
>>>> is
>>>> 1/2^512, so the probability of not colliding in N-1 attempts and then
>>>> colliding on the Nth is [(1-1/2^512)^(N-1)]/2^512. Thus the mean number
>>>> is
>>>> SUM_N N (1-1/2^512)^(N-1) 1/2^512 =SUM_N -(d/dx) [(1-x/2^512)]^N|_(x=1)
>>>> =
>>>> -d/dx (1/(1-(1-x/2^512)))|_(x=1) =-d/dx(2^512/x)|_(x=1) =2^512
>>>
>>> I am having a hard time following your calculations (I am not used to
>>> mathematical notation expressed in ASCII). However, are you
>>> essentially saying that if we think of a brute force preimage attack
>>> on a hash function as conducting a series of independent Bernoulli
>>> trials, then in order to have an expected value of 1 (i.e. np = 1,
>>> where n = no. of trials and p = probability of success for each trial)
>>> then if p = 2^-512 then n must = 2^512. It that correct? Or am I off
>>> on the wrong track again?
>>
>> Well, I believe you're getting closer, but you're not quite there yet.
>> You're actually answering a different question that just happens to have
>> the
>> same answer.
>>
>> For what 'expected value' is, see
>> http://en.wikipedia.org/wiki/Expected_Value (which explains it better
>> than I
>> can hope to, and especially look in the Examples section).
>>
>> In this case, well, if we call the probability of a single query
>> succeeding
>> p=2^-512, and q=1-p, then the probability of us finding a hit (that is,
>> something hashed to the value we're looking for) on the k-th query is:
>>
>> q^(k-1) * p
>>
>> We have a factor of q^(k-1) there because if we got a hit before the k'th
>> query, we would have stopped then, and so the k'th query would never have
>> happened; hence what is neccessary and sufficient is that the first k-1
>> queries failed, and the k'th query succeeded
>>
>> The result we get if the k'th query hit is 'k' (because, in this case, we
>> did exactly k queries).
>>
>> So, by the definition of expected value, this expected value is the
>> infinite
>> sum over integer k> 0 (> 0 because we always do at least one query) of:
>>
>> k * q^(k-1) * p
>>
>> This is precisely the infinite sum that unruh gave, and if you evaluate
>> it,
>> turns out to be 1/p, which (given our definition of p above) is the value
>> 2^512.
>
> Assuming the probability that a given string will hash to the
> candidate hash is p=1/2^512
>
> If one considers N strings, the probability that none will hash to the
> candidate hash is (1-p)^N
> (Independence and all that good stuff)

Yup, and that's why I invoked the random oracle model; in that model, all
distinct queries are defined to be independent (which, strictly speaking, is
something we don't know about SHA-512, but is plausible).

>
> Therefore, the probability that AT LEAST ONE string hashes to the
> candidate hash is PP = 1 - (1-p)^N
>
> As far as I can tell, the question asked is: How many strings must one
> consider before it is likely one finds at least one collision.

Well, no, the OP asked "what is the expected value of the number of attempts
until a match is found". "Expected number" is not "how many you have to do
before you get to 50%". To put it in statistics terms, the expected value
is the mean number of attempts, not the median.

--
poncho


From: Sal on
On Apr 8, 9:23 am, "Scott Fluhrer" <sfluh...(a)ix.netcom.com> wrote:
> "Noob" <r...(a)127.0.0.1> wrote in message
>
> news:hpkl7e$tg2$1(a)speranza.aioe.org...
>
>
>
> > Scott Fluhrer wrote:
> >> "J.D."<degolyer...(a)yahoo.com>  wrote in message
> >>news:3a2ed776-bd41-4cbd-b010-2d3affe00f3e(a)20g2000vbr.googlegroups.com....
> >> On Apr 7, 2:49 am, unruh<un...(a)wormhole.physics.ubc.ca>  wrote:
>
> >>>> As you say the probability of not having the first N not collide is
> >>>> (1-1/2^512)^N, but the probability that one will then get a collision
> >>>> is
> >>>> 1/2^512, so the probability of not colliding in N-1 attempts and then
> >>>> colliding on the Nth is [(1-1/2^512)^(N-1)]/2^512. Thus the mean number
> >>>> is
> >>>> SUM_N N (1-1/2^512)^(N-1) 1/2^512 =SUM_N -(d/dx) [(1-x/2^512)]^N|_(x=1)
> >>>> =
> >>>> -d/dx (1/(1-(1-x/2^512)))|_(x=1) =-d/dx(2^512/x)|_(x=1) =2^512
>
> >>> I am having a hard time following your calculations (I am not used to
> >>> mathematical notation expressed in ASCII).  However, are you
> >>> essentially saying that if we think of a brute force preimage attack
> >>> on a hash function as conducting a series of independent Bernoulli
> >>> trials, then in order to have an expected value of 1 (i.e. np = 1,
> >>> where n = no. of trials and p = probability of success for each trial)
> >>> then if p = 2^-512 then n must = 2^512.  It that correct?  Or am I off
> >>> on the wrong track again?
>
> >> Well, I believe you're getting closer, but you're not quite there yet.
> >> You're actually answering a different question that just happens to have
> >> the
> >> same answer.
>
> >> For what 'expected value' is, see
> >>http://en.wikipedia.org/wiki/Expected_Value(which explains it better
> >> than I
> >> can hope to, and especially look in the Examples section).
>
> >> In this case, well, if we call the probability of a single query
> >> succeeding
> >> p=2^-512, and q=1-p, then the probability of us finding a hit (that is,
> >> something hashed to the value we're looking for) on the k-th query is:
>
> >>     q^(k-1) * p
>
> >> We have a factor of q^(k-1) there because if we got a hit before the k'th
> >> query, we would have stopped then, and so the k'th query would never have
> >> happened; hence what is neccessary and sufficient is that the first k-1
> >> queries failed, and the k'th query succeeded
>
> >> The result we get if the k'th query hit is 'k' (because, in this case, we
> >> did exactly k queries).
>
> >> So, by the definition of expected value, this expected value is the
> >> infinite
> >> sum over integer k>  0 (>  0 because we always do at least one query) of:
>
> >>     k * q^(k-1) * p
>
> >> This is precisely the infinite sum that unruh gave, and if you evaluate
> >> it,
> >> turns out to be 1/p, which (given our definition of p above) is the value
> >> 2^512.
>
> > Assuming the probability that a given string will hash to the
> > candidate hash is p=1/2^512
>
> > If one considers N strings, the probability that none will hash to the
> > candidate hash is (1-p)^N
> > (Independence and all that good stuff)
>
> Yup, and that's why I invoked the random oracle model; in that model, all
> distinct queries are defined to be independent (which, strictly speaking, is
> something we don't know about SHA-512, but is plausible).
>
>
>
> > Therefore, the probability that AT LEAST ONE string hashes to the
> > candidate hash is PP = 1 - (1-p)^N
>
> > As far as I can tell, the question asked is: How many strings must one
> > consider before it is likely one finds at least one collision.
>
> Well, no, the OP asked "what is the expected value of the number of attempts
> until a match is found".  "Expected number" is not "how many you have to do
> before you get to 50%".  To put it in statistics terms, the expected value
> is the mean number of attempts, not the median.
>
> --
> poncho

Can't compute it, but for accuracy isn't this the REAL solution? Find
N where:

((2^512)-1)/2^512 * ((2^512)-2)/2^512 * ((2^512)-3)/2^512 * ... *
((2^512)-N)/2^512 ~ 50%