Associative hash function [Cryptography]

Prev: If we could factor large numbers quickly, how exactly does everything break?
Next: HTTP and HTTPS sessions question

From: Francois Grieu on 17 Apr 2010 06:50

Make my last post end in:

I fear that the security argument vanishes if instead of an RSA modulus
of unknown factorization we use a public randomly-seeded prime p with
(p-1)/2 prime.

This has the advantage that we can work on the finite set:
E = [0..p-2] x [1..p-1] with function F: E x E -> E
F( (j,x), (k,y) ) = ( (j+k)%(p-1), x*(y^^((3^^j)%(p-1)))%p )
which has much lower asymptotic cost O((Log(p)^^s) ) with s<3.

But on the other hand, disaster is likely to creep in. Like something in
the tune of: if we consider messages formed by pasting two different
blocks (or equivalently blocks of one bit), messages of the same length
will have the same hash if they lead to integers equal mod (p-1) when
considering the message as a base-3 radix representation with digit 0
for one block and 1 for the other, and least significant digit first.

In relation withe the original post: unless I err, what we are
discussing is not an associative one-way function in (at least the
wording of) the definition of Rabi and Sherman. It is a one-way function
with the property that its result for a message can be computed by
combining its results for sub-messages using an other function that is
associative.

Francois Grieu

From: Maaartin on 17 Apr 2010 08:23

On Apr 17, 12:50 pm, Francois Grieu <fgr...(a)gmail.com> wrote:
> Make my last post end in:
>
> I fear that the security argument vanishes if instead of an RSA modulus
> of unknown factorization we use a public randomly-seeded prime p with
> (p-1)/2 prime.
>
> This has the advantage that we can work on the finite set:
> E = [0..p-2] x [1..p-1]

This is IMHO not very important, as every message you'll ever
encounter is surely much shorter than 2^^128
and n must be much larger, so you can always limit yourself to
[0..2^^128-1] x [1..n-1].

> with function F: E x E -> E
> F( (j,x), (k,y) ) = ( (j+k)%(p-1), x*(y^^((3^^j)%(p-1)))%p )
> which has much lower asymptotic cost O((Log(p)^^s) ) with s<3.

When I swap the arguments of F I get
H(a1, a2, .., ak) = (k, 1 * a1^^(3^^k) * a2^^(3^^(k-1)) * ... *
ak^^(3^^0) % n)
which can be computed quite efficient recursively starting from the
left
(the original expression can be computed starting from the right,
which is not so convenient).
Sure, the general computation is more time consuming, but this may or
may not be a problem.

What is a problem (at least in my original use case which I didn't
describe yet) is the large size of the hash.

> But on the other hand, disaster is likely to creep in. Like something in
> the tune of: if we consider messages formed by pasting two different
> blocks (or equivalently blocks of one bit), messages of the same length
> will have the same hash if they lead to integers equal mod (p-1) when
> considering the message as a base-3 radix representation with digit 0
> for one block and 1 for the other, and least significant digit first.

I can't follow this. The first component is the length of the message
and thus can never reach p since p needs to be huge.
Even when ignoring other reasons, p must have at least 512 bits in
order for the SHA-512 to fit in.
I don't see what for you use the base 3 here.

> In relation withe the original post: unless I err, what we are
> discussing is not an associative one-way function in (at least the
> wording of) the definition of Rabi and Sherman.
> It is a one-way function
> with the property that its result for a message can be computed by
> combining its results for sub-messages using an other function that is
> associative.

You mean that my associativity is quite strange, right? But it is not,
the function H is associative in the normal sense
H(a, H(b, c)) = H(H(a, b), c)
for every a, b, c where it is defined, i.e., messages of length being
multiple of the blocksize.
The definition of H can't be extended to arbitrary length messages
without loosing associativity.

From: Francois Grieu on 17 Apr 2010 11:23

On 17/04/2010 14:23, Maaartin wrote:
> On Apr 17, 12:50 pm, Francois Grieu<fgr...(a)gmail.com> wrote:
>> Make my last post end in:
>>
>> I fear that the security argument vanishes if instead of an RSA modulus
>> of unknown factorization we use a public randomly-seeded prime p with
>> (p-1)/2 prime.
>>
>> This has the advantage that we can work on the finite set:
>> E = [0..p-2] x [1..p-1]
>
> This is IMHO not very important, as every message you'll ever
> encounter is surely much shorter than 2^^128
> and n must be much larger, so you can always limit yourself to
> [0..2^^128-1] x [1..n-1].
>
>> with function F: E x E -> E
>> F( (j,x), (k,y) ) = ( (j+k)%(p-1), x*(y^^((3^^j)%(p-1)))%p )
>> which has much lower asymptotic cost O((Log(p)^^s) ) with s<3.
>
> When I swap the arguments of F I get
> H(a1, a2, .., ak) = (k, 1 * a1^^(3^^k) * a2^^(3^^(k-1)) * ... *
> ak^^(3^^0) % n)
> which can be computed quite efficient recursively starting from the
> left
> (the original expression can be computed starting from the right,
> which is not so convenient).

Yes. What I was aiming at is a way of making the expression computable
more efficiently, in any order, for long messages. In the RSA variant
with unknown factorization of n, for message size z blocks and the worst
(or even random) evaluation order, the overall computational cost of the
hash grows as O(z^^2); my variant with public modulus p instead of an
RSA modulus n changes that to O(z).

Notice this same speedup is also achievable in the RSA case but only for
someone knowing the factorization of n, since
(y^^(3^^k))%n = (y^^((3^^k)%LCM(p-1,q-1)))%n.
This might be usable if the final hash is computed by a trusted party.

[In the rest of this post I use n for the modulus used for the "main"
computation, regardless of if it is composite (original scheme) or prime
(variant)].

> Sure, the general computation is more time consuming, but this may or
> may not be a problem.
>
> What is a problem (at least in my original use case which I didn't
> describe yet) is the large size of the hash.
>
>> But on the other hand, disaster is likely to creep in. Like something in
>> the tune of: if we consider messages formed by pasting two different
>> blocks (or equivalently blocks of one bit), messages of the same length
>> will have the same hash if they lead to integers equal mod (p-1) when
>> considering the message as a base-3 radix representation with digit 0
>> for one block and 1 for the other, and least significant digit first.
>
> I can't follow this. The first component is the length of the message
> and thus can never reach p since p needs to be huge.

Indeed.

> Even when ignoring other reasons, p must have at least 512 bits in
> order for the SHA-512 to fit in.

Yes. But my idea for collisions (in the case where the factorization of
the public modulus n is known, including but not limited to the modulus
being a public prime p) has nothing to do with the number of blocks
becoming anywhere close to n (or p).

> I don't see what for you use the base 3 here.

3 comes from the public exponent used.

Consider the two message blocks consisting of all zero and all one bits.
Say their hashes are (1,h0) and (1,h1). We'll be looking for collisions
in the set of the messages of exactly z blocks chosen from these two,
for some plausible z (say ten thousands).

This set Mz has exactly 2^^z messages. The all-zero message of z blocks
has hash (z, h0z) with
h0z = h0^^(1+3+3^^2+..3^^(z-1))%n

Most probably h0 is co-prime with n and we can find g such that
h1 = h0*g % n.

If a message M of Mz reduces to some bit-string (of length z) when each
of its block is replaced by 0 or 1, then this bit-string is also the
base-3 representation (in the appropriate order) of an integer f such that
H(M) = (z, h0z*(g^^(3^^f))%n)
= (z, h0z*(g^^(3^^(f%LCM(p-1,q-1))))%n)

Now the problem of finding a non-zero message in Mz colliding with the
all-zero message is reduced to that of finding a string of z bits that
is also the base-3 representation (in the appropriate direction) of a
positive multiple of LCM(p-1,q-1). This is not a hard problem when
LCM(p-1,q-1) in known and z can be a few times the bit size of
LCM(p-1,q-1). The problem of finding colliding messages is even easier.
Note: just change LCM(p-1,q-1) to (p-1) if n is a prime p.
Note: I am not optimistic that replacing 3 by a bigger value might be
enough to get rid of the requirement that the factorization of n is unknown.

>> In relation to the original post: unless I err, what we are
>> discussing is not an associative one-way function in (at least the
>> wording of) the definition of Rabi and Sherman.
>> It is a one-way function
>> with the property that its result for a message can be computed by
>> combining its results for sub-messages using an other function that is
>> associative.
>
> You mean that my associativity is quite strange, right?

Yes that's what I mean.

> But it is not,
> the function H is associative in the normal sense
> H(a, H(b, c)) = H(H(a, b), c)
> for every a, b, c where it is defined, i.e., messages of length being
> multiple of the blocksize.

I maintain that your hash is not associative in the standard
mathematical sense H(a,H(b,c)) = H(H(a,b),c) for all a b c [restricted
to whole blocks].
Or with the notation H(a,b) = a*b, in the sense a*(b*c) = (a*b)*c.

Notice that in H(a,H(b,c)) the right argument of the left function H is
a hash, rather than a message. In the other notation, * is a hash
function, not concatenation of messages (that I'll note ## in the
following).

In other words, the problem is that in H(a,H(b,c)), block c goes through
two hashes, when it goes through only one in H(H(a,b),c).

Your definition of associativity of H seems to be: there exist an
associative F such that
H(a##b) = F(H(a),H(b)) for all messages a b restricted to whole blocks.
If this holds, it follows that
H(a##b##c) = F(H(a##b),H(c)) = F(H(a),H(b##c)) for all a b c restricted
to whole blocks,
but not that H(a##H(b##c)) = H(H(a##b)##c).

> The definition of H can't be extended to arbitrary length messages
> without loosing associativity.

Right, but the problem of achieving associativity in the usual
mathematical sense, and as far as I can get it in the definition given
by Rabi and Sherman [*] seems worse. The issue is not only about block
size, which I think could be solved by using one-bit blocks.

Francois Grieu

[*] Muhammad Rabi, Alan T. Sherman:
Associative one-way functions: A new paradigm for secret-key agreement
and digital signatures (1993)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.118.6837
http://www.csee.umbc.edu/~sherman/Papers/aowf_tr.ps
An observation on associative one-way functions in complexity theory
(Information Processing Letters, 1997)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.110.6323
http://www.csee.umbc.edu/~sherman/Papers/aowf_ipl.ps

From: Maaartin on 17 Apr 2010 13:04

On Apr 17, 5:23 pm, Francois Grieu <fgr...(a)gmail.com> wrote:
> Yes. What I was aiming at is a way of making the expression computable
> more efficiently, in any order, for long messages. In the RSA variant
> with unknown factorization of n, for message size z blocks and the worst
> (or even random) evaluation order, the overall computational cost of the
> hash grows as O(z^^2); my variant with public modulus p instead of an
> RSA modulus n changes that to O(z).

Ok.

> Notice this same speedup is also achievable in the RSA case but only for
> someone knowing the factorization of n, since
> (y^^(3^^k))%n = (y^^((3^^k)%LCM(p-1,q-1)))%n.
> This might be usable if the final hash is computed by a trusted party.

OT: Is there really something like a trusted third party here? In case
of PKI we can trust the CA somehow, since certifying a bad guy could
get detected and have consequences. Here, keeping (p, q) forever or
giving it to the goverment or to whoever pays enough wouldn't be
detected until they get used in a stupid way. Using a third party here
needs as much trust as communicating via a third party (A wants to
sends a message to B; A encrypts it for C, C decrypts it and encrypts
it for B).

> Yes. But my idea for collisions (in the case where the factorization of
> the public modulus n is known, including but not limited to the modulus
> being a public prime p) has nothing to do with the number of blocks
> becoming anywhere close to n (or p).
>
> > I don't see what for you use the base 3 here.
>
> 3 comes from the public exponent used.
>
> Consider the two message blocks consisting of all zero and all one bits.
> Say their hashes are (1,h0) and (1,h1). We'll be looking for collisions
> in the set of the messages of exactly z blocks chosen from these two,
> for some plausible z (say ten thousands).
>
> This set Mz has exactly 2^^z messages. The all-zero message of z blocks
> has hash (z, h0z) with
> h0z = h0^^(1+3+3^^2+..3^^(z-1))%n
>
> Most probably h0 is co-prime with n and we can find g such that
> h1 = h0*g % n.
>
> If a message M of Mz reduces to some bit-string (of length z) when each
> of its block is replaced by 0 or 1, then this bit-string is also the
> base-3 representation (in the appropriate order) of an integer f such that
> H(M) = (z, h0z*(g^^(3^^f))%n)
> = (z, h0z*(g^^(3^^(f%LCM(p-1,q-1))))%n)
>
> Now the problem of finding a non-zero message in Mz colliding with the
> all-zero message is reduced to that of finding a string of z bits that
> is also the base-3 representation (in the appropriate direction) of a
> positive multiple of LCM(p-1,q-1).
> This is not a hard problem when
> LCM(p-1,q-1) in known and z can be a few times the bit size of
> LCM(p-1,q-1). The problem of finding colliding messages is even easier.

Nice explanation, but maybe I'm still missing something. For one
collision, you only need to get f2=f1+LCM(p-1,q-1) and express it in
ternary, right? Or any multiple of the LCM for additional collisions.

> Note: just change LCM(p-1,q-1) to (p-1) if n is a prime p.
> Note: I am not optimistic that replacing 3 by a bigger value might be
> enough to get rid of the requirement that the factorization of n is unknown.

IIUYC, it can't help, you only need to express f to a different base.

> I maintain that your hash is not associative in the standard
> mathematical sense H(a,H(b,c)) = H(H(a,b),c) for all a b c [restricted
> to whole blocks].
> Or with the notation H(a,b) = a*b, in the sense a*(b*c) = (a*b)*c.
>
> Notice that in H(a,H(b,c)) the right argument of the left function H is
> a hash, rather than a message.

You're right and I was blind. Let's call it pseudoassociativity.

> Right, but the problem of achieving associativity in the usual
> mathematical sense, and as far as I can get it in the definition given
> by Rabi and Sherman [*] seems worse.

It must be worse since the existence of such a function depends on P!
=NP. But for what I'm thinking about, the pseudoassociativity
suffices.

> The issue is not only about block
> size, which I think could be solved by using one-bit blocks.

In the meantime I've found an interesting pseudoassociative hash:
Zemor-Tillich which uses one-bit blocks and works in SL(2, GF(2**n)),
i.e., it uses 2x2 matrixes with determinant 1.
The original paper "Hashing with SL2" had fallen pray to Springer, but
there are some papers here:
http://www.dice.ucl.ac.be/~petit/index2.html

From: Francois Grieu on 17 Apr 2010 13:37

On 17/04/2010 19:04, Maaartin wrote:
> On Apr 17, 5:23 pm, Francois Grieu<fgr...(a)gmail.com> wrote:
>> Yes. What I was aiming at is a way of making the expression computable
>> more efficiently, in any order, for long messages. In the RSA variant
>> with unknown factorization of n, for message size z blocks and the worst
>> (or even random) evaluation order, the overall computational cost of the
>> hash grows as O(z^^2); my variant with public modulus p instead of an
>> RSA modulus n changes that to O(z).
>
> Ok.
>
>> Notice this same speedup is also achievable in the RSA case but only for
>> someone knowing the factorization of n, since
>> (y^^(3^^k))%n = (y^^((3^^k)%LCM(p-1,q-1)))%n.
>> This might be usable if the final hash is computed by a trusted party.
>
> OT: Is there really something like a trusted third party here? In case
> of PKI we can trust the CA somehow, since certifying a bad guy could
> get detected and have consequences. Here, keeping (p, q) forever or
> giving it to the goverment or to whoever pays enough wouldn't be
> detected until they get used in a stupid way. Using a third party here
> needs as much trust as communicating via a third party (A wants to
> sends a message to B; A encrypts it for C, C decrypts it and encrypts
> it for B).

Yes.

>> My idea for collisions (in the case where the factorization of
>> the public modulus n is known, including but not limited to the modulus
>> being a public prime p) has nothing to do with the number of blocks
>> becoming anywhere close to n (or p).
>>
>>> I don't see what for you use the base 3 here.
>>
>> 3 comes from the public exponent used.
>>
>> Consider the two message blocks consisting of all zero and all one bits.
>> Say their hashes are (1,h0) and (1,h1). We'll be looking for collisions
>> in the set of the messages of exactly z blocks chosen from these two,
>> for some plausible z (say ten thousands).
>>
>> This set Mz has exactly 2^^z messages. The all-zero message of z blocks
>> has hash (z, h0z) with
>> h0z = h0^^(1+3+3^^2+..3^^(z-1))%n
>>
>> Most probably h0 is co-prime with n and we can find g such that
>> h1 = h0*g % n.
>>
>> If a message M of Mz reduces to some bit-string (of length z) when each
>> of its block is replaced by 0 or 1, then this bit-string is also the
>> base-3 representation (in the appropriate order) of an integer f such that
>> H(M) = (z, h0z*(g^^(3^^f))%n)
>> = (z, h0z*(g^^(3^^(f%LCM(p-1,q-1))))%n)
>>
>> Now the problem of finding a non-zero message in Mz colliding with the
>> all-zero message is reduced to that of finding a string of z bits that
>> is also the base-3 representation (in the appropriate direction) of a
>> positive multiple of LCM(p-1,q-1).
>> This is not a hard problem when
>> LCM(p-1,q-1) in known and z can be a few times the bit size of
>> LCM(p-1,q-1). The problem of finding colliding messages is even easier.
>
> Nice explanation, but maybe I'm still missing something. For one
> collision, you only need to get f2=f1+LCM(p-1,q-1) and express it in
> ternary, right? Or any multiple of the LCM for additional collisions.

Unless I err, finding collisions is not quite as easy, because f1 and f2
are bound to have only 0 and 1 in their ternary expression, and it is
extremely unlikely f2 thus constructed will match this condition.

>> Note: just change LCM(p-1,q-1) to (p-1) if n is a prime p.
>> Note: I am not optimistic that replacing 3 by a bigger value might be
>> enough to get rid of the requirement that the factorization of n is unknown.
>
> IIUYC, it can't help, you only need to express f to a different base.

I'd say that it can at best increase slightly the minimum z for the
attack to work and/or the memory resources needed by the attacker.

>
>> I maintain that your hash is not associative in the standard
>> mathematical sense H(a,H(b,c)) = H(H(a,b),c) for all a b c [restricted
>> to whole blocks].
>> Or with the notation H(a,b) = a*b, in the sense a*(b*c) = (a*b)*c.
>>
>> Notice that in H(a,H(b,c)) the right argument of the left function H is
>> a hash, rather than a message.
>
> You're right and I was blind. Let's call it pseudoassociativity.
>
>> Right, but the problem of achieving associativity in the usual
>> mathematical sense, and as far as I can get it in the definition given
>> by Rabi and Sherman [*] seems worse.
>
> It must be worse since the existence of such a function depends on P!
> =NP. But for what I'm thinking about, the pseudoassociativity
> suffices.
>
>> The issue is not only about block
>> size, which I think could be solved by using one-bit blocks.
>
> In the meantime I've found an interesting pseudoassociative hash:
> Zemor-Tillich which uses one-bit blocks and works in SL(2, GF(2**n)),
> i.e., it uses 2x2 matrixes with determinant 1.
> The original paper "Hashing with SL2" had fallen pray to Springer, but
> there are some papers here:
> http://www.dice.ucl.ac.be/~petit/index2.html

I'll try to have a look at that. Now it is time for some social
activity, and the next 10 days will be busy professionally.

Although not linked to a reference hard problem, your idea of matrix of
elements of GF(256) is interesting, and computationally practical.

Francois Grieu

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: If we could factor large numbers quickly, how exactly does everything break?
Next: HTTP and HTTPS sessions question