From: Francois Grieu on
Le 16/06/2010 16:47, Maaartin a �crit :
>> On 16/06/2010 03:06, Maaartin wrote:
>>> Does everybody agree?
>>
>> Yes, with no practical reservation, but two remarks:
>>
>> 1) Making a hash faster makes brute force proportionally faster, and
>> less unlikely to succeed.
>
> I understand "faster", and don't care much - what is factor 10 (which
> is much more then I can get) when doing something impossible? I don't
> understand "less unlikely", except when you mean "less unlikely in
> given time".

Yes, that's what that I meant.

I said "less unlikely" because odds of sucess under any semi-realistic
conditions are so low. Assuming $(2^30) invested, that each $ could buy
an IC with 2^10 cores each capable of sieving 2^35 values per second,
and free energy, odds of success for each year of operation (less than
2^25 seconds) are less than 1 in 2^28.

>> 2) For an adversary with unbounded memory, finding a preimage to
>> H(x) = x ^ aes(k, x) for some known constant k
>> with a given odd of success requires on average slightly less
>> evaluations of aes than would
>> H(x) = aes(x, c) for some known constant c
>> but that gain is by a factor of at most two, and entirely negligible for
>> an adversary with a practical amount of memory.
>> The security bounds obtainable for the two constructs under a random
>> oracle model reflect this fact. In practice, the improvement is possible
>> because computing y = aes(k, x) and verifying that x ^ y is not the
>> desired h, is enough to rule out both x and h ^ y as possible solutions.
>
> Ok, this helped me understand it better.
>
> On Jun 16, 1:45 pm, Thomas Pornin <por...(a)bolet.org> wrote:
>> According to Maaartin <grajc...(a)seznam.cz>:
>>
>>> H(x) = x ^ aes(k, x)
>> [...]
>>> - It's probably faster then md5(x)
>>
>> On an Intel Core 2 (Q6600), it takes about 375 clock cycles to process
>> one MD5 block; that's the base cost of hashing any message up to 55
>> bytes. On the same machine, OpenSSL's implementation of AES/CBC
>> processes one block (i.e. one AES and one XOR) in about 230 clock
>> cycles(*). So the H() function above is indeed faster than MD5 on 16-byte
>> input, but:
>> - not by a large margin;
>> - the timings may be different on other architectures;
>> - the AES implementation uses tables (cumulative size is 4 kB) so it
>> needs some L1 cache, which MD5 does not; this may or may not change
>> things, possibly drastically, depending on the rest of the application
>> (the code which uses that H() function).
>>
>> Hence it is relatively unclear whether H() is always faster than MD5. It
>> seems plausible that H() and MD5 will offer performance of similar
>> magnitude on most systems, except those for which either AES or MD5
>> benefits from some specialized hardware.
>>
>> --Thomas Pornin
>>
>> (*) There is a paper by Bernstein and Schwabe which claims about 169
>> clock cycles for an AES block, and the XOR will account for no more than
>> 4 extra clock cycles, so that "230" could be reduced to something like
>> "173". But remember that this is in "ideal benchmark conditions",
>> something which rarely happens in actual applications, especially due to
>> cache effects: the L1 cache (both instruction and data) must be shared
>> with the rest of the code in the data processing path.
>
> This is sad. The fastest thing I've found is a bit-sliced
> implementation (by Kaesper and Schwabe) using SSE3 and achieving 7.08
> cycles per byte on Intel Core i7 920 when processing 4KiB. Even so,
> this means 114 cycles for an AES block and speedup factor of three
> only. It uses no tables, but it achieves the speed for large blocks
> only. I don't think I could get anywhere close even if I could rely on
> SSE3.

Bleeding edge Intel chips can reportedly do about 1.28 cycle/byte/core

<http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-aes-instructions-set/>

Francois Grieu
From: Paul Rubin on
Maaartin <grajcar1(a)seznam.cz> writes:
> This is sad. The fastest thing I've found is a bit-sliced
> implementation (by Kaesper and Schwabe) using SSE3 and achieving 7.08
> cycles per byte on Intel Core i7 920 when processing 4KiB. Even so,
> this means 114 cycles for an AES block and speedup factor of three only.

On an i7, use the AES-NI hardware instructions for under 2 cycles/byte if
I remember right.
From: Paul Rubin on
Kristian Gjøsteen <kristiag+news(a)math.ntnu.no> writes:
> The same holds for hash functions as well, so this is not an objection
> against random oracle arguments.
>
> The idea is that the adversary doesn't really care about the internals
> of the function, and the function should be a typical example of a
> "random function" (or "random permutation").
>
> Once you believe that about aes(k,.), the argument should be plausible.

But, hash functions are built for the express purpose of resisting what
amounts to a "known key attack", i.e. to substitute for random oracles.
There is some experience indicating that it's harder to do that than to
design a block cipher. Look how well even DES has held up, compared
with MD4, MD5, SHA0, SHA1, etc.

Consider a hypothetical "BES" which is simply AES(k,x) - x. Then x +
BES(k,X) is equal to AES(k,x) which is invertible if k is known. The
structure of BES actually matters, i.e., in the case of BES, the random
oracle hypothesis is invalid. Why should we expect it to be valid for
AES, which never promised to be anything but a PRP?

I agree that in practice the function is probably ok, but it presumes
characteristics of AES that are not among the advertised ones.
From: Francois Grieu on
Le 16/06/2010 16:47, Maaartin a �crit :
>> On 16/06/2010 03:06, Maaartin wrote:
>>> Does everybody agree?
>>
>> Yes, with no practical reservation, but two remarks:
>>
>> 1) Making a hash faster makes brute force proportionally faster, and
>> less unlikely to succeed.
>
> I understand "faster", and don't care much - what is factor 10 (which
> is much more then I can get) when doing something impossible? I don't
> understand "less unlikely", except when you mean "less unlikely in
> given time".

Yes, that's what that I meant.

I said "less unlikely" because odds of success under any semi-realistic
conditions are so low. Assuming $(2^30) invested, that each $ could buy
an IC with 2^10 cores each capable of sieving 2^35 values per second,
and free energy, odds of success for each year of operation (less than
2^25 seconds) are less than 1 in 2^28.

>> 2) For an adversary with unbounded memory, finding a preimage to
>> H(x) = x ^ aes(k, x) for some known constant k
>> with a given odd of success requires on average slightly less
>> evaluations of aes than would
>> H(x) = aes(x, c) for some known constant c
>> but that gain is by a factor of at most two, and entirely negligible for
>> an adversary with a practical amount of memory.
>> The security bounds obtainable for the two constructs under a random
>> oracle model reflect this fact. In practice, the improvement is possible
>> because computing y = aes(k, x) and verifying that x ^ y is not the
>> desired h, is enough to rule out both x and h ^ y as possible solutions.
>
> Ok, this helped me understand it better.
>
> On Jun 16, 1:45 pm, Thomas Pornin <por...(a)bolet.org> wrote:
>> According to Maaartin <grajc...(a)seznam.cz>:
>>
>>> H(x) = x ^ aes(k, x)
>> [...]
>>> - It's probably faster then md5(x)
>>
>> On an Intel Core 2 (Q6600), it takes about 375 clock cycles to process
>> one MD5 block; that's the base cost of hashing any message up to 55
>> bytes. On the same machine, OpenSSL's implementation of AES/CBC
>> processes one block (i.e. one AES and one XOR) in about 230 clock
>> cycles(*). So the H() function above is indeed faster than MD5 on 16-byte
>> input, but:
>> - not by a large margin;
>> - the timings may be different on other architectures;
>> - the AES implementation uses tables (cumulative size is 4 kB) so it
>> needs some L1 cache, which MD5 does not; this may or may not change
>> things, possibly drastically, depending on the rest of the application
>> (the code which uses that H() function).
>>
>> Hence it is relatively unclear whether H() is always faster than MD5. It
>> seems plausible that H() and MD5 will offer performance of similar
>> magnitude on most systems, except those for which either AES or MD5
>> benefits from some specialized hardware.
>>
>> --Thomas Pornin
>>
>> (*) There is a paper by Bernstein and Schwabe which claims about 169
>> clock cycles for an AES block, and the XOR will account for no more than
>> 4 extra clock cycles, so that "230" could be reduced to something like
>> "173". But remember that this is in "ideal benchmark conditions",
>> something which rarely happens in actual applications, especially due to
>> cache effects: the L1 cache (both instruction and data) must be shared
>> with the rest of the code in the data processing path.
>
> This is sad. The fastest thing I've found is a bit-sliced
> implementation (by Kaesper and Schwabe) using SSE3 and achieving 7.08
> cycles per byte on Intel Core i7 920 when processing 4KiB. Even so,
> this means 114 cycles for an AES block and speedup factor of three
> only. It uses no tables, but it achieves the speed for large blocks
> only. I don't think I could get anywhere close even if I could rely on
> SSE3.

Bleeding edge Intel chips can reportedly do about 1.28 cycle/byte, an that's for one core.

<http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-aes-instructions-set/>

Francois Grieu
From: Kristian Gj�steen on
Paul Rubin <no.email(a)nospam.invalid> wrote:
>But, hash functions are built for the express purpose of resisting what
>amounts to a "known key attack", i.e. to substitute for random oracles.

I don't know if MD5 and SHA-1 were designed to look like "random
functions", but I think AES was designed to look like an ideal cipher. One
consequence is that aes(k,.) should look like a random permutation.

>I agree that in practice the function is probably ok, but it presumes
>characteristics of AES that are not among the advertised ones.

Well, I think the designers aimed for this property, among others.

--
Kristian Gj�steen