Estimating entropy of a stream [Cryptography]

Prev: Encryption & Authentication
Next: Decimation

From: Joe Green on 20 Dec 2009 00:08

Data examination, spectrum analysis and intimate knowledge of your randomness source is what is needed. Thus if you calculate 5 bits of randomness per sample, arithmetic says output 32 bits every 32/5 (round up to 7) times. Then double or triple that. I would go with 32 bits every 16 samples.

//Do it like this
hash_work_area tbl;
for(;;) {
for(j=0;j<16;j++) {
augment tbl with randomness source
tbl = someHash(tbl);
}
output 32 bits from table
}

After program start or power up be sure to toss the first outputs equivalent to at least 3 times the hash area size / bits of randomness per sample.

I will soon have a < US$100 USB TRNG that produces > 50K true random bytes per second. My processor is a 32-bit machine that runs the "augment tbl with randomness source and tbl = someHash(tbl)" steps for two samples in just over 3 microseconds. My hash work area is over 1200 bits with excellent and efficient augmentation and mixing.

Or you could cheat and no one could tell. (My USB TRNG does not cheat.)

From: unruh on 22 Dec 2009 16:13

On 2009-12-22, Maaartin <grajcar1(a)seznam.cz> wrote:
> On Dec 22, 7:38?pm, unruh <un...(a)wormhole.physics.ubc.ca> wrote:
>> On 2009-12-22, lukeo <lukejamesocon...(a)gmail.com> wrote:
>>
>> No it does not converge to the entropy of a stream. Consider the stream
>> 1 2 3 4 5 6 7 8.... run through say MD4. The result will fool any
>> entropy test on the output into reporting a maximal entropy stream. But
>> it clearly is not. It has almost the same entropy ( a few bytes more) ?as the stream 1 2 3 4 5 6 7 8
>
> I think it will.... after seeing md4(1) the probabilty of seeing md4
> (2) is very high (it'd be 1 for a block cipher, it may be 1, 1/2, 1/3
> or anything like this for a hash, nobody knows). But for this you need
> to work with blocks of size 128 bits and await the next occurence of
> md4(1), which is with time and memory complexity of 2**128 completely
> impossible. So there's convergence, but of no use to anybody. Or am I
> wrong?

Depends on how md4 is applied to the stream of successive integers.
Remember it can hash an arbitrary length. Thus there is no "recycling"
after 2^128 elements. And md4(1) is also equal to md4(r) for an
infinite number of r, and 1 never reoccurs. Ie, in the stream, the
probability of seeing md4(2) occur after md4(1) is tiny ( someting like
2^(-128).

From: Maaartin on 22 Dec 2009 14:19

On Dec 22, 7:38 pm, unruh <un...(a)wormhole.physics.ubc.ca> wrote:
> On 2009-12-22, lukeo <lukejamesocon...(a)gmail.com> wrote:
>
> No it does not converge to the entropy of a stream. Consider the stream
> 1 2 3 4 5 6 7 8.... run through say MD4. The result will fool any
> entropy test on the output into reporting a maximal entropy stream. But
> it clearly is not. It has almost the same entropy ( a few bytes more) as the stream 1 2 3 4 5 6 7 8

I think it will.... after seeing md4(1) the probabilty of seeing md4
(2) is very high (it'd be 1 for a block cipher, it may be 1, 1/2, 1/3
or anything like this for a hash, nobody knows). But for this you need
to work with blocks of size 128 bits and await the next occurence of
md4(1), which is with time and memory complexity of 2**128 completely
impossible. So there's convergence, but of no use to anybody. Or am I
wrong?

From: unruh on 23 Dec 2009 00:48

On 2009-12-23, Maaartin <grajcar1(a)seznam.cz> wrote:
> On Dec 22, 10:13?pm, unruh <un...(a)wormhole.physics.ubc.ca> wrote:
>>
>> Depends on how md4 is applied to the stream of successive integers.
>> Remember it can hash an arbitrary length. Thus there is no "recycling"
>> after 2^128 elements. And md4(1) is also equal to md4(r) for an
>> infinite number of r, and 1 never reoccurs. Ie, in the stream, the
>> probability of seeing md4(2) occur after md4(1) is tiny ( someting like
>> 2^(-128).
>
> Yes, but...
>
> Let's assume, the 128 least significant bits come last. The internal
> state is finite, so after seeing
> md4(1), md4(2), and md4(3) in row again, you can bet the next sample
> will be md4(4).
> There's (afaik) no period, but the whole sequence consists of "only"
> 2**128 chunks of length 2**128.
> Knowing this, you'd need "only" memory of 2**128 samples and about
> 2*256 time,
> but I guess the universal statistical test would need 2**256 of both,
>
> Actually, I don't think it converges to the right value of 0 bits per
> sample,
> from the above I assume it leads to no more than 128/(2**128) bits per
> sample.
>
> Let's assume, the 128 least significant bits come first.

So what? That is not how md4 works. It depends on the all of the bits in
the input. So if the input is 100000 bits long, the output depends on
all of those input bits.

> The first 2**256 samples are the same as in the previous case, just
> reordered,
> but this would mean that the test had to look at values 2**128 samples
> apart,
> which is even more crazy. But maybe there's a better way.
>
> I ignored the padding since considering it would need a precise
> description of the representation of unlimited integers.

From: Maaartin on 23 Dec 2009 01:39

On Dec 23, 6:48 am, unruh <un...(a)wormhole.physics.ubc.ca> wrote:
>
> So what? That is not how md4 works. It depends on the all of the bits in
> the input. So if the input is 100000 bits long, the output depends on
> all of those input bits.

Sure, it does. But I don't see
- what makes you believe I don't know it
- what part of what I wrote do you consider to be wrong
- how this should be an argument against it

| Next | Last
Pages: 1 2
Prev: Encryption & Authentication
Next: Decimation