From: mk on
On 2010-02-24 20:01, Robert Kern wrote:
> I will repeat my advice to just use random.SystemRandom.choice() instead
> of trying to interpret the bytes from /dev/urandom directly.

Oh I hear you -- for production use I would (will) certainly consider
this. However, now I'm interested in the problem itself: why is the damn
distribution not uniform?

Regards,
mk


From: Robert Kern on
On 2010-02-24 13:09 PM, mk wrote:
> On 2010-02-24 20:01, Robert Kern wrote:
>> I will repeat my advice to just use random.SystemRandom.choice() instead
>> of trying to interpret the bytes from /dev/urandom directly.
>
> Oh I hear you -- for production use I would (will) certainly consider
> this. However, now I'm interested in the problem itself: why is the damn
> distribution not uniform?

You want "< 234", not "< 235". (234 % 26 == 0), so you get some extra 'a's.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

From: mk on
On 2010-02-24 20:01, Robert Kern wrote:
> I will repeat my advice to just use random.SystemRandom.choice() instead
> of trying to interpret the bytes from /dev/urandom directly.

Out of curiosity:

def gen_rand_string(length):
prng = random.SystemRandom()
chars = []
for i in range(length):
chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz'))
return ''.join(chars)

if __name__ == "__main__":
chardict = {}
for i in range(10000):
## w = gen_rand_word(10)
w = gen_rand_string(10)
count_chars(chardict, w)
counts = list(chardict.items())
counts.sort(key = operator.itemgetter(1), reverse = True)
for char, count in counts:
print char, count


s 3966
d 3912
g 3909
h 3905
a 3901
u 3900
q 3891
m 3888
k 3884
b 3878
x 3875
v 3867
w 3864
y 3851
l 3825
z 3821
c 3819
e 3819
r 3816
n 3808
o 3797
f 3795
t 3784
p 3765
j 3730
i 3704

Better, although still not perfect.

Regards,
mk

From: Paul Rubin on
Robert Kern <robert.kern(a)gmail.com> writes:
> I will repeat my advice to just use random.SystemRandom.choice()
> instead of trying to interpret the bytes from /dev/urandom directly.

SystemRandom is something pretty new so I wasn't aware of it. But
yeah, if I were thinking more clearly I would have suggested os.urandom
instead of opening /dev/urandom.
From: Robert Kern on
On 2010-02-24 13:16 PM, mk wrote:
> On 2010-02-24 20:01, Robert Kern wrote:
>> I will repeat my advice to just use random.SystemRandom.choice() instead
>> of trying to interpret the bytes from /dev/urandom directly.
>
> Out of curiosity:
>
> def gen_rand_string(length):
> prng = random.SystemRandom()
> chars = []
> for i in range(length):
> chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz'))
> return ''.join(chars)
>
> if __name__ == "__main__":
> chardict = {}
> for i in range(10000):
> ## w = gen_rand_word(10)
> w = gen_rand_string(10)
> count_chars(chardict, w)
> counts = list(chardict.items())
> counts.sort(key = operator.itemgetter(1), reverse = True)
> for char, count in counts:
> print char, count
>
>
> s 3966
> d 3912
> g 3909
> h 3905
> a 3901
> u 3900
> q 3891
> m 3888
> k 3884
> b 3878
> x 3875
> v 3867
> w 3864
> y 3851
> l 3825
> z 3821
> c 3819
> e 3819
> r 3816
> n 3808
> o 3797
> f 3795
> t 3784
> p 3765
> j 3730
> i 3704
>
> Better, although still not perfect.

This distribution is well within expectations.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: ANN: Leo 4.7 final released
Next: AKKA vs Python