Is this secure? [Python]

Prev: ANN: Leo 4.7 final released
Next: AKKA vs Python

From: Paul Rubin on 24 Feb 2010 14:31

mk <mrkafk(a)gmail.com> writes:
> So I have little in the way of limitations of password length ...>
> The main application will access the data using HTTP (probably), so
> the main point is that an attacker is not able to guess passwords
> using brute force.

If it's HTTP instead of HTTPS and you're sending the password in the
clear, then a serious attacker can simply eavesdrop the connection and
pick up the password. Again, if the application is a web forum or
something like that, the security requirements probably aren't terribly
high. If it's (say) a financial application with potentially motivated
attackers, you've got to be a lot more careful than I think you're being
right now, and you should really get security specialists involved.

> Using A-z with 10-char password seems to provide 3 orders of magnitude
> more combinations than a-z:

Yes, 2**10 = 1024 so (57/25)**10 is a little more than that.

> Even then I'm not getting completely uniform distribution for some reason:

Exact equality of the counts would be surprising and a sign that
something was wrong with the generation process. It would be like
flipping a coin 10000 times and getting exactly 5000 heads. The
binomial distribution tells you that the number should be close to 5000,
but that it's unlikely to be -exactly- 5000.

Also, as Michael Rudolf mentioned, getting a letter by taking n%26 where
n is drawn uniformly from [0..255] doesn't give a uniform distribution
because 256 is not a multiple of 26. I had thought about making an
adjustment for that when I posted, but it didn't seem worth cluttering
up the code. Uniformity for its own sake doesn't gain you anything;
what matters is entropy. If you compute the entropy difference between
the slightly nonuniform distribution and a uniform one, it's very small.

To get a more uniform distribution I usually just take a larger n,
rather than conditionalizing the draws. For example, in the
diceware-like code I posted, I read 10 random bytes (giving a uniform
random number on [0..2**80]) from urandom for each word. That is still
not perfectly uniform, but it's closer to the point where the difference
would be very hard to detect.

> Aw shucks when will I learn to do the stuff in 3 lines well instead of
> 20, poorly. :-/

Well, that's partly a matter of practice, but I'll mention one way I
simplified the code, which was by reading more bytes from /dev/urandom
than was really necessary. I read one byte for each random letter
(i.e. throwing away about 3 random bits for each letter) while you tried
to encode the urandom data cleverly and map 4 random bytes to 5
alphabetic letters. /dev/urandom uses a cryptographic PRNG and it's
pretty fast, so reading a few extra bytes from it to simplify your code
doesn't really cost you anything.

From: Michael Rudolf on 24 Feb 2010 14:30

Am 24.02.2010 19:35, schrieb mk:
> On 2010-02-24 18:56, Michael Rudolf wrote:
>
>> The reason is 256 % 26 != 0
>> 256 mod 26 equals 22, thus your code is hitting a-v about 10% (256/26 is
>> approx. 10) more often than w-z.
>
> <Barbie voice>writing secure code is hard...

So true. That's why one should stick to standard libs when it comes to
crypto or security in general. It's just to easy to mess it up. Just ask
Debian about whether touching OpenSSL was a good idea ;)

>> You might want to skip the values 0-22
>> to achieve a truly uniform distribution.
> Hmm perhaps you meant to skip values over 256 - 22 ?

That's the same thing as x mod y equals x+N*y mod y for every natural N.

> def gen_rand_word(n):
> with open('/dev/urandom') as f:
> return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x)
> > 22])

Off-by-one-error: you're skipping len(range(22))==23 hits.
OK, I just see that I wrote misleading 0-22 while I meant range(22).

> While with this:
> def gen_rand_word(n):
> with open('/dev/urandom') as f:
> return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x)
> < 235])

Same off-by-one.

>> FYI: Electronic Cash PINs in europe (dont know about the rest of the
>> world) were computed the same way (random hexdigit and just mod it when
>> it's too large) leading to a high probability that your first digit was
>> a 1 :)
> Schadenfreude is deriving joy from others' misfortunes; what is the
> German word, if any, for deriving solace from others' misfortunes? ;-)

Well - "Schadenfreude" *is* in fact a german word :)
"Schaden" is the event or result of misfortune, "Freude" is joy.

Well, I really think that you should use repeated Random.choice on an
alphabet.
Or Random.Systemrandom.choice if you don't trust the PRNG.

Regards,
Michael

From: mk on 24 Feb 2010 15:02

On 2010-02-24 20:19, Robert Kern wrote:
> On 2010-02-24 13:09 PM, mk wrote:
>> On 2010-02-24 20:01, Robert Kern wrote:
>>> I will repeat my advice to just use random.SystemRandom.choice() instead
>>> of trying to interpret the bytes from /dev/urandom directly.
>>
>> Oh I hear you -- for production use I would (will) certainly consider
>> this. However, now I'm interested in the problem itself: why is the damn
>> distribution not uniform?
>
> You want "< 234", not "< 235". (234 % 26 == 0), so you get some extra 'a's.

Right, this explains the 'a' outlier. Fixed. But still:

import operator
import os
import random
import math

def rand_str_custom(n):
s = os.urandom(n)
return ''.join([chr(ord('a') + ord(x) % 26) for x in s if ord(x) <
234])

def count_chars(chardict, word):
for c in word:
try:
chardict[c] += 1
except KeyError:
chardict[c] = 0

def rand_str_SystemRandom_seeding(length):
seed = os.urandom(32)
random.seed(seed)
prng = random.SystemRandom()
chars = []
for i in range(length):
chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz'))
return ''.join(chars)

def rand_str_SystemRandom_noseeding(length):
prng = random.SystemRandom()
chars = []
for i in range(length):
chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz'))
return ''.join(chars)

def sd(x):
sd.sum += x
sd.sum2 += x*x
sd.n += 1.0
sum, sum2, n = sd.sum, sd.sum2, sd.n
return math.sqrt(sum2/n - sum*sum/n/n)

def gen_rand_with_fun(fun):
print fun.__name__
chardict = {}
for i in range(10000):
w = fun(10)
count_chars(chardict, w)
counts = list(chardict.items())
counts.sort(key = operator.itemgetter(1), reverse = True)
nums = [c[1] for c in counts]
sd.sum = sd.sum2 = sd.n = 0
mean = (1.0*sum(nums))/len(nums)
stddev = map(sd, nums)[-1]
print 'mean', mean, 'std dev', stddev
for char, count in counts:
print char, count, '%.2f' % ((count - mean)/stddev), 'std devs
away from mean'

if __name__ == "__main__":
gen_rand_with_fun(rand_str_SystemRandom_seeding)
print
gen_rand_with_fun(rand_str_SystemRandom_noseeding)
print
gen_rand_with_fun(rand_str_custom)

rand_str_SystemRandom_seeding
mean 3845.15384615 std dev 46.2016419186
l 3926 1.75 std devs away from mean
y 3916 1.53 std devs away from mean
d 3909 1.38 std devs away from mean
a 3898 1.14 std devs away from mean
p 3898 1.14 std devs away from mean
c 3889 0.95 std devs away from mean
u 3884 0.84 std devs away from mean
j 3873 0.60 std devs away from mean
n 3873 0.60 std devs away from mean
w 3866 0.45 std devs away from mean
x 3863 0.39 std devs away from mean
r 3855 0.21 std devs away from mean
m 3852 0.15 std devs away from mean
b 3841 -0.09 std devs away from mean
t 3835 -0.22 std devs away from mean
o 3829 -0.35 std devs away from mean
k 3827 -0.39 std devs away from mean
i 3821 -0.52 std devs away from mean
s 3812 -0.72 std devs away from mean
q 3806 -0.85 std devs away from mean
v 3803 -0.91 std devs away from mean
g 3799 -1.00 std devs away from mean
h 3793 -1.13 std devs away from mean
e 3782 -1.37 std devs away from mean
f 3766 -1.71 std devs away from mean
z 3758 -1.89 std devs away from mean

rand_str_SystemRandom_noseeding
mean 3845.15384615 std dev 55.670522726
i 3961 2.08 std devs away from mean
r 3911 1.18 std devs away from mean
e 3910 1.16 std devs away from mean
m 3905 1.08 std devs away from mean
a 3901 1.00 std devs away from mean
u 3893 0.86 std devs away from mean
t 3882 0.66 std devs away from mean
w 3872 0.48 std devs away from mean
s 3870 0.45 std devs away from mean
c 3868 0.41 std devs away from mean
n 3866 0.37 std devs away from mean
q 3865 0.36 std devs away from mean
k 3863 0.32 std devs away from mean
y 3848 0.05 std devs away from mean
j 3836 -0.16 std devs away from mean
v 3830 -0.27 std devs away from mean
f 3829 -0.29 std devs away from mean
z 3829 -0.29 std devs away from mean
g 3827 -0.33 std devs away from mean
l 3818 -0.49 std devs away from mean
b 3803 -0.76 std devs away from mean
d 3803 -0.76 std devs away from mean
p 3756 -1.60 std devs away from mean
x 3755 -1.62 std devs away from mean
h 3744 -1.82 std devs away from mean
o 3729 -2.09 std devs away from mean

rand_str_custom
mean 3517.15384615 std dev 40.7541336343
i 3586 1.69 std devs away from mean
a 3578 1.49 std devs away from mean
e 3575 1.42 std devs away from mean
m 3570 1.30 std devs away from mean
q 3562 1.10 std devs away from mean
c 3555 0.93 std devs away from mean
g 3552 0.86 std devs away from mean
w 3542 0.61 std devs away from mean
p 3536 0.46 std devs away from mean
x 3533 0.39 std devs away from mean
s 3528 0.27 std devs away from mean
o 3524 0.17 std devs away from mean
d 3516 -0.03 std devs away from mean
t 3515 -0.05 std devs away from mean
h 3511 -0.15 std devs away from mean
v 3502 -0.37 std devs away from mean
z 3502 -0.37 std devs away from mean
b 3500 -0.42 std devs away from mean
f 3496 -0.52 std devs away from mean
u 3492 -0.62 std devs away from mean
l 3486 -0.76 std devs away from mean
r 3478 -0.96 std devs away from mean
n 3476 -1.01 std devs away from mean
j 3451 -1.62 std devs away from mean
k 3450 -1.65 std devs away from mean
y 3430 -2.14 std devs away from mean

It would appear that SystemRandom().choice is indeed best (in terms of
how much the counts stray from mean in std devs), but only after seeding
it with os.urandom.

Regards,
mk

From: mk on 24 Feb 2010 15:06

On 2010-02-24 20:30, Michael Rudolf wrote:
>>> The reason is 256 % 26 != 0
>>> 256 mod 26 equals 22, thus your code is hitting a-v about 10% (256/26 is
>>> approx. 10) more often than w-z.
>>
>> <Barbie voice>writing secure code is hard...
>
> So true. That's why one should stick to standard libs when it comes to
> crypto or security in general. It's just to easy to mess it up. Just ask
> Debian about whether touching OpenSSL was a good idea ;)

That was brain-dead hiccup, for crying out loud how could they do smth
so stupid.

>> def gen_rand_word(n):
>> with open('/dev/urandom') as f:
>> return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x)
>> > 22])
>
> Off-by-one-error: you're skipping len(range(22))==23 hits.

Argh, it's late here.

> Well, I really think that you should use repeated Random.choice on an
> alphabet.
> Or Random.Systemrandom.choice if you don't trust the PRNG.

I just posted a comparison with calculating std deviations for various
methods - using os.urandom, SystemRandom.choice with seeding and without
seeding.

They all seem to have slightly different distributions.

Regards,
mk

From: Robert Kern on 24 Feb 2010 15:20

On 2010-02-24 14:02 PM, mk wrote:

> It would appear that SystemRandom().choice is indeed best (in terms of
> how much the counts stray from mean in std devs), but only after seeding
> it with os.urandom.

Calling random.seed() does not affect SystemRandom() whatsoever. You are getting
perfectly acceptable distributions for all three variants.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: ANN: Leo 4.7 final released
Next: AKKA vs Python