in [Python]

Prev: ANN: Leo 4.7 final released
Next: AKKA vs Python
From: Paul Rubin on 24 Feb 2010 14:31 mk <mrkafk (a)gmail.com> writes:> So I have little in the way of limitations of password length ...> > The main application will access the data using HTTP (probably), so > the main point is that an attacker is not able to guess passwords > using brute force. If it's HTTP instead of HTTPS and you're sending the password in the clear, then a serious attacker can simply eavesdrop the connection and pick up the password. Again, if the application is a web forum or something like that, the security requirements probably aren't terribly high. If it's (say) a financial application with potentially motivated attackers, you've got to be a lot more careful than I think you're being right now, and you should really get security specialists involved. > Using A-z with 10-char password seems to provide 3 orders of magnitude > more combinations than a-z: Yes, 2**10 = 1024 so (57/25)**10 is a little more than that. > Even then I'm not getting completely uniform distribution for some reason: Exact equality of the counts would be surprising and a sign that something was wrong with the generation process. It would be like flipping a coin 10000 times and getting exactly 5000 heads. The binomial distribution tells you that the number should be close to 5000, but that it's unlikely to be -exactly- 5000. Also, as Michael Rudolf mentioned, getting a letter by taking n%26 where n is drawn uniformly from [0..255] doesn't give a uniform distribution because 256 is not a multiple of 26. I had thought about making an adjustment for that when I posted, but it didn't seem worth cluttering up the code. Uniformity for its own sake doesn't gain you anything; what matters is entropy. If you compute the entropy difference between the slightly nonuniform distribution and a uniform one, it's very small. To get a more uniform distribution I usually just take a larger n, rather than conditionalizing the draws. For example, in the diceware-like code I posted, I read 10 random bytes (giving a uniform random number on [0..2**80]) from urandom for each word. That is still not perfectly uniform, but it's closer to the point where the difference would be very hard to detect. > Aw shucks when will I learn to do the stuff in 3 lines well instead of > 20, poorly. :-/ Well, that's partly a matter of practice, but I'll mention one way I simplified the code, which was by reading more bytes from /dev/urandom than was really necessary. I read one byte for each random letter (i.e. throwing away about 3 random bits for each letter) while you tried to encode the urandom data cleverly and map 4 random bytes to 5 alphabetic letters. /dev/urandom uses a cryptographic PRNG and it's pretty fast, so reading a few extra bytes from it to simplify your code doesn't really cost you anything.
From: Michael Rudolf on 24 Feb 2010 14:30 Am 24.02.2010 19:35, schrieb mk: > On 2010-02-24 18:56, Michael Rudolf wrote: > >> The reason is 256 % 26 != 0 >> 256 mod 26 equals 22, thus your code is hitting a-v about 10% (256/26 is >> approx. 10) more often than w-z. > > <Barbie voice>writing secure code is hard... So true. That's why one should stick to standard libs when it comes to crypto or security in general. It's just to easy to mess it up. Just ask Debian about whether touching OpenSSL was a good idea ;) >> You might want to skip the values 0-22 >> to achieve a truly uniform distribution. > Hmm perhaps you meant to skip values over 256 - 22 ? That's the same thing as x mod y equals x+N*y mod y for every natural N. > def gen_rand_word(n): > with open('/dev/urandom') as f: > return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x) > > 22]) Off-by-one-error: you're skipping len(range(22))==23 hits. OK, I just see that I wrote misleading 0-22 while I meant range(22). > While with this: > def gen_rand_word(n): > with open('/dev/urandom') as f: > return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x) > < 235]) Same off-by-one. >> FYI: Electronic Cash PINs in europe (dont know about the rest of the >> world) were computed the same way (random hexdigit and just mod it when >> it's too large) leading to a high probability that your first digit was >> a 1 :) > Schadenfreude is deriving joy from others' misfortunes; what is the > German word, if any, for deriving solace from others' misfortunes? ;-) Well - "Schadenfreude" *is* in fact a german word :) "Schaden" is the event or result of misfortune, "Freude" is joy. Well, I really think that you should use repeated Random.choice on an alphabet. Or Random.Systemrandom.choice if you don't trust the PRNG. Regards, Michael
From: mk on 24 Feb 2010 15:02 On 2010-02-24 20:19, Robert Kern wrote: > On 2010-02-24 13:09 PM, mk wrote: >> On 2010-02-24 20:01, Robert Kern wrote: >>> I will repeat my advice to just use random.SystemRandom.choice() instead >>> of trying to interpret the bytes from /dev/urandom directly. >> >> Oh I hear you -- for production use I would (will) certainly consider >> this. However, now I'm interested in the problem itself: why is the damn >> distribution not uniform? > > You want "< 234", not "< 235". (234 % 26 == 0), so you get some extra 'a's. Right, this explains the 'a' outlier. Fixed. But still: import operator import os import random import math def rand_str_custom(n): s = os.urandom(n) return ''.join([chr(ord('a') + ord(x) % 26) for x in s if ord(x) < 234]) def count_chars(chardict, word): for c in word: try: chardict[c] += 1 except KeyError: chardict[c] = 0 def rand_str_SystemRandom_seeding(length): seed = os.urandom(32) random.seed(seed) prng = random.SystemRandom() chars = [] for i in range(length): chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz')) return ''.join(chars) def rand_str_SystemRandom_noseeding(length): prng = random.SystemRandom() chars = [] for i in range(length): chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz')) return ''.join(chars) def sd(x): sd.sum += x sd.sum2 += x*x sd.n += 1.0 sum, sum2, n = sd.sum, sd.sum2, sd.n return math.sqrt(sum2/n - sum*sum/n/n) def gen_rand_with_fun(fun): print fun.__name__ chardict = {} for i in range(10000): w = fun(10) count_chars(chardict, w) counts = list(chardict.items()) counts.sort(key = operator.itemgetter(1), reverse = True) nums = [c[1] for c in counts] sd.sum = sd.sum2 = sd.n = 0 mean = (1.0*sum(nums))/len(nums) stddev = map(sd, nums)[-1] print 'mean', mean, 'std dev', stddev for char, count in counts: print char, count, '%.2f' % ((count - mean)/stddev), 'std devs away from mean' if __name__ == "__main__": gen_rand_with_fun(rand_str_SystemRandom_seeding) gen_rand_with_fun(rand_str_SystemRandom_noseeding) gen_rand_with_fun(rand_str_custom) rand_str_SystemRandom_seeding mean 3845.15384615 std dev 46.2016419186 l 3926 1.75 std devs away from mean y 3916 1.53 std devs away from mean d 3909 1.38 std devs away from mean a 3898 1.14 std devs away from mean p 3898 1.14 std devs away from mean c 3889 0.95 std devs away from mean u 3884 0.84 std devs away from mean j 3873 0.60 std devs away from mean n 3873 0.60 std devs away from mean w 3866 0.45 std devs away from mean x 3863 0.39 std devs away from mean r 3855 0.21 std devs away from mean m 3852 0.15 std devs away from mean b 3841 -0.09 std devs away from mean t 3835 -0.22 std devs away from mean o 3829 -0.35 std devs away from mean k 3827 -0.39 std devs away from mean i 3821 -0.52 std devs away from mean s 3812 -0.72 std devs away from mean q 3806 -0.85 std devs away from mean v 3803 -0.91 std devs away from mean g 3799 -1.00 std devs away from mean h 3793 -1.13 std devs away from mean e 3782 -1.37 std devs away from mean f 3766 -1.71 std devs away from mean z 3758 -1.89 std devs away from mean rand_str_SystemRandom_noseeding mean 3845.15384615 std dev 55.670522726 i 3961 2.08 std devs away from mean r 3911 1.18 std devs away from mean e 3910 1.16 std devs away from mean m 3905 1.08 std devs away from mean a 3901 1.00 std devs away from mean u 3893 0.86 std devs away from mean t 3882 0.66 std devs away from mean w 3872 0.48 std devs away from mean s 3870 0.45 std devs away from mean c 3868 0.41 std devs away from mean n 3866 0.37 std devs away from mean q 3865 0.36 std devs away from mean k 3863 0.32 std devs away from mean y 3848 0.05 std devs away from mean j 3836 -0.16 std devs away from mean v 3830 -0.27 std devs away from mean f 3829 -0.29 std devs away from mean z 3829 -0.29 std devs away from mean g 3827 -0.33 std devs away from mean l 3818 -0.49 std devs away from mean b 3803 -0.76 std devs away from mean d 3803 -0.76 std devs away from mean p 3756 -1.60 std devs away from mean x 3755 -1.62 std devs away from mean h 3744 -1.82 std devs away from mean o 3729 -2.09 std devs away from mean rand_str_custom mean 3517.15384615 std dev 40.7541336343 i 3586 1.69 std devs away from mean a 3578 1.49 std devs away from mean e 3575 1.42 std devs away from mean m 3570 1.30 std devs away from mean q 3562 1.10 std devs away from mean c 3555 0.93 std devs away from mean g 3552 0.86 std devs away from mean w 3542 0.61 std devs away from mean p 3536 0.46 std devs away from mean x 3533 0.39 std devs away from mean s 3528 0.27 std devs away from mean o 3524 0.17 std devs away from mean d 3516 -0.03 std devs away from mean t 3515 -0.05 std devs away from mean h 3511 -0.15 std devs away from mean v 3502 -0.37 std devs away from mean z 3502 -0.37 std devs away from mean b 3500 -0.42 std devs away from mean f 3496 -0.52 std devs away from mean u 3492 -0.62 std devs away from mean l 3486 -0.76 std devs away from mean r 3478 -0.96 std devs away from mean n 3476 -1.01 std devs away from mean j 3451 -1.62 std devs away from mean k 3450 -1.65 std devs away from mean y 3430 -2.14 std devs away from mean It would appear that SystemRandom().choice is indeed best (in terms of how much the counts stray from mean in std devs), but only after seeding it with os.urandom. Regards, mk
From: mk on 24 Feb 2010 15:06 On 2010-02-24 20:30, Michael Rudolf wrote: >>> The reason is 256 % 26 != 0 >>> 256 mod 26 equals 22, thus your code is hitting a-v about 10% (256/26 is >>> approx. 10) more often than w-z. >> >> <Barbie voice>writing secure code is hard... > > So true. That's why one should stick to standard libs when it comes to > crypto or security in general. It's just to easy to mess it up. Just ask > Debian about whether touching OpenSSL was a good idea ;) That was brain-dead hiccup, for crying out loud how could they do smth so stupid. >> def gen_rand_word(n): >> with open('/dev/urandom') as f: >> return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x) >> > 22]) > > Off-by-one-error: you're skipping len(range(22))==23 hits. Argh, it's late here. > Well, I really think that you should use repeated Random.choice on an > alphabet. > Or Random.Systemrandom.choice if you don't trust the PRNG. I just posted a comparison with calculating std deviations for various methods - using os.urandom, SystemRandom.choice with seeding and without seeding. They all seem to have slightly different distributions. Regards, mk
From: Robert Kern on 24 Feb 2010 15:20
On 2010-02-24 14:02 PM, mk wrote: > It would appear that SystemRandom().choice is indeed best (in terms of > how much the counts stray from mean in std devs), but only after seeding > it with os.urandom. Calling random.seed() does not affect SystemRandom() whatsoever. You are getting perfectly acceptable distributions for all three variants. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |