From: kanze on
kevin.hall(a)motioneng.com wrote:
> What denotes a good random number generator?

There are a number of tests.

> No PSEUDO-random number generator is good compared to a TRUE
> random number generator.

It depends on what you are doing. For many applications, they
are better.

> But the C library rand() (which Ulrich Eckhardt and others in
> this thread have recommended) is usually implemented as a
> linear congruential PRNG. And these are notorious for having
> less random lower order bits.

The quality of a linear congruential PRNG depends on its
parameters. Historically, the original Unix implementation used
a very bad set, which resulted in a poor generator, period.
(Not just the low order bits, although they were visibly poor.)
Regretfully, this implementation ended up as the example in the
C standard. But...

This is a very poor generator, which fails just about every
standard test. Avoiding using the low order bits may mask the
problem, but the problem is still there. If your library
implementation still uses this algorithm, the only real solution
is to change it.

The problem has been known for quite some time. I think (or
hope) that most library implementations have since corrected it.

> What PRNG do you use?

Basically a linear congruential generator, based on parameters
discussed in "Random Number Generators: Good Ones Are Hard To
Find" (Park and Miller, CACM, Oct. 1988). To increase the
period, I use two of them, with slightly different periods,
merged using a shuffling algorithm described in Seciont 7.1 of
<i>Numerical Recipes in C</i> (Press, Teukolscy, Vetterling and
Flannery, 1992).

> I use the mersenne-twister for anything that really matters in
> my work.

I've heard that they are very good. I've not had any reason to
complain about my current implementation, but of course, I don't
use it for anything really serious.

> Though I believe the MT is less susceptible to having lower
> order bits be less random, I haven't yet seen a study that
> proves that.

I believe that it passes all of the standard tests, which should
be enough.

> I would like to though -- seriously I would. If you can point
> me to one or to a study that proves that the lower order bits
> for the PRNGs that you use are not less random, please post a
> link.

The behavior of linear congruential generators is discussed in
detail in the Park and Miller article cited above. For the
Mersenne twister, I would look up the papers where it is first
presented -- given its date, I presume that they document
sufficient test results to convince people that they are good.

FWIW, I usually use a very simple first test: I simulate
throwing a pair of dice, using "rand()%6 + rand()%6 + 2". The
relative probabilities for the results from 2 to 12 should be 1,
2, 3, 4, 5, 6, 5, 4, 3, 2, 1. Do ten or twenty trials of
1000 throws each. If any one number is systematically above or
below the expected probability, it's suspicious. (If all numbers
are always exactly the expected probability, it's also
suspicious.)

With the orignal Unix implementation of rand(), the probability
of 2, 4, 6, 8, 10 or 12 in this test was 0 -- the values never
came up. I've not done extensive tests of the rand() under
Solaris today, but at least this simple test doesn't show up any
obvious errors.

--
James Kanze GABI Software
Conseils en informatique orient?e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: kevin.hall on
> Which C library is "the C library"?

Ok, you're right here. What I should have said to be more clear is
"Many implementations of the C library use 'simple' PNRGs that are
notorious for this behavior. In particular, many older libraries use
linear congruential generators which are known to display this
behavior".

> Maybe. I'm defending thinking instead of basing engineering decisions
> on rumors.

Look, linear congruential generators are known to have this problem.
Modulo is thus known *in some circumstances* to be a poor choice.
Division is known not to have problems *in all circumstances*. (This
is what Carlos has pointed out too). I'm going to choose the operation
that is will work all the time. Then I don't have to worry as much
about how good the PNRG is for lower order bits.

> No, because it's not the same as above.

OK, again I didn't make myself clear. There's two parts of the
problem:

(1) reducing the range of values from the PNRG to reduce the imbalance
in results.
(2) Choosing an operation to map the many "passing" values of the PNRG
to the target range required by the user.

I agreed with Ulrich on (1). I was attempting to show a little more of
the math behind it.
What I disagree with Ulrich on is (2). Thus my list shows a different
transformation.

I hope that clears things up.


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: kanze on
Francis Glassborow wrote:
> In article <1139433651.899829.311980(a)g47g2000cwa.googlegroups.com>,
> kevin.hall(a)motioneng.com writes
> >Modulo is a poor operation in helping find a random number.
> >The reason is that the lowest bits in a psuedo-random number
> >are less random than the upper bits. It's much better to use
> >division:

> And that is completely irrelevant even if it were true (which
> it generally isn't) unless you are dividing by a power of two.

Not really. It is a known fact that a certain category of
linear congruent generators have a very low period in the low
order bits: the bit 0 has a period of 2, the bit 1 of 4, etc.
And that the example generator in the C standard is one of
these. This can be important any time you're taking a modulo
which is a multiple of 2. If you use an expression like
rand()%6 + rand()%6 + 2 to throw dice, you will never roll an
even value.

Of course, such generators have a lot of other problems as well.
A bad RGN is a bad RGN. Using the high order bits will
typically mean that a naive user won't seen the pattern as
readily, but it doesn't mean the results are any better. The
answer is to use a better generator.

I think that most implementations have fixed this; at any rate,
the Solaris generator doesn't suffer from this problem, and I
seem to remember than the Glibc generator used with Linux
doesn't either.

Still, if you have an application where you must have a certain
quality generator, you have to provide your own; the C++
standard doesn't give any guarantee.

--
James Kanze GABI Software
Conseils en informatique orient?e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: kanze on
Ulrich Eckhardt wrote:
> Carlos Moreno wrote:
> [using modulo versus using integer division to cut down an RNG's range]
> > As pointed out already, the lower bits, for certain classes
> > of generators and certain implementations tend to have lower
> > "randomness" than the higher bits.
> [snipped further explanation]
> > So, in some cases modulo is worse, and in no case division
> > is worse -- why use modulo then?

> Okay, now that's a reasoning I accept!

> However, I'd like to point out something: you implemented a
> workaround but omitted to document what exactly it was you
> were working around.

He also omitted any explination or proof that his work-around
actually worked around the problem. In fact, he mistated the
problem:

Historically, certain implementations of rand() were not
particularly random. This was immediately and visually apparent
in the low order bits, which had a very short cycle. Although
the high order bits were not really any better, statistically
speaking, the rumor soon got around that they were, somehow,
"more random".

In fact, even the best linear congruent generators suffer from
one defect: a very low number will always be followed by a high
number, since if the number is low enough, the multiplication
will not wrap. Thus, for example, with the minimum quality
random generator described in Park and Miller ("Random Number
Generators: Good Ones Are Hard To Find", CACM, Oct. 1988), any
time the generator returns a value less than 44488, you are
guaranteed that the next value is larger. Since it does this
only once in 48271 times, the statistical bias will not be
immediately visible, but in fact, you can say that the high
order bits are "slightly" less random than the low order bits.

(My own implementation merges two good linear congruent
generators with slightly different periods, so this effect is
not visible. But somehow I doubt that the typical
implementation of rand() goes that far.)

In practice, I would expect that this is true of most modern
implementations of rand(). The weaknesses of the older
generators are obvious, and Park and Miller gave a simple and
effective solution.

--
James Kanze GABI Software
Conseils en informatique orient?e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Pete Becker on
Alberto Ganesh Barbati wrote:
> quickcur(a)yahoo.com ha scritto:
>
>>Hi, I am using the int version of rand() with MS Visual Studio. I would
>>like to generate random number in the range of [0, 99]. Right now I am
>>using
>>
>>rand() * 100 / (RAN_MAX + 1)
>>
>>I found that I got too much 0. 0 appears more than any other numbers.
>>How can I fix it?
>
>
> I've seen a lot of discussion about rand(), modulo and division in this
> thread... I'm not going to delve into that, but rather I suggest you
> throw away rand() as soon as possible and consider using Boost.Random
> (http://boost.org/libs/random/) as a replacement.

Replacing something simple with something complex might be a good
solution if the replacement solves a real problem. At this point there's
nothing that indicates that such a drastic change is needed. rand() is
part of the standard library, so it's already there. No extra work needed.

> It's a solid
> peer-reviewed random generator library, which means all gory details
> have probably been settled, so that you don't need to bother about them.
>
> Boost.Random has been considered as a reference implementation for the
> upcoming TR1, so I bet it's going to become quite popular (at least I
> hope so).

There is no reference implementation for TR1, nor was there any
consideration of such a thing.

--

Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]