KISS4691, a potentially top-ranked RNG. [Fortran]

Prev: solutions book
Next: real kind declaration

From: Ilmari Karonen on 31 Jul 2010 10:25

["Followup-To:" header set to sci.math.]
On 2010-07-30, Ron Shepard <ron-shepard(a)NOSPAM.comcast.net> wrote:
> In article <i2tvn7$ldf$1(a)smaug.linux.pwf.cam.ac.uk>, nmm1(a)cam.ac.uk
> wrote:
>> In article <ron-shepard-969C9C.22130629072010(a)news60.forteinc.com>,
>> Ron Shepard <ron-shepard(a)NOSPAM.comcast.net> wrote:
>> >
>> >The situation that needs to be avoided is to run the code once with one
>> >seed, and then run it again with another seed that results in an overlap
>> >of the sequences of values for the two runs. In some applications this
>> >is unimportant, but in other applications it would be bad for two
>> >supposedly independent runs to have a long sequence of identical values.
>>
>> No, no, a thousand times, NO! That is NOT enough, though FAR too many
>> Web pages, published papers and books claim that it is. Disjointness
>> isn't even a poor relation of randomness.
>
[snip]
>
>> Parallel random number generation is not easy, and 99% of the stuff
>> published on it is somewhere between grossly misleading and complete
>> nonsense.
>
> I think in the parallel case, one would want to be able to generate
> a seed to produce values that are guaranteed not to overlap with any
> other node. Maybe something like
>
> call RANDOM_NEW_SEED( old_seed, n, new_seed, my_node )
>
> would be a sufficient interface. new_seed(:) would depend on
> my_node in such a way that the generated sequence would not overlap
> with that produced by any other possible value of my_node (again,
> assuming the cycle length is long enough to satisfy that request).

Let me try to demonstrate, using a simplified (and admittedly somewhat
artificial) counterexample, why lack of overlap is not a sufficient
condition for independence:

Assume we have a "random oracle" R: Z -> [0,1) which takes as its
input an integer and returns a uniformly distributed random real
number between 0 and 1, such that the same input always produces the
same output, but that the outputs for different inputs are completely
independent.

Given such an oracle, we can construct a perfect PRNG P: Z -> [0,1)^N
which takes as its input an integer k, and returns the sequence <R(k),
R(k+1), R(k+2), ...>. Obviously, the sequence generated by P would be
indistinguishable from random (since it is indeed, by definition,
random) and non-repeating.

Now, what if we wanted several independent streams of random numbers?
Obviously, we can't just pass different seed values to P, since we
know that the streams would eventually overlap. We could solve the
problem by modifying P, e.g. to make it return the sequence
<R(f(k,0)), R(f(k,1)), R(f(k,2)), ...> instead, where f is an
injective function from Z^2 to Z. But what if we wanted to do it
_without_ modifying P?

One _bad_ solution would be to define a new generator Q: Z -> [0,1)^N
as Q(k)_i = frac(P(0)_i + ak), where a is some irrational number and
frac: R -> [0,1) returns the part of a real number "after the decimal
point" (i.e. frac(x) = x - floor(x)).

Clearly, the sequences returned by Q for different seed values would
never overlap, and, individually, they would each still be perfectly
random. Yet, just as clearly, all those sequences would be related by
a very trivial linear relation that would be blatantly obvious if you,
say, plotted two of them against each other.

The _right_ solution, as I suggested above, would've been to redesign
the underlying generator so that the different streams would be not
just non-overlapping but actually statistically independent. The same
general advice holds for practical PRNGs too, not just for my
idealized example.

You can't just take any arbitrary PRNG, designed for generating a
_single_ stream of random-seeming numbers, and expect the output to
still look random if you get to compare several distinct output
streams generated from related seeds. It might turn out that it does
look so, if the designer of the PRNG was careful or just lucky. But
designing a PRNG to satisfy such a requirement is a strictly harder
problem than simply designing it to generate a single random-looking
stream.

--
Ilmari Karonen
To reply by e-mail, please replace ".invalid" with ".net" in address.

From: Ron Shepard on 1 Aug 2010 21:19

In article
<e5f36412-ddf2-4ed5-9014-9d23e2c0da88(a)u4g2000prn.googlegroups.com>,
orz <cdhorz(a)gmail.com> wrote:

> If an RNG has a large number of possible states (say, 2^192) then it's
> unlikely for any two runs to EVER reach identical states by any means
> other than starting from the same seed. This RNG has a period vastly
> in excess of "large", so for practical purposes it just won't
> happen.

If you think about it for a second, you can see that this is not true
for a PRNG with the standard fortran interface. This interface allows
the seed values to be both input and output. The purpose (presumably,
although this is not required by the standard) for this is to allow the
seed to be extracted at the end of one run, and then used to initiate
the seed for the next run. This eliminates the possibility of the two
runs having overlapping sequences (provided the PRNG cycle is long
enough, of course).

But, with the ability to read the internal state of the PNRG, this would
allow the user to generate the exact same sequence twice (useful for
debugging, for example). Or the user could generate two sequences that
have arbitrary numbers of overlapping values (the sequences offset by 1,
offset by 2, offset by 100, or whatever). I can't think of any
practical reason for doing this. However, since it can be done with
100% certainty, it is not "unlikely" for it to occur, assuming the user
wants it to happen.

In a previous post, I suggested a new (intrinsic fortran) subroutine of
the form

call RANDOM_NEW_SEED( old_seed, n, new_seed )

One way to implement this subroutine would be for new_seed(:) to be the
internal state corresponding to the (n+1)-th element of the sequence
that results (or would result) from initialization with old_seed(:). Of
course, for this to be practical, especially for large values of n, this
would require the PRNG algorithm to allow the internal state for
arbitrary elements within a sequence to be determined in this way. That
would place practical restrictions on which PRNG algorithms can be used
with this interface, especially if additional features such as
independence of the separate sequences are also required.

$.02 -Ron Shepard

From: sturlamolden on 1 Aug 2010 21:29

I did a speed comparison on my laptop of KISS4691 against Mersenne
Twister 19937. KISS4691 produced about 110 million random numbers per
second, whereas MT19937 produced 118 million. If the compiler was
allowed to inline KISS4691, the performance increased to 148 millon
per second. The function call overhead is imporant, and without taking
it into consideration, MT19937 will actually be faster than KISS4691.

Also note that a SIMD oriented version of MT19937 is twice as fast as
the one used in my tests. There is also a version of Mersenne Twister
suitable for parallel processing and GPUs. So a parallel or SIMD
version of Mersenne Twister is likely to be a faster PRNG than
KISS4691.

Then for the question of numerical quality: Is KISS4691 better than
MT19937? I don't feel qualified to answer this. Marsaglia is the
world's foremost authority on random number generators, so I trust his
advice, but MT19937 is claimed to give impeckable quality except for
crypto.

For those who need details:

The speed test was basically running the PRNGs for five seconds or
more, querying Windows' performance counter on every ten million call,
and finally subtracting an estimate of the timing overhead. The C
compiler was Microsoft C/C++ version 15.00.30729.01 (the only
optimization flag I used was /O2, i.e. maximize for speed). The
processor is AMD Phenom N930 @ 2.0 Gz.

Sturla Molden

From: glen herrmannsfeldt on 1 Aug 2010 22:29

In comp.lang.fortran Ron Shepard <ron-shepard(a)nospam.comcast.net> wrote:
(snip)

> In a previous post, I suggested a new (intrinsic fortran) subroutine of
> the form

> call RANDOM_NEW_SEED( old_seed, n, new_seed )

> One way to implement this subroutine would be for new_seed(:) to be the
> internal state corresponding to the (n+1)-th element of the sequence
> that results (or would result) from initialization with old_seed(:). Of
> course, for this to be practical, especially for large values of n, this
> would require the PRNG algorithm to allow the internal state for
> arbitrary elements within a sequence to be determined in this way. That
> would place practical restrictions on which PRNG algorithms can be used
> with this interface, especially if additional features such as
> independence of the separate sequences are also required.

The standard mostly does not specify what is, or is not, practical.

DO I=1,INT(1000000,SELECTED_INT_KIND(20))**3
ENDDO

For many generators, one should not specify too large a value
for the above RANDOM_NEW_SEED routine.

-- glen

From: Noob on 2 Aug 2010 07:06

sturlamolden wrote:

> [...] MT19937 is claimed to give impeckable quality [...]

It cannot be eaten in small bites? ;-)

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Prev: solutions book
Next: real kind declaration