From: Allan W on
> Diego Martins wrote:
> > double zero_one_rand()
> > {
> > double r = static_cast<double>(rand()) / rand();
> > return r <= 1 ? r : 1/r;
> > }

kanze wrote:
> Would this result in a linear distribution? (I'm too lazy to do
> the analysis, but a priori, I'm very suspicious of anything that
> involves division by rand().)

I did the analysis, expecting it to be biased.

I didn't literally run the function above, to do the analysis.
Instead, I used:

int main() {
for (double n=0.0; n<1.0; n+=0.01)
for (double d=0.0; d<1.0; d+=0.01)
if (d>0.0) std::cout << (n/d) << std::endl;
else std::cout << "Err\n";
}

Then I sorted the output, took the inverse of the values greater
than 1.0, and sorted again.

I was surprised. Other than the "Err" outputs, the results were
pretty evenly distributed. The line graph of the results were
practically a straight line. There was a *Slight* bias towards
either zero or one.

> It also fails to meet the requirements if rand() returns the
> same value twice in a row.

Yes, we're not supposed to be able to get the value 1.0.
And the obvious "fix" of
return r <= 1 ? r : 1/r;
wouldn't help if the value was exactly 1.0.

> Not to mention a small problem if
> the second call to rand() returns 0. (Did I mention that I'm
> suspicious of this sort of division?)

That's true too.

So the revised version would look like:

double rand_double() {
double n,r;
do {
n=rand();
d=rand();
} while (n>=d);
return n/d;
}

This one also has a slight bias towards zero. (Zero seems to
be the result about twice as often as any other number.) Other
than that, the results are very linear.


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: kanze on
Allan W wrote:
> > Diego Martins wrote:
> > > double zero_one_rand()
> > > {
> > > double r = static_cast<double>(rand()) / rand();
> > > return r <= 1 ? r : 1/r;
> > > }

> kanze wrote:
> > Would this result in a linear distribution? (I'm too lazy
> > to do the analysis, but a priori, I'm very suspicious of
> > anything that involves division by rand().)

> I did the analysis, expecting it to be biased.

That's more or less my expectations as well.

> I didn't literally run the function above, to do the analysis.
> Instead, I used:

> int main() {
> for (double n=0.0; n<1.0; n+=0.01)
> for (double d=0.0; d<1.0; d+=0.01)
> if (d>0.0) std::cout << (n/d) << std::endl;
> else std::cout << "Err\n";
> }

> Then I sorted the output, took the inverse of the values
> greater than 1.0, and sorted again.

That's not analyse, that simulation. Still, it should reveal
the worst errors.

Of course, you're program is not exactly what he was doing. His
"n" and "d" were integer values, converted to double. I tried
the following:

int
main()
{
int count[ 11 ] ;
std::fill_n( count, 11, 0 ) ;
for ( int i = 0 ; i < 10 ; ++ i ) {
double n = (double)i ;
for ( int j = 1 ; j < 10 ; ++ j ) {
double d = (double)j ;
double r = n / d ;
if ( r >= 1.0 ) {
r = 1.0 / r ;
}
++ count[ (int)(10.0 * r) ] ;
}
}
for ( int i = 0 ; i < 11 ; ++ i ) {
double d = (double)i ;
std::cout << d << ": " << count[ i ] << std::endl ;
}
return 0 ;
}

I think this pretty much simulates what he was doing, with
RAND_MAX == 10. The output was:

0: 9
1: 8
2: 10
3: 8
4: 6
5: 12
6: 10
7: 8
8: 10
9: 0
10: 9

Given that this is basically a test of the full period, we
should have exactly the same value in 0-9, and 0 in 10. In
fact, we are very far from it.

Curiously, a similar test using the actual random number
generator on my system (
double n = (double)rand() ;
double d = (double)rand() ;
double r = n / d ;
if ( r >= 1.0 ) {
r = 1.0 / r ;
}
++ count[ (int)(10.0 * r) ] ;
repeated 100000 times) shows apparently good results: all values
between 9800 and 10200, with a lot of variation as to which
counts are above 10000, and which are below it, according to the
seed. At least for the counts for 0 through 9.

Obviously not a proof, either, nor analysis, but it does suggest
that my suspicions might be unfounded.

> I was surprised. Other than the "Err" outputs, the results
> were pretty evenly distributed. The line graph of the results
> were practically a straight line. There was a *Slight* bias
> towards either zero or one.

I'm not sure what you were measuring here. The line graph of
what results?

> > It also fails to meet the requirements if rand() returns the
> > same value twice in a row.

> Yes, we're not supposed to be able to get the value 1.0.
> And the obvious "fix" of
> return r <= 1 ? r : 1/r;
> wouldn't help if the value was exactly 1.0.

Which it will be if the random generator returns the same value
twice.

My tests with the actual random number generator on my system
shows that (int)(10.0 * r) is occasionally >= 1; for 100000
trials, I get between 1 and 4 in count[10] in my test above.

> > Not to mention a small problem if the second call to rand()
> > returns 0. (Did I mention that I'm suspicious of this sort
> > of division?)

> That's true too.

Given that a typical linear congruent generator will return a
range of 1...N, and that rand() has probably subracted one from
this to get it's required range (which starts at 0), adding one
to the results of rand() should be a good solution.

Both of these problems are easily coded around. Does the
work-around bias the results? To tell the truth, I don't know,
I'm suspicious that it might, but probably not to a really
measurable degree.

> So the revised version would look like:

> double rand_double() {
> double n,r;
> do {
> n=rand();
> d=rand();
> } while (n>=d);
> return n/d;
> }

> This one also has a slight bias towards zero. (Zero seems to
> be the result about twice as often as any other number.) Other
> than that, the results are very linear.

This one throws out about half the values.

But what I want isn't experimental data. I'm more interested in
actual analysis. I think my first test, above, comes the
closest to simulating a full cycle, and it makes me very
suspicious. Unless I've done something wrong, or misunderstood
something, the counts there should be exactly equal, and they
aren't. On the other hand, the pattern they show suggests that
the problem won't be immediately recognizable for larger values
of RAND_MAX.

--
James Kanze GABI Software
Conseils en informatique orient?e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Allan W on

kanze wrote:
> Allan W wrote:
> > > Diego Martins wrote:
> > > > double zero_one_rand()
> > > > {
> > > > double r = static_cast<double>(rand()) / rand();
> > > > return r <= 1 ? r : 1/r;
> > > > }
>
> > kanze wrote:
> > > Would this result in a linear distribution? (I'm too lazy
> > > to do the analysis, but a priori, I'm very suspicious of
> > > anything that involves division by rand().)
>
> > I did the analysis, expecting it to be biased.
>
> That's more or less my expectations as well.
>
> > I didn't literally run the function above, to do the analysis.
> > Instead, I used:
>
> > int main() {
> > for (double n=0.0; n<1.0; n+=0.01)
> > for (double d=0.0; d<1.0; d+=0.01)
> > if (d>0.0) std::cout << (n/d) << std::endl;
> > else std::cout << "Err\n";
> > }
>
> > Then I sorted the output, took the inverse of the values
> > greater than 1.0, and sorted again.
>
> That's not analyse, that simulation. Still, it should reveal
> the worst errors.

Hmm...

> Of course, you're program is not exactly what he was doing. His
> "n" and "d" were integer values, converted to double.

Good point. I missed that. I doubt he meant to do that, though.
[Snip]

> > I was surprised. Other than the "Err" outputs, the results
> > were pretty evenly distributed. The line graph of the results
> > were practically a straight line. There was a *Slight* bias
> > towards either zero or one.
>
> I'm not sure what you were measuring here. The line graph of
> what results?

I graphed the return values verses the number of times each
value was seen. The graph was very nearly linear, meaning
(for instance) that the value 0.35 was seen almost exactly
the same number of times as the value 0.36.

> My tests with the actual random number generator on my system
> shows that (int)(10.0 * r) is occasionally >= 1; for 100000
> trials, I get between 1 and 4 in count[10] in my test above.

I'm surprised! That's not supposed to EVER happen, is it?

> > double rand_double() {
> > double n,r;
> > do {
> > n=rand();
> > d=rand();
> > } while (n>=d);
> > return n/d;
> > }
>
> > This one also has a slight bias towards zero. (Zero seems to
> > be the result about twice as often as any other number.) Other
> > than that, the results are very linear.
>
> This one throws out about half the values.

True. But I was trying to avoid the n==d case.
I suppose we could change the while() loop to
while (n==d)
and change the return to
return (n>d) ? (n/d) : (d/n);
but I haven't analyzed ... er, simulated that. :-)

> But what I want isn't experimental data. I'm more interested in
> actual analysis. I think my first test, above, comes the
> closest to simulating a full cycle, and it makes me very
> suspicious.

But I think I did a full cycle too, and got pretty different results.
I'll try to take a closer look when I have more time...


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: sstump on
>
> But what I want isn't experimental data. I'm more interested in
> actual analysis. I think my first test, above, comes the
> closest to simulating a full cycle, and it makes me very
> suspicious. Unless I've done something wrong, or misunderstood
> something, the counts there should be exactly equal, and they
> aren't. On the other hand, the pattern they show suggests that
> the problem won't be immediately recognizable for larger values
> of RAND_MAX.
>
> --
> James Kanze GABI Software
> Conseils en informatique orient?e objet/
> Beratung in objektorientierter Datenverarbeitung
> 9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34

Interesting questions...
I believe your suspicions are well placed. To further confuse the issue, I
measured the mean of your variable 'r' in your 'full cycle program', and it
was 0.5. Which is exactly where it should be for a uniform random variable
in [0,1). Looks good right? Not so. Look at
http://mathworld.wolfram.com/RatioDistribution.html

The last equation on that page shows the CDF for the ratio of two uniform
random variables U = X/Y. It doesn't apply exactly as written, as we are
inverting if U > 1, but it tells me that the ratio is not uniform.
Curiously, if we find the mean using the CDF given (evaluate for u = 1/2,
that is half the values of U < 1/2, half above) we get mean = 1/2
(independent of the parameters of the uniform distribution of X and Y, as
long as they are the same), as your program predicted.



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Keith H Duggar on
kanze wrote:
> Obviously not a proof, either, nor analysis, but it does suggest
> that my suspicions might be unfounded.

If u1 and u2 are uniform random variables between 0 and 1
and r=u1/u2 then r is a random variable with density

p(r) = 1/2 : 0 <= r < 1
p(r) = 1/(2*r^2) : 1 <= r

and applying the transform

r' = r : 0 <= r < 1
r' = 1/r : 1 <= r

yields r' a uniform deviate

p(r) = 1 : 0 <= r <= 1

since for r >= 1

p(r') = p(r)/|dr'/dr|
p(r') = 1/(2*r^2)/|-1/r^2|
p(r') = (r^2)/(2*r^2)
p(r') = 1/2

So using the ratio-of-uniform distribution and taking the
inverse (or ignoring) values greater than 1 does indeed give
you back a uniform distribution. However, consider that we
must ALREADY have had the ability to generate uniform
deviates in order to use this method. So what have we gained
by performing these transformations? Perhaps nothing or
perhaps we have lowered the quality of our uniform deviates
at the expense of an extra division (since if we are
dividing by a constant link RAND_MAX we can precompute the
reciprocal which we cannot do with the random variable u2).
Anyhow there are papers written about using ratio-of-uniform
deviates with LCGs so it's worth looking those up if one is
considering using this method. Of course there are a number
of caveats which can add non-uniformity as the derivation I
gave applies to real numbers and not rational uniform
deviates as we actually have.


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]