unique name list [General]

Prev: Fill colour
Next: Auto Filter Sorting Issue

From: Joe User on 12 Feb 2010 05:27

"CJ" wrote:
> Excel 2003. I have a list of 40 names. I would
> like to randomly select names without repeat
> until all 40 names have been selected.

One way....

Suppose the 40 names are in A1:A40. Put =RAND() into B1:B40. Then put the
following formula into C1 and copy down through C40:

=INDEX($A$1:$A$40,RANK(B1,$B$1:$B$40))

If you do not want B1:B40 changing every time a cell is edited, see my
response in your thread "volatile v non-volatile".

Alternatively, if you are doing this one time or infrequently, set up B1:B40
as above. If you use the volatile RAND() function, you might want to copy
B1:B40 and paste-special-value over B1:B40. Then select A1:B40 and use Data
> Sort to sort column B.

(The paste-special-value is not really necessary. Sort will work just fine.
But the volatile RAND() expression will cause B1:B40 to be recalculated when
Sort writes back the sorted values. So B1:B40 will no longer reflect the
order in A1:A40. No harm done. But it might be mystifying to the unwary
user.)

From: מיכאל (מיקי) אבידן on 12 Feb 2010 05:52

Although the chance of repetition two identical RAND() results is like
winning the Lotto - can one assure it won't happen ?
Micky

"Joe User" wrote:

> "CJ" wrote:
> > Excel 2003. I have a list of 40 names. I would
> > like to randomly select names without repeat
> > until all 40 names have been selected.
>
> One way....
>
> Suppose the 40 names are in A1:A40. Put =RAND() into B1:B40. Then put the
> following formula into C1 and copy down through C40:
>
> =INDEX($A$1:$A$40,RANK(B1,$B$1:$B$40))
>
> If you do not want B1:B40 changing every time a cell is edited, see my
> response in your thread "volatile v non-volatile".
>
> Alternatively, if you are doing this one time or infrequently, set up B1:B40
> as above. If you use the volatile RAND() function, you might want to copy
> B1:B40 and paste-special-value over B1:B40. Then select A1:B40 and use Data
> > Sort to sort column B.
>
> (The paste-special-value is not really necessary. Sort will work just fine.
> But the volatile RAND() expression will cause B1:B40 to be recalculated when
> Sort writes back the sorted values. So B1:B40 will no longer reflect the
> order in A1:A40. No harm done. But it might be mystifying to the unwary
> user.)

From: מיכאל (מיקי) אבידן on 12 Feb 2010 06:10

You may also consider installing the MOREFUNC add-in and use its: MRAND
Function - as shown in the picture.
http://img17.imageshack.us/img17/8778/nonameuy.png
*** MRAND will assure NO repetitions ***
http://www.download.com/Morefunc/3000-2077_4-10423159.html
Micky

"CJ" wrote:

> Excel 2003. I have a list of 40 names. I would like to randomly select names
> without repeat until all 40 names have been selected. Any ideas please.

From: Joe User on 12 Feb 2010 08:51

"מיכאל (מיקי) אבידן" <micky-a*at*tapuz.co.il> wrote:
> Although the chance of repetition two identical RAND() results
> is like winning the Lotto - can one assure it won't happen ?

Not exactly.

KB 828795 claims to describe the algorithm and constants used by RAND()
(before XL2010). The claim is: "more than 10^13 numbers will be generated
before the repetition". I presume that "more than 10^13" is exactly
27,814,431,486,576 (about 2.78E+13), which is 30268*30306*30322.

So I guess one's "assurance" depends on your confidence in the correctness
of KBs in general, or at least KB 828795 in particular.

From my experience with other KBs, I must say that my confidence is less
than 100%. Many of the KBs that I have read about the numerical properties
of Excel algorithms contain technical errors, often material errors.

I can say with impunity that __if__ the algorithm and constants in KB 828795
are correct, each modulo expression for IX, IY and IZ runs through its
respective maximum sequence, not including zero, as long as Excel ensures
that IX, IY and IZ are not zero when the algorithm is seeded. Thus, the
algorithm will indeed produce about 2.78E+13 combinations of IX, IY and IZ.

I cannot say with impunity that when those factors are combined to produce
the pseudo-random number -- presumably IX/30269.0 + IY/30307.0 + IZ/30323.0
modulo 1 -- that produces 2.78E+13 unique floating point values, especially
taking the limits of IEEE 64-bit floating point representation and perhaps
Intel 80-bit floating point computation into account.

I believe that can be proved only by producing and sorting all 2.78E+13
pseudo-random numbers, then comparing them pairwise. I think that would be
computationally prohibitive on a single computer with an Intel CPU -- at
least, mine.

I did attempt to vet the algorithm and constants described in KB 828795 --
that is, to prove or disprove that they are correct. I was not successful.
My method was to generate a pseudo-random number with RAND() -- actually
several -- and try to reverse-engineer the factors IX, IY and IZ assuming
the constants 30269, 30307 and 30323 and the Wichman-Hill algorithm
described in KB 828795. I came very close; close enough to think that KB
828795 might be correct, but different enough to suspect that it might not
be. The difference in binary representations was large enough that I was
unable to explain it based on the numerical properties of alternative
floating point methods.

As an aside, I suspect that the algorithm described in a wiki article --
http://en.wikibooks.org/wiki/Statistics/Numerical_Methods/Numerics_in_Excel
-- is incorrect __if__ KB 828795 is correct about generating "more than
10^13" unique pseudo-random numbers. (Admittedly, that's a big "if".) The
wiki algorithm will not generate all 2.78E+13 combinations of IX, IY and IZ.
Each modulo expression will produce zero; and once that happens, the modulo
expression is stuck at zero. Of course, if/when that happens, an
implementation could self-correct by reseeding the PRNG. But depending on
the seed, the cycle can be very short. So I think the wiki algorithm is
unlikely. Every other description of the Wichman-Hill algorithm is similar
to KB 828795, at least functionally, if not in syntax.

(I hope my comments do not spark a diatribe about the weaknesses of the
Wichman-Hill(1982) algorithm. The weakness are well-known. That is why
XL2010 abandoned it.)

----- original message -----

"מיכאל (מיקי) אבידן" <micky-a*at*tapuz.co.il> wrote in message
news:33A3B65A-084E-44E0-BC53-4A6F3A91F546(a)microsoft.com...
> Although the chance of repetition two identical RAND() results is like
> winning the Lotto - can one assure it won't happen ?
> Micky
>
> "Joe User" wrote:
>
>> "CJ" wrote:
>> > Excel 2003. I have a list of 40 names. I would
>> > like to randomly select names without repeat
>> > until all 40 names have been selected.
>>
>> One way....
>>
>> Suppose the 40 names are in A1:A40. Put =RAND() into B1:B40. Then put
>> the
>> following formula into C1 and copy down through C40:
>>
>> =INDEX($A$1:$A$40,RANK(B1,$B$1:$B$40))
>>
>> If you do not want B1:B40 changing every time a cell is edited, see my
>> response in your thread "volatile v non-volatile".
>>
>> Alternatively, if you are doing this one time or infrequently, set up
>> B1:B40
>> as above. If you use the volatile RAND() function, you might want to
>> copy
>> B1:B40 and paste-special-value over B1:B40. Then select A1:B40 and use
>> Data
>> > Sort to sort column B.
>>
>> (The paste-special-value is not really necessary. Sort will work just
>> fine.
>> But the volatile RAND() expression will cause B1:B40 to be recalculated
>> when
>> Sort writes back the sorted values. So B1:B40 will no longer reflect the
>> order in A1:A40. No harm done. But it might be mystifying to the unwary
>> user.)

From: Joe User on 12 Feb 2010 08:59

I wrote:
> Every other description of the Wichman-Hill algorithm is similar to KB
> 828795, at least functionally, if not in syntax.

I meant to write "every other description of the Wichman-Hill(1982)
algorithm". I added 1982 to the last reference, but not to this one.

----- original message -----

"Joe User" <joeu2004> wrote in message
news:eUgRvp%23qKHA.3536(a)TK2MSFTNGP06.phx.gbl...
> "מיכאל (מיקי) אבידן" <micky-a*at*tapuz.co.il> wrote:
>> Although the chance of repetition two identical RAND() results
>> is like winning the Lotto - can one assure it won't happen ?
>
> Not exactly.
>
> KB 828795 claims to describe the algorithm and constants used by RAND()
> (before XL2010). The claim is: "more than 10^13 numbers will be
> generated before the repetition". I presume that "more than 10^13" is
> exactly 27,814,431,486,576 (about 2.78E+13), which is 30268*30306*30322.
>
> So I guess one's "assurance" depends on your confidence in the correctness
> of KBs in general, or at least KB 828795 in particular.
>
> From my experience with other KBs, I must say that my confidence is less
> than 100%. Many of the KBs that I have read about the numerical
> properties of Excel algorithms contain technical errors, often material
> errors.
>
> I can say with impunity that __if__ the algorithm and constants in KB
> 828795 are correct, each modulo expression for IX, IY and IZ runs through
> its respective maximum sequence, not including zero, as long as Excel
> ensures that IX, IY and IZ are not zero when the algorithm is seeded.
> Thus, the algorithm will indeed produce about 2.78E+13 combinations of IX,
> IY and IZ.
>
> I cannot say with impunity that when those factors are combined to produce
> the pseudo-random number -- presumably IX/30269.0 + IY/30307.0 +
> IZ/30323.0 modulo 1 -- that produces 2.78E+13 unique floating point
> values, especially taking the limits of IEEE 64-bit floating point
> representation and perhaps Intel 80-bit floating point computation into
> account.
>
> I believe that can be proved only by producing and sorting all 2.78E+13
> pseudo-random numbers, then comparing them pairwise. I think that would
> be computationally prohibitive on a single computer with an Intel CPU --
> at least, mine.
>
> I did attempt to vet the algorithm and constants described in KB 828795 --
> that is, to prove or disprove that they are correct. I was not
> successful. My method was to generate a pseudo-random number with
> RAND() -- actually several -- and try to reverse-engineer the factors IX,
> IY and IZ assuming the constants 30269, 30307 and 30323 and the
> Wichman-Hill algorithm described in KB 828795. I came very close; close
> enough to think that KB 828795 might be correct, but different enough to
> suspect that it might not be. The difference in binary representations
> was large enough that I was unable to explain it based on the numerical
> properties of alternative floating point methods.
>
> As an aside, I suspect that the algorithm described in a wiki article --
> http://en.wikibooks.org/wiki/Statistics/Numerical_Methods/Numerics_in_Excel
> -- is incorrect __if__ KB 828795 is correct about generating "more than
> 10^13" unique pseudo-random numbers. (Admittedly, that's a big "if".)
> The wiki algorithm will not generate all 2.78E+13 combinations of IX, IY
> and IZ. Each modulo expression will produce zero; and once that happens,
> the modulo expression is stuck at zero. Of course, if/when that happens,
> an implementation could self-correct by reseeding the PRNG. But depending
> on the seed, the cycle can be very short. So I think the wiki algorithm
> is unlikely. Every other description of the Wichman-Hill algorithm is
> similar to KB 828795, at least functionally, if not in syntax.
>
> (I hope my comments do not spark a diatribe about the weaknesses of the
> Wichman-Hill(1982) algorithm. The weakness are well-known. That is why
> XL2010 abandoned it.)
>
>
> ----- original message -----
>
> "מיכאל (מיקי) אבידן" <micky-a*at*tapuz.co.il> wrote in message
> news:33A3B65A-084E-44E0-BC53-4A6F3A91F546(a)microsoft.com...
>> Although the chance of repetition two identical RAND() results is like
>> winning the Lotto - can one assure it won't happen ?
>> Micky
>>
>> "Joe User" wrote:
>>
>>> "CJ" wrote:
>>> > Excel 2003. I have a list of 40 names. I would
>>> > like to randomly select names without repeat
>>> > until all 40 names have been selected.
>>>
>>> One way....
>>>
>>> Suppose the 40 names are in A1:A40. Put =RAND() into B1:B40. Then put
>>> the
>>> following formula into C1 and copy down through C40:
>>>
>>> =INDEX($A$1:$A$40,RANK(B1,$B$1:$B$40))
>>>
>>> If you do not want B1:B40 changing every time a cell is edited, see my
>>> response in your thread "volatile v non-volatile".
>>>
>>> Alternatively, if you are doing this one time or infrequently, set up
>>> B1:B40
>>> as above. If you use the volatile RAND() function, you might want to
>>> copy
>>> B1:B40 and paste-special-value over B1:B40. Then select A1:B40 and use
>>> Data
>>> > Sort to sort column B.
>>>
>>> (The paste-special-value is not really necessary. Sort will work just
>>> fine.
>>> But the volatile RAND() expression will cause B1:B40 to be recalculated
>>> when
>>> Sort writes back the sorted values. So B1:B40 will no longer reflect
>>> the
>>> order in A1:A40. No harm done. But it might be mystifying to the
>>> unwary
>>> user.)
>

First | Prev | Next | Last
Pages: 1 2 3
Prev: Fill colour
Next: Auto Filter Sorting Issue