From: Xho Jingleheimerschmidt on
Peng Yu wrote:
>
> But I feel sorry that perl doesn't provide such a function out of the
> box.

There are many ways to implement this, and which is best depends on how
large of a set you are sampling and how densely you are sampling it, as
well as perhaps other things (do you know how many you need in advance,
or will you know when you get there? Are you happy with standard rand()
or do you want something better, etc.). I'm entirely unsurprised Perl
doesn't provide this out of the box, as it is so context dependent, plus
not all that common. It is trivial to implement for yourself, and throw
your implementation into a toolkit module you use for all your code.

Xho
From: Ted Zlatanov on
On Tue, 01 Jun 2010 20:54:29 -0700 Xho Jingleheimerschmidt <xhoster(a)gmail.com> wrote:

XJ> Ted Zlatanov wrote:
>> On Tue, 01 Jun 2010 04:50:36 -0400 "Uri Guttman"
>> <uri(a)StemSystems.com> wrote:
>>
UG> my %seen ;
UG> while( 1 ) { $x = int rand( 100_000_000 ) ; $seen{$x} and next ;
UG> $seen{$x} = 1; print $x }
>>
>> This will grow pretty quickly with a hash. Bit::Vector already has
>> Bit_On($index) and bit_test($index) so memory usage and probably
>> performance will be a bit (heh) better.

XJ> I'd be very surprised if Bit::Vector had faster performance, at least
XJ> until the other method started swapping.

It's C internally so performance is pretty good. But you're probably
right anyhow, I wasn't thinking.

On Tue, 01 Jun 2010 15:37:34 -0400 "Uri Guttman" <uri(a)StemSystems.com> wrote:

UG> he said he wanted 1k random numbers out of a large range so a hash would
UG> be fine for that.

You're right, I wasn't paying enough attention.

Ted
From: David Combs on
I see it differently.

He's got every right to get pissed off from time to time.

How many hours a week does he sit in front of the terminal
answering questions for us (when he could be making money,
presumably, a nice sacrifice for us all!) and after a while
of being eaten by "I have to get back to WORK sometime --
so I can pay some BILLS" or whatever else might be eating
him now and then, that almost it doesn't matter WHAT the
next question is, the least perceived-foolishness/stupidity
in it will elicit a ROAR and accompanying dragon-mouth FLAME
BLAST.

That done, he bypasses a few questions he might have liked
answering, and he's back to his nice helpful self, again
answering hairy questions.

So thank you, Uri, for being there!

David

From: David Combs on
In article <87ocfuk0ht.fsf(a)quad.sysarch.com>,
Uri Guttman <uri(a)StemSystems.com> wrote:
>>>>>> "TZ" == Ted Zlatanov <tzz(a)lifelogs.com> writes:
>
> TZ> On Tue, 01 Jun 2010 04:50:36 -0400 "Uri Guttman" <uri(a)StemSystems.com> wrote:
> UG> my %seen ;
> UG> while( 1 ) { $x = int rand( 100_000_000 ) ; $seen{$x} and next ;
> UG> $seen{$x} = 1; print $x }
>
> TZ> This will grow pretty quickly with a hash. Bit::Vector already has
> TZ> Bit_On($index) and bit_test($index) so memory usage and probably
> TZ> performance will be a bit (heh) better.
>
>he said he wanted 1k random numbers out of a large range so a hash would
>be fine for that.
>
>uri
>
>--
>Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com --
>----- Perl Code Review , Architecture, Development, Training, Support ------
>--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

WERE this to be a cpan module, maybe a mix would be nice. Bit vector for
the first N of them, N of course requiring N/8 bytes preallocated probably,
and for > N, the hash scheme?

David


From: David Combs on
In article <87y6exwo6o.fsf(a)lifelogs.com>,
Ted Zlatanov <tzz(a)lifelogs.com> wrote:
>On Tue, 01 Jun 2010 20:54:29 -0700 Xho Jingleheimerschmidt <xhoster(a)gmail.com> wrote:
>
>XJ> Ted Zlatanov wrote:
>>> On Tue, 01 Jun 2010 04:50:36 -0400 "Uri Guttman"
>>> <uri(a)StemSystems.com> wrote:
>>>
>UG> my %seen ;
>UG> while( 1 ) { $x = int rand( 100_000_000 ) ; $seen{$x} and next ;
>UG> $seen{$x} = 1; print $x }
>>>
>>> This will grow pretty quickly with a hash. Bit::Vector already has
>>> Bit_On($index) and bit_test($index) so memory usage and probably
>>> performance will be a bit (heh) better.
>
>XJ> I'd be very surprised if Bit::Vector had faster performance, at least
>XJ> until the other method started swapping.
>
>It's C internally so performance is pretty good. But you're probably
>right anyhow, I wasn't thinking.
>
>On Tue, 01 Jun 2010 15:37:34 -0400 "Uri Guttman" <uri(a)StemSystems.com> wrote:
>
>UG> he said he wanted 1k random numbers out of a large range so a hash would
>UG> be fine for that.
>
>You're right, I wasn't paying enough attention.
>
>Ted

What, you're saying a div and a mod (let's assume 64 bits, if integet) to
find the desired bit, versus dealing each time with a hash bucket to choose
an ensuing chain to hunt down through?

So you make the bucket-array longer, so the chains are short -- get it
long enough so only very, very few chains are longer than one (most
being zero) in length, and you're almost emulating the bit-vector
scheme, but taking up FAR FAR more space.

?

David