From: Peter J. Holzer on
On 2010-06-26 18:05, David Combs <dkcombs(a)panix.com> wrote:
> In article <87y6exwo6o.fsf(a)lifelogs.com>,
> Ted Zlatanov <tzz(a)lifelogs.com> wrote:
>>On Tue, 01 Jun 2010 15:37:34 -0400 "Uri Guttman" <uri(a)StemSystems.com> wrote:
>>
>>UG> he said he wanted 1k random numbers out of a large range so a hash would
>>UG> be fine for that.
>>
>>You're right, I wasn't paying enough attention.
>
> What, you're saying a div and a mod (let's assume 64 bits, if integet) to
> find the desired bit, versus dealing each time with a hash bucket to choose
> an ensuing chain to hunt down through?
>
> So you make the bucket-array longer, so the chains are short -- get it
> long enough so only very, very few chains are longer than one (most
> being zero) in length, and you're almost emulating the bit-vector
> scheme, but taking up FAR FAR more space.

No, for the use case given (1k random numbers out of a large range) a
hash takes *less* space.

The bit vector takes (max-min)/8 bytes. So for 0 .. 100_000_000 it takes
approximately 12.5 MB. 1000 hash entries take (depending on pointer
size, malloc implementation, etc.) something like 60 to 150 kB. So
that's about 1/100th of the space. I'd still think a bit vector should
be faster.

hp
From: Tad McClellan on
David Combs <dkcombs(a)panix.com> wrote:

> I see it


You see what "it"?


> differently.


Differently from what?


> He's got every right to get pissed off from time to time.


Who "he"?


Please follow the standard practice of quoting some context in followups.


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.
From: Ted Zlatanov on
On Sat, 26 Jun 2010 21:02:19 +0200 "Peter J. Holzer" <hjp-usenet2(a)hjp.at> wrote:

PJH> On 2010-06-26 18:05, David Combs <dkcombs(a)panix.com> wrote:
>> In article <87y6exwo6o.fsf(a)lifelogs.com>,
>> Ted Zlatanov <tzz(a)lifelogs.com> wrote:
>>> On Tue, 01 Jun 2010 15:37:34 -0400 "Uri Guttman" <uri(a)StemSystems.com> wrote:
>>>
UG> he said he wanted 1k random numbers out of a large range so a hash would
UG> be fine for that.
>>>
>>> You're right, I wasn't paying enough attention.
>>
>> What, you're saying a div and a mod (let's assume 64 bits, if integet) to
>> find the desired bit, versus dealing each time with a hash bucket to choose
>> an ensuing chain to hunt down through?
>>
>> So you make the bucket-array longer, so the chains are short -- get it
>> long enough so only very, very few chains are longer than one (most
>> being zero) in length, and you're almost emulating the bit-vector
>> scheme, but taking up FAR FAR more space.

PJH> No, for the use case given (1k random numbers out of a large range) a
PJH> hash takes *less* space.

PJH> The bit vector takes (max-min)/8 bytes. So for 0 .. 100_000_000 it takes
PJH> approximately 12.5 MB. 1000 hash entries take (depending on pointer
PJH> size, malloc implementation, etc.) something like 60 to 150 kB. So
PJH> that's about 1/100th of the space. I'd still think a bit vector should
PJH> be faster.

Right. I somehow missed the 1K sample requirement and thought this was
for any set of random numbers. Certainly a bit vector has better O(n)
performance but the requirement is very far from hitting any O(n)
bounds. Also as Peter pointed out memory usage is really high.
Finally, Perl's hashes are very well optimized so I'd rely on them over
Bit::Vector for performance with small sets.

If I expected *clusters* of numbers in a large space, like say Unicode
codepoints, I would use inversion lists
(http://search.cpan.org/~teodor/Algorithm-InversionList-0.03/). If the
domain was 32-bit or smaller ints and I expected at least 10% to be
randomly picked, I would use Bit::Vector. For almost all other
reasonable use cases, though, hashes would work better.

Ted