From: Roger Stafford on
"Simon Preston" <preston.simon+mathsworks(a)gmail.com> wrote in
message <fvpde9$rm2$1(a)fred.mathworks.com>...
> Having looked, it uses histc. (I promise I hadn't known
> about randsample before my original reply!)
>
> In fact, the second output of histc is the length-n vector
> which indexes the bin into which each observation falls,
> i.e. what Shonkhor wanted. Hence
> [x,y] = histc(rand(1,n),[0 cumsum(p)]);
> returns as y Shonkhor's desired sample.
>
> As for the order of this approach, maybe Roger could
> explain: are the other methods (theoretically) faster? I
> suppose, though, i) the order calculations are asymptotic
> results, and ii) it's hard to imagine an application where
> this calculation, whichever method used, would be
> rate-limiting. Still it's interesting to see there are
> various nice ways to solve the same problem.
>
> Best wishes, S
---------
Hello Simon,

In any discussion of algorithmic order it needs to be understood that, as you
say, this has only to do with asymptotic times of execution. I initially favored
the binary search approach with its order (n*log(m)) to provide for cases of
large values of both m and n. It would be better than (n+m)*log(n+m) for
very large n and better than n*m for very large m. (I am guessing 'histc' is
order ((n+m)*log(n+m)) because they may do a initial sort of combined edges
and points. However, perhaps they have a simpler order (n*m) algorithm
instead, as my own 'hist' does.) Whether it would be faster for numbers likely
to be used is quite another question and requires experimentation. I did
have the distinct impression from Shonkhor's mention of distributions that a
very large value of n might be needed and the value m = 360 was evidently
large.

Whatever is true here, you did certainly provide Shonkhor with a very fine
one-liner that doesn't require the Statistics Toolbox, Simon. When I saw it, I
said to myself, "now why didn't I think of that?"

Roger Stafford


From: Peter Perkins on
Roger Stafford wrote:
> Peter Perkins <Peter.PerkinsRemoveThis(a)mathworks.com> wrote in message
>> randsample(1:360,p,n,true)
>>
>> to generate a sample of size n, drawing values from 1:360 independently
>> with replacement.
> -----------
> That will teach me not to go diving into a problem without checking to see if
> someone else has already done it. I have a weakness in that direction!
> Anyway, Shonkhor got three versions that don't require the Statistics Toolbox.
>
> According to the documentation for 'randsample', the call should be made
> as:
>
> randsample(1:360,n,true,p);
>
> Does your argument sequence really work also?

No, and that will teach _me_ to not check the help before typing example code.
The correct argument order is what you described. Thanks.
From: Simon Preston on

> I do have a question about 'randsample' if it doesn't
violate Mathworks'
> proprietary interests. The four-argument case is markedly
more difficult to
> implement than the others. Does it use something like the
binary search I
> used to effect the inverse cumulative distribution of p,
Simon's 'histc' method,
> or perhaps something akin to Alessandro's
acceptance-rejection method
> directly on the p values? In other words, if m = 360, is
it an order(n*log(m))
> algorithm, order ((n+m)*log(n+m)) algorithm, or something
altogether
> different? Users would probably like to know this, but
the documentation
> doesn't say.
>
> Roger Stafford
>

Having looked, it uses histc. (I promise I hadn't known
about randsample before my original reply!)

In fact, the second output of histc is the length-n vector
which indexes the bin into which each observation falls,
i.e. what Shonkhor wanted. Hence
[x,y] = histc(rand(1,n),[0 cumsum(p)]);
returns as y Shonkhor's desired sample.

As for the order of this approach, maybe Roger could
explain: are the other methods (theoretically) faster? I
suppose, though, i) the order calculations are asymptotic
results, and ii) it's hard to imagine an application where
this calculation, whichever method used, would be
rate-limiting. Still it's interesting to see there are
various nice ways to solve the same problem.

Best wishes, S




From: alessandro mura on
> However, I think you would have trouble doing polynomial fits to inverse
> cdf's
> for distributions with unbounded ranges.
>
> Roger Stafford

Exactly. Usually what I need to simulate are random
distribution of velocity or energy for a test particle, and in fact they
have (almost) never upper bound.
What I sometimes do (when it is not a maxwellian, of course), is to split
the problem in two parts:
I use a Neumann rejection for the main part of the distribution, for example
E<E0when this method has a reasonable
fraction of "good" guesses (to avoid wasting machine time), and I add a
"high energy tail" (E>E0) with a polynomial fit of the inverse CDF.

Ciao
--
Alessandro Mura
Istituto Nazionale di Astrofisica - IFSI
http://pptt4.ifsi-roma.inaf.it/~mura/index.html
http://www.alessandromura.it