Monkey and a poet's work ? [Unix Programming]

Prev: #define _XOPEN_SOURCE 600 - where to define?
Next: WinGDB - debugging remote Linux/Unix, MinGW/Cygwin, embedded systems under Visual Studio

From: Richard Heathfield on 14 Mar 2010 22:49

mike wrote:
<snip>

>> Using your method, the whole question changes to
>> whether your suggested program will run to completion before the monkeys
>> either hit the jackpot or give up in disgust. Whether it does in fact
>> yield the required order of magnitude error reduction seems, on the face
>> of it, to be a side issue.
>>
> Well you asked for a reasonably simple calculation that could
> significantly reduce the error in your estimate. While my explanation
> may not have been simple (I wasn't going to waste a lot of time editing
> my post)

Sure.

> the calculation is very simple (what could be simpler than
> matrix multiplication?) and reduces the error to zero.

And what could be more complicated than working out the transition
matrix for a string of 40 million bits? (Well, okay, lots of things
could be more complicated, but it's up there with the leaders...)

> I never claimed that it was going to be practical*...

<grin> True enough. So we can refine the challenge to that of coming up
with a *practical* way of finding a closer answer than my method gave.

> Comparisons with slightly longer binary strings shows that your estimate
> could be very badly out. For example your calculation would predict that
> the probability of not finding a 4 digit binary string in 128 random
> digits would be 0.0003136 = (15/16)^(128-3), whereas the actual
> probability could be as low as 0.0001806 (for string '0110') or as high
> as 0.009712 for string '0000'.

Well, I could live with 200% (in the specific case of Monkeys vs
Shakespeare, I hasten to add!), but of course the error would be much
higher for longer needle strings, especially with the colossal haystacks
that we would require. So I guess I have to scratch that method completely.

One reasonable compromise might be to use increasingly long subsets of
Shakespeare to find the number of monkey keypress events required to
give us, say, a 99% probability of duplicating those subsets, using your
method of calculation, up until the calculations start to become
impractical, and produce a graph which we could then extrapolate to find
a ballpark figure for the complete Shakespeherian canon. It wouldn't be
as accurate as using your method exhaustively, but I suppose it would be
a lot more accurate than my method, and has the added benefit that it
could actually be completed within a reasonable time.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within

From: mike on 15 Mar 2010 18:47

In article <GuudnSVUk7vPPQDWnZ2dnUVZ7sydnZ2d(a)bt.com>,
rjh(a)see.sig.invalid says...
> mike wrote:
> <snip>
>
> >> Using your method, the whole question changes to
> >> whether your suggested program will run to completion before the monkeys
> >> either hit the jackpot or give up in disgust. Whether it does in fact
> >> yield the required order of magnitude error reduction seems, on the face
> >> of it, to be a side issue.
> >>
> > Well you asked for a reasonably simple calculation that could
> > significantly reduce the error in your estimate. While my explanation
> > may not have been simple (I wasn't going to waste a lot of time editing
> > my post)
>
> Sure.
>
> > the calculation is very simple (what could be simpler than
> > matrix multiplication?) and reduces the error to zero.
>
> And what could be more complicated than working out the transition
> matrix for a string of 40 million bits? (Well, okay, lots of things
> could be more complicated, but it's up there with the leaders...)
>
> > I never claimed that it was going to be practical*...
>
> <grin> True enough. So we can refine the challenge to that of coming up
> with a *practical* way of finding a closer answer than my method gave.
>
> > Comparisons with slightly longer binary strings shows that your estimate
> > could be very badly out. For example your calculation would predict that
> > the probability of not finding a 4 digit binary string in 128 random
> > digits would be 0.0003136 = (15/16)^(128-3), whereas the actual
> > probability could be as low as 0.0001806 (for string '0110') or as high
> > as 0.009712 for string '0000'.
>
> Well, I could live with 200% (in the specific case of Monkeys vs
> Shakespeare, I hasten to add!), but of course the error would be much
> higher for longer needle strings, especially with the colossal haystacks
> that we would require. So I guess I have to scratch that method completely.
>
> One reasonable compromise might be to use increasingly long subsets of
> Shakespeare to find the number of monkey keypress events required to
> give us, say, a 99% probability of duplicating those subsets, using your
> method of calculation, up until the calculations start to become
> impractical, and produce a graph which we could then extrapolate to find
> a ballpark figure for the complete Shakespeherian canon. It wouldn't be
> as accurate as using your method exhaustively, but I suppose it would be
> a lot more accurate than my method, and has the added benefit that it
> could actually be completed within a reasonable time.
>
I think that as a reasonable compromise I am willing to admit that, for
any moderately long search string that does not have a lot of copies of
the first n letters of the string scattered through the rest of the
string, your estimate is probably as exact as anyone would care to
require. My slightly unfair examples only emphasised the difference
between your prediction and reality because they were fairly short and
the 'pattern' in the strings influenced the probability.

Mike

From: Richard Heathfield on 15 Mar 2010 20:34

mike wrote:
<snip>

> I think that as a reasonable compromise I am willing to admit that, for
> any moderately long search string that does not have a lot of copies of
> the first n letters of the string scattered through the rest of the
> string, your estimate is probably as exact as anyone would care to
> require. My slightly unfair examples only emphasised the difference
> between your prediction and reality because they were fairly short and
> the 'pattern' in the strings influenced the probability.

So it's reasonable compromises now, is it?

Usenet is going to the dogs.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within

From: mike on 15 Mar 2010 22:51

In article <OKCdnXnE3JeATwPWnZ2dnUVZ8jVi4p2d(a)bt.com>,
rjh(a)see.sig.invalid says...
> mike wrote:
> <snip>
>
> > I think that as a reasonable compromise I am willing to admit that, for
> > any moderately long search string that does not have a lot of copies of
> > the first n letters of the string scattered through the rest of the
> > string, your estimate is probably as exact as anyone would care to
> > require. My slightly unfair examples only emphasised the difference
> > between your prediction and reality because they were fairly short and
> > the 'pattern' in the strings influenced the probability.
>
> So it's reasonable compromises now, is it?
>
I never said it wasn't.

If you remember:

1) Someone else mentioned the probability of hitting the right text.
2) You provided a formula to calculate what that probability was.
3) Someone else pointed out that your formula was incorrect (and why).
4) You admitted the fact and asked for a better formula.
5) I provided an exact solution...
6) ...which you suggested was computationally difficult, and asked for a
compromise solution.
7) I pointed out that your initial solution would be 'good enough' in
normal circumstances.

At no point did I suggest that your formula was not a reasonable
compromise. All I did was provide you with what you requested and, for
illustrative purposes, describe some circumstances where your solution
would be inadequate.

Cheers,
Mike

From: Richard Heathfield on 21 Mar 2010 03:28

mike wrote:
> In article <OKCdnXnE3JeATwPWnZ2dnUVZ8jVi4p2d(a)bt.com>,
> rjh(a)see.sig.invalid says...
>> mike wrote:
>> <snip>
>>
>>> I think that as a reasonable compromise I am willing to admit that, for
>>> any moderately long search string that does not have a lot of copies of
>>> the first n letters of the string scattered through the rest of the
>>> string, your estimate is probably as exact as anyone would care to
>>> require. My slightly unfair examples only emphasised the difference
>>> between your prediction and reality because they were fairly short and
>>> the 'pattern' in the strings influenced the probability.
>> So it's reasonable compromises now, is it?
>>
> I never said it wasn't.
>
> If you remember:
>
> 1) Someone else mentioned the probability of hitting the right text.
> 2) You provided a formula to calculate what that probability was.
> 3) Someone else pointed out that your formula was incorrect (and why).
> 4) You admitted the fact and asked for a better formula.
> 5) I provided an exact solution...
> 6) ...which you suggested was computationally difficult, and asked for a
> compromise solution.
> 7) I pointed out that your initial solution would be 'good enough' in
> normal circumstances.
>
> At no point did I suggest that your formula was not a reasonable
> compromise. All I did was provide you with what you requested and, for
> illustrative purposes, describe some circumstances where your solution
> would be inadequate.

Ah - we have here a light-hearted all-Usenauts-together reply taken far
too literally, and thoroughly but unnecessarily rebuffed; folks, things
were touch and go there for a while, but Usenet is back to normal again!

:-)

(Sorry for the late reply - I've been kinda busy.)

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: #define _XOPEN_SOURCE 600 - where to define?
Next: WinGDB - debugging remote Linux/Unix, MinGW/Cygwin, embedded systems under Visual Studio