From: mike on
In article <He-dnXLmZ6o_VzjWnZ2dnUVZ8l2dnZ2d(a)bt.com>,
rjh(a)see.sig.invalid says...
> mike wrote:
> > In article <OKCdnXnE3JeATwPWnZ2dnUVZ8jVi4p2d(a)bt.com>,
> > rjh(a)see.sig.invalid says...
> >> mike wrote:
> >> <snip>
> >>
> >>> I think that as a reasonable compromise I am willing to admit that, for
> >>> any moderately long search string that does not have a lot of copies of
> >>> the first n letters of the string scattered through the rest of the
> >>> string, your estimate is probably as exact as anyone would care to
> >>> require. My slightly unfair examples only emphasised the difference
> >>> between your prediction and reality because they were fairly short and
> >>> the 'pattern' in the strings influenced the probability.
> >> So it's reasonable compromises now, is it?
> >>
> > I never said it wasn't.
> >
> > If you remember:
> >
> > 1) Someone else mentioned the probability of hitting the right text.
> > 2) You provided a formula to calculate what that probability was.
> > 3) Someone else pointed out that your formula was incorrect (and why).
> > 4) You admitted the fact and asked for a better formula.
> > 5) I provided an exact solution...
> > 6) ...which you suggested was computationally difficult, and asked for a
> > compromise solution.
> > 7) I pointed out that your initial solution would be 'good enough' in
> > normal circumstances.
> >
> > At no point did I suggest that your formula was not a reasonable
> > compromise. All I did was provide you with what you requested and, for
> > illustrative purposes, describe some circumstances where your solution
> > would be inadequate.
>
>
> Ah - we have here a light-hearted all-Usenauts-together reply taken far
> too literally, and thoroughly but unnecessarily rebuffed; folks, things
> were touch and go there for a while, but Usenet is back to normal again!
>
> :-)
>
> (Sorry for the late reply - I've been kinda busy.)
>
And sorry if I appeared to overreact - the coffee machine was empty.

Mike
From: Richard Heathfield on
mike wrote:
> In article <He-dnXLmZ6o_VzjWnZ2dnUVZ8l2dnZ2d(a)bt.com>,
> rjh(a)see.sig.invalid says...
<snip>

>> (Sorry for the late reply - I've been kinda busy.)
>>
> And sorry if I appeared to overreact - the coffee machine was empty.

I feel your pain; so, as soon as I've posted this article, I'm going to
fax some coffee to you, to tide you over until the vendor's next delivery.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within
From: mike on
In article <ndydnWHHL76kjjrWnZ2dnUVZ8kti4p2d(a)bt.com>,
rjh(a)see.sig.invalid says...
> mike wrote:
> > In article <He-dnXLmZ6o_VzjWnZ2dnUVZ8l2dnZ2d(a)bt.com>,
> > rjh(a)see.sig.invalid says...
> <snip>
>
> >> (Sorry for the late reply - I've been kinda busy.)
> >>
> > And sorry if I appeared to overreact - the coffee machine was empty.
>
> I feel your pain; so, as soon as I've posted this article, I'm going to
> fax some coffee to you, to tide you over until the vendor's next delivery.
>
I have invested in a portable filter cup and a supply of fresh grounds
now - so I can avoid recurence of the above. But do look forwards to
your fax.

Further to the original problem though, I believe (but have not taken
the time to prove) that in practice, if we were looking for an
uncompressed ascii subtext in a large random (monkey generated) string,
then the small variations in probability from your approximation would
be due to repetitions of the first few characters of the subtest - and
only the first few repetitions would make any significant difference. So
I believe it might be practical to take just the first few characters
(maybe in the range of 100-1000) of the subtext, determine the influence
of pattern on the probability of finding that text and then extrapolate
with your approximation for the rest of the subtext. So, for example if
we chose the first 1000 out of a 10000 character substring, then it
would be a simple process (and a relatively modest amunt of processing)
to repeatedly square the 1000x1000 array a few dozen times to determine
the probability of finding those 1000 characters within a 10^14-10^15
character monkey string (for values of 'few' approximately equal to 4).
Then we just scale that probability by your approximation of finding the
remaining 9000 characters. Some care might be needed to ensure that the
array elements were floating point numbers with sufficient resolution to
avoid round-off errors during the process.

Mike