From: Golabi Doon on
Hello folks,

I would really appreciate if some one could help me on the following
problem.

Given n strings S_1 through S_n, where each S_k is simply a
concatenation a of letter, say "a", but of arbitrary yet finite
length. For example, S_1=aa , S_2=a , S_3=aa , S_4=aaa.

Consider a tape of length T. We start from the head of the tape,
randomly choose a S_k, put it on the tape and move fto the next free
cell, and then randomly choose another S_k and paste it in and so on.
In other words, we concatenate randomly chosen strings from the given
set of strings until we reach a point where the size of the chosen S_k
is larger than the free space left on the tape.

Example: T=7.
A randomly chosen string happens to be S_1, the tape becomes aa.
Next string happens to be S_2, the tape becomes aaa.
Next string happens to be S_1, the tape becomes aaaaa
Next string happens to be S_4, the tape is too full for that, so halt.

Let's denote the number of times that S_k appears on the tape by
#S_k.In the previous example #S_1=2, #S_2=1, #S_3=#S_4=0.

I believe, as T approaches infinity, the ratio #S_k / sum_t (#S_t)
converges to fixed number or each k. By fixed I mean, if I repeat this
process, by generating another randomly concatenation of the given
string set on the tape, I get the same ratios.

Is my conjecture correct? If so, is it possible to express #S_k /
sum_t (#S_t) in terms of L_1 through L_n where L_k is the length of
S_k? Naturally, a longer S_k has less chance to appear on the tape as
opposed to a smaller S_k. However, I do not know the exact
relationship.

Your help would be greatly appreciated.

Golabi.
From: Golabi Doon on
I forgot to mention the simplying assumption that when choosing a
string to put it on the tape, all the strings have equal chance to be
chosen, i.e. the probablity of choosing any string S_k is equal to 1/
n.
From: Barb Knox on
In article
<7a051b59-5b02-47ca-b581-9bb1065ffa70(a)d21g2000yqn.googlegroups.com>,
Golabi Doon <golabidoon(a)gmail.com> wrote:

> Hello folks,
>
> I would really appreciate if some one could help me on the following
> problem.
>
> Given n strings S_1 through S_n, where each S_k is simply a
> concatenation a of letter, say "a", but of arbitrary yet finite
> length. For example, S_1=aa , S_2=a , S_3=aa , S_4=aaa.
>
> Consider a tape of length T. We start from the head of the tape,
> randomly choose a S_k, put it on the tape and move fto the next free
> cell, and then randomly choose another S_k and paste it in and so on.
> In other words, we concatenate randomly chosen strings from the given
> set of strings until we reach a point where the size of the chosen S_k
> is larger than the free space left on the tape.
>
> Example: T=7.
> A randomly chosen string happens to be S_1, the tape becomes aa.
> Next string happens to be S_2, the tape becomes aaa.
> Next string happens to be S_1, the tape becomes aaaaa
> Next string happens to be S_4, the tape is too full for that, so halt.
>
> Let's denote the number of times that S_k appears on the tape by
> #S_k.In the previous example #S_1=2, #S_2=1, #S_3=#S_4=0.
>
> I believe, as T approaches infinity, the ratio #S_k / sum_t (#S_t)
> converges to fixed number or each k. By fixed I mean, if I repeat this
> process, by generating another randomly concatenation of the given
> string set on the tape, I get the same ratios.
>
> Is my conjecture correct? If so, is it possible to express #S_k /
> sum_t (#S_t) in terms of L_1 through L_n where L_k is the length of
> S_k? Naturally, a longer S_k has less chance to appear on the tape as
> opposed to a smaller S_k. However, I do not know the exact
> relationship.
>
> Your help would be greatly appreciated.

In article
<c991a082-ab7f-43b1-9181-0fafcebb603f(a)c3g2000yqd.googlegroups.com>,
Golabi Doon <golabidoon(a)gmail.com> wrote:

> I forgot to mention the simplying assumption that when choosing a
> string to put it on the tape, all the strings have equal chance to be
> chosen, i.e. the probablity of choosing any string S_k is equal to 1/
> n.

Your conjecture is true, but trivially so. (#S_k)/sum_t(#S_t)
approaches 1/n for every k. As the tape gets longer, the length of the
final string (which, as you say, is more likely to be a short one than a
long one) contributes less and less to the averages. As T -> oo its
contribution becomes vanishingly small. All the other (i.e., non-final)
strings are uniformly distributed.


--
---------------------------
| BBB b \ Barbara at LivingHistory stop co stop uk
| B B aa rrr b |
| BBB a a r bbb | Quidquid latine dictum sit,
| B B a a r b b | altum viditur.
| BBB aa a r bbb |
-----------------------------
From: Barb Knox on
In article <see-6FAF0F.07585502122009(a)mail.eternal-september.org>,
Barb Knox <see(a)sig.below> wrote:

> In article
> <7a051b59-5b02-47ca-b581-9bb1065ffa70(a)d21g2000yqn.googlegroups.com>,
> Golabi Doon <golabidoon(a)gmail.com> wrote:
>
> > Hello folks,
> >
> > I would really appreciate if some one could help me on the following
> > problem.
> >
> > Given n strings S_1 through S_n, where each S_k is simply a
> > concatenation a of letter, say "a", but of arbitrary yet finite
> > length. For example, S_1=aa , S_2=a , S_3=aa , S_4=aaa.
> >
> > Consider a tape of length T. We start from the head of the tape,
> > randomly choose a S_k, put it on the tape and move fto the next free
> > cell, and then randomly choose another S_k and paste it in and so on.
> > In other words, we concatenate randomly chosen strings from the given
> > set of strings until we reach a point where the size of the chosen S_k
> > is larger than the free space left on the tape.
> >
> > Example: T=7.
> > A randomly chosen string happens to be S_1, the tape becomes aa.
> > Next string happens to be S_2, the tape becomes aaa.
> > Next string happens to be S_1, the tape becomes aaaaa
> > Next string happens to be S_4, the tape is too full for that, so halt.
> >
> > Let's denote the number of times that S_k appears on the tape by
> > #S_k.In the previous example #S_1=2, #S_2=1, #S_3=#S_4=0.
> >
> > I believe, as T approaches infinity, the ratio #S_k / sum_t (#S_t)
> > converges to fixed number or each k. By fixed I mean, if I repeat this
> > process, by generating another randomly concatenation of the given
> > string set on the tape, I get the same ratios.
> >
> > Is my conjecture correct? If so, is it possible to express #S_k /
> > sum_t (#S_t) in terms of L_1 through L_n where L_k is the length of
> > S_k? Naturally, a longer S_k has less chance to appear on the tape as
> > opposed to a smaller S_k. However, I do not know the exact
> > relationship.
> >
> > Your help would be greatly appreciated.
>
> In article
> <c991a082-ab7f-43b1-9181-0fafcebb603f(a)c3g2000yqd.googlegroups.com>,
> Golabi Doon <golabidoon(a)gmail.com> wrote:
>
> > I forgot to mention the simplying assumption that when choosing a
> > string to put it on the tape, all the strings have equal chance to be
> > chosen, i.e. the probablity of choosing any string S_k is equal to 1/
> > n.
>
> Your conjecture is true, but trivially so. (#S_k)/sum_t(#S_t)
> approaches 1/n for every k. As the tape gets longer,


> the length of the final string (which, as you say,
> is more likely to be a short one than a long one)

Oh bother. That turns out to be false, even though I found it
intuitively compelling. Certainly the final+1 string (i.e., the one
which runs off the end of the tape and is thus discarded) is more likely
to be long than short, but ALL the other strings, including the final
non-discarded one, have uniformly distributed lengths. That is, even
for *small* tape lengths T, (#S_k)/sum_t(#S_t) averages 1/n for all k.
The only constraint is that T not be < n.

I was surprised by this. I would like to hear from you (yes, *you* too,
gentle reader) what *your* intuitions say about the expected length of
the final non-discarded string.

I have taken a pry-bar to my intuitions about this and have managed to
shove them into somewhat better conformance with reality, but they're
still wobbly. Does anyone reading this have a good intuitive
combinatorial (i.e., not probabilistic) argument showing that
considering all possible completed tapes, every S_k occurs with equal
frequency regardless of the tape length?

Does this result have a common name?



> contributes less and less to the averages. As T -> oo its
> contribution becomes vanishingly small. All the other (i.e., non-final)
> strings are uniformly distributed.



--
---------------------------
| BBB b \ Barbara at LivingHistory stop co stop uk
| B B aa rrr b |
| BBB a a r bbb | Quidquid latine dictum sit,
| B B a a r b b | altum viditur.
| BBB aa a r bbb |
-----------------------------
From: Tim Little on
>> > I believe, as T approaches infinity, the ratio #S_k / sum_t (#S_t)
>> > converges to fixed number or each k.

Your proposition is slightly ambiguous: you mention the ratio
(#S_k)/sum_t(#S_t), but do not mention whether this should be
interpreted as a ratio of expected values, or an expected value of the
ratio.

For example, suppose |S_1| = 1, |S_2| = 5, T = 5. Then E(#S_2) = 1/2
(it appears iff it appears first), while E(#S_1) = 31/32. So then
E(#S_1) / E(sum_t(#S_t)) = 31/47 and E(#S_2) / E(sum_t(#S_t)) = 16/47.

However, E(#S_1 / sum_t(#S_t)) = E(#S_2 / sum_t(#S_t)) = 1/2.


On 2009-12-02, Barb Knox <see(a)sig.below> wrote:
> Does anyone reading this have a good intuitive combinatorial (i.e.,
> not probabilistic) argument showing that considering all possible
> completed tapes, every S_k occurs with equal frequency regardless of
> the tape length?

As the above example indicates, they don't. Short strings do occur
more frequently - in that example, the shorter string appears 31/16
times more frequently overall.

If you disregard the probabilities of each completed tape and just
look at the combinatorial possibilities, then the disparity is even
greater: the possibilities are 1, 11, 111, 1111, 11111, and 2.

It is true that the final strings are uniformly distributed, by
reversal symmetry and the condition that the initial strings are.


- Tim