From: MM on
On Wed, 02 Jun 2010 14:39:15 -0600, Tom Shelton
<tom_shelton(a)comcast.invalid> wrote:

>MM has brought this to us :
>> On Wed, 02 Jun 2010 13:15:50 -0600, Tom Shelton
>> <tom_shelton(a)comcast.invalid> wrote:
>>
>>> MM presented the following explanation :
>>>> On Wed, 02 Jun 2010 14:29:13 -0400, GS <gesansom(a)netscape.net> wrote:
>>>>
>>>>> After serious thinking Helmut Meukel wrote :
>>>>>> "GS" <gesansom(a)netscape.net> schrieb im Newsbeitrag
>>>>>> news:hu603v$388$1(a)news.eternal-september.org...
>>>>>>> Bob Butler explained :
>>>>>>>> "GS" <gesansom(a)netscape.net> wrote in message
>>>>>>>> news:hu5p6c$9hs$1(a)news.eternal-september.org...
>>>>>>>>> Larry Serflaten expressed precisely :
>>>>>>>>>> txt = Left$(txt, InStr(txt, " "))
>>>>>>>>>
>>>>>>>>> Just curious...
>>>>>>>>> Why are you including the space in the collection of headings?
>>>>>>>>>
>>>>>>>>> Shouldn't this
>>>>>>>>> txt = Left$(txt, InStr(txt, " "))
>>>>>>>>>
>>>>>>>>> be this?
>>>>>>>>> txt = Left$(txt, InStr(txt, " ") - 1)
>>>>>>>>
>>>>>>>> if there is no space then the first example will return a zero-length
>>>>>>>> string and the second will raise an error. Depending on the input and
>>>>>>>> the desired results that needs to be considered.
>>>>>>>
>>>>>>> I'm assuming your comment assumes txt is an empty string, in which case
>>>>>>> I agree. In the case of Larry's code example, iteration of the list
>>>>>>> doesn't leave txt an empty string.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Garry,
>>>>>> if there is an entry like
>>>>>> Grapefruits
>>>>>> with only one word, then you would have the same problem.
>>>>>>
>>>>>> Helmut.
>>>>>>
>>>>>> BTW, to the OP: how about an entry like Space Shuttles?
>>>>>> The heading should be Space Shuttles not Space, shouldn't it?
>>>>>
>>>>> I agree. See my reply to Bob.
>>>>> The info is good to know, and I appreciate sharing it with me.
>>>>
>>>> Yes, indeed. "Space Shuttles" would be a possible key, as might
>>>> "Apples To Be Sold Today", or "Apples Are Good For You". There's no
>>>> way one can tell whether it's the first space, the second, the third
>>>> and so on. But if there is a series of keys like
>>>>
>>>> "Apples Are Good For You and other fruit, too"
>>>> "Apples Are Good For You but only on Tuesdays"
>>>> "Apples Are Good For You"
>>>> "Apples Are Good For You Class 101"
>>>> "Apples Are Good For You, even crab apples"
>>>> "Apples Are Good For You (Department for Food Hygiene)"
>>>>
>>>> We humans can spot straight away where the commonality is. But
>>>> software?
>>>>
>>>> I reckon maybe a concordance index is the way to go.
>>>>
>>>> MM
>>>
>>> Just to make sure I understand the problem - are you saying you have a
>>> bunch of strings and your trying to find out hte common parts and use
>>> that as a key of some sort? Like:
>>>
>>> "Apples Are Good For You and other fruit, too"
>>> "Apples Are Good For You but only on Tuesdays"
>>> "Apples Are Good For You"
>>> "Cars go slow"
>>> "Cars go fast"
>>> "Space Shuttles"
>>> "Space Savers"
>>>
>>> You would get:
>>> "Apples Are Good For You"
>>
>> Correct.
>>
>>> "Cars go"
>>
>> Correct.
>>
>>> "Space"
>>
>> It depends. If there were loads of entries starting with Space
>> Shuttles and another load starting with Space Savers, I'd want to
>> group by both Space Shuttles and Space Savers, assuming that both were
>> followed by 'other stuff', e.g.
>>
>> "Space Shuttles are go"
>> "Space Savers will help you save space"
>> "Space Shuttles may be the answer to interplanetary travel"
>> "Space Savers at your beck and call"
>> "Space Shuttles anytime, anywhere"
>>
>> MM
>
>In other words, your looking at the longest common prefix... This is
>sort of a specialized version of the longest common substring problem.
>I did some googling, and I can't really find a vb6 implementation of
>this...
>
>I converted a java diff library sometime back to C# - which if memory
>serves me employees a longest common substring algorithm. If I get
>sometime tonight, I'll see if I can't convert it to VB6. Otherwise,
>you can google "longest common substring" or "longest common prefix".
>There are seems to be lots of information on various ways to implement
>this...

Thanks for the feedback. I'll start the Googling (but tomorrow; I'm
knackered now!).

MM
From: MM on
On Wed, 02 Jun 2010 16:46:26 -0400, GS <gesansom(a)netscape.net> wrote:

>After serious thinking MM wrote :
>> On Wed, 02 Jun 2010 14:29:13 -0400, GS <gesansom(a)netscape.net> wrote:
>>
>>> After serious thinking Helmut Meukel wrote :
>>>> "GS" <gesansom(a)netscape.net> schrieb im Newsbeitrag
>>>> news:hu603v$388$1(a)news.eternal-september.org...
>>>>> Bob Butler explained :
>>>>>> "GS" <gesansom(a)netscape.net> wrote in message
>>>>>> news:hu5p6c$9hs$1(a)news.eternal-september.org...
>>>>>>> Larry Serflaten expressed precisely :
>>>>>>>> txt = Left$(txt, InStr(txt, " "))
>>>>>>>
>>>>>>> Just curious...
>>>>>>> Why are you including the space in the collection of headings?
>>>>>>>
>>>>>>> Shouldn't this
>>>>>>> txt = Left$(txt, InStr(txt, " "))
>>>>>>>
>>>>>>> be this?
>>>>>>> txt = Left$(txt, InStr(txt, " ") - 1)
>>>>>>
>>>>>> if there is no space then the first example will return a zero-length
>>>>>> string and the second will raise an error. Depending on the input and
>>>>>> the desired results that needs to be considered.
>>>>>
>>>>> I'm assuming your comment assumes txt is an empty string, in which case I
>>>>> agree. In the case of Larry's code example, iteration of the list doesn't
>>>>> leave txt an empty string.
>>>>>
>>>>
>>>>
>>>> Garry,
>>>> if there is an entry like
>>>> Grapefruits
>>>> with only one word, then you would have the same problem.
>>>>
>>>> Helmut.
>>>>
>>>> BTW, to the OP: how about an entry like Space Shuttles?
>>>> The heading should be Space Shuttles not Space, shouldn't it?
>>>
>>> I agree. See my reply to Bob.
>>> The info is good to know, and I appreciate sharing it with me.
>>
>> Yes, indeed. "Space Shuttles" would be a possible key, as might
>> "Apples To Be Sold Today", or "Apples Are Good For You". There's no
>> way one can tell whether it's the first space, the second, the third
>> and so on. But if there is a series of keys like
>>
>> "Apples Are Good For You and other fruit, too"
>> "Apples Are Good For You but only on Tuesdays"
>> "Apples Are Good For You"
>> "Apples Are Good For You Class 101"
>> "Apples Are Good For You, even crab apples"
>> "Apples Are Good For You (Department for Food Hygiene)"
>>
>> We humans can spot straight away where the commonality is. But
>> software?
>>
>> I reckon maybe a concordance index is the way to go.
>>
>> MM
>
>What's a concordance index?

One succinct explanation is:

"An alphabetical index of all the words in a text or corpus of texts,
showing every contextual occurrence of a word: a concordance of
Shakespeare's works."

Here's an example, keying on men:

Mortimer, Leading the MEN of Herefordshire to
s of the moon; and let MEN say we be men of go
and let men say we be MEN of good government,
s that are the moon's MEN doth ebb and flow l
ave set a match. O, if MEN were to be saved by
dshill shall rob those MEN that we have alread
o much shall I falsify MEN's hopes; And like b
; Redeeming time when MEN think least I will.
in time to come, That MEN of your nobility an

>So far, all your example possibilities have a patter; the heading
>portion is all in proper case.
>
>Do you have control over the construction of the strings?

No. The strings can come from anywhere. I just need to make some sense
of them by reducing them to their smallest common denominators. So
instead of ten thousand "Apples" with various other words tacked on
I'd have a main heading or keyword "Apples".

MM
From: GS on
MM submitted this idea :
> On Wed, 02 Jun 2010 16:46:26 -0400, GS <gesansom(a)netscape.net> wrote:
>
>> After serious thinking MM wrote :
>>> On Wed, 02 Jun 2010 14:29:13 -0400, GS <gesansom(a)netscape.net> wrote:
>>>
>>>> After serious thinking Helmut Meukel wrote :
>>>>> "GS" <gesansom(a)netscape.net> schrieb im Newsbeitrag
>>>>> news:hu603v$388$1(a)news.eternal-september.org...
>>>>>> Bob Butler explained :
>>>>>>> "GS" <gesansom(a)netscape.net> wrote in message
>>>>>>> news:hu5p6c$9hs$1(a)news.eternal-september.org...
>>>>>>>> Larry Serflaten expressed precisely :
>>>>>>>>> txt = Left$(txt, InStr(txt, " "))
>>>>>>>>
>>>>>>>> Just curious...
>>>>>>>> Why are you including the space in the collection of headings?
>>>>>>>>
>>>>>>>> Shouldn't this
>>>>>>>> txt = Left$(txt, InStr(txt, " "))
>>>>>>>>
>>>>>>>> be this?
>>>>>>>> txt = Left$(txt, InStr(txt, " ") - 1)
>>>>>>>
>>>>>>> if there is no space then the first example will return a zero-length
>>>>>>> string and the second will raise an error. Depending on the input and
>>>>>>> the desired results that needs to be considered.
>>>>>>
>>>>>> I'm assuming your comment assumes txt is an empty string, in which case
>>>>>> I agree. In the case of Larry's code example, iteration of the list
>>>>>> doesn't leave txt an empty string.
>>>>>>
>>>>>
>>>>>
>>>>> Garry,
>>>>> if there is an entry like
>>>>> Grapefruits
>>>>> with only one word, then you would have the same problem.
>>>>>
>>>>> Helmut.
>>>>>
>>>>> BTW, to the OP: how about an entry like Space Shuttles?
>>>>> The heading should be Space Shuttles not Space, shouldn't it?
>>>>
>>>> I agree. See my reply to Bob.
>>>> The info is good to know, and I appreciate sharing it with me.
>>>
>>> Yes, indeed. "Space Shuttles" would be a possible key, as might
>>> "Apples To Be Sold Today", or "Apples Are Good For You". There's no
>>> way one can tell whether it's the first space, the second, the third
>>> and so on. But if there is a series of keys like
>>>
>>> "Apples Are Good For You and other fruit, too"
>>> "Apples Are Good For You but only on Tuesdays"
>>> "Apples Are Good For You"
>>> "Apples Are Good For You Class 101"
>>> "Apples Are Good For You, even crab apples"
>>> "Apples Are Good For You (Department for Food Hygiene)"
>>>
>>> We humans can spot straight away where the commonality is. But
>>> software?
>>>
>>> I reckon maybe a concordance index is the way to go.
>>>
>>> MM
>>
>> What's a concordance index?
>
> One succinct explanation is:
>
> "An alphabetical index of all the words in a text or corpus of texts,
> showing every contextual occurrence of a word: a concordance of
> Shakespeare's works."
>
> Here's an example, keying on men:
>
> Mortimer, Leading the MEN of Herefordshire to
> s of the moon; and let MEN say we be men of go
> and let men say we be MEN of good government,
> s that are the moon's MEN doth ebb and flow l
> ave set a match. O, if MEN were to be saved by
> dshill shall rob those MEN that we have alread
> o much shall I falsify MEN's hopes; And like b
> ; Redeeming time when MEN think least I will.
> in time to come, That MEN of your nobility an
>
>> So far, all your example possibilities have a patter; the heading
>> portion is all in proper case.
>>
>> Do you have control over the construction of the strings?
>
> No. The strings can come from anywhere. I just need to make some sense
> of them by reducing them to their smallest common denominators. So
> instead of ten thousand "Apples" with various other words tacked on
> I'd have a main heading or keyword "Apples".
>
> MM

Thanks! So it's basically what a search feature in a browser or PDF
reader would do. I didn't realize it was a specific 'type' of indexing.

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


From: MM on
On Wed, 02 Jun 2010 14:39:15 -0600, Tom Shelton
<tom_shelton(a)comcast.invalid> wrote:

>In other words, your looking at the longest common prefix... This is
>sort of a specialized version of the longest common substring problem.
>I did some googling, and I can't really find a vb6 implementation of
>this...

Having read parts of The Algorithm Design Manual via Look Inside on
Amazon, I think it may in fact be a "Shortest Common Superstring"
problem. This research is opening up a whole new avenue of interesting
stuff! Thank goodness I'm retired and can dabble with it all day.

MM
From: Tom Shelton on
MM formulated the question :
> On Wed, 02 Jun 2010 14:39:15 -0600, Tom Shelton
> <tom_shelton(a)comcast.invalid> wrote:
>
>> In other words, your looking at the longest common prefix... This is
>> sort of a specialized version of the longest common substring problem.
>> I did some googling, and I can't really find a vb6 implementation of
>> this...
>
> Having read parts of The Algorithm Design Manual via Look Inside on
> Amazon, I think it may in fact be a "Shortest Common Superstring"
> problem. This research is opening up a whole new avenue of interesting
> stuff! Thank goodness I'm retired and can dabble with it all day.
>
> MM

Interesting... Now I've learned something new :)

--
Tom Shelton