From: MM on
On Wed, 02 Jun 2010 14:29:13 -0400, GS <gesansom(a)netscape.net> wrote:

>After serious thinking Helmut Meukel wrote :
>> "GS" <gesansom(a)netscape.net> schrieb im Newsbeitrag
>> news:hu603v$388$1(a)news.eternal-september.org...
>>> Bob Butler explained :
>>>> "GS" <gesansom(a)netscape.net> wrote in message
>>>> news:hu5p6c$9hs$1(a)news.eternal-september.org...
>>>>> Larry Serflaten expressed precisely :
>>>>>> txt = Left$(txt, InStr(txt, " "))
>>>>>
>>>>> Just curious...
>>>>> Why are you including the space in the collection of headings?
>>>>>
>>>>> Shouldn't this
>>>>> txt = Left$(txt, InStr(txt, " "))
>>>>>
>>>>> be this?
>>>>> txt = Left$(txt, InStr(txt, " ") - 1)
>>>>
>>>> if there is no space then the first example will return a zero-length
>>>> string and the second will raise an error. Depending on the input and the
>>>> desired results that needs to be considered.
>>>
>>> I'm assuming your comment assumes txt is an empty string, in which case I
>>> agree. In the case of Larry's code example, iteration of the list doesn't
>>> leave txt an empty string.
>>>
>>
>>
>> Garry,
>> if there is an entry like
>> Grapefruits
>> with only one word, then you would have the same problem.
>>
>> Helmut.
>>
>> BTW, to the OP: how about an entry like Space Shuttles?
>> The heading should be Space Shuttles not Space, shouldn't it?
>
>I agree. See my reply to Bob.
>The info is good to know, and I appreciate sharing it with me.

Yes, indeed. "Space Shuttles" would be a possible key, as might
"Apples To Be Sold Today", or "Apples Are Good For You". There's no
way one can tell whether it's the first space, the second, the third
and so on. But if there is a series of keys like

"Apples Are Good For You and other fruit, too"
"Apples Are Good For You but only on Tuesdays"
"Apples Are Good For You"
"Apples Are Good For You Class 101"
"Apples Are Good For You, even crab apples"
"Apples Are Good For You (Department for Food Hygiene)"

We humans can spot straight away where the commonality is. But
software?

I reckon maybe a concordance index is the way to go.

MM
From: Tom Shelton on
MM presented the following explanation :
> On Wed, 02 Jun 2010 14:29:13 -0400, GS <gesansom(a)netscape.net> wrote:
>
>> After serious thinking Helmut Meukel wrote :
>>> "GS" <gesansom(a)netscape.net> schrieb im Newsbeitrag
>>> news:hu603v$388$1(a)news.eternal-september.org...
>>>> Bob Butler explained :
>>>>> "GS" <gesansom(a)netscape.net> wrote in message
>>>>> news:hu5p6c$9hs$1(a)news.eternal-september.org...
>>>>>> Larry Serflaten expressed precisely :
>>>>>>> txt = Left$(txt, InStr(txt, " "))
>>>>>>
>>>>>> Just curious...
>>>>>> Why are you including the space in the collection of headings?
>>>>>>
>>>>>> Shouldn't this
>>>>>> txt = Left$(txt, InStr(txt, " "))
>>>>>>
>>>>>> be this?
>>>>>> txt = Left$(txt, InStr(txt, " ") - 1)
>>>>>
>>>>> if there is no space then the first example will return a zero-length
>>>>> string and the second will raise an error. Depending on the input and
>>>>> the desired results that needs to be considered.
>>>>
>>>> I'm assuming your comment assumes txt is an empty string, in which case I
>>>> agree. In the case of Larry's code example, iteration of the list doesn't
>>>> leave txt an empty string.
>>>>
>>>
>>>
>>> Garry,
>>> if there is an entry like
>>> Grapefruits
>>> with only one word, then you would have the same problem.
>>>
>>> Helmut.
>>>
>>> BTW, to the OP: how about an entry like Space Shuttles?
>>> The heading should be Space Shuttles not Space, shouldn't it?
>>
>> I agree. See my reply to Bob.
>> The info is good to know, and I appreciate sharing it with me.
>
> Yes, indeed. "Space Shuttles" would be a possible key, as might
> "Apples To Be Sold Today", or "Apples Are Good For You". There's no
> way one can tell whether it's the first space, the second, the third
> and so on. But if there is a series of keys like
>
> "Apples Are Good For You and other fruit, too"
> "Apples Are Good For You but only on Tuesdays"
> "Apples Are Good For You"
> "Apples Are Good For You Class 101"
> "Apples Are Good For You, even crab apples"
> "Apples Are Good For You (Department for Food Hygiene)"
>
> We humans can spot straight away where the commonality is. But
> software?
>
> I reckon maybe a concordance index is the way to go.
>
> MM

Just to make sure I understand the problem - are you saying you have a
bunch of strings and your trying to find out hte common parts and use
that as a key of some sort? Like:

"Apples Are Good For You and other fruit, too"
"Apples Are Good For You but only on Tuesdays"
"Apples Are Good For You"
"Cars go slow"
"Cars go fast"
"Space Shuttles"
"Space Savers"

You would get:
"Apples Are Good For You"
"Cars go"
"Space"

Is that what your looking for?

--
Tom Shelton


From: MM on
On Wed, 02 Jun 2010 13:15:50 -0600, Tom Shelton
<tom_shelton(a)comcast.invalid> wrote:

>MM presented the following explanation :
>> On Wed, 02 Jun 2010 14:29:13 -0400, GS <gesansom(a)netscape.net> wrote:
>>
>>> After serious thinking Helmut Meukel wrote :
>>>> "GS" <gesansom(a)netscape.net> schrieb im Newsbeitrag
>>>> news:hu603v$388$1(a)news.eternal-september.org...
>>>>> Bob Butler explained :
>>>>>> "GS" <gesansom(a)netscape.net> wrote in message
>>>>>> news:hu5p6c$9hs$1(a)news.eternal-september.org...
>>>>>>> Larry Serflaten expressed precisely :
>>>>>>>> txt = Left$(txt, InStr(txt, " "))
>>>>>>>
>>>>>>> Just curious...
>>>>>>> Why are you including the space in the collection of headings?
>>>>>>>
>>>>>>> Shouldn't this
>>>>>>> txt = Left$(txt, InStr(txt, " "))
>>>>>>>
>>>>>>> be this?
>>>>>>> txt = Left$(txt, InStr(txt, " ") - 1)
>>>>>>
>>>>>> if there is no space then the first example will return a zero-length
>>>>>> string and the second will raise an error. Depending on the input and
>>>>>> the desired results that needs to be considered.
>>>>>
>>>>> I'm assuming your comment assumes txt is an empty string, in which case I
>>>>> agree. In the case of Larry's code example, iteration of the list doesn't
>>>>> leave txt an empty string.
>>>>>
>>>>
>>>>
>>>> Garry,
>>>> if there is an entry like
>>>> Grapefruits
>>>> with only one word, then you would have the same problem.
>>>>
>>>> Helmut.
>>>>
>>>> BTW, to the OP: how about an entry like Space Shuttles?
>>>> The heading should be Space Shuttles not Space, shouldn't it?
>>>
>>> I agree. See my reply to Bob.
>>> The info is good to know, and I appreciate sharing it with me.
>>
>> Yes, indeed. "Space Shuttles" would be a possible key, as might
>> "Apples To Be Sold Today", or "Apples Are Good For You". There's no
>> way one can tell whether it's the first space, the second, the third
>> and so on. But if there is a series of keys like
>>
>> "Apples Are Good For You and other fruit, too"
>> "Apples Are Good For You but only on Tuesdays"
>> "Apples Are Good For You"
>> "Apples Are Good For You Class 101"
>> "Apples Are Good For You, even crab apples"
>> "Apples Are Good For You (Department for Food Hygiene)"
>>
>> We humans can spot straight away where the commonality is. But
>> software?
>>
>> I reckon maybe a concordance index is the way to go.
>>
>> MM
>
>Just to make sure I understand the problem - are you saying you have a
>bunch of strings and your trying to find out hte common parts and use
>that as a key of some sort? Like:
>
>"Apples Are Good For You and other fruit, too"
>"Apples Are Good For You but only on Tuesdays"
>"Apples Are Good For You"
>"Cars go slow"
>"Cars go fast"
>"Space Shuttles"
>"Space Savers"
>
>You would get:
>"Apples Are Good For You"

Correct.

>"Cars go"

Correct.

>"Space"

It depends. If there were loads of entries starting with Space
Shuttles and another load starting with Space Savers, I'd want to
group by both Space Shuttles and Space Savers, assuming that both were
followed by 'other stuff', e.g.

"Space Shuttles are go"
"Space Savers will help you save space"
"Space Shuttles may be the answer to interplanetary travel"
"Space Savers at your beck and call"
"Space Shuttles anytime, anywhere"

MM
From: Tom Shelton on
MM has brought this to us :
> On Wed, 02 Jun 2010 13:15:50 -0600, Tom Shelton
> <tom_shelton(a)comcast.invalid> wrote:
>
>> MM presented the following explanation :
>>> On Wed, 02 Jun 2010 14:29:13 -0400, GS <gesansom(a)netscape.net> wrote:
>>>
>>>> After serious thinking Helmut Meukel wrote :
>>>>> "GS" <gesansom(a)netscape.net> schrieb im Newsbeitrag
>>>>> news:hu603v$388$1(a)news.eternal-september.org...
>>>>>> Bob Butler explained :
>>>>>>> "GS" <gesansom(a)netscape.net> wrote in message
>>>>>>> news:hu5p6c$9hs$1(a)news.eternal-september.org...
>>>>>>>> Larry Serflaten expressed precisely :
>>>>>>>>> txt = Left$(txt, InStr(txt, " "))
>>>>>>>>
>>>>>>>> Just curious...
>>>>>>>> Why are you including the space in the collection of headings?
>>>>>>>>
>>>>>>>> Shouldn't this
>>>>>>>> txt = Left$(txt, InStr(txt, " "))
>>>>>>>>
>>>>>>>> be this?
>>>>>>>> txt = Left$(txt, InStr(txt, " ") - 1)
>>>>>>>
>>>>>>> if there is no space then the first example will return a zero-length
>>>>>>> string and the second will raise an error. Depending on the input and
>>>>>>> the desired results that needs to be considered.
>>>>>>
>>>>>> I'm assuming your comment assumes txt is an empty string, in which case
>>>>>> I agree. In the case of Larry's code example, iteration of the list
>>>>>> doesn't leave txt an empty string.
>>>>>>
>>>>>
>>>>>
>>>>> Garry,
>>>>> if there is an entry like
>>>>> Grapefruits
>>>>> with only one word, then you would have the same problem.
>>>>>
>>>>> Helmut.
>>>>>
>>>>> BTW, to the OP: how about an entry like Space Shuttles?
>>>>> The heading should be Space Shuttles not Space, shouldn't it?
>>>>
>>>> I agree. See my reply to Bob.
>>>> The info is good to know, and I appreciate sharing it with me.
>>>
>>> Yes, indeed. "Space Shuttles" would be a possible key, as might
>>> "Apples To Be Sold Today", or "Apples Are Good For You". There's no
>>> way one can tell whether it's the first space, the second, the third
>>> and so on. But if there is a series of keys like
>>>
>>> "Apples Are Good For You and other fruit, too"
>>> "Apples Are Good For You but only on Tuesdays"
>>> "Apples Are Good For You"
>>> "Apples Are Good For You Class 101"
>>> "Apples Are Good For You, even crab apples"
>>> "Apples Are Good For You (Department for Food Hygiene)"
>>>
>>> We humans can spot straight away where the commonality is. But
>>> software?
>>>
>>> I reckon maybe a concordance index is the way to go.
>>>
>>> MM
>>
>> Just to make sure I understand the problem - are you saying you have a
>> bunch of strings and your trying to find out hte common parts and use
>> that as a key of some sort? Like:
>>
>> "Apples Are Good For You and other fruit, too"
>> "Apples Are Good For You but only on Tuesdays"
>> "Apples Are Good For You"
>> "Cars go slow"
>> "Cars go fast"
>> "Space Shuttles"
>> "Space Savers"
>>
>> You would get:
>> "Apples Are Good For You"
>
> Correct.
>
>> "Cars go"
>
> Correct.
>
>> "Space"
>
> It depends. If there were loads of entries starting with Space
> Shuttles and another load starting with Space Savers, I'd want to
> group by both Space Shuttles and Space Savers, assuming that both were
> followed by 'other stuff', e.g.
>
> "Space Shuttles are go"
> "Space Savers will help you save space"
> "Space Shuttles may be the answer to interplanetary travel"
> "Space Savers at your beck and call"
> "Space Shuttles anytime, anywhere"
>
> MM

In other words, your looking at the longest common prefix... This is
sort of a specialized version of the longest common substring problem.
I did some googling, and I can't really find a vb6 implementation of
this...

I converted a java diff library sometime back to C# - which if memory
serves me employees a longest common substring algorithm. If I get
sometime tonight, I'll see if I can't convert it to VB6. Otherwise,
you can google "longest common substring" or "longest common prefix".
There are seems to be lots of information on various ways to implement
this...

--
Tom Shelton


From: GS on
After serious thinking MM wrote :
> On Wed, 02 Jun 2010 14:29:13 -0400, GS <gesansom(a)netscape.net> wrote:
>
>> After serious thinking Helmut Meukel wrote :
>>> "GS" <gesansom(a)netscape.net> schrieb im Newsbeitrag
>>> news:hu603v$388$1(a)news.eternal-september.org...
>>>> Bob Butler explained :
>>>>> "GS" <gesansom(a)netscape.net> wrote in message
>>>>> news:hu5p6c$9hs$1(a)news.eternal-september.org...
>>>>>> Larry Serflaten expressed precisely :
>>>>>>> txt = Left$(txt, InStr(txt, " "))
>>>>>>
>>>>>> Just curious...
>>>>>> Why are you including the space in the collection of headings?
>>>>>>
>>>>>> Shouldn't this
>>>>>> txt = Left$(txt, InStr(txt, " "))
>>>>>>
>>>>>> be this?
>>>>>> txt = Left$(txt, InStr(txt, " ") - 1)
>>>>>
>>>>> if there is no space then the first example will return a zero-length
>>>>> string and the second will raise an error. Depending on the input and
>>>>> the desired results that needs to be considered.
>>>>
>>>> I'm assuming your comment assumes txt is an empty string, in which case I
>>>> agree. In the case of Larry's code example, iteration of the list doesn't
>>>> leave txt an empty string.
>>>>
>>>
>>>
>>> Garry,
>>> if there is an entry like
>>> Grapefruits
>>> with only one word, then you would have the same problem.
>>>
>>> Helmut.
>>>
>>> BTW, to the OP: how about an entry like Space Shuttles?
>>> The heading should be Space Shuttles not Space, shouldn't it?
>>
>> I agree. See my reply to Bob.
>> The info is good to know, and I appreciate sharing it with me.
>
> Yes, indeed. "Space Shuttles" would be a possible key, as might
> "Apples To Be Sold Today", or "Apples Are Good For You". There's no
> way one can tell whether it's the first space, the second, the third
> and so on. But if there is a series of keys like
>
> "Apples Are Good For You and other fruit, too"
> "Apples Are Good For You but only on Tuesdays"
> "Apples Are Good For You"
> "Apples Are Good For You Class 101"
> "Apples Are Good For You, even crab apples"
> "Apples Are Good For You (Department for Food Hygiene)"
>
> We humans can spot straight away where the commonality is. But
> software?
>
> I reckon maybe a concordance index is the way to go.
>
> MM

What's a concordance index?
So far, all your example possibilities have a patter; the heading
portion is all in proper case.

Do you have control over the construction of the strings? If so, could
you not use a delimiter at the end of the heading?

Random chaos is really difficult to work with!<g>

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc