From: MM on 2 Jun 2010 17:01 On Wed, 02 Jun 2010 14:39:15 -0600, Tom Shelton <tom_shelton(a)comcast.invalid> wrote: >MM has brought this to us : >> On Wed, 02 Jun 2010 13:15:50 -0600, Tom Shelton >> <tom_shelton(a)comcast.invalid> wrote: >> >>> MM presented the following explanation : >>>> On Wed, 02 Jun 2010 14:29:13 -0400, GS <gesansom(a)netscape.net> wrote: >>>> >>>>> After serious thinking Helmut Meukel wrote : >>>>>> "GS" <gesansom(a)netscape.net> schrieb im Newsbeitrag >>>>>> news:hu603v$388$1(a)news.eternal-september.org... >>>>>>> Bob Butler explained : >>>>>>>> "GS" <gesansom(a)netscape.net> wrote in message >>>>>>>> news:hu5p6c$9hs$1(a)news.eternal-september.org... >>>>>>>>> Larry Serflaten expressed precisely : >>>>>>>>>> txt = Left$(txt, InStr(txt, " ")) >>>>>>>>> >>>>>>>>> Just curious... >>>>>>>>> Why are you including the space in the collection of headings? >>>>>>>>> >>>>>>>>> Shouldn't this >>>>>>>>> txt = Left$(txt, InStr(txt, " ")) >>>>>>>>> >>>>>>>>> be this? >>>>>>>>> txt = Left$(txt, InStr(txt, " ") - 1) >>>>>>>> >>>>>>>> if there is no space then the first example will return a zero-length >>>>>>>> string and the second will raise an error. Depending on the input and >>>>>>>> the desired results that needs to be considered. >>>>>>> >>>>>>> I'm assuming your comment assumes txt is an empty string, in which case >>>>>>> I agree. In the case of Larry's code example, iteration of the list >>>>>>> doesn't leave txt an empty string. >>>>>>> >>>>>> >>>>>> >>>>>> Garry, >>>>>> if there is an entry like >>>>>> Grapefruits >>>>>> with only one word, then you would have the same problem. >>>>>> >>>>>> Helmut. >>>>>> >>>>>> BTW, to the OP: how about an entry like Space Shuttles? >>>>>> The heading should be Space Shuttles not Space, shouldn't it? >>>>> >>>>> I agree. See my reply to Bob. >>>>> The info is good to know, and I appreciate sharing it with me. >>>> >>>> Yes, indeed. "Space Shuttles" would be a possible key, as might >>>> "Apples To Be Sold Today", or "Apples Are Good For You". There's no >>>> way one can tell whether it's the first space, the second, the third >>>> and so on. But if there is a series of keys like >>>> >>>> "Apples Are Good For You and other fruit, too" >>>> "Apples Are Good For You but only on Tuesdays" >>>> "Apples Are Good For You" >>>> "Apples Are Good For You Class 101" >>>> "Apples Are Good For You, even crab apples" >>>> "Apples Are Good For You (Department for Food Hygiene)" >>>> >>>> We humans can spot straight away where the commonality is. But >>>> software? >>>> >>>> I reckon maybe a concordance index is the way to go. >>>> >>>> MM >>> >>> Just to make sure I understand the problem - are you saying you have a >>> bunch of strings and your trying to find out hte common parts and use >>> that as a key of some sort? Like: >>> >>> "Apples Are Good For You and other fruit, too" >>> "Apples Are Good For You but only on Tuesdays" >>> "Apples Are Good For You" >>> "Cars go slow" >>> "Cars go fast" >>> "Space Shuttles" >>> "Space Savers" >>> >>> You would get: >>> "Apples Are Good For You" >> >> Correct. >> >>> "Cars go" >> >> Correct. >> >>> "Space" >> >> It depends. If there were loads of entries starting with Space >> Shuttles and another load starting with Space Savers, I'd want to >> group by both Space Shuttles and Space Savers, assuming that both were >> followed by 'other stuff', e.g. >> >> "Space Shuttles are go" >> "Space Savers will help you save space" >> "Space Shuttles may be the answer to interplanetary travel" >> "Space Savers at your beck and call" >> "Space Shuttles anytime, anywhere" >> >> MM > >In other words, your looking at the longest common prefix... This is >sort of a specialized version of the longest common substring problem. >I did some googling, and I can't really find a vb6 implementation of >this... > >I converted a java diff library sometime back to C# - which if memory >serves me employees a longest common substring algorithm. If I get >sometime tonight, I'll see if I can't convert it to VB6. Otherwise, >you can google "longest common substring" or "longest common prefix". >There are seems to be lots of information on various ways to implement >this... Thanks for the feedback. I'll start the Googling (but tomorrow; I'm knackered now!). MM
From: MM on 2 Jun 2010 17:44 On Wed, 02 Jun 2010 16:46:26 -0400, GS <gesansom(a)netscape.net> wrote: >After serious thinking MM wrote : >> On Wed, 02 Jun 2010 14:29:13 -0400, GS <gesansom(a)netscape.net> wrote: >> >>> After serious thinking Helmut Meukel wrote : >>>> "GS" <gesansom(a)netscape.net> schrieb im Newsbeitrag >>>> news:hu603v$388$1(a)news.eternal-september.org... >>>>> Bob Butler explained : >>>>>> "GS" <gesansom(a)netscape.net> wrote in message >>>>>> news:hu5p6c$9hs$1(a)news.eternal-september.org... >>>>>>> Larry Serflaten expressed precisely : >>>>>>>> txt = Left$(txt, InStr(txt, " ")) >>>>>>> >>>>>>> Just curious... >>>>>>> Why are you including the space in the collection of headings? >>>>>>> >>>>>>> Shouldn't this >>>>>>> txt = Left$(txt, InStr(txt, " ")) >>>>>>> >>>>>>> be this? >>>>>>> txt = Left$(txt, InStr(txt, " ") - 1) >>>>>> >>>>>> if there is no space then the first example will return a zero-length >>>>>> string and the second will raise an error. Depending on the input and >>>>>> the desired results that needs to be considered. >>>>> >>>>> I'm assuming your comment assumes txt is an empty string, in which case I >>>>> agree. In the case of Larry's code example, iteration of the list doesn't >>>>> leave txt an empty string. >>>>> >>>> >>>> >>>> Garry, >>>> if there is an entry like >>>> Grapefruits >>>> with only one word, then you would have the same problem. >>>> >>>> Helmut. >>>> >>>> BTW, to the OP: how about an entry like Space Shuttles? >>>> The heading should be Space Shuttles not Space, shouldn't it? >>> >>> I agree. See my reply to Bob. >>> The info is good to know, and I appreciate sharing it with me. >> >> Yes, indeed. "Space Shuttles" would be a possible key, as might >> "Apples To Be Sold Today", or "Apples Are Good For You". There's no >> way one can tell whether it's the first space, the second, the third >> and so on. But if there is a series of keys like >> >> "Apples Are Good For You and other fruit, too" >> "Apples Are Good For You but only on Tuesdays" >> "Apples Are Good For You" >> "Apples Are Good For You Class 101" >> "Apples Are Good For You, even crab apples" >> "Apples Are Good For You (Department for Food Hygiene)" >> >> We humans can spot straight away where the commonality is. But >> software? >> >> I reckon maybe a concordance index is the way to go. >> >> MM > >What's a concordance index? One succinct explanation is: "An alphabetical index of all the words in a text or corpus of texts, showing every contextual occurrence of a word: a concordance of Shakespeare's works." Here's an example, keying on men: Mortimer, Leading the MEN of Herefordshire to s of the moon; and let MEN say we be men of go and let men say we be MEN of good government, s that are the moon's MEN doth ebb and flow l ave set a match. O, if MEN were to be saved by dshill shall rob those MEN that we have alread o much shall I falsify MEN's hopes; And like b ; Redeeming time when MEN think least I will. in time to come, That MEN of your nobility an >So far, all your example possibilities have a patter; the heading >portion is all in proper case. > >Do you have control over the construction of the strings? No. The strings can come from anywhere. I just need to make some sense of them by reducing them to their smallest common denominators. So instead of ten thousand "Apples" with various other words tacked on I'd have a main heading or keyword "Apples". MM
From: GS on 2 Jun 2010 19:32 MM submitted this idea : > On Wed, 02 Jun 2010 16:46:26 -0400, GS <gesansom(a)netscape.net> wrote: > >> After serious thinking MM wrote : >>> On Wed, 02 Jun 2010 14:29:13 -0400, GS <gesansom(a)netscape.net> wrote: >>> >>>> After serious thinking Helmut Meukel wrote : >>>>> "GS" <gesansom(a)netscape.net> schrieb im Newsbeitrag >>>>> news:hu603v$388$1(a)news.eternal-september.org... >>>>>> Bob Butler explained : >>>>>>> "GS" <gesansom(a)netscape.net> wrote in message >>>>>>> news:hu5p6c$9hs$1(a)news.eternal-september.org... >>>>>>>> Larry Serflaten expressed precisely : >>>>>>>>> txt = Left$(txt, InStr(txt, " ")) >>>>>>>> >>>>>>>> Just curious... >>>>>>>> Why are you including the space in the collection of headings? >>>>>>>> >>>>>>>> Shouldn't this >>>>>>>> txt = Left$(txt, InStr(txt, " ")) >>>>>>>> >>>>>>>> be this? >>>>>>>> txt = Left$(txt, InStr(txt, " ") - 1) >>>>>>> >>>>>>> if there is no space then the first example will return a zero-length >>>>>>> string and the second will raise an error. Depending on the input and >>>>>>> the desired results that needs to be considered. >>>>>> >>>>>> I'm assuming your comment assumes txt is an empty string, in which case >>>>>> I agree. In the case of Larry's code example, iteration of the list >>>>>> doesn't leave txt an empty string. >>>>>> >>>>> >>>>> >>>>> Garry, >>>>> if there is an entry like >>>>> Grapefruits >>>>> with only one word, then you would have the same problem. >>>>> >>>>> Helmut. >>>>> >>>>> BTW, to the OP: how about an entry like Space Shuttles? >>>>> The heading should be Space Shuttles not Space, shouldn't it? >>>> >>>> I agree. See my reply to Bob. >>>> The info is good to know, and I appreciate sharing it with me. >>> >>> Yes, indeed. "Space Shuttles" would be a possible key, as might >>> "Apples To Be Sold Today", or "Apples Are Good For You". There's no >>> way one can tell whether it's the first space, the second, the third >>> and so on. But if there is a series of keys like >>> >>> "Apples Are Good For You and other fruit, too" >>> "Apples Are Good For You but only on Tuesdays" >>> "Apples Are Good For You" >>> "Apples Are Good For You Class 101" >>> "Apples Are Good For You, even crab apples" >>> "Apples Are Good For You (Department for Food Hygiene)" >>> >>> We humans can spot straight away where the commonality is. But >>> software? >>> >>> I reckon maybe a concordance index is the way to go. >>> >>> MM >> >> What's a concordance index? > > One succinct explanation is: > > "An alphabetical index of all the words in a text or corpus of texts, > showing every contextual occurrence of a word: a concordance of > Shakespeare's works." > > Here's an example, keying on men: > > Mortimer, Leading the MEN of Herefordshire to > s of the moon; and let MEN say we be men of go > and let men say we be MEN of good government, > s that are the moon's MEN doth ebb and flow l > ave set a match. O, if MEN were to be saved by > dshill shall rob those MEN that we have alread > o much shall I falsify MEN's hopes; And like b > ; Redeeming time when MEN think least I will. > in time to come, That MEN of your nobility an > >> So far, all your example possibilities have a patter; the heading >> portion is all in proper case. >> >> Do you have control over the construction of the strings? > > No. The strings can come from anywhere. I just need to make some sense > of them by reducing them to their smallest common denominators. So > instead of ten thousand "Apples" with various other words tacked on > I'd have a main heading or keyword "Apples". > > MM Thanks! So it's basically what a search feature in a browser or PDF reader would do. I didn't realize it was a specific 'type' of indexing. -- Garry Free usenet access at http://www.eternal-september.org ClassicVB Users Regroup! comp.lang.basic.visual.misc
From: MM on 3 Jun 2010 06:58 On Wed, 02 Jun 2010 14:39:15 -0600, Tom Shelton <tom_shelton(a)comcast.invalid> wrote: >In other words, your looking at the longest common prefix... This is >sort of a specialized version of the longest common substring problem. >I did some googling, and I can't really find a vb6 implementation of >this... Having read parts of The Algorithm Design Manual via Look Inside on Amazon, I think it may in fact be a "Shortest Common Superstring" problem. This research is opening up a whole new avenue of interesting stuff! Thank goodness I'm retired and can dabble with it all day. MM
From: Tom Shelton on 3 Jun 2010 12:59 MM formulated the question : > On Wed, 02 Jun 2010 14:39:15 -0600, Tom Shelton > <tom_shelton(a)comcast.invalid> wrote: > >> In other words, your looking at the longest common prefix... This is >> sort of a specialized version of the longest common substring problem. >> I did some googling, and I can't really find a vb6 implementation of >> this... > > Having read parts of The Algorithm Design Manual via Look Inside on > Amazon, I think it may in fact be a "Shortest Common Superstring" > problem. This research is opening up a whole new avenue of interesting > stuff! Thank goodness I'm retired and can dabble with it all day. > > MM Interesting... Now I've learned something new :) -- Tom Shelton
First
|
Prev
|
Pages: 1 2 3 4 5 Prev: How can I implement jagged arrays in VB6? UDT? Next: How to catch app shutdown from windows |