From: bugbear on
Jerry Stuckle wrote:
> But what you're looking for is to get a computer to be a natural
> language processor, which is still beyond our current programming
> capabilities. IBM has recently come up with a test system ("Watson")
> which does a fair job, but still has a long ways to go. Once we get
> there, we'll have a Star Trek capability :)
>
> With that said, it doesn't mean all is hopeless. Levenstein can help,
> as can trigram matching and other things mentioned (except SoundEx). But
> it will also require a lot of work on your part to "train" the system as
> to whether two questions are similar or not.

Surely something like concept extraction/matching
(like the old Excite ICE model)
would be helpful.

BugBear
From: Jerry Stuckle on
bugbear wrote:
> Jerry Stuckle wrote:
>> But what you're looking for is to get a computer to be a natural
>> language processor, which is still beyond our current programming
>> capabilities. IBM has recently come up with a test system ("Watson")
>> which does a fair job, but still has a long ways to go. Once we get
>> there, we'll have a Star Trek capability :)
>>
>> With that said, it doesn't mean all is hopeless. Levenstein can help,
>> as can trigram matching and other things mentioned (except SoundEx).
>> But it will also require a lot of work on your part to "train" the
>> system as to whether two questions are similar or not.
>
> Surely something like concept extraction/matching
> (like the old Excite ICE model)
> would be helpful.
>
> BugBear

It's possible, but I'm not sure it's public domain, is it? And trying
to generate your own concept extraction/matching module would be a huge
undertaking.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(a)attglobal.net
==================
From: bugbear on
Jerry Stuckle wrote:
> bugbear wrote:
>> Jerry Stuckle wrote:
>>> But what you're looking for is to get a computer to be a natural
>>> language processor, which is still beyond our current programming
>>> capabilities. IBM has recently come up with a test system ("Watson")
>>> which does a fair job, but still has a long ways to go. Once we get
>>> there, we'll have a Star Trek capability :)
>>>
>>> With that said, it doesn't mean all is hopeless. Levenstein can
>>> help, as can trigram matching and other things mentioned (except
>>> SoundEx). But it will also require a lot of work on your part to
>>> "train" the system as to whether two questions are similar or not.
>>
>> Surely something like concept extraction/matching
>> (like the old Excite ICE model)
>> would be helpful.
>>
>> BugBear
>
> It's possible, but I'm not sure it's public domain, is it? And trying
> to generate your own concept extraction/matching module would be a huge
> undertaking.

There have been manu academic version (indeed, they came first):

google for "latent semantic analysis"
and/or
"singular value decomposition"

I think the excite engine's novelty was an efficient
and fairly accurate "incremental mode", where the entire
SVD didn't have to be fully redone when a document was added to the corpus.

BugBear
From: Jerry Stuckle on
bugbear wrote:
> Jerry Stuckle wrote:
>> bugbear wrote:
>>> Jerry Stuckle wrote:
>>>> But what you're looking for is to get a computer to be a natural
>>>> language processor, which is still beyond our current programming
>>>> capabilities. IBM has recently come up with a test system
>>>> ("Watson") which does a fair job, but still has a long ways to go.
>>>> Once we get there, we'll have a Star Trek capability :)
>>>>
>>>> With that said, it doesn't mean all is hopeless. Levenstein can
>>>> help, as can trigram matching and other things mentioned (except
>>>> SoundEx). But it will also require a lot of work on your part to
>>>> "train" the system as to whether two questions are similar or not.
>>>
>>> Surely something like concept extraction/matching
>>> (like the old Excite ICE model)
>>> would be helpful.
>>>
>>> BugBear
>>
>> It's possible, but I'm not sure it's public domain, is it? And trying
>> to generate your own concept extraction/matching module would be a
>> huge undertaking.
>
> There have been manu academic version (indeed, they came first):
>
> google for "latent semantic analysis"
> and/or
> "singular value decomposition"
>
> I think the excite engine's novelty was an efficient
> and fairly accurate "incremental mode", where the entire
> SVD didn't have to be fully redone when a document was added to the corpus.
>
> BugBear

Have you ever used these?

Academic versions are not the same as commercial, and generally have
restrictions on their use. Also, early versions are comparatively
limited in their functionality. And significant training is still required.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(a)attglobal.net
==================