From: David Filmer on
sftriman wrote:
> get a list of suggested words for a misspelled word. Or better, "the"
> most likely intended word for a misspelled word.

Ever notice that Google does a pretty good job of that? So consider
Net::Google::Spelling:
http://search.cpan.org/~bstilwell/Net-Google-1.0.1/lib/Net/Google/Spelling.pm

--
David Filmer (http://DavidFilmer.com)
From: Joost Diepenmaat on
sftriman <ironmanda(a)yahoo.com> writes:

> I am looking for a way to, without custom defining a dictionary, to
> get a list of suggested words for a misspelled word. Or better, "the"
> most likely intended word for a misspelled word.

You may find this article interesting:
http://norvig.com/spell-correct.html

You still need a list of "good" words, of course.

--
Joost Diepenmaat | blog: http://joost.zeekat.nl/ | work: http://zeekat.nl/
From: Ben Bullock on
On Apr 3, 6:47 pm, sftriman <ironma...(a)yahoo.com> wrote:
> I am looking for a way to, without custom defining a dictionary, to
> get a list of suggested words for a misspelled word. Or better, "the"
> most likely intended word for a misspelled word.

> from which I could easily pass on the dmr suggestions, but, scoring
> and evaluating the suggestions for wjite is harder. "white" and
> "write" are 'ranked' (I guess) 3rd, 4th, and 7th.

One thing which might help you rank the strings is the "Levenshtein
distance". This gives you the "difference" between two strings as a
number. I don't know if it is on CPAN but there is a module found
here:

http://world.std.com/~swmcd/steven/perl/lib/String/Levenshtein/index.html

The documentation is here:

http://world.std.com/~swmcd/steven/perl/lib/String/Levenshtein/Levenshtein.html

Presumably the string with the smallest Levenshtein distance from the
input string would be the most likely candidate for the spelling
checker, although some very rare words might have small distances.
From: Ted Zlatanov on
On Thu, 3 Apr 2008 23:27:56 -0700 (PDT) Ben Bullock <benkasminbullock(a)gmail.com> wrote:

BB> On Apr 3, 6:47 pm, sftriman <ironma...(a)yahoo.com> wrote:
>> I am looking for a way to, without custom defining a dictionary, to
>> get a list of suggested words for a misspelled word. Or better, "the"
>> most likely intended word for a misspelled word.

>> from which I could easily pass on the dmr suggestions, but, scoring
>> and evaluating the suggestions for wjite is harder. "white" and
>> "write" are 'ranked' (I guess) 3rd, 4th, and 7th.

BB> One thing which might help you rank the strings is the "Levenshtein
BB> distance". This gives you the "difference" between two strings as a
BB> number. I don't know if it is on CPAN but there is a module found
BB> here:

BB> http://world.std.com/~swmcd/steven/perl/lib/String/Levenshtein/index.html

BB> The documentation is here:

BB> http://world.std.com/~swmcd/steven/perl/lib/String/Levenshtein/Levenshtein.html

BB> Presumably the string with the smallest Levenshtein distance from the
BB> input string would be the most likely candidate for the spelling
BB> checker, although some very rare words might have small distances.

It's useful to rank the distance in terms of how close keys are on the
keyboard. For example, h and j are more likely to be swapped than h and
r, for the white/write/wjite example above. On CPAN, I found:

String::Similarity
String::KeyboardDistance (see above)
String::Approx (very comprehensive, probably the right choice for the OP)
Text::DoubleMetaphone

Ted