|
Prev: FAQ 8.12 How do I start a process in the background?
Next: Creating a 'load simulator' by calling Perl Programs - or Forking?
From: David Filmer on 3 Apr 2008 15:29 sftriman wrote: > get a list of suggested words for a misspelled word. Or better, "the" > most likely intended word for a misspelled word. Ever notice that Google does a pretty good job of that? So consider Net::Google::Spelling: http://search.cpan.org/~bstilwell/Net-Google-1.0.1/lib/Net/Google/Spelling.pm -- David Filmer (http://DavidFilmer.com)
From: Joost Diepenmaat on 3 Apr 2008 15:34 sftriman <ironmanda(a)yahoo.com> writes: > I am looking for a way to, without custom defining a dictionary, to > get a list of suggested words for a misspelled word. Or better, "the" > most likely intended word for a misspelled word. You may find this article interesting: http://norvig.com/spell-correct.html You still need a list of "good" words, of course. -- Joost Diepenmaat | blog: http://joost.zeekat.nl/ | work: http://zeekat.nl/
From: Ben Bullock on 4 Apr 2008 02:27 On Apr 3, 6:47 pm, sftriman <ironma...(a)yahoo.com> wrote: > I am looking for a way to, without custom defining a dictionary, to > get a list of suggested words for a misspelled word. Or better, "the" > most likely intended word for a misspelled word. > from which I could easily pass on the dmr suggestions, but, scoring > and evaluating the suggestions for wjite is harder. "white" and > "write" are 'ranked' (I guess) 3rd, 4th, and 7th. One thing which might help you rank the strings is the "Levenshtein distance". This gives you the "difference" between two strings as a number. I don't know if it is on CPAN but there is a module found here: http://world.std.com/~swmcd/steven/perl/lib/String/Levenshtein/index.html The documentation is here: http://world.std.com/~swmcd/steven/perl/lib/String/Levenshtein/Levenshtein.html Presumably the string with the smallest Levenshtein distance from the input string would be the most likely candidate for the spelling checker, although some very rare words might have small distances.
From: Ted Zlatanov on 4 Apr 2008 11:58
On Thu, 3 Apr 2008 23:27:56 -0700 (PDT) Ben Bullock <benkasminbullock(a)gmail.com> wrote: BB> On Apr 3, 6:47 pm, sftriman <ironma...(a)yahoo.com> wrote: >> I am looking for a way to, without custom defining a dictionary, to >> get a list of suggested words for a misspelled word. Or better, "the" >> most likely intended word for a misspelled word. >> from which I could easily pass on the dmr suggestions, but, scoring >> and evaluating the suggestions for wjite is harder. "white" and >> "write" are 'ranked' (I guess) 3rd, 4th, and 7th. BB> One thing which might help you rank the strings is the "Levenshtein BB> distance". This gives you the "difference" between two strings as a BB> number. I don't know if it is on CPAN but there is a module found BB> here: BB> http://world.std.com/~swmcd/steven/perl/lib/String/Levenshtein/index.html BB> The documentation is here: BB> http://world.std.com/~swmcd/steven/perl/lib/String/Levenshtein/Levenshtein.html BB> Presumably the string with the smallest Levenshtein distance from the BB> input string would be the most likely candidate for the spelling BB> checker, although some very rare words might have small distances. It's useful to rank the distance in terms of how close keys are on the keyboard. For example, h and j are more likely to be swapped than h and r, for the white/write/wjite example above. On CPAN, I found: String::Similarity String::KeyboardDistance (see above) String::Approx (very comprehensive, probably the right choice for the OP) Text::DoubleMetaphone Ted |