|
Prev: Algorithm for inserting numbers in a list?
Next: Crew scheduling/rostering problem, Some questions
From: Bruno Barberi Gnecco on 8 Apr 2008 17:05 I need to implement an automatic text tagging system. Any suggestions of algorithms? I've used Bayesian classification with great success when the categories are fixed and in small number, but in the case of tags I believe it won't work very well (too few items per tag to train well). I'm also looking for something more sophisticated than simply finding tags in text. Any pointers to papers, books or code is appreciated. Thanks a lot.
From: amado.alves on 9 Apr 2008 10:20 On Apr 8, 10:05 pm, Bruno Barberi Gnecco <brunobgDELETET...(a)users.sourceforge.net> wrote: > I need to implement an automatic text tagging system. Any suggestions > of algorithms? I've used Bayesian classification with great success when the > categories are fixed and in small number, but in the case of tags I believe > it won't work very well (too few items per tag to train well). I'm also looking > for something more sophisticated than simply finding tags in text. > > Any pointers to papers, books or code is appreciated. Thanks a lot. You mean Part-Of-Speech tags (Noun, Verb, etc.)? But these *are* a "fixed and small number of categories", are not they? For a small training set a very successful technique is to take into account the context, namely the few words to the left and to the right of the word under tagging. Work with probabilities. In an enlarged context many times there are choices with probability 1 (e.g. words "the", "at"). These "ground" choices help chose the others.
From: Bruno Barberi Gnecco on 11 Apr 2008 09:46 > On Apr 8, 10:05 pm, Bruno Barberi Gnecco > <brunobgDELETET...(a)users.sourceforge.net> wrote: > >> I need to implement an automatic text tagging system. Any suggestions >>of algorithms? I've used Bayesian classification with great success when the >>categories are fixed and in small number, but in the case of tags I believe >>it won't work very well (too few items per tag to train well). I'm also looking >>for something more sophisticated than simply finding tags in text. >> >> Any pointers to papers, books or code is appreciated. Thanks a lot. > > > You mean Part-Of-Speech tags (Noun, Verb, etc.)? > > But these *are* a "fixed and small number of categories", are not > they? > > For a small training set a very successful technique is to take into > account the context, namely the few words to the left and to the right > of the word under tagging. Work with probabilities. In an enlarged > context many times there are choices with probability 1 (e.g. words > "the", "at"). These "ground" choices help chose the others. > No, I mean tags as they're used in many websites nowadays, describing what the text is about. For example, this message could be tagged "text mining, tag, probabilities".
From: Chris on 12 Apr 2008 16:27 Bruno Barberi Gnecco wrote: >> On Apr 8, 10:05 pm, Bruno Barberi Gnecco >> <brunobgDELETET...(a)users.sourceforge.net> wrote: >> >>> I need to implement an automatic text tagging system. Any >>> suggestions >>> of algorithms? I've used Bayesian classification with great success >>> when the >>> categories are fixed and in small number, but in the case of tags I >>> believe >>> it won't work very well (too few items per tag to train well). I'm >>> also looking >>> for something more sophisticated than simply finding tags in text. >>> >>> Any pointers to papers, books or code is appreciated. Thanks a >>> lot. >> >> >> You mean Part-Of-Speech tags (Noun, Verb, etc.)? >> >> But these *are* a "fixed and small number of categories", are not >> they? >> >> For a small training set a very successful technique is to take into >> account the context, namely the few words to the left and to the right >> of the word under tagging. Work with probabilities. In an enlarged >> context many times there are choices with probability 1 (e.g. words >> "the", "at"). These "ground" choices help chose the others. >> > > No, I mean tags as they're used in many websites nowadays, > describing what the text is about. For example, this message could be > tagged "text mining, tag, probabilities". It's not quite ready for prime time, but take a look at http://openpipeline.org. The code will be ready for release in a week or two. It's not a solution to the problem, but it is a nice framework for plugging in a solution. Try googling "entity extraction" for useful links.
From: amado.alves on 18 Apr 2008 11:30 >> I need to implement an automatic text tagging system. Any suggestions > >>of algorithms? I've used Bayesian classification with great success when the > >>categories are fixed and in small number, but in the case of tags I believe > >>it won't work very well (too few items per tag to train well). I'm also looking > >>for something more sophisticated than simply finding tags in text. > > >> Any pointers to papers, books or code is appreciated. Thanks a lot. > > > You mean Part-Of-Speech tags (Noun, Verb, etc.)? > No, I mean tags as they're used in many websites nowadays, > describing what the text is about. For example, this message could be > tagged "text mining, tag, probabilities". I see. Spreading Activation is my favorite technique. Take a look a the paper "Eigensearching the web..." Also, for "something more sophisticated than simply finding tags in text," consider a postcoordination approach: don't tag; search.
|
Pages: 1 Prev: Algorithm for inserting numbers in a list? Next: Crew scheduling/rostering problem, Some questions |