From: Rene Veerman on
I've browsed wikipedia, sf.net and google for code & papers on what is
commonly known as NLP.

I haven't found thesaurus software for native php/mysql, wordnet which
is apparently the leader, provides os-native apps, and "db files"
without db structure and not in any sql format (looks like cvs without
the commas but i'm not sure yet).
When i asked princeton staff about sql releases they simply replied
"we dont do sql here". Which i find a bit strange..
Easiest thing for me to do is write a conversion script that puts
their "db files" in mysql, and work from there.

My search on sf.net turned up empty too, all of the projects with
relevant descriptions have just the name regged, no code releases.

From reading http://www.go4expert.com/forums/showthread.php?t=35,
"Introduction to Natural Language Processing(NLP)", i gather that NLP
as it is results in much ambiguity on several levels of it's
operation.

It's an interesting problem though, and probably a profitable one, so
i'm going to spend some time trying to come up with something better
from scratch.

On Sun, Mar 14, 2010 at 12:04 AM, Rene Veerman <rene7705(a)gmail.com> wrote:
> Hi..
>
> I'm building a newsscraper -> portal.
> Fetching, parsing and storing many links to news items per hour was
> not much of a problem.
> Translations between languages can be done via google, so that wont be
> much of a problem either i suspect.
>
> I dont want to reveal too much of my business idea, but i do need to
> do text-analysis, to group related items, and make "suggestions"
> lists.
> I've had a dabble with creating my own ontology structure (kinda like
> a dictionary + thesaurus datamodel) by scraping existing ontology
> websites, but needless to say natural text analysis is a huge field.
> One that i'm a total noob in.
>
> So in the first place, I'm looking for any free/paid useful existing
> data-mining / text-analysis code that can be run easily from php.
> TBH i dont even know my feature-requirements really, i'm interested to
> know what's available.
>
> In the second place, i'm looking for free and published-for-a-cost
> data-mining / text-analysis papers/books that explain how to produce
> useful results.
>
> Thanks for your input.
>
From: Rene Veerman on
Thanks for the links..

But i think i'll keep at it on my own. I may be interested to set up a
competitor to the companies of which you gave links.
I've built a nice datamodel today, which i think will return even
better results than zemanta.

But what do you mean by "linked data", Nathan?

On Wed, Mar 17, 2010 at 4:10 PM, Nathan Rixham <nrixham(a)gmail.com> wrote:
> wouldn't be diving right in to full on nlp for this ;) it's pretty easy
> to do term/semantic extraction nowadays.
>
> have you seen opencalais, alchemy, zemanta, yahoo term extraction or the
> like?
>
> honestly I've been doing this for years and would recommend hooking up
> to the opencalais and zemanta api's - should you muddle your way towards
> linked data in any way from there give me a shout and I'll give you some
> pointers. There are already clients for PHP, as well as the normal cms
> things like drupal, wordpress etc :)
>
> regards!
>
> ps: if you really want to get in to this kind of thing then
> http://gate.ac.uk/ is a good starting (and ending) point
>