From: Ashley Sheridan on
On Thu, 2009-09-03 at 12:12 -0700, sono-io(a)fannullone.us wrote:
> Thanks to everyone who has responded. After reading everyone's
> response, I think I have a very simple way to solve my "problem".
>
> Using my original example, if someone wants to find item #
> 4D-2448-7PS, no matter what they type in, I'll take the input, strip
> out all non-alphanumeric characters to make it 4D24487PS, add the
> wildcard character between each of the remaining characters like so,
> 4*D*2*4*4*8*7*P*S, and then do the search.
>
> Still being new at this, it seems to be the simplest approach, or is
> my thinking flawed? This also keeps me from having to add another
> field in the db to search on.
>
> BTW, this solution needs to work with any db, even ASCII files, so it
> has to happen in PHP.
>
> Thanks again,
> Frank
>
For speed you might want to consider an extra field in the DB in the
future. If the database gets larger, or your query needs to join several
tables together, then things will take a noticeable speed hit. I had a
similar issue myself where I had to search for names based on
mis-spellings of them. In the end I searched with metaphone tags on an
extra field in the DB set up for that purpose, but it was the only way
to do it that didn't affect the speed of the site.

Thanks,
Ash
http://www.ashleysheridan.co.uk



From: Eddie Drapkin on
On Thu, Sep 3, 2009 at 3:17 PM, Ashley Sheridan<ash(a)ashleysheridan.co.uk> wrote:
> On Thu, 2009-09-03 at 12:12 -0700, sono-io(a)fannullone.us wrote:
>>       Thanks to everyone who has responded.  After reading everyone's
>> response, I think I have a very simple way to solve my "problem".
>>
>>       Using my original example, if someone wants to find item #
>> 4D-2448-7PS, no matter what they type in, I'll take the input, strip
>> out all non-alphanumeric characters to make it 4D24487PS, add the
>> wildcard character between each of the remaining characters like so,
>> 4*D*2*4*4*8*7*P*S, and then do the search.
>>
>>       Still being new at this, it seems to be the simplest approach, or is
>> my thinking flawed?  This also keeps me from having to add another
>> field in the db to search on.
>>
>>       BTW, this solution needs to work with any db, even ASCII files, so it
>> has to happen in PHP.
>>
>> Thanks again,
>> Frank
>>
> For speed you might want to consider an extra field in the DB in the
> future. If the database gets larger, or your query needs to join several
> tables together, then things will take a noticeable speed hit. I had a
> similar issue myself where I had to search for names based on
> mis-spellings of them. In the end I searched with metaphone tags on an
> extra field in the DB set up for that purpose, but it was the only way
> to do it that didn't affect the speed of the site.
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>
>
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

Has anyone considered deploying an actual search engine (Solr, Sphinx,
etc.), as they will take care of the stripping, stemming, spelling
corrections, etc?
From: Tommy Pham on
----- Original Message ----
> From: "sono-io(a)fannullone.us" <sono-io(a)fannullone.us>
> To: PHP General List <php-general(a)lists.php.net>
> Sent: Thursday, September 3, 2009 12:12:40 PM
> Subject: Re: [PHP] Searching on AlphaNumeric Content Only
>
> Thanks to everyone who has responded. After reading everyone's response, I
> think I have a very simple way to solve my "problem".
>
> Using my original example, if someone wants to find item # 4D-2448-7PS, no
> matter what they type in, I'll take the input, strip out all non-alphanumeric
> characters to make it 4D24487PS, add the wildcard character between each of the
> remaining characters like so, 4*D*2*4*4*8*7*P*S, and then do the search.

The correct wildcard syntax to work in any DB (Oracle, MySQL, MSSQL, etc), is % and not * if I remember correctly. Searching like this is ok but won't be efficient when you have a lot of rows. As for external file processing txt, csv, etc... I recommend you create a separate mechanism for it since each storage medium is meant for different purposes. txt (both delimited and fix formatted) and csv are usually meant for importing/exporting between various RDBMS types and different companies. They're not mean for fast searching of data. I suggest you think about the amount of the data you have to deal with 1st and how often will the search be done on that data. It's probably easier and faster just to import the ascii into db and do you search on db if you have to work with any ascii.

As for adding another field to the db, perhaps your project just started? If so, wouldn't it be better to do it with the future in mind so later you won't have to go back and redesign the db and modify the codes because now you have over 100k rows to search and the search occurs just about every other hits? That time you now have could be used for code optimizing for better performance, add more features/functionalities to the site, etc... :) Trust me, searching the db table with over 200k rows and return the results with multi-table joins based 1 criteria isn't fun. Keep in mind that you shouldn't keep the users waiting more than 5 seconds. Only exception to that rule is data mining where you'll have millions of rows to work with ;) Then it's no longer your problem. It's the DBA :D

Regards,
Tommy

>
> Still being new at this, it seems to be the simplest approach, or is my
> thinking flawed? This also keeps me from having to add another field in the db
> to search on.
>
> BTW, this solution needs to work with any db, even ASCII files, so it has to
> happen in PHP.
>
> Thanks again,
> Frank
>
> --PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php

From: Paul M Foster on
On Thu, Sep 03, 2009 at 12:12:40PM -0700, sono-io(a)fannullone.us wrote:

> Thanks to everyone who has responded. After reading everyone's
> response, I think I have a very simple way to solve my "problem".
>
> Using my original example, if someone wants to find item #
> 4D-2448-7PS, no matter what they type in, I'll take the input, strip
> out all non-alphanumeric characters to make it 4D24487PS, add the
> wildcard character between each of the remaining characters like so,
> 4*D*2*4*4*8*7*P*S, and then do the search.

Your expression, if used to directly search in your SQL table, won't
work. The '*' character isn't a valid wildcard for SQL. In PostgreSQL,
the wildcard for any number of characters is '%', and for a single
character is '_'. I don't know that MySQL understands this same
convention. And who knows about Oracle.

As others have mentioned, it would be ideal (though not very
"normalized") to create a new table column which contains the
alphanumerics without the punctuation characters ('-'). In nearly any
SQL dialect, you could do a simple SELECT using LIKE to find your item,
if you're searching on this extra field.

If you want do the searching in PHP, then it becomes more complicated.
You'll have to strip out the dashes from the user input, and then query
all the keys from your table, and test them using a regular expression.
As mentioned before, this is time-consuming for a large table.

Here's something else to consider: Could there ever be two items which
only differ by the placement of their dashes? Like 4D-2448-7PS versus
4D2-44-87PS? If not, then you should store the item number without
punctuation, and use that as the primary key on your table. Have an
"extra" field which shows the item number with dashes. You can use this
extra field in printing inventory labels or whatever (I don't recall the
context of your original post).

Paul

--
Paul M. Foster
From: Andrea Giammarchi on

stripping, stemming, spelling corrections ?
... uhm, that's probably why they invented regular expressions, isn't it?

As I said, at the end of the day, this will be a manual slow, potentially wrong implementation of what we already have and use on daily basis.

But obviously, everybody is free to create his own problems, no doubts about that.

Regards

> Has anyone considered deploying an actual search engine (Solr, Sphinx,
> etc.), as they will take care of the stripping, stemming, spelling
> corrections, etc?


_________________________________________________________________
With Windows Live, you can organize, edit, and share your photos.
http://www.microsoft.com/middleeast/windows/windowslive/products/photo-gallery-edit.aspx