From: Shannon Jacobs on
Dealing with an array of fixed length strings. Goal is to select based
on certain columns. After rather lengthy study of the camel book and
searching on the web for various examples, I thought this should work:

X @foo2 = grep(/^.{50}(1121|1217|1256|2033).{6}$/,@foo1);

It did not. I consulted with a heavy Perler, and after a few minutes
of wrestling with the problem, he suggested something like this (as I
tinkered it into working):

@foo2 = grep(/^.{50,62}($1121|1217|1256|2033).{6,18}$/,@foo1);

My idea in the broken example was to ignore the first 50 and last 6
characters in each line, which was supposed to leave only the 12
characters in the middle to search against. My fuzzy understanding of
the working version is that I first had to match the entire thing, and
then let Perl fish for candidate matches by truncating down towards
50?

The examples above are slightly simplified for purposes of
explanation. Here is the actual code, just in case I did something
wrong in the tweaking:

@foo2 = grep(/^.{50,62}($form_values{'a_SEARCH_VALUE'}).
{6,18}$/,@foo1);

From: J�rgen Exner on
Shannon Jacobs wrote:
> Dealing with an array of fixed length strings. Goal is to select based
> on certain columns. After rather lengthy study of the camel book and
> searching on the web for various examples, I thought this should work:
>
> X @foo2 = grep(/^.{50}(1121|1217|1256|2033).{6}$/,@foo1);
>
> It did not. I consulted with a heavy Perler, and after a few minutes
> of wrestling with the problem, he suggested something like this (as I
> tinkered it into working):
>
> @foo2 = grep(/^.{50,62}($1121|1217|1256|2033).{6,18}$/,@foo1);

Ouch! That hurts!
When dealing with fixed length formats then REs are certainly not the tool
of choice.
One much better alternative: substr()
The other commonly used alternative: pack()/unpack()

jue


From: Shannon Jacobs on
On Feb 11, 11:00 am, "Jürgen Exner" <jurge...(a)hotmail.com> wrote:
> Shannon Jacobs wrote:
> > Dealing with an array of fixed length strings. Goal is to select based
> > on certain columns. After rather lengthy study of the camel book and
> > searching on the web for various examples, I thought this should work:
>
> > X @foo2 = grep(/^.{50}(1121|1217|1256|2033).{6}$/,@foo1);
>
> > It did not. I consulted with a heavy Perler, and after a few minutes
> > of wrestling with the problem, he suggested something like this (as I
> > tinkered it into working):
>
> > @foo2 = grep(/^.{50,62}($1121|1217|1256|2033).{6,18}$/,@foo1);
>
> Ouch! That hurts!
> When dealing with fixed length formats then REs are certainly not the tool
> of choice.
> One much better alternative: substr()
> The other commonly used alternative: pack()/unpack()

It was not my intention to cause you any pain, but that's not the
question I asked, though I suppose it's good to rethink problems in
terms of the objectives. Actually, in another part of the program I do
use substrings to massage things where a more linear approach seemed
more suitable. I vaguely remember considering unpack() long ago (the
code has evolved over a period of about 10 years), but decided against
it for some reason. I didn't need pack() since this is actually a
backend query program, and there are limitations in the programs that
are exporting the data. (And yes, the Perler with whom I discussed the
problem did suggest alternatives including substrings.)

I'd still like to understand why this regular expression works as it
does. Or perhaps you should clarify your intended sense of "painful"?
As it is, I'm content with how well the code works. It seems like an
adequate amount of search bang for the small regex buck.

From: Uri Guttman on
>>>>> "SJ" == Shannon Jacobs <Shannon.Jacobs.nospam(a)gmail.com> writes:

SJ> Dealing with an array of fixed length strings. Goal is to select based
SJ> on certain columns. After rather lengthy study of the camel book and
SJ> searching on the web for various examples, I thought this should work:

SJ> X @foo2 = grep(/^.{50}(1121|1217|1256|2033).{6}$/,@foo1);

SJ> It did not. I consulted with a heavy Perler, and after a few minutes
SJ> of wrestling with the problem, he suggested something like this (as I
SJ> tinkered it into working):

SJ> @foo2 = grep(/^.{50,62}($1121|1217|1256|2033).{6,18}$/,@foo1);

you should show some sample data as well so we can see what you are
matching. as jurgen said that is painful to read. even good perl hackers
will have trouble deciphering it quickly and that means it is not good
perl IMO.

also this line has $1121 and the previous one didn't have the $ so i am
not sure which is correct.


SJ> My idea in the broken example was to ignore the first 50 and last 6
SJ> characters in each line, which was supposed to leave only the 12
SJ> characters in the middle to search against. My fuzzy understanding of
SJ> the working version is that I first had to match the entire thing, and
SJ> then let Perl fish for candidate matches by truncating down towards
SJ> 50?

no need to ignore the last 6 chars as that won't affect the match unless
some lines were of different lengths.


SJ> The examples above are slightly simplified for purposes of
SJ> explanation. Here is the actual code, just in case I did something
SJ> wrong in the tweaking:

SJ> @foo2 = grep(/^.{50,62}($form_values{'a_SEARCH_VALUE'}).
SJ> {6,18}$/,@foo1);

that doesn't seem to be a fixed offset value. the initial skip is from
50-62 chars. if the search value can't appear in that, why not just
grep for that? is the search value something with alternation as the
above lines suggest? then a faster thing might be to grab the part you
want and look it up in a hash of wanted values. alternation can be very
slow especially with many choices (due to backtracking).

in fact as you have been told, substr and a hash lookup might be the
perfect thing for this (but i am not sure since the leading skip can
vary in size). again, showing some real data would help as we could see
what variants there are, what the searched for parts look like (and if
they are not found earlier in the string), etc.

uri

--
Uri Guttman ------ uri(a)stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
From: John W. Krahn on
Shannon Jacobs wrote:
> Dealing with an array of fixed length strings. Goal is to select based
> on certain columns. After rather lengthy study of the camel book and
> searching on the web for various examples, I thought this should work:
>
> X @foo2 = grep(/^.{50}(1121|1217|1256|2033).{6}$/,@foo1);
>
> It did not. I consulted with a heavy Perler, and after a few minutes
> of wrestling with the problem, he suggested something like this (as I
> tinkered it into working):
>
> @foo2 = grep(/^.{50,62}($1121|1217|1256|2033).{6,18}$/,@foo1);
>
> My idea in the broken example was to ignore the first 50 and last 6
> characters in each line,

@foo2 = grep substr( $_, 50, -6 ) =~ /1121|1217|1256|2033/, @foo1;

> which was supposed to leave only the 12
> characters in the middle to search against.

@foo2 = grep substr( $_, 50, 12 ) =~ /1121|1217|1256|2033/, @foo1;




John
--
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order. -- Larry Wall
 |  Next  |  Last
Pages: 1 2 3 4 5
Prev: perl subroutine
Next: Win32: Need the intact ARGV string