|
From: Shannon Jacobs on 10 Feb 2007 20:21 Dealing with an array of fixed length strings. Goal is to select based on certain columns. After rather lengthy study of the camel book and searching on the web for various examples, I thought this should work: X @foo2 = grep(/^.{50}(1121|1217|1256|2033).{6}$/,@foo1); It did not. I consulted with a heavy Perler, and after a few minutes of wrestling with the problem, he suggested something like this (as I tinkered it into working): @foo2 = grep(/^.{50,62}($1121|1217|1256|2033).{6,18}$/,@foo1); My idea in the broken example was to ignore the first 50 and last 6 characters in each line, which was supposed to leave only the 12 characters in the middle to search against. My fuzzy understanding of the working version is that I first had to match the entire thing, and then let Perl fish for candidate matches by truncating down towards 50? The examples above are slightly simplified for purposes of explanation. Here is the actual code, just in case I did something wrong in the tweaking: @foo2 = grep(/^.{50,62}($form_values{'a_SEARCH_VALUE'}). {6,18}$/,@foo1);
From: J�rgen Exner on 10 Feb 2007 21:00 Shannon Jacobs wrote: > Dealing with an array of fixed length strings. Goal is to select based > on certain columns. After rather lengthy study of the camel book and > searching on the web for various examples, I thought this should work: > > X @foo2 = grep(/^.{50}(1121|1217|1256|2033).{6}$/,@foo1); > > It did not. I consulted with a heavy Perler, and after a few minutes > of wrestling with the problem, he suggested something like this (as I > tinkered it into working): > > @foo2 = grep(/^.{50,62}($1121|1217|1256|2033).{6,18}$/,@foo1); Ouch! That hurts! When dealing with fixed length formats then REs are certainly not the tool of choice. One much better alternative: substr() The other commonly used alternative: pack()/unpack() jue
From: Shannon Jacobs on 10 Feb 2007 21:13 On Feb 11, 11:00 am, "Jürgen Exner" <jurge...(a)hotmail.com> wrote: > Shannon Jacobs wrote: > > Dealing with an array of fixed length strings. Goal is to select based > > on certain columns. After rather lengthy study of the camel book and > > searching on the web for various examples, I thought this should work: > > > X @foo2 = grep(/^.{50}(1121|1217|1256|2033).{6}$/,@foo1); > > > It did not. I consulted with a heavy Perler, and after a few minutes > > of wrestling with the problem, he suggested something like this (as I > > tinkered it into working): > > > @foo2 = grep(/^.{50,62}($1121|1217|1256|2033).{6,18}$/,@foo1); > > Ouch! That hurts! > When dealing with fixed length formats then REs are certainly not the tool > of choice. > One much better alternative: substr() > The other commonly used alternative: pack()/unpack() It was not my intention to cause you any pain, but that's not the question I asked, though I suppose it's good to rethink problems in terms of the objectives. Actually, in another part of the program I do use substrings to massage things where a more linear approach seemed more suitable. I vaguely remember considering unpack() long ago (the code has evolved over a period of about 10 years), but decided against it for some reason. I didn't need pack() since this is actually a backend query program, and there are limitations in the programs that are exporting the data. (And yes, the Perler with whom I discussed the problem did suggest alternatives including substrings.) I'd still like to understand why this regular expression works as it does. Or perhaps you should clarify your intended sense of "painful"? As it is, I'm content with how well the code works. It seems like an adequate amount of search bang for the small regex buck.
From: Uri Guttman on 10 Feb 2007 22:20 >>>>> "SJ" == Shannon Jacobs <Shannon.Jacobs.nospam(a)gmail.com> writes: SJ> Dealing with an array of fixed length strings. Goal is to select based SJ> on certain columns. After rather lengthy study of the camel book and SJ> searching on the web for various examples, I thought this should work: SJ> X @foo2 = grep(/^.{50}(1121|1217|1256|2033).{6}$/,@foo1); SJ> It did not. I consulted with a heavy Perler, and after a few minutes SJ> of wrestling with the problem, he suggested something like this (as I SJ> tinkered it into working): SJ> @foo2 = grep(/^.{50,62}($1121|1217|1256|2033).{6,18}$/,@foo1); you should show some sample data as well so we can see what you are matching. as jurgen said that is painful to read. even good perl hackers will have trouble deciphering it quickly and that means it is not good perl IMO. also this line has $1121 and the previous one didn't have the $ so i am not sure which is correct. SJ> My idea in the broken example was to ignore the first 50 and last 6 SJ> characters in each line, which was supposed to leave only the 12 SJ> characters in the middle to search against. My fuzzy understanding of SJ> the working version is that I first had to match the entire thing, and SJ> then let Perl fish for candidate matches by truncating down towards SJ> 50? no need to ignore the last 6 chars as that won't affect the match unless some lines were of different lengths. SJ> The examples above are slightly simplified for purposes of SJ> explanation. Here is the actual code, just in case I did something SJ> wrong in the tweaking: SJ> @foo2 = grep(/^.{50,62}($form_values{'a_SEARCH_VALUE'}). SJ> {6,18}$/,@foo1); that doesn't seem to be a fixed offset value. the initial skip is from 50-62 chars. if the search value can't appear in that, why not just grep for that? is the search value something with alternation as the above lines suggest? then a faster thing might be to grab the part you want and look it up in a hash of wanted values. alternation can be very slow especially with many choices (due to backtracking). in fact as you have been told, substr and a hash lookup might be the perfect thing for this (but i am not sure since the leading skip can vary in size). again, showing some real data would help as we could see what variants there are, what the searched for parts look like (and if they are not found earlier in the string), etc. uri -- Uri Guttman ------ uri(a)stemsystems.com -------- http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
From: John W. Krahn on 10 Feb 2007 23:08
Shannon Jacobs wrote: > Dealing with an array of fixed length strings. Goal is to select based > on certain columns. After rather lengthy study of the camel book and > searching on the web for various examples, I thought this should work: > > X @foo2 = grep(/^.{50}(1121|1217|1256|2033).{6}$/,@foo1); > > It did not. I consulted with a heavy Perler, and after a few minutes > of wrestling with the problem, he suggested something like this (as I > tinkered it into working): > > @foo2 = grep(/^.{50,62}($1121|1217|1256|2033).{6,18}$/,@foo1); > > My idea in the broken example was to ignore the first 50 and last 6 > characters in each line, @foo2 = grep substr( $_, 50, -6 ) =~ /1121|1217|1256|2033/, @foo1; > which was supposed to leave only the 12 > characters in the middle to search against. @foo2 = grep substr( $_, 50, 12 ) =~ /1121|1217|1256|2033/, @foo1; John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order. -- Larry Wall |