|
Prev: CPAN - 'cl' is not recognized as an internal or externalcommand,
Next: String buffer instead of file handle?
From: Kevin Crosbie on 13 Sep 2006 13:29 Hi, I'm not sure if this is appropriate for this group, it's related to perlre rather than perl, if so, apologies and I'd appreciate a pointer to the right group. I'm trying to find out if this is possible before I embark upon implementing a solution to my problem. I have a string with comma separated tags: "a, b, c, d, e, f" It's rather easy to write something to express a boolean OR: a OR b OR c = (^|,(\s)+)(a|b|c)[,$] What I would like to know is if there is a way to express AND or NOT: 1. (a OR b) AND c 2. (a OR b) AND NOT c 3. (a OR b) AND NOT (c OR d) I imagine there is no nice way to do this without doing something like writing out your AND clause before and after whatever OR clause you are using, which would become really messy for more complicated expressions, but perhaps someone knows of some way to do this. Regards, Kevin
From: Brian McCauley on 13 Sep 2006 14:07 Kevin Crosbie wrote: > I'm not sure if this is appropriate for this group, it's related to > perlre rather than perl, if so, apologies and I'd appreciate a pointer > to the right group. I do not know how much of Perl REs are implemented by perlre so the solutions I offer may not apply. > I'm trying to find out if this is possible before I embark upon > implementing a solution to my problem. > > I have a string with comma separated tags: > "a, b, c, d, e, f" > > It's rather easy to write something to express a boolean OR: > a OR b OR c = (^|,(\s)+)(a|b|c) > What I would like to know is if there is a way to express AND or NOT: > 1. (a OR b) AND c /^(?=.*(^|,(\s)+)(c|d)(,|$)).*(^|,(\s)+)(a|b)(,|$)/ Note: if you want to trade efficiency for readbility you can make all the capturing (...) into non-captureing (?:...) Note: I've assumed your data contains no characters that don't match the period regex. > 3. (a OR b) AND NOT (c OR d) /^(?!.*(^|,(\s)+)(c|d)).*(^|,(\s)+)(a|b)(,|$)/ > I imagine there is no nice way to do this without doing something like > writing out your AND clause before and after whatever OR clause you are > using, which would become really messy for more complicated expressions, > but perhaps someone knows of some way to do this. But REs are really the wrong tool for the job. If this were a real Perl question I'd say change your API to take a CODE ref rather than a regex.
From: A. Sinan Unur on 13 Sep 2006 14:13 Kevin Crosbie <caoimhinocrosbai_at(a)yahoo.com> wrote in news:45083f01$0 $19201$88260bb3(a)news.teranews.com: > I'm not sure if this is appropriate for this group, it's related to > perlre rather than perl, It all depends on what you mean perlre. If you referring to regular expressions as implemented in Perl, then it is the right group. However, if somehow you are referring to some library that implements a regular expression facility similar to Perl's, it is probably not. > if so, apologies and I'd appreciate a pointer > to the right group. Don't know (see above). > I'm trying to find out if this is possible before I embark upon > implementing a solution to my problem. > > I have a string with comma separated tags: > "a, b, c, d, e, f" > > It's rather easy to write something to express a boolean OR: > a OR b OR c = (^|,(\s)+)(a|b|c)[,$] Wait a second. That is not a valid expression! #!/usr/bin/perl use strict; use warnings; my $s = 'a,b,c'; if ($s =~ /(^|,(\s)+)(a|b|c)[,$]/) { print "matched\n"; } D:\UseNet\clpmisc> t66 Unmatched [ in regex; marked by <-- HERE in m/(^|,(\s)+)(a|b|c)[ <-- HERE ,5.008008/ at D:\UseNet\clpmisc\t66.pl line 8. > What I would like to know is if there is a way to express AND or NOT: > 1. (a OR b) AND c > 2. (a OR b) AND NOT c > 3. (a OR b) AND NOT (c OR d) It is more efficient to write if ( /a/ or /b/ or /c/ ) { ... } than to write if ( /a|b|c/ ) { ... } > I imagine there is no nice way to do this without doing something like > writing out your AND clause before and after whatever OR clause you > are using, which would become really messy for more complicated > expressions, I am sure there is way of writing a really complicated, hard-to-read, and inefficient regex one can write to do what you want, but it is not worth the effort. There is a reason Perl has logical operators. Sinan -- A. Sinan Unur <1usa(a)llenroc.ude.invalid> (remove .invalid and reverse each component for email address) comp.lang.perl.misc guidelines on the WWW: http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
From: John W. Krahn on 13 Sep 2006 14:30 Kevin Crosbie wrote: > > I'm not sure if this is appropriate for this group, it's related to > perlre rather than perl, if so, apologies and I'd appreciate a pointer > to the right group. > > I'm trying to find out if this is possible before I embark upon > implementing a solution to my problem. > > I have a string with comma separated tags: > "a, b, c, d, e, f" > > It's rather easy to write something to express a boolean OR: > a OR b OR c = (^|,(\s)+)(a|b|c)[,$] > > What I would like to know is if there is a way to express AND or NOT: > 1. (a OR b) AND c /a|b/ && /c/ > 2. (a OR b) AND NOT c /a|b/ && !/c/ > 3. (a OR b) AND NOT (c OR d) /a|b/ && !/c|d/ John -- use Perl; program fulfillment
From: jl_post@hotmail.com on 13 Sep 2006 14:43 Kevin Crosbie wrote: > I have a string with comma separated tags: > "a, b, c, d, e, f" > > It's rather easy to write something to express a boolean OR: > a OR b OR c = (^|,(\s)+)(a|b|c)[,$] Um, I don't think the "[,$]" is doing what you think it should be doing (in fact, I don't think that'll even compile). What you should use in its place is "(,|$)". > What I would like to know is if there is a way to express AND or NOT: > 1. (a OR b) AND c > 2. (a OR b) AND NOT c > 3. (a OR b) AND NOT (c OR d) > > I imagine there is no nice way to do this without doing something like > writing out your AND clause before and after whatever OR clause you are > using, which would become really messy for more complicated expressions, > but perhaps someone knows of some way to do this. As far as I know, writing out your AND clause before and after whatever OR clause you are using will be the simplest and most readable solution. However, you can do what you want using more complicated expressions, taking advantage of Perl's extended regular expressions. (You can read "perldoc perlre" to find out more about them.) Let me warn you, though, they can be rather "messy." For this reason, instead of searching for commas (or the beginning/end of the string) like you have, I'll just leave those out, as if all the elements were one character long. This won't always be the case, of course, but I figure that you'll be able to add the delimeter detection in yourself later, but for now, I won't put it in for simplicity's sake. > 1. (a OR b) AND c For this one, you basically want to search use (a|b), but you also want to look for c, which may come before or after (a|b). So you can use this regular expression: m/(a|b).*c|c.*(a|b)/ > 2. (a OR b) AND NOT c This one is trickier, because in order to verify that there is no 'c', you must search the entire line. If 'c' were just one character long, we could get away with using the [^c] character class, like this: m/^[^c]*(a|b)[^c]*$/ This searches for 'a' or 'b', but makes sure that ALL the characters before AND after are NOT 'c'. However, it's likely that your 'c' term won't be one character long. In that case, you'll probably want to use a "negative look-ahead" assertion (again, look it up in "perldoc perlre" if you want to read details about it -- this is one of extended regular expressions I mentioned earlier). That way we would have: m/^((?!c).)*(a|b)((?!c).)*$/ This pattern is essentially the same as the previous one, except instead of having "[^c]" (which assumes that 'c' is one character long), we have "((?!c).)". What this pattern matches is any character provided that 'c' is not immediately found at that spot. To clarify, if 'c' was actually the string "car", you would write the term as "((?!car).)". Notice that you still use one '.' even though "car" is three letters long. That's because the '.' only matches one character, but with the (?!car) in front of it it'll only match if that character is not a 'c' that is followed by an 'a' and an 'r'. (If you put three '.' instead of just one, then the "((?!car)...)*" expression would match multiples of three characters, which is not what you want.) Of course, just as a '*' follows "[^c]", one should also follow "((?!c).)" because you are necessarily searching through more than one character (we'll assume that there is more than one character that comes before and after "(a|b)"). > 3. (a OR b) AND NOT (c OR d) This one is pretty much the same as the previous example, except that instead of using "(?!c)" you'll replace it with "(?!c|d)", like this: m/^((?!c|d).)*(a|b)((?!c|d).)*$/ That's pretty much it. Are the expressions messy? Most people would say yes, so you might want to seriously consider breaking out each of the above regular expressions into more than one, if only for readability's sake. Another tip: Whenever you use a complicated regular expression, consider putting a comment right above it that clearly states what it's searching for. For example, you might write your code to look like: # Look for (a OR b) AND NOT (c OR d): if ($string =~ m/^((?!c|d).)*(a|b)((?!c|d).)*$/) This will make your code easier to understand and to debug. Without the comment, any maintainer that comes after you will have a puzzle to solve in order to figure out what you really meant. And if for some reason you (or a future maintainer) introduced a bug in your regular expression, the comment can serve as a guide to determine whether or not a bug actually exists in the regular expression (otherwise, it would be difficult to know for sure). I hope this helps, Kevin. -- Jean-Luc
|
Next
|
Last
Pages: 1 2 3 Prev: CPAN - 'cl' is not recognized as an internal or externalcommand, Next: String buffer instead of file handle? |