From: Kevin Crosbie on
Hi,

I'm not sure if this is appropriate for this group, it's related to
perlre rather than perl, if so, apologies and I'd appreciate a pointer
to the right group.

I'm trying to find out if this is possible before I embark upon
implementing a solution to my problem.

I have a string with comma separated tags:
"a, b, c, d, e, f"

It's rather easy to write something to express a boolean OR:
a OR b OR c = (^|,(\s)+)(a|b|c)[,$]

What I would like to know is if there is a way to express AND or NOT:
1. (a OR b) AND c
2. (a OR b) AND NOT c
3. (a OR b) AND NOT (c OR d)

I imagine there is no nice way to do this without doing something like
writing out your AND clause before and after whatever OR clause you are
using, which would become really messy for more complicated expressions,
but perhaps someone knows of some way to do this.

Regards,

Kevin
From: Brian McCauley on

Kevin Crosbie wrote:

> I'm not sure if this is appropriate for this group, it's related to
> perlre rather than perl, if so, apologies and I'd appreciate a pointer
> to the right group.

I do not know how much of Perl REs are implemented by perlre so the
solutions I offer may not apply.

> I'm trying to find out if this is possible before I embark upon
> implementing a solution to my problem.
>
> I have a string with comma separated tags:
> "a, b, c, d, e, f"
>
> It's rather easy to write something to express a boolean OR:
> a OR b OR c = (^|,(\s)+)(a|b|c)

> What I would like to know is if there is a way to express AND or NOT:
> 1. (a OR b) AND c

/^(?=.*(^|,(\s)+)(c|d)(,|$)).*(^|,(\s)+)(a|b)(,|$)/

Note: if you want to trade efficiency for readbility you can make all
the capturing (...) into non-captureing (?:...)

Note: I've assumed your data contains no characters that don't match
the period regex.

> 3. (a OR b) AND NOT (c OR d)

/^(?!.*(^|,(\s)+)(c|d)).*(^|,(\s)+)(a|b)(,|$)/

> I imagine there is no nice way to do this without doing something like
> writing out your AND clause before and after whatever OR clause you are
> using, which would become really messy for more complicated expressions,
> but perhaps someone knows of some way to do this.

But REs are really the wrong tool for the job. If this were a real
Perl question I'd say change your API to take a CODE ref rather than a
regex.

From: A. Sinan Unur on
Kevin Crosbie <caoimhinocrosbai_at(a)yahoo.com> wrote in news:45083f01$0
$19201$88260bb3(a)news.teranews.com:

> I'm not sure if this is appropriate for this group, it's related to
> perlre rather than perl,

It all depends on what you mean perlre. If you referring to regular
expressions as implemented in Perl, then it is the right group. However,
if somehow you are referring to some library that implements a regular
expression facility similar to Perl's, it is probably not.

> if so, apologies and I'd appreciate a pointer
> to the right group.

Don't know (see above).

> I'm trying to find out if this is possible before I embark upon
> implementing a solution to my problem.
>
> I have a string with comma separated tags:
> "a, b, c, d, e, f"
>
> It's rather easy to write something to express a boolean OR:
> a OR b OR c = (^|,(\s)+)(a|b|c)[,$]

Wait a second. That is not a valid expression!

#!/usr/bin/perl

use strict;
use warnings;

my $s = 'a,b,c';

if ($s =~ /(^|,(\s)+)(a|b|c)[,$]/) {
print "matched\n";
}

D:\UseNet\clpmisc> t66
Unmatched [ in regex; marked by <-- HERE in m/(^|,(\s)+)(a|b|c)[ <--
HERE ,5.008008/ at D:\UseNet\clpmisc\t66.pl line 8.

> What I would like to know is if there is a way to express AND or NOT:
> 1. (a OR b) AND c
> 2. (a OR b) AND NOT c
> 3. (a OR b) AND NOT (c OR d)

It is more efficient to write

if ( /a/ or /b/ or /c/ ) { ... }

than to write

if ( /a|b|c/ ) { ... }

> I imagine there is no nice way to do this without doing something like
> writing out your AND clause before and after whatever OR clause you
> are using, which would become really messy for more complicated
> expressions,

I am sure there is way of writing a really complicated, hard-to-read,
and inefficient regex one can write to do what you want, but it is not
worth the effort.

There is a reason Perl has logical operators.

Sinan
--
A. Sinan Unur <1usa(a)llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html

From: John W. Krahn on
Kevin Crosbie wrote:
>
> I'm not sure if this is appropriate for this group, it's related to
> perlre rather than perl, if so, apologies and I'd appreciate a pointer
> to the right group.
>
> I'm trying to find out if this is possible before I embark upon
> implementing a solution to my problem.
>
> I have a string with comma separated tags:
> "a, b, c, d, e, f"
>
> It's rather easy to write something to express a boolean OR:
> a OR b OR c = (^|,(\s)+)(a|b|c)[,$]
>
> What I would like to know is if there is a way to express AND or NOT:
> 1. (a OR b) AND c

/a|b/ && /c/


> 2. (a OR b) AND NOT c

/a|b/ && !/c/


> 3. (a OR b) AND NOT (c OR d)

/a|b/ && !/c|d/




John
--
use Perl;
program
fulfillment
From: jl_post@hotmail.com on
Kevin Crosbie wrote:
> I have a string with comma separated tags:
> "a, b, c, d, e, f"
>
> It's rather easy to write something to express a boolean OR:
> a OR b OR c = (^|,(\s)+)(a|b|c)[,$]

Um, I don't think the "[,$]" is doing what you think it should be
doing (in fact, I don't think that'll even compile). What you should
use in its place is "(,|$)".

> What I would like to know is if there is a way to express AND or NOT:
> 1. (a OR b) AND c
> 2. (a OR b) AND NOT c
> 3. (a OR b) AND NOT (c OR d)
>
> I imagine there is no nice way to do this without doing something like
> writing out your AND clause before and after whatever OR clause you are
> using, which would become really messy for more complicated expressions,
> but perhaps someone knows of some way to do this.

As far as I know, writing out your AND clause before and after
whatever OR clause you are using will be the simplest and most readable
solution. However, you can do what you want using more complicated
expressions, taking advantage of Perl's extended regular expressions.
(You can read "perldoc perlre" to find out more about them.)

Let me warn you, though, they can be rather "messy." For this
reason, instead of searching for commas (or the beginning/end of the
string) like you have, I'll just leave those out, as if all the
elements were one character long. This won't always be the case, of
course, but I figure that you'll be able to add the delimeter detection
in yourself later, but for now, I won't put it in for simplicity's
sake.

> 1. (a OR b) AND c

For this one, you basically want to search use (a|b), but you also
want to look for c, which may come before or after (a|b). So you can
use this regular expression:

m/(a|b).*c|c.*(a|b)/

> 2. (a OR b) AND NOT c

This one is trickier, because in order to verify that there is no
'c', you must search the entire line. If 'c' were just one character
long, we could get away with using the [^c] character class, like this:

m/^[^c]*(a|b)[^c]*$/

This searches for 'a' or 'b', but makes sure that ALL the characters
before AND after are NOT 'c'.

However, it's likely that your 'c' term won't be one character long.
In that case, you'll probably want to use a "negative look-ahead"
assertion (again, look it up in "perldoc perlre" if you want to read
details about it -- this is one of extended regular expressions I
mentioned earlier). That way we would have:

m/^((?!c).)*(a|b)((?!c).)*$/

This pattern is essentially the same as the previous one, except
instead of having "[^c]" (which assumes that 'c' is one character
long), we have "((?!c).)". What this pattern matches is any character
provided that 'c' is not immediately found at that spot.

To clarify, if 'c' was actually the string "car", you would write
the term as "((?!car).)". Notice that you still use one '.' even
though "car" is three letters long. That's because the '.' only
matches one character, but with the (?!car) in front of it it'll only
match if that character is not a 'c' that is followed by an 'a' and an
'r'.

(If you put three '.' instead of just one, then the "((?!car)...)*"
expression would match multiples of three characters, which is not what
you want.)

Of course, just as a '*' follows "[^c]", one should also follow
"((?!c).)" because you are necessarily searching through more than one
character (we'll assume that there is more than one character that
comes before and after "(a|b)").

> 3. (a OR b) AND NOT (c OR d)

This one is pretty much the same as the previous example, except
that instead of using "(?!c)" you'll replace it with "(?!c|d)", like
this:

m/^((?!c|d).)*(a|b)((?!c|d).)*$/

That's pretty much it. Are the expressions messy? Most people
would say yes, so you might want to seriously consider breaking out
each of the above regular expressions into more than one, if only for
readability's sake.

Another tip: Whenever you use a complicated regular expression,
consider putting a comment right above it that clearly states what it's
searching for. For example, you might write your code to look like:

# Look for (a OR b) AND NOT (c OR d):
if ($string =~ m/^((?!c|d).)*(a|b)((?!c|d).)*$/)

This will make your code easier to understand and to debug. Without
the comment, any maintainer that comes after you will have a puzzle to
solve in order to figure out what you really meant. And if for some
reason you (or a future maintainer) introduced a bug in your regular
expression, the comment can serve as a guide to determine whether or
not a bug actually exists in the regular expression (otherwise, it
would be difficult to know for sure).

I hope this helps, Kevin.

-- Jean-Luc