From: Bezoar on
I've been trying to get a regexp that will match lines that match a
particular first part but do not end and two specified last parts. For
instance if I have the following input:

a abc dkkdkdkdChuck;
a abc oeriererer;
a abc oeriereXY;

my regexp must only match line 2 ( a abc oeriererer;) which means I
need a regexp that means:
"line begins with a, then contains abc then any number or characters
but not ending in Chuck or XY followed by ; and the end of line". My
first attempt was the following where the only NOT operator is ^ but
is only effective in [] which allow only character groups not
alternation.

set reg {^a abc .*[^(Chuck|XY)];$} ; # not valid

How can I get the opposite of the alternation?
From: Bezoar on
On Aug 5, 12:14 pm, Bezoar <cwjo...(a)gmail.com> wrote:
> I've been trying to get a regexp that will match lines that match a
> particular first part but do not end and two specified last parts. For
> instance if I have the following input:
>
> a abc dkkdkdkdChuck;
> a abc oeriererer;
> a abc oeriereXY;
>
> my regexp must only match line 2 ( a abc oeriererer;)  which means I
> need a regexp that means:
> "line begins with a, then contains abc then any number or characters
> but not ending in Chuck or XY followed by ; and the end of line". My
> first attempt was  the following where the only NOT operator is ^ but
> is only effective in [] which allow only character groups not
> alternation.
>
> set reg {^a abc .*[^(Chuck|XY)];$}  ; # not valid
>
> How can I get the opposite of the alternation?

Well after some more digging and experimentation I found the answer:

set reg {^a abc(?!.*(Chuck|XY);$).*;$}

this uses the negative lookahead constraint which says :
match up to a abc then look head to see if the line ends in Chuck;$ or
XY;$ then if it does not then continue to match any character 0 or
more times followed by ; and end of line, otherwise it does not
match.

Whew tough one

From: Jonathan Bromley on
On Thu, 5 Aug 2010 11:03:01 -0700 (PDT), Bezoar wrote:

[...]
>> How can I get the opposite of the alternation?
>
>Well after some more digging and experimentation I found the answer:
>
>set reg {^a abc(?!.*(Chuck|XY);$).*;$}
>
>this uses the negative lookahead constraint which says :
>match up to a abc then look head to see if the line ends in Chuck;$ or
>XY;$ then if it does not then continue to match any character 0 or
>more times followed by ; and end of line, otherwise it does not
>match.

Beware negative lookahead constraints. They take a HUGE
performance hit in the regexp engine. I don't know exactly
why this is so, but I've seen 50x performance degradation
with even simple constraints (a fixed string of about 6
characters, nothing clever). If you're scanning large
input texts, this can make all the difference between
satisfactory and unacceptable performance.

Generally it is far faster to get ALL the candidate matches
with a first RE, then use some filtering (possibly another RE)
to reject the unwanted ones. [regexp -all -inline] is your
friend (possibly with -indices too).
--
Jonathan Bromley
From: Gerald W. Lester on
Bezoar wrote:
> On Aug 5, 12:14 pm, Bezoar <cwjo...(a)gmail.com> wrote:
>> I've been trying to get a regexp that will match lines that match a
>> particular first part but do not end and two specified last parts. For
>> instance if I have the following input:
>>
>> a abc dkkdkdkdChuck;
>> a abc oeriererer;
>> a abc oeriereXY;
>>
>> my regexp must only match line 2 ( a abc oeriererer;) which means I
>> need a regexp that means:
>> "line begins with a, then contains abc then any number or characters
>> but not ending in Chuck or XY followed by ; and the end of line". My
>> first attempt was the following where the only NOT operator is ^ but
>> is only effective in [] which allow only character groups not
>> alternation.
>>
>> set reg {^a abc .*[^(Chuck|XY)];$} ; # not valid
>>
>> How can I get the opposite of the alternation?
>
> Well after some more digging and experimentation I found the answer:
>
> set reg {^a abc(?!.*(Chuck|XY);$).*;$}
>
> this uses the negative lookahead constraint which says :
> match up to a abc then look head to see if the line ends in Chuck;$ or
> XY;$ then if it does not then continue to match any character 0 or
> more times followed by ; and end of line, otherwise it does not
> match.
>
> Whew tough one

You could also have done (which IMHO is a lot easier to read):

if {[string match {a*Chuck;} $line] || [string match {a*XY;} $line]} {
##
## Line does not match
##
} else {
##
## Line does match
##
}



--
+------------------------------------------------------------------------+
| Gerald W. Lester, President, KNG Consulting LLC |
| Email: Gerald.Lester(a)kng-consulting.net |
+------------------------------------------------------------------------+
From: Uwe Klein on
Bezoar wrote:
> On Aug 5, 12:14 pm, Bezoar <cwjo...(a)gmail.com> wrote:
>
>>I've been trying to get a regexp that will match lines that match a
>>particular first part but do not end and two specified last parts. For
>>instance if I have the following input:
>>
>>a abc dkkdkdkdChuck;
>>a abc oeriererer;
>>a abc oeriereXY;
>
> Whew tough one
>
switch -regexp -- $pattern \
{^a abc .*Chuck;$} - {^a abc .*XY;$} {
# nop
} ^a abc .*;$} {
# hit
puts "found it"
} default {
# everything else
}


uwe