regular expression hassle [FreeBSD]

Prev: NYC LOCAL: Tuesday 30 March 2010 NYLUG Hackfest
Next: NYC LOCAL: Friday 2 April 2010 Student and Startup Hackathon NYC

From: Robert Bonomi on 1 Apr 2010 20:54

In article <hp2b70$l18$1(a)speranza.aioe.org>,
Mike Scott <usenet.12(a)spam.stopper.scottsonline.org.uk> wrote:
>
>Oh, and using 'old-style' re's,
>$.*$\1
>matches
>123abcabc456
>but returns a null string as the match! Wierd.

If you don't understand why that is happening, then you do *NOT* understand
regular expressions.

explanation:
'.' means 'match any character'
'*' means 'match ZERO OR MORE of the previous character'

Thus '.*' does match a null string (zero characters, before the first '1')
and there is a second null string, following the first one, (still before
the first '1') -- hence the search criteria _is_ satisfied.

Wildcard RE matches look for the match that starts _earliest_ in the string,
and has the longest length.

The null string match occurs before the 'abcabc' match, and thus is selected
even though the second pattern match is longer.

From: Mike Scott on 2 Apr 2010 03:39

mikea wrote:
> Randal L. Schwartz <merlyn(a)stonehenge.com> wrote in <86iq8au9nt.fsf(a)red.stonehenge.com>:
>>>>>>> "Johan" == Johan van Selst <{c.u.b.f.m.}@news.gletsjer.org> writes:
>> Johan> If you need back references, then you must use the old style 'basic'
>> Johan> regular expressions (where possible). The new, 'extended' regular
>> Johan> expressions are generally faster and more useful though, as long as
>> Johan> you do not need this feature.
>>
>> Or, just use Perl, where you can have the kitchen sink...
>
> Well, yes, but the OP and I are using milter-regex, and he asked in the
> context of milter-regex. We don't get to choose which regex engine is being
> used, unless we do some rather determined hackery on the product. I grant I
> _have_ done some minor hackery already, but nothing so complex as changing
> to a different regex engine. The prospect rather daunts me.
>

Now there's a thought - rewrite milter-regex in perl perhaps? :-) iirec
cpan does have the relevant milter library?

But meanwhile, the starting point was that I couldn't get backrefs to
work in milter-regex, whether old-style or extended regexp's. Has anyone
else managed to do this?

Specifically, and understanding the risks, I'm trying to make sure that
a To: line like
"comment" <jo(a)mydomain>
will fail unless "comment" includes the text "jo", so I'm looking at
re's like
$..*$.*<\1@

but these always seem to fail to match everything; it seems to be the \1
that is causing the problem.

--
Mike Scott (unet2 <at> [deletethis] scottsonline.org.uk)
Harlow Essex England

From: Mike Scott on 2 Apr 2010 03:42

Robert Bonomi wrote:
> In article <hp2b70$l18$1(a)speranza.aioe.org>,
> Mike Scott <usenet.12(a)spam.stopper.scottsonline.org.uk> wrote:
>> Oh, and using 'old-style' re's,
>> $.*$\1
>> matches
>> 123abcabc456
>> but returns a null string as the match! Wierd.
>
> If you don't understand why that is happening, then you do *NOT* understand
> regular expressions.
>
> explanation:
> '.' means 'match any character'
> '*' means 'match ZERO OR MORE of the previous character'
>
> Thus '.*' does match a null string (zero characters, before the first '1')
> and there is a second null string, following the first one, (still before
> the first '1') -- hence the search criteria _is_ satisfied.
>
> Wildcard RE matches look for the match that starts _earliest_ in the string,
> and has the longest length.
>
> The null string match occurs before the 'abcabc' match, and thus is selected
> even though the second pattern match is longer.
>
>
Ok, thanks, point taken. In mitigation, I did find an almost exactly
similar example on the net, making the exact same mistake.......

--
Mike Scott (unet2 <at> [deletethis] scottsonline.org.uk)
Harlow Essex England

From: Randal L. Schwartz on 2 Apr 2010 04:11

>>>>> "Mike" == Mike Scott <usenet.12(a)spam.stopper.scottsonline.org.uk> writes:

Mike> Now there's a thought - rewrite milter-regex in perl perhaps? :-) iirec cpan
Mike> does have the relevant milter library?

I believe there are Perl milter plugins, so you're not far off.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn(a)stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion

From: Mike Scott on 8 Apr 2010 14:03

Randal L. Schwartz wrote:
>>>>>> "Mike" == Mike Scott <usenet.12(a)spam.stopper.scottsonline.org.uk> writes:
>
> Mike> Now there's a thought - rewrite milter-regex in perl perhaps? :-) iirec cpan
> Mike> does have the relevant milter library?
>
> I believe there are Perl milter plugins, so you're not far off.
>

(I took a look. Unfortunately Sendmail::Milter needs perl compiled with
threading, which the fbsd default perl does not have. I'm reluctant to
recompile, as it appears this will break any library stuff compiled for
the existing perl: if anyone can tell me this isn't so, I'll be happier!
The Net::Milter module though provides a useful test framework for
milters, which I didn't know about before.)

But back to the re problem. I've finally sorted out my problems with
old/new re's and the syntax differences, etc. But I'm left with a killer
problem for my milter-regex config.

I'm trying to match things like
"all musicians" <musicians(a)mydomain>
in the mail 'to' header, and have an re to match this
$..*$.*<\1@

Unfortunately, people being what they are, I also get things like
"All Musicians" <musicians(a)mydomain>
appearing - note the capitals.
Unfortunately, the backref /always/ seems to honour the capitalization,
so the above re will not match, even with REG_ICASE set. The behaviour
seems debatable and the man page unclear. I assume there's no way out of
this using re's???

--
Mike Scott (unet2 <at> [deletethis] scottsonline.org.uk)
Harlow Essex England

First | Prev | Next | Last
Pages: 1 2 3
Prev: NYC LOCAL: Tuesday 30 March 2010 NYLUG Hackfest
Next: NYC LOCAL: Friday 2 April 2010 Student and Startup Hackathon NYC