From: Mike Scott on
I've been trying without success to make a back reference regular
expression work in milter-regex, but it seems I misunderstand something
more basic.

Using old-style re's, an re like
\(..*\)\1
will match a string
123abcabc456
returning
abcabc

That works fine in a trivial test program.

I'm using fbsd 6.x, and compiling an re with REG_EXTENDED always fails
to match whenever \1 appears in the re. So
(..*)\1
fails to match the test string above, (while
(..*)
will match the whole (of course) string).

The man page isn't entirely clear about whether the 'new atom type'
means back-referencing is included or excluded in the extended re
syntax; the web suggests it should work with REG_EXTENDED set.

Is this behaviour correct with fbsd 6 please? And have things changed
since? I do see the man page mentions 'alpha quality' which sounds
ominous :-(


Either way, milter-regex doesn't seem to like the use of the \1
construct - is this disallowed by that program for some reason?


Oh, and using 'old-style' re's,
\(.*\)\1
matches
123abcabc456
but returns a null string as the match! Wierd.



TIA.

--
Mike Scott (unet2 <at> [deletethis] scottsonline.org.uk)
Harlow Essex England
From: Johan van Selst on
Once upon a newsgroup, Mike Scott claimed:
> Using old-style re's, an re like
> \(..*\)\1
> will match a string
> 123abcabc456
>
> I'm using fbsd 6.x, and compiling an re with REG_EXTENDED always fails
> to match whenever \1 appears in the re.

Indeed, extended regular expressions do not work with back references.
You will see the same behaviour with sed (sed -E) and grep (egrep).

If you need back references, then you must use the old style 'basic'
regular expressions (where possible). The new, 'extended' regular
expressions are generally faster and more useful though, as long as
you do not need this feature.


Ciao,
Johan
--
Why do we always come here - I guess we'll never know.
It's like a kind of torture to have to watch the show.
From: Randal L. Schwartz on
>>>>> "Johan" == Johan van Selst <{c.u.b.f.m.}@news.gletsjer.org> writes:

Johan> If you need back references, then you must use the old style 'basic'
Johan> regular expressions (where possible). The new, 'extended' regular
Johan> expressions are generally faster and more useful though, as long as
Johan> you do not need this feature.

Or, just use Perl, where you can have the kitchen sink...

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn(a)stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
From: mikea on
Randal L. Schwartz <merlyn(a)stonehenge.com> wrote in <86iq8au9nt.fsf(a)red.stonehenge.com>:
>>>>>> "Johan" == Johan van Selst <{c.u.b.f.m.}@news.gletsjer.org> writes:
>
> Johan> If you need back references, then you must use the old style 'basic'
> Johan> regular expressions (where possible). The new, 'extended' regular
> Johan> expressions are generally faster and more useful though, as long as
> Johan> you do not need this feature.
>
> Or, just use Perl, where you can have the kitchen sink...

Well, yes, but the OP and I are using milter-regex, and he asked in the
context of milter-regex. We don't get to choose which regex engine is being
used, unless we do some rather determined hackery on the product. I grant I
_have_ done some minor hackery already, but nothing so complex as changing
to a different regex engine. The prospect rather daunts me.

--
French does have a certain je ne sais quoi, but I don't know
what it is.
-- Jeffrey Goldberg, in nanae
From: Balwinder S Dheeman on
On 04/02/2010 02:18 AM, Johan van Selst wrote:
> Once upon a newsgroup, Mike Scott claimed:
>> Using old-style re's, an re like
>> \(..*\)\1
>> will match a string
>> 123abcabc456
>>
>> I'm using fbsd 6.x, and compiling an re with REG_EXTENDED always fails
>> to match whenever \1 appears in the re.
>
> Indeed, extended regular expressions do not work with back references.
> You will see the same behaviour with sed (sed -E) and grep (egrep).
>
> If you need back references, then you must use the old style 'basic'
> regular expressions (where possible). The new, 'extended' regular
> expressions are generally faster and more useful though, as long as
> you do not need this feature.

How about using pcre's pgrep?

--
Balwinder S "bdheeman" Dheeman Registered Linux User: #229709
Anu'z Linux(a)HOME (Unix Shoppe) Machines: #168573, 170593, 259192
Chandigarh, UT, 160062, India Plan9, T2, Arch/Debian/FreeBSD/XP
Home: http://werc.homelinux.net/ Visit: http://counter.li.org/