From: Robert Bonomi on
In article <hp4752$loi$2(a)speranza.aioe.org>,
Mike Scott <usenet.12(a)spam.stopper.scottsonline.org.uk> wrote:
>Robert Bonomi wrote:
>> In article <hp2b70$l18$1(a)speranza.aioe.org>,
>> Mike Scott <usenet.12(a)spam.stopper.scottsonline.org.uk> wrote:
>>> Oh, and using 'old-style' re's,
>>> \(.*\)\1
>>> matches
>>> 123abcabc456
>>> but returns a null string as the match! Wierd.
>>
>> If you don't understand why that is happening, then you do *NOT* understand
>> regular expressions.
>>
>> explanation:
>> '.' means 'match any character'
>> '*' means 'match ZERO OR MORE of the previous character'
>>
>> Thus '.*' does match a null string (zero characters, before the first '1')
>> and there is a second null string, following the first one, (still before
>> the first '1') -- hence the search criteria _is_ satisfied.
>>
>> Wildcard RE matches look for the match that starts _earliest_ in the string,
>> and has the longest length.
>>
>> The null string match occurs before the 'abcabc' match, and thus is selected
>> even though the second pattern match is longer.
>>
>>
>Ok, thanks, point taken. In mitigation, I did find an almost exactly
>similar example on the net, making the exact same mistake.......

It is a _common_ mistake. I've made it myself, _more_ than once. <wry grin>

'.*' without something anchoring it on at least one side is almost *never*
what the author intended, for exactly that reason. Usually, the intent is
'.+' (or '..*', if you don't have the '+' wildcard available -- as in some
obselete, pre-POSIX, implementations) which imposes a minimum length of 1.

From: Mike Scott on
Mike Scott wrote:
......
> Unfortunately, people being what they are, I also get things like
> "All Musicians" <musicians(a)mydomain>
> appearing - note the capitals.
> Unfortunately, the backref /always/ seems to honour the capitalization,
> so the above re will not match, even with REG_ICASE set. The behaviour
> seems debatable and the man page unclear. I assume there's no way out of
> this using re's???
>
>
(Sorry for following up my own post)

A quick check this week showed perl behaved sensibly - the
case-independent flag makes even back ref's ignore case, unlike the C
library re routines. Has anyone already hacked milter-regex to use pcre
instead??

Maybe I'll ask too on the sendmail group when time allows.

--
Mike Scott (unet2 <at> [deletethis] scottsonline.org.uk)
Harlow Essex England