From: spydox on

I'm trying to find a repeated number in a string, like 122345 finds
22.

This works:

/(\d)\1/

This doesn't:

/\1(\d)/

I guess LLR parsing is to blame, but shouldn't the second example
first try to FIND a $1 then check to see if there is a \1, and repeat
that process moving L to R?

I though Perl sort of went to and fro trying to do matching. To me,
there IS a /\1(\d)/ in the string since $1 is 2, and there is a \1 = 2
preceeding it.

I was a little surprized this didn't work although I can sort of see
why in a way too. In some ways it seems to me that regexes should be
*disconnected* from parsing - just answer the question does this
match?




From: Ben Morrow on

Quoth spydox(a)gmail.com:
>
> I'm trying to find a repeated number in a string, like 122345 finds
> 22.
>
> This works:
>
> /(\d)\1/
>
> This doesn't:
>
> /\1(\d)/
>
> I guess LLR parsing is to blame, but shouldn't the second example
> first try to FIND a $1 then check to see if there is a \1, and repeat
> that process moving L to R?
>
> I though Perl sort of went to and fro trying to do matching. To me,
> there IS a /\1(\d)/ in the string since $1 is 2, and there is a \1 = 2
> preceeding it.

There are two separate operations here which you are confusing. First
perl parses the regex itself, and compiles it into an internal form.
Then it matches that regex against the string you provide. The second
will backtrack, under some circumstances; the first won't.

Ben

From: A. Sinan Unur on
spydox(a)gmail.com wrote in
news:093bf887-729d-4400-8750-
6c91b21b478e(a)w4g2000prd.googlegroups.com
:

> I'm trying to find a repeated number in a string, like 122345
> finds 22.
>
> This works:
>
> /(\d)\1/
>
> This doesn't:
>
> /\1(\d)/
>
> I guess LLR parsing is to blame,

....

> I was a little surprized this didn't work although I can sort of
> see why in a way too. In some ways it seems to me that regexes
> should be *disconnected* from parsing - just answer the question
> does this match?

I don't look at this as a parsing issue. Rather, it is a "the
universe must make sense" kind of issue: The first match does not
exist before the first match. That makes sense to me. It may not
make sense to you.

Sinan
--
A. Sinan Unur <1usa(a)llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
From: spydox on
On Apr 14, 2:31 pm, Ben Morrow <b...(a)morrow.me.uk> wrote:
> Quoth spy...(a)gmail.com:
>
>
>
>
>
> > I'm trying to find a repeated number in a string, like 122345 finds
> > 22.
>
> > This works:
>
> > /(\d)\1/
>
> > This doesn't:
>
> > /\1(\d)/
>
> > I guess LLR parsing is to blame, but shouldn't the second example
> > first try to FIND a $1 then check to see if there is a \1, and repeat
> > that process moving L to R?
>
> > I though Perl sort of went to and fro trying to do matching. To me,
> > there IS a /\1(\d)/ in the string since $1 is 2, and there is a \1 = 2
> > preceeding it.
>
> There are two separate operations here which you are confusing. First
> perl parses the regex itself, and compiles it into an internal form.
> Then it matches that regex against the string you provide. The second
> will backtrack, under some circumstances; the first won't.
>
> Ben

Understood, and I appreciate the insight. It makes sense.
Yet, when all else apparently *fails*, in my experience, and I've
heard MJD and others say this, Perl will "do its best" to match. To
me, unless it *also* tried backtracking, it gave up too soon..



From: spydox on
..
..
..
>
> > I guess LLR parsing is to blame,
>
..
..
>
> I don't look at this as a parsing issue. Rather, it is a "the
> universe must make sense" kind of issue: The first match does not
> exist before the first match. That makes sense to me. It may not
> make sense to you.
>

To me, like conventional pattern-recognition, of say two tanks next to
each other, the system should accept it whether the match is described
either way:

find a tank with another identical tank to it's left

*or*

find a tank with another identical tank to it's right


The system should have no *context-sensitivity* where only one of the
two matches. Sure, internally an algorithm may be scanning L to R or R
to L or whatever, but the user should not even be concerned with that,
at least in this case. I still think it gave up too soon- it should
have tried R to L (backtracking) when L to R failed.

Just IMHO, thank-you for your thoughts. This area seems just a bit
gray to me I'd be very interested in Damain or Mark's thoughts.