Raw string substitution problem [Python]

Prev: Odd json encoding erro
Next: Subclassing RegexObject

From: Gregory Ewing on 18 Dec 2009 02:51

MRAB wrote:

> Regular expressions and replacement strings have their own escaping
> mechanism, which also uses backslashes.

This seems like a misfeature to me. It makes sense for
a regular expression to give special meanings to backslash
sequences, because it's a sublanguage with its own syntax.
But I can't see any earthly reason to do that with the
*replacement* string, which is just data.

It looks like a feature that's been blindly copied over
from Perl without thinking about whether it makes sense
in Python.

--
Greg

From: Sion Arrowsmith on 18 Dec 2009 12:09

Gregory Ewing <greg.ewing(a)canterbury.ac.nz> wrote:
>MRAB wrote:
>> Regular expressions and replacement strings have their own escaping
>> mechanism, which also uses backslashes.
>This seems like a misfeature to me. It makes sense for
>a regular expression to give special meanings to backslash
>sequences, because it's a sublanguage with its own syntax.
>But I can't see any earthly reason to do that with the
>*replacement* string, which is just data.

>>> re.sub('a(.)c', r'\1', "123abcdefg")
'123bdefg'

Still think the replacement string is "just data"?

--
\S

under construction

From: MRAB on 18 Dec 2009 12:17

Gregory Ewing wrote:
> MRAB wrote:
>
>> Regular expressions and replacement strings have their own escaping
>> mechanism, which also uses backslashes.
>
> This seems like a misfeature to me. It makes sense for a regular
> expression to give special meanings to backslash sequences, because
> it's a sublanguage with its own syntax. But I can't see any earthly
> reason to do that with the *replacement* string, which is just data.
>
> It looks like a feature that's been blindly copied over from Perl
> without thinking about whether it makes sense in Python.
>
In simple cases you might be replacing with the same string every time,
but other cases you might want the replacement to contain substrings
captured by the regex.

For example, swapping pairs of words:

>>> re.sub(r'(\w+) (\w+)', r'\2 \1', r'first second third fourth')
'second first fourth third'

Python also allows you to provide a function that returns the
replacement string, but that seems a bit long-winded for those cases
when a simple replacement template would suffice.

From: Alan G Isaac on 18 Dec 2009 12:58

On 12/17/2009 7:59 PM, Rhodri James wrote:
> "re.compile('a\\nc')" passes a sequence of four characters to
> re.compile: 'a', '\', 'n' and 'c'. re.compile() then does it's own
> interpretation: 'a' passes through as is, '\' flags an escape which
> combined with 'n' produces the newline character (0x0a), and 'c' passes
> through as is.

I got that from MRAB's posts. (Thanks.)
What I'm not getting is why the replacement string
gets this particular interpretation. What is the payoff?
(Contrast e.g. Vim's substitution syntax.)

Thanks,
Alan

From: Alan G Isaac on 18 Dec 2009 12:59

On 12/18/2009 12:17 PM, MRAB wrote:
> In simple cases you might be replacing with the same string every time,
> but other cases you might want the replacement to contain substrings
> captured by the regex.

Of course that "conversion" is needed in the replacement.
But e.g. Vim substitutions handle this fine without the
odd (to non perlers) handling of backslashes in replacement.

Alan Isaac

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: Odd json encoding erro
Next: Subclassing RegexObject