Raw string substitution problem [Python]

Prev: Odd json encoding erro
Next: Subclassing RegexObject

From: Gabriel Genellina on 16 Dec 2009 09:35

En Wed, 16 Dec 2009 11:09:32 -0300, Ed Keith <e_d_k(a)yahoo.com> escribió:

> I am having a problem when substituting a raw string. When I do the
> following:
>
> re.sub('abc', r'a\nb\nc', '123abcdefg')
>
> I get
>
> """
> 123a
> b
> cdefg
> """
>
> what I want is
>
> r'123a\nb\ncdefg'

From http://docs.python.org/library/re.html#re.sub

re.sub(pattern, repl, string[, count])

...repl can be a string or a function; if
it is a string, any backslash escapes in
it are processed. That is, \n is converted
to a single newline character, \r is
converted to a linefeed, and so forth.

So you'll have to double your backslashes:

py> re.sub('abc', r'a\\nb\\nc', '123abcdefg')
'123a\\nb\\ncdefg'

--
Gabriel Genellina

From: Ed Keith on 16 Dec 2009 12:19

--- On Wed, 12/16/09, Gabriel Genellina <gagsl-py2(a)yahoo.com.ar> wrote:

> From: Gabriel Genellina <gagsl-py2(a)yahoo.com.ar>
> Subject: Re: Raw string substitution problem
> To: python-list(a)python.org
> Date: Wednesday, December 16, 2009, 9:35 AM
> En Wed, 16 Dec 2009 11:09:32 -0300,
> Ed Keith <e_d_k(a)yahoo.com>
> escribió:
>
> > I am having a problem when substituting a raw string.
> When I do the following:
> >
> > re.sub('abc', r'a\nb\nc', '123abcdefg')
> >
> > I get
> >
> > """
> > 123a
> > b
> > cdefg
> > """
> >
> > what I want is
> >
> > r'123a\nb\ncdefg'
>
> From http://docs.python.org/library/re.html#re.sub
>
>     re.sub(pattern, repl, string[, count])
>
>     ...repl can be a string or a function;
> if
>     it is a string, any backslash escapes
> in
>     it are processed.. That is, \n is
> converted
>     to a single newline character, \r is
>     converted to a linefeed, and so forth.
>
> So you'll have to double your backslashes:
>
> py> re.sub('abc', r'a\\nb\\nc', '123abcdefg')
> '123a\\nb\\ncdefg'
>
> --Gabriel Genellina
>
> --http://mail.python.org/mailman/listinfo/python-list
>

That is going to be a nontrivial exercise. I have control over the pattern, but the texts to be substituted and substituted into will be read from user supplied files. I need to reproduce the exact text the is read from the file.

Maybe what I should do is use re to break the string into two pieces, the part before the pattern to be replaces and the part after it, then splice the replacement text in between them. Seems like doing it the hard way, but it should work.

Thanks,

-EdK

From: Peter Otten on 16 Dec 2009 12:51

Ed Keith wrote:

> --- On Wed, 12/16/09, Gabriel Genellina <gagsl-py2(a)yahoo.com.ar> wrote:
>
>> From: Gabriel Genellina <gagsl-py2(a)yahoo.com.ar>
>> Subject: Re: Raw string substitution problem
>> To: python-list(a)python.org
>> Date: Wednesday, December 16, 2009, 9:35 AM
>> En Wed, 16 Dec 2009 11:09:32 -0300,
>> Ed Keith <e_d_k(a)yahoo.com>
>> escribió:
>>
>> > I am having a problem when substituting a raw string.
>> When I do the following:
>> >
>> > re.sub('abc', r'a\nb\nc', '123abcdefg')
>> >
>> > I get
>> >
>> > """
>> > 123a
>> > b
>> > cdefg
>> > """
>> >
>> > what I want is
>> >
>> > r'123a\nb\ncdefg'
>>
>> From http://docs.python.org/library/re.html#re.sub
>>
>> re.sub(pattern, repl, string[, count])
>>
>> ...repl can be a string or a function;
>> if
>> it is a string, any backslash escapes
>> in
>> it are processed. That is, \n is
>> converted
>> to a single newline character, \r is
>> converted to a linefeed, and so forth.
>>
>> So you'll have to double your backslashes:
>>
>> py> re.sub('abc', r'a\\nb\\nc', '123abcdefg')
>> '123a\\nb\\ncdefg'
>>
>> --Gabriel Genellina
>>
>> --http://mail.python.org/mailman/listinfo/python-list
>>
>
> That is going to be a nontrivial exercise. I have control over the
> pattern, but the texts to be substituted and substituted into will be read
> from user supplied files. I need to reproduce the exact text the is read
> from the file.

There is a helper function re.escape() that you can use to sanitize the
substitution:

>>> print re.sub('abc', re.escape(r'a\nb\nc'), '123abcdefg')
123a\nb\ncdefg

Peter

From: Gabriel Genellina on 16 Dec 2009 14:23

En Wed, 16 Dec 2009 14:51:08 -0300, Peter Otten <__peter__(a)web.de>
escribi�:

> Ed Keith wrote:
>
>> --- On Wed, 12/16/09, Gabriel Genellina <gagsl-py2(a)yahoo.com.ar> wrote:
>>
>>> Ed Keith <e_d_k(a)yahoo.com>
>>> escribi�:
>>>
>>> > I am having a problem when substituting a raw string.
>>> When I do the following:
>>> >
>>> > re.sub('abc', r'a\nb\nc', '123abcdefg')
>>> >
>>> > I get
>>> >
>>> > """
>>> > 123a
>>> > b
>>> > cdefg
>>> > """
>>> >
>>> > what I want is
>>> >
>>> > r'123a\nb\ncdefg'
>>>
>>> So you'll have to double your backslashes:
>>>
>>> py> re.sub('abc', r'a\\nb\\nc', '123abcdefg')
>>> '123a\\nb\\ncdefg'
>>>
>> That is going to be a nontrivial exercise. I have control over the
>> pattern, but the texts to be substituted and substituted into will be
>> read
>> from user supplied files. I need to reproduce the exact text the is read
>> from the file.
>
> There is a helper function re.escape() that you can use to sanitize the
> substitution:
>
>>>> print re.sub('abc', re.escape(r'a\nb\nc'), '123abcdefg')
> 123a\nb\ncdefg

Unfortunately re.escape does much more than that:

py> print re.sub('abc', re.escape(r'a.b.c'), '123abcdefg')
123a\.b\.cdefg

I think the string_escape encoding is what the OP needs:

py> print re.sub('abc', r'a\n(b.c)\nd'.encode("string_escape"),
'123abcdefg')
123a\n(b.c)\nddefg

--
Gabriel Genellina

From: Peter Otten on 16 Dec 2009 14:54

Gabriel Genellina wrote:

> En Wed, 16 Dec 2009 14:51:08 -0300, Peter Otten <__peter__(a)web.de>
> escribió:
>
>> Ed Keith wrote:
>>
>>> --- On Wed, 12/16/09, Gabriel Genellina <gagsl-py2(a)yahoo.com.ar> wrote:
>>>
>>>> Ed Keith <e_d_k(a)yahoo.com>
>>>> escribió:
>>>>
>>>> > I am having a problem when substituting a raw string.
>>>> When I do the following:
>>>> >
>>>> > re.sub('abc', r'a\nb\nc', '123abcdefg')
>>>> >
>>>> > I get
>>>> >
>>>> > """
>>>> > 123a
>>>> > b
>>>> > cdefg
>>>> > """
>>>> >
>>>> > what I want is
>>>> >
>>>> > r'123a\nb\ncdefg'
>>>>
>>>> So you'll have to double your backslashes:
>>>>
>>>> py> re.sub('abc', r'a\\nb\\nc', '123abcdefg')
>>>> '123a\\nb\\ncdefg'
>>>>
>>> That is going to be a nontrivial exercise. I have control over the
>>> pattern, but the texts to be substituted and substituted into will be
>>> read
>>> from user supplied files. I need to reproduce the exact text the is read
>>> from the file.
>>
>> There is a helper function re.escape() that you can use to sanitize the
>> substitution:
>>
>>>>> print re.sub('abc', re.escape(r'a\nb\nc'), '123abcdefg')
>> 123a\nb\ncdefg
>
> Unfortunately re.escape does much more than that:
>
> py> print re.sub('abc', re.escape(r'a.b.c'), '123abcdefg')
> 123a\.b\.cdefg

Sorry, I didn't think of that.

> I think the string_escape encoding is what the OP needs:
>
> py> print re.sub('abc', r'a\n(b.c)\nd'.encode("string_escape"),
> '123abcdefg')
> 123a\n(b.c)\nddefg

Another possibility:

>>> print re.sub('abc', lambda m: r'a\nb\n.c\a', '123abcdefg')
123a\nb\n.c\adefg

Peter

| Next | Last
Pages: 1 2 3 4 5 6
Prev: Odd json encoding erro
Next: Subclassing RegexObject