From: Steven D'Aprano on
I have a byte-string which is an escape sequence, that is, it starts with
a backslash, followed by either a single character, a hex or octal escape
sequence. E.g. something like one of these in Python 2.5:

'\\n'
'\\xFF'
'\\023'

If s is such a string, what is the right way to un-escape them to single
character byte strings?

I could decode them to unicode first, then encode to ASCII:

>>> s = '\\n'
>>> assert len(s) == 2
>>> s.decode('unicode-escape').encode()
'\n'

but this fails for non-ASCII bytes:

>>> '\\xFF'.decode('unicode-escape').encode()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
position 0: ordinal not in range(128)





--
Steven

From: Chris Rebert on
On Wed, Jun 30, 2010 at 10:50 PM, Steven D'Aprano
<steve-REMOVE-THIS(a)cybersource.com.au> wrote:
> I have a byte-string which is an escape sequence, that is, it starts with
> a backslash, followed by either a single character, a hex or octal escape
> sequence. E.g. something like one of these in Python 2.5:
>
> '\\n'
> '\\xFF'
> '\\023'
>
> If s is such a string, what is the right way to un-escape them to single
> character byte strings?
>
> I could decode them to unicode first, then encode to ASCII:
>
>>>> s = '\\n'
>>>> assert len(s) == 2
>>>> s.decode('unicode-escape').encode()
> '\n'
>
> but this fails for non-ASCII bytes:
>
>>>> '\\xFF'.decode('unicode-escape').encode()
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
> position 0: ordinal not in range(128)

Python 2.6.5 (r265:79063, May 25 2010, 18:21:57)
>>> '\\xFF'.decode('string_escape')
'\xff'

Cheers,
Chris
--
http://blog.rebertia.com
From: Steven D'Aprano on
On Wed, 30 Jun 2010 23:11:59 -0700, Chris Rebert wrote:

> Python 2.6.5 (r265:79063, May 25 2010, 18:21:57)
>>>> '\\xFF'.decode('string_escape')
> '\xff'

I knew unicode-escape, obviously, and then I tried just 'escape', but
never thought of 'string_escape'.

Thanks for the quick answer.


--
Steven
From: Mark Tolonen on

"Steven D'Aprano" <steve-REMOVE-THIS(a)cybersource.com.au> wrote in message
news:4c2c2cab$0$14136$c3e8da3(a)news.astraweb.com...
>I have a byte-string which is an escape sequence, that is, it starts with
> a backslash, followed by either a single character, a hex or octal escape
> sequence. E.g. something like one of these in Python 2.5:
>
> '\\n'
> '\\xFF'
> '\\023'
>
> If s is such a string, what is the right way to un-escape them to single
> character byte strings?
>
> I could decode them to unicode first, then encode to ASCII:
>
>>>> s = '\\n'
>>>> assert len(s) == 2
>>>> s.decode('unicode-escape').encode()
> '\n'
>
> but this fails for non-ASCII bytes:
>
>>>> '\\xFF'.decode('unicode-escape').encode()
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
> position 0: ordinal not in range(128)

Use 'string-escape':

>>> s=['\\n','\\xff','\\023']
>>> for n in s: n.decode('string-escape')
....
'\n'
'\xff'
'\x13'

-Mark