Prev: Very odd output from subprocess
Next: Solutions for hand injury from computer use (was: I strongly dislike Python 3)
From: Steven D'Aprano on 1 Jul 2010 01:50 I have a byte-string which is an escape sequence, that is, it starts with a backslash, followed by either a single character, a hex or octal escape sequence. E.g. something like one of these in Python 2.5: '\\n' '\\xFF' '\\023' If s is such a string, what is the right way to un-escape them to single character byte strings? I could decode them to unicode first, then encode to ASCII: >>> s = '\\n' >>> assert len(s) == 2 >>> s.decode('unicode-escape').encode() '\n' but this fails for non-ASCII bytes: >>> '\\xFF'.decode('unicode-escape').encode() Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in position 0: ordinal not in range(128) -- Steven
From: Chris Rebert on 1 Jul 2010 02:11 On Wed, Jun 30, 2010 at 10:50 PM, Steven D'Aprano <steve-REMOVE-THIS(a)cybersource.com.au> wrote: > I have a byte-string which is an escape sequence, that is, it starts with > a backslash, followed by either a single character, a hex or octal escape > sequence. E.g. something like one of these in Python 2.5: > > '\\n' > '\\xFF' > '\\023' > > If s is such a string, what is the right way to un-escape them to single > character byte strings? > > I could decode them to unicode first, then encode to ASCII: > >>>> s = '\\n' >>>> assert len(s) == 2 >>>> s.decode('unicode-escape').encode() > '\n' > > but this fails for non-ASCII bytes: > >>>> '\\xFF'.decode('unicode-escape').encode() > Traceback (most recent call last): > Â File "<stdin>", line 1, in <module> > UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in > position 0: ordinal not in range(128) Python 2.6.5 (r265:79063, May 25 2010, 18:21:57) >>> '\\xFF'.decode('string_escape') '\xff' Cheers, Chris -- http://blog.rebertia.com
From: Steven D'Aprano on 1 Jul 2010 02:20 On Wed, 30 Jun 2010 23:11:59 -0700, Chris Rebert wrote: > Python 2.6.5 (r265:79063, May 25 2010, 18:21:57) >>>> '\\xFF'.decode('string_escape') > '\xff' I knew unicode-escape, obviously, and then I tried just 'escape', but never thought of 'string_escape'. Thanks for the quick answer. -- Steven
From: Mark Tolonen on 1 Jul 2010 02:20
"Steven D'Aprano" <steve-REMOVE-THIS(a)cybersource.com.au> wrote in message news:4c2c2cab$0$14136$c3e8da3(a)news.astraweb.com... >I have a byte-string which is an escape sequence, that is, it starts with > a backslash, followed by either a single character, a hex or octal escape > sequence. E.g. something like one of these in Python 2.5: > > '\\n' > '\\xFF' > '\\023' > > If s is such a string, what is the right way to un-escape them to single > character byte strings? > > I could decode them to unicode first, then encode to ASCII: > >>>> s = '\\n' >>>> assert len(s) == 2 >>>> s.decode('unicode-escape').encode() > '\n' > > but this fails for non-ASCII bytes: > >>>> '\\xFF'.decode('unicode-escape').encode() > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in > position 0: ordinal not in range(128) Use 'string-escape': >>> s=['\\n','\\xff','\\023'] >>> for n in s: n.decode('string-escape') .... '\n' '\xff' '\x13' -Mark |