From: John Machin on 26 May 2010 03:04 Rob Williscroft <rtw <at> rtw.me.uk> writes: > > Barry wrote in news:83dc485a-5a20-403b-99ee-c8c627bdbab3 > @m21g2000vbr.googlegroups.com in gmane.comp.python.general: > > > UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: > > unexpected code byte > > It may not be you, en.wiktionary.org is sending gzip > encoded content back, It sure is; here's where the offending 0x8b comes from: """ID1 (IDentification 1) ID2 (IDentification 2) These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139 (0x8b, \213), to identify the file as being in gzip format.""" (from http://www.faqs.org/rfcs/rfc1952.html)
From: Kushal Kumaran on 26 May 2010 11:59 On Tue, 2010-05-25 at 20:12 +0000, Rob Williscroft wrote: > Barry wrote in news:83dc485a-5a20-403b-99ee-c8c627bdbab3 > @m21g2000vbr.googlegroups.com in gmane.comp.python.general: > > > Hi, > > > > The code below is giving me the error: > > > > Traceback (most recent call last): > > File "C:\Users\Administratör\Desktop\test.py", line 4, in <module> > > UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: > > unexpected code byte > > > > > > What am i doing wrong? > > It may not be you, en.wiktionary.org is sending gzip > encoded content back, it seems to do this even if you set > the Accept header as in: > > request.add_header( "Accept", "text/html" ) > > But maybe I'm not doing it correctly. > You need the Accept-Encoding: identity header. http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html <snip> -- regards, kushal
From: Rob Williscroft on 26 May 2010 14:10 Kushal Kumaran wrote in news:1274889564.2339.16.camel(a)nitrogen in gmane.comp.python.general: > On Tue, 2010-05-25 at 20:12 +0000, Rob Williscroft wrote: >> Barry wrote in news:83dc485a-5a20-403b-99ee-c8c627bdbab3 >> @m21g2000vbr.googlegroups.com in gmane.comp.python.general: >> >> > Hi, >> > >> > The code below is giving me the error: >> > >> > Traceback (most recent call last): >> > File "C:\Users\Administratör\Desktop\test.py", line 4, in >> > <module> >> > UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position >> > 1: unexpected code byte >> > >> > >> > What am i doing wrong? >> >> It may not be you, en.wiktionary.org is sending gzip >> encoded content back, it seems to do this even if you set >> the Accept header as in: >> >> request.add_header( "Accept", "text/html" ) >> >> But maybe I'm not doing it correctly. >> > You need the Accept-Encoding: identity header. > http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html Thanks, following this I did change the line to be: request.add_header( "Accept-Encoding", "identity" ) but it made no difference to en.wiktionary.org it just sent the back a gzip encoded response. Rob.
From: Kushal Kumaran on 27 May 2010 01:00
On Wed, May 26, 2010 at 11:40 PM, Rob Williscroft <rtw(a)rtw.me.uk> wrote: > Kushal Kumaran wrote in news:1274889564.2339.16.camel(a)nitrogen in > gmane.comp.python.general: > >> On Tue, 2010-05-25 at 20:12 +0000, Rob Williscroft wrote: >>> Barry wrote in news:83dc485a-5a20-403b-99ee-c8c627bdbab3 >>> @m21g2000vbr.googlegroups.com in gmane.comp.python.general: >>> >>> > Hi, >>> > >>> > The code below is giving me the error: >>> > >>> > Traceback (most recent call last): >>> >  File "C:\Users\Administratör\Desktop\test.py", line 4, in >>> >  <module> >>> > UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position >>> > 1: unexpected code byte >>> > >>> > >>> > What am i doing wrong? >>> >>> It may not be you, en.wiktionary.org is sending gzip >>> encoded content back, it seems to do this even if you set >>> the Accept header as in: >>> >>> request.add_header( "Accept", "text/html" ) >>> >>> But maybe I'm not doing it correctly. >>> >> You need the Accept-Encoding: identity header. >> http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html > > Thanks, following this I did change the line to be: > > request.add_header( "Accept-Encoding", "identity" ) > > but it made no difference to en.wiktionary.org it just sent the > back a gzip encoded response. > A known problem, I guess... https://bugzilla.wikimedia.org/show_bug.cgi?id=7098 You'll just have to handle the gzipped data. -- regards, kushal |