From: Norman Rieß on
Hello,

i am trying to read a large bz2 compressed textfile using the bz2 module.
The file is 1717362770 lines long and 8GB large.
Using this code

source_file = bz2.BZ2File(file, "r")
for line in source_file:
print line.strip()

print "Exiting"
print "I used file: " + file

the loop exits cleanly after 4311 lines in midline and the prints are
executed.
This happened on two different boxes runnig different brands of linux.
Is there something i miss or should be done differently?

Thank you.

Regards,
Norman

From: Norman Rieß on
Am 02/21/10 22:09, schrieb Dennis Lee Bieber:
> On Sat, 20 Feb 2010 23:12:50 +0100, Norman Rie�<norman(a)smash-net.org>
> declaimed the following in comp.lang.python:
>
>
>> Hello,
>>
>> i am trying to read a large bz2 compressed textfile using the bz2 module.
>> The file is 1717362770 lines long and 8GB large.
>> Using this code
>>
>> source_file = bz2.BZ2File(file, "r")
>> for line in source_file:
>> print line.strip()
>>
>> print "Exiting"
>> print "I used file: " + file
>>
>> the loop exits cleanly after 4311 lines in midline and the prints are
>> executed.
>> This happened on two different boxes runnig different brands of linux.
>> Is there something i miss or should be done differently?
>>
>>
> Please verify your indentation! What you posted above is invalid in
> many ways.
>
I am sorry, the indentation suffered from pasting.

This is the actual code:

source_file = bz2.BZ2File(file, "r")
for line in source_file:
print line.strip()

print "Exiting"
print "I used file: " + file



From: Steven D'Aprano on
On Mon, 22 Feb 2010 07:49:51 +0100, Norman Rieß wrote:

> This is the actual code:
>
> source_file = bz2.BZ2File(file, "r")
> for line in source_file:
> print line.strip()
>
> print "Exiting"
> print "I used file: " + file


Have you verified that the bz file is good by opening it in another
application?



--
Steven
From: Norman Rieß on
Am 02/22/10 09:02, schrieb Steven D'Aprano:
> On Mon, 22 Feb 2010 07:49:51 +0100, Norman Rieß wrote:
>
>
>> This is the actual code:
>>
>> source_file = bz2.BZ2File(file, "r")
>> for line in source_file:
>> print line.strip()
>>
>> print "Exiting"
>> print "I used file: " + file
>>
>
> Have you verified that the bz file is good by opening it in another
> application?
>
>
>
>

Yes, bzcat is running through the file fine. And piping bzcat output
into the python script reading stdin works fine, too.
From: Lie Ryan on
On 02/22/10 19:43, Norman Rieß wrote:
> Am 02/22/10 09:02, schrieb Steven D'Aprano:
>> On Mon, 22 Feb 2010 07:49:51 +0100, Norman Rieß wrote:
>>
>>
>>> This is the actual code:
>>>
>>> source_file = bz2.BZ2File(file, "r")
>>> for line in source_file:
>>> print line.strip()
>>>
>>> print "Exiting"
>>> print "I used file: " + file
>>>
>>
>> Have you verified that the bz file is good by opening it in another
>> application?
>>
>>
>>
>>
>
> Yes, bzcat is running through the file fine. And piping bzcat output
> into the python script reading stdin works fine, too.

test with using something other than bzcat; bzcat does certain things
differently because of the way it works (a cat for bzipped file). Try
using plain "bunzip2 filename.bz2"