From: Stephen.Wu on
tmp=file.read() (very huge file)
if targetStr in tmp:
print "find it"
else:
print "not find"
file.close()

I checked if file.read() is huge to some extend, it doesn't work, but
could any give me some certain information on this prolbem?

From: Chris Rebert on
On Mon, Feb 1, 2010 at 1:17 AM, Stephen.Wu <54wutong(a)gmail.com> wrote:
> tmp=file.read() (very huge file)
> if targetStr in tmp:
>    print "find it"
> else:
>    print "not find"
> file.close()
>
> I checked if file.read() is huge to some extend, it doesn't work, but
> could any give me some certain information on this prolbem?

If the file's contents is larger than available memory, you'll get a
MemoryError. To avoid this, you can read the file in by chunks (or if
applicable, by lines) and see if each chunk/line matches.

Cheers,
Chris
--
http://blog.rebertia.com
From: Gary Herron on
Stephen.Wu wrote:
> tmp=file.read() (very huge file)
> if targetStr in tmp:
> print "find it"
> else:
> print "not find"
> file.close()
>
> I checked if file.read() is huge to some extend, it doesn't work, but
> could any give me some certain information on this prolbem?
>
>

Python has no specific limit on string size other than memory size and
perhaps 32 bit address space and so on. However, if your file size is
even a fraction of that size, you should not attempt to read it all into
memory at once. Is there not a way to process your file in batches of a
reasonable size?

Gary Herron


From: Stephen.Wu on
On Feb 1, 5:26 pm, Chris Rebert <c...(a)rebertia.com> wrote:
> On Mon, Feb 1, 2010 at 1:17 AM, Stephen.Wu <54wut...(a)gmail.com> wrote:
> > tmp=file.read() (very huge file)
> > if targetStr in tmp:
> >    print "find it"
> > else:
> >    print "not find"
> > file.close()
>
> > I checked if file.read() is huge to some extend, it doesn't work, but
> > could any give me some certain information on this prolbem?
>
> If the file's contents is larger than available memory, you'll get a
> MemoryError. To avoid this, you can read the file in by chunks (or if
> applicable, by lines) and see if each chunk/line matches.
>
> Cheers,
> Chris
> --http://blog.rebertia.com

actually, I just use file.read(length) way, i just want to know what
exactly para of length I should set, I'm afraid length doesn't equal
to the amount of physical memory after trials...
From: Stefan Behnel on
Stephen.Wu, 01.02.2010 10:17:
> tmp=file.read() (very huge file)
> if targetStr in tmp:
> print "find it"
> else:
> print "not find"
> file.close()
>
> I checked if file.read() is huge to some extend, it doesn't work, but
> could any give me some certain information on this prolbem?

Others have already pointed out that reading the entire file into memory is
not a good idea. Try reading chunks repeatedly instead.

As it appears that you simply try to find out if a file contains a specific
byte sequence, you might find acora interesting:

http://pypi.python.org/pypi/acora

Also note that there are usually platform optimised tools available to
search content in files, e.g. grep. It's basically impossible to beat their
raw speed even with hand-tuned Python code, so running the right tool using
the subprocess module might be a solution.

Stefan