From: Steven D'Aprano on
On Fri, 06 Aug 2010 17:45:31 -0700, dmtr wrote:

> I'm running into some performance / memory bottlenecks on large lists.
> Is there any easy way to minimize/optimize memory usage?

Yes, lots of ways. For example, do you *need* large lists? Often a better
design is to use generators and iterators to lazily generate data when
you need it, rather than creating a large list all at once.

An optimization that sometimes may help is to intern strings, so that
there's only a single copy of common strings rather than multiple copies
of the same one.

Can you compress the data and use that? Without knowing what you are
trying to do, and why, it's really difficult to advise a better way to do
it (other than vague suggestions like "use generators instead of lists").

Very often, it is cheaper and faster to just put more memory in the
machine than to try optimizing memory use. Memory is cheap, your time and
effort is not.

[...]
> Well... 63 bytes per item for very short unicode strings... Is there
> any way to do better than that? Perhaps some compact unicode objects?

If you think that unicode objects are going to be *smaller* than byte
strings, I think you're badly informed about the nature of unicode.

Python is not a low-level language, and it trades off memory compactness
for ease of use. Python strings are high-level rich objects, not merely a
contiguous series of bytes. If all else fails, you might have to use
something like the array module, or even implement your own data type in
C.

But as a general rule, as I mentioned above, the best way to minimize the
memory used by a large list is to not use a large list. I can't emphasise
that enough -- look into generators and iterators, and lazily handle your
data whenever possible.


--
Steven