getting MemoryError with dicts; suspect memory fragmentation [Python]

Prev: An empty object with dynamic attributes (expando)
Next: Plain simple unix timestamp with an HTTP GET

From: dmtr on 3 Jun 2010 19:41

On Jun 3, 3:43 pm, "Emin.shopper Martinian.shopper"
<emin.shop...(a)gmail.com> wrote:
> Dear Experts,
>
> I am getting a MemoryError when creating a dict in a long running
> process and suspect this is due to memory fragmentation. Any
> suggestions would be welcome. Full details of the problem are below.
>
> I have a long running processing which eventually dies to a
> MemoryError exception. When it dies, it is using roughly 900 MB on a 4
> GB Windows XP machine running Python 2.5.4. If I do "import pdb;

Are you sure you have enough memory available?
Dict memory usage can jump x2 during re-balancing.

-- Dmitry

P.S. Wish there was a google-sparsehash port for python....

From: dmtr on 3 Jun 2010 19:46

> I have a long running processing which eventually dies to a
> MemoryError exception. When it dies, it is using roughly 900 MB on a 4
> GB Windows XP machine running Python 2.5.4. If I do "import pdb;

BTW have you tried the same code with the Python 2.6.5?

-- Dmitry

From: Bryan on 4 Jun 2010 12:06

Emin.shopper wrote:
> dmtr wrote:
> > I'm still unconvinced that it is a memory fragmentation problem. It's
> > very rare.
>
> You could be right. I'm not an expert on python memory management. But
> if it isn't memory fragmentation, then why is it that I can create
> lists which use up 600 more MB but if I try to create a dict that uses
> a couple more MB it dies? My guess is that python dicts want a
> contiguous chunk of memory for their hash table. Is there a reason
> that you think memroy fragmentation isn't the problem?

Your logic makes some sense. You wrote that you can create a dict with
1300 items, but not 1400 items. If my reading of the Python source is
correct, the dict type decides it's overloaded when 2/3 full, and
enlarges by powers of two, so the 1366'th item will trigger allocation
of an array of 4096 PyDictEntry's.
http://svn.python.org/view/python/branches/release25-maint/Objects/dictnotes.txt?view=markup
http://svn.python.org/view/python/branches/release25-maint/Objects/dictobject.c?view=markup

On the other hand PyDictEntry is 12 bytes (on 32-bit Python), so the
memory chunk needed is just 48 K. It doesn't seem plausible that you
have have hundreds of megabytes available but can't allocate 48K in
one chunk.

Plus, unless I'm misreading the code, a Python list also uses one
contiguous chunk of memory to store all the item references. I'm
looking at PyList_New() and list_resize() in:
http://svn.python.org/view/python/branches/release25-maint/Objects/listobject.c?view=markup
and the memory allocators in:
http://svn.python.org/view/python/branches/release25-maint/Include/pymem.h?view=markup

> What else could it be?

Unfortunately several things, most of them hard to diagnose. I'd
suggest checking easy stuff first. Make sure 'dict' is still <type
'dict'>. If you can test again in the debugger in the error case, see
how large a set you can make, as the set implementation is similar to
dict except the hash table entries are one pointer shorter at 8 bytes.

--
--Bryan Olson

From: Philip Semanchuk on 4 Jun 2010 12:21

On Jun 4, 2010, at 12:06 PM, Bryan wrote:

> Emin.shopper wrote:
>> dmtr wrote:
>>> I'm still unconvinced that it is a memory fragmentation problem.
>>> It's
>>> very rare.
>>
>> You could be right. I'm not an expert on python memory management.
>> But
>> if it isn't memory fragmentation, then why is it that I can create
>> lists which use up 600 more MB but if I try to create a dict that
>> uses
>> a couple more MB it dies? My guess is that python dicts want a
>> contiguous chunk of memory for their hash table. Is there a reason
>> that you think memroy fragmentation isn't the problem?
>
> Your logic makes some sense. You wrote that you can create a dict with
> 1300 items, but not 1400 items. If my reading of the Python source is
> correct, the dict type decides it's overloaded when 2/3 full, and
> enlarges by powers of two, so the 1366'th item will trigger allocation
> of an array of 4096 PyDictEntry's.

At PyCon 2010, Brandon Craig Rhodes presented about how dictionaries
work under the hood:
http://python.mirocommunity.org/video/1591/pycon-2010-the-mighty-dictiona

I found that very informative.

There's also some slides if you don't like the video; I haven't looked
at 'em myself.
http://us.pycon.org/2010/conference/schedule/event/12/

Cheers
Philip

From: Bryan on 4 Jun 2010 16:40

Philip Semanchuk wrote:
> At PyCon 2010, Brandon Craig Rhodes presented about how dictionaries
> work under the hood:http://python.mirocommunity.org/video/1591/pycon-2010-the-mighty-dict...
>
> I found that very informative.

That's a fine presentation of hash tables in general and Python's
choices in particular. Also highly informative, while easily readable,
is the Objects/dictnotes.txt file in the Python source.

Fine as those resources may be, the issue here stands. Most of my own
Python issues turn out to be stupid mistakes, and the problem here
might be on that level, but Emin seems to have worked his problem and
gotten a bunch of stuff right. There is no good reason why
constructing a 50 kilobyte dict should fail with a MemoryError while
constructing 50 megabyte lists succeeds.

--
--Bryan Olson

| Next | Last
Pages: 1 2
Prev: An empty object with dynamic attributes (expando)
Next: Plain simple unix timestamp with an HTTP GET