Heavy malloc()/free() load and threads [Unix Programming]

Prev: Athena widgets
Next: Yourp new proxy site it's free to open blocked sites

From: sfuerst on 15 Mar 2010 13:05

On Mar 15, 7:36 am, Rainer Weikusat <rweiku...(a)mssgmbh.com> wrote:
> sfuerst <svfue...(a)gmail.com> writes:
>
> [...]
>
> > If you have a wide size-range, then you should know that writing a
> > general purpose allocator isn't a 200 line job. To get something
> > better than the above allocators will probably require a few
> > thousand lines or more.
>
> This should have been "implementing the malloc interface such that the
> implementation performs reasonably well for the usual benchmark
> cases isn't a 200 line job". Although it actually even might be: The
> Linux-kernel uses a power-of-two-freelist allocator based on the
> object caching allocator it also uses. And 'if it is good enough for
> Linux, it will be good enough for $my_application' is usually a
> sensible proposition. But it is really much better to not try to do
> this to begin with. And for this case, a 'general purpose allocator'
> which basically avoids external fragmentation and whose allocation and
> deallocation operations are guaranteed to be fast and complete in
> constant time can be implemented in less than 200 lines of code[*],
> provided that scalability regarding concurrent allocations/
> deallocations from multiple threads is not a priority concern.

The kernel allocator needs to live in kernel-space, which places extra
constraints on it.

You might want to make your 200 line userspace allocator - there is
nothing preventing you from doing it. A simple power-of-two allocator
is a nice place to start. However, I think you'll find that the
result will be 50-100% slower than the standard glibc allocator that
the op was originally using. Why the slowdown?

The problem is that an allocation + free will take O( lg(n) ) memory
references. The better allocators take O(1) references, nearly all
the time. The difference is noticeable for something that executes in
as few instructions as a memory allocator. Thus in order to approach
the speed of the competition, you'll need to add something like fast
look-aside lists.

The next problem is that the op specifically mentioned a notoriously
difficult case for memory allocators: the producer consumer pattern.
To prevent the producer and consumer threads interfering with each
other isn't easy. Even a single extra cache-line bounce can kill
performance in this case. Then there is the problem of "memory blow-
up", which a few allocators unfortunately have in this situation.
Your simple power-of-two allocator isn't going to work nearly as well
as you might hope in this difficult case. Benchmark, and see. :-)

In short, these days the generic allocators are very very good. They
use all sorts of tricks that the average programmer hasn't thought of
in order to increase speed. The only way to beat them is in
specialized situations, and even then you need to benchmark a bit to
see if the extra code you added actually helped. The only common case
I found where a generic allocator could easily be beaten is where you
are allocating and freeing many many objects of the same size. In
that case, it pays to have a bounded-size per-thread freelist that you
check first for objects to use. (This is ignoring the situations
where you can use alloca() or the new C99 variable-sized arrays to
allocate things with per-function lifetimes, because there you don't
need a new allocator, just use an intrinsic part of the C language.)

Steven

From: Chris Friesen on 15 Mar 2010 13:04

On 03/15/2010 08:36 AM, Rainer Weikusat wrote:

> For instance, another
> approach is to get a 'large' block of memory from the kernel and use
> that to satisfy allocation requests for 'small structures' from many
> subsystems which all cache their allocated structures internally and
> reuse them before allocating new structures. Especially in a program
> which isn't supposed to ever stop, every situation which did occur in
> the past will happen again in future and this implies that letting go
> of memory which has been allocated for a specific purpose just means
> 'running the risk that it won't be available next time'.

In systems with disk swap, not letting go of memory is no guarantee that
it will be available next time--it could be swapped out to disk, the
swap space could be full, and there could be no available memory in the
system. Unless you've mlock()'d the memory, it's not guaranteed to be
available.

I've worked on projects where we preallocated (and mlock()'d) all memory
up-front for performance/reliability reasons. This can be useful for
dedicated systems.

However, I wouldn't want all the dozens of apps on my desktop to all
refuse to return memory back to the system just because they might want
it again at some point in the future (thus unnecessarily forcing
swapping to disk or a memory upgrade). Unless there is a very specific
reason to think that performance is critical, my personal view is that
it's polite for a userspace app to return memory back to the underlying
system whenever it is reasonably possible.

Chris

From: Rainer Weikusat on 15 Mar 2010 16:02

Chris Friesen <cbf123(a)mail.usask.ca> writes:
> On 03/15/2010 08:36 AM, Rainer Weikusat wrote:

[...]

>> Especially in a program which isn't supposed to ever stop, every
>> situation which did occur in the past will happen again in future
>> and this implies that letting go of memory which has been allocated
>> for a specific purpose just means 'running the risk that it won't
>> be available next time'.

[...]

> I've worked on projects where we preallocated (and mlock()'d) all memory
> up-front for performance/reliability reasons. This can be useful for
> dedicated systems.
>
> However, I wouldn't want all the dozens of apps on my desktop to all
> refuse to return memory back to the system just because they might want
> it again at some point in the future (thus unnecessarily forcing
> swapping to disk or a memory upgrade). Unless there is a very specific
> reason to think that performance is critical, my personal view is that
> it's polite for a userspace app to return memory back to the underlying
> system whenever it is reasonably possible.

'malloc' may or may not return memory to the system. Usually, it
won't, except in fringe cases (eg 'large allocations' done via
mmap). Memory allocations which originally happened by calling brk/
sbrk cannot easily be returned to the system, anyway, only if freeing
them happens to release a chunk of memory just below the current
break.

From: Rainer Weikusat on 15 Mar 2010 16:22

lacos(a)ludens.elte.hu (Ersek, Laszlo) writes:
> In article <87d3z6r36g.fsf(a)fever.mssgmbh.com>, Rainer Weikusat <rweikusat(a)mssgmbh.com> writes:
>> Eric Sosman <esosman(a)ieee-dot-org.invalid> writes:
>>> On 3/14/2010 4:40 PM, Rainer Weikusat wrote:
>>>>
>>>> Realistically, memory leaks are especially common in code which lacks
>>>> any kind of memory managment, IOW, just uses malloc/ free naively.
>
> This is two statements:
>
> (1) "memory leaks are especially common in code which lacks any kind of
> memory managment",
>
> (2) using malloc/free naively is equivalent to no memory management at
> all ("lacks any kind of").
>
> A third statement might come as a consequence,
>
> (3) "memory leaks are especially common in code which [...] just uses
> malloc/ free naively"
>
> I tend to accept (1) -- not free()ing consistently can be considered a
> lack of memory management. I strongly disagree with (2)

In plain English, this seems to mean "I (meaning you) strongly desire
to use the terms "you" (meaning me) were using to describe a specific
situation to describe a fundamentally different situation" ...

[...]

> An approach with ownership responsibilities laid out clearly and
> strictly can still be called naive. That stuff can be drawn up in all
> kinds of diagrams, commented on in the code etc.

.... in particular, your idea of 'code which implements memory
management' seems to be 'code which doesn't implement memory
management' but associated documentation contains something on this
topic in some form. That's not what I was writing about.

[...]

>> - Usually, people who are having problems they cannot deal
>> with anymore manually tend to rather write tools to help
>> them with being able to continue to act in problem-creating
>> ways than to consider changing their habits. Valgrind is
>> such a tool and it has excellent support for 'straight'
>> malloc/free and basically no support for anything else.
>
> I explicitly said I am not endorsing programming *for* valgrind -- I
> expected such a followup.

I have seriously no idea how you managed to understand my statement
above (automated tools for detecting memory leaks in code which 'just
uses' malloc/free exist => memory leaks are a severe problem in such
code) in this way.

[...]

> (On a side note, you chose to ignore my earlier question (or so it
> seems) targeted at the portability of those 50-300 SLOC custom
> allocators and whether they are based on malloc().)

malloc is a libray routine on UNIX(*) which uses some facility the
kernel provides for actually allocating address space (not memory,
strictly speaking). This will usually be brk/ sbrk or mmap. Because
library routines will likely call malloc behind ones back, using brk/
sbrk in a more sophisticated custom allocator would be asking for
trouble. Which leaves mmap. While brk is no longer a 'standardized'
UNIX(*) routine, it will be available on UNIX(*) because existing
allocators (eg, dlmalloc) use it. mmap is a standardized
UNIX(*)-routine.

From: Eric Sosman on 15 Mar 2010 16:37

On 3/15/2010 4:02 PM, Rainer Weikusat wrote:
>
> 'malloc' may or may not return memory to the system. Usually, it
> won't, except in fringe cases (eg 'large allocations' done via
> mmap). Memory allocations which originally happened by calling brk/
> sbrk cannot easily be returned to the system, anyway, only if freeing
> them happens to release a chunk of memory just below the current
> break.

The one time I actually tried returning memory from an
sbrk-funded allocator (I didn't want to do it, but management
had a bee in their bonnet), the actual amount of returnable
memory turned out to be pretty small. Long- and short-lived
objects were pretty well mixed, with the result that there
was usually a long-lived object not very far below the break,
"pinning" everything below it.

Switched to getting memory from mmap (or mmap-like things
on non-Unix platforms), and things got better -- but not by
an enormous amount. Again, since long- and short-lived objects
lived cheek by jowl, there was an excellent chance that any
given page contained a long-lived object somewhere, preventing
the whole page from being returned.

Eventually moved to a mixed strategy, where "small" requests
were satisfied by subdividing pages that never got returned,
while "large" objects resided on dedicated pages that could be
released when/if the object was freed. (I'm given to understand
that glibc uses a strategy something like this.) The trick was
to choose a small/large threshold that worked well: Set it too
high and you'll retain too much memory, too low and you'll waste
time thrashing pages back and forth to the kernel. The proper
threshold seemed to depend a lot on the way the product was used;
experiments didn't turn up One Best Value for all usage patterns.

--
Eric Sosman
esosman(a)ieee-dot-org.invalid

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: Athena widgets
Next: Yourp new proxy site it's free to open blocked sites