Heavy malloc()/free() load and threads [Unix Programming]

Prev: Athena widgets
Next: Yourp new proxy site it's free to open blocked sites

From: Ersek, Laszlo on 14 Mar 2010 18:45

In article <87d3z6r36g.fsf(a)fever.mssgmbh.com>, Rainer Weikusat <rweikusat(a)mssgmbh.com> writes:
> Eric Sosman <esosman(a)ieee-dot-org.invalid> writes:
>> On 3/14/2010 4:40 PM, Rainer Weikusat wrote:
>>>
>>> Realistically, memory leaks are especially common in code which lacks
>>> any kind of memory managment, IOW, just uses malloc/ free naively.

This is two statements:

(1) "memory leaks are especially common in code which lacks any kind of
memory managment",

(2) using malloc/free naively is equivalent to no memory management at
all ("lacks any kind of").

A third statement might come as a consequence,

(3) "memory leaks are especially common in code which [...] just uses
malloc/ free naively"

I tend to accept (1) -- not free()ing consistently can be considered a
lack of memory management. I strongly disagree with (2) -- see below. I
tend to accept (3), but not because it follows from (1)&&(2) (I think
(2) is false) -- (3) is probably true because naive malloc()/free() is
*hard*.

> [...] There are,
> however, three reasons why I consider mine to be 'sufficiently
> representative':
>
> - A memory leak occurs whenever space allocated in order to
> accomplish a particular task isn't 'freed' before the
> pointer pointing to it is gone. Usually, this is an
> oversight on part of the person who wrote the code. The
> chances that a human forgets something increases with the
> number of different somethings which must be
> remembered. Code which has been written with an 'just call
> malloc whenever you need a buffer'-attitude will contain
> lots of malloc calls and this means there are a lot of
> opportunities to forget to also call free.

An approach with ownership responsibilities laid out clearly and
strictly can still be called naive. That stuff can be drawn up in all
kinds of diagrams, commented on in the code etc. I interpret "naive" as
"not sophisticated performance-wise", ie. the style in which, whenever
the coder can prove he *must* free a block, he/she frees it. For less or
more naivety, s/must/can/, even. "Naive" carries no negative meaning in
this context (correctness-wise, which we are talking about, wrt. memory
leaks).

> - Usually, people who are having problems they cannot deal
> with anymore manually tend to rather write tools to help
> them with being able to continue to act in problem-creating
> ways than to consider changing their habits. Valgrind is
> such a tool and it has excellent support for 'straight'
> malloc/free and basically no support for anything else.

I explicitly said I am not endorsing programming *for* valgrind -- I
expected such a followup. I wanted to present a point of view, one which
I happen to find secondary at best, but the priorities may be different
for the OP. A QA lab may require valgrindability, for example. I do test
my code with valgrind occasionally too, as an afterthought, after
writing my code as if the universe knew of no valgrind or anything
similar.

> - Custom allocators (eg as in, Apache2, Postgres or Samba) are
> often specifically designed to enable easy 'freeing' of
> memory which is no longer in use.
>
> Two common solutions suggest that a problem actually exists.

The problem does exist, and since I accept (3), I don't debate such
custom allocators decrease the risk of memory leaks. However, in my
opinion, they do so becase they ease the hardship that goes with manual
malloc()/free(), not because they provide something that is unavailable
naively.

(On a side note, you chose to ignore my earlier question (or so it
seems) targeted at the portability of those 50-300 SLOC custom
allocators and whether they are based on malloc().)

If I created any straw man arguments, I apologize -- they're due to lack
of reasoning skills and/or attention, not malice.

lacos

From: William Ahern on 14 Mar 2010 22:24

Ersek, Laszlo <lacos(a)ludens.elte.hu> wrote:
> In article <87d3z6r36g.fsf(a)fever.mssgmbh.com>, Rainer Weikusat <rweikusat(a)mssgmbh.com> writes:
<snip>
> > - Usually, people who are having problems they cannot deal
> > with anymore manually tend to rather write tools to help
> > them with being able to continue to act in problem-creating
> > ways than to consider changing their habits. Valgrind is
> > such a tool and it has excellent support for 'straight'
> > malloc/free and basically no support for anything else.
>
> I explicitly said I am not endorsing programming *for* valgrind -- I
> expected such a followup. I wanted to present a point of view, one which
> I happen to find secondary at best, but the priorities may be different
> for the OP. A QA lab may require valgrindability, for example. I do test
> my code with valgrind occasionally too, as an afterthought, after
> writing my code as if the universe knew of no valgrind or anything
> similar.

Perhaps you're giving too much ground on the point. I don't think there's
anything wrong with adding code _for_ Valgrind, and I encourage it. All of
my specialized allocators have debug modes which allow Valgrind to catch
overflows (underflows are harder) and memory leaks.

I write code which explicitly frees all objects--at least in debug
mode--rather than depending on the allocator to do this, on the premise that
if my code is losing track of an object something is likely amiss. Often a
programmer makes the "premature optimization", especially in corner cases
such as unwinding intermediate commitments after an error, that it's okay to
lose track of an object because the allocator keeps a reference, neglecting
altogether the possible security ramifications or the side-effect of losing
possible signals on code correctness.

I also fix my code and submit patches to other projects when Valgrind
complains, _even_ _if_ the complaints don't actually signify a real error.
Valgrind is useful for many things, not just memory leaks, and reducing the
noise level is benefical. I don't do this as a matter of course, blindly.
I'm aware of the Debian OpenSSL accident; indeed I had earlier and
coincidentally made the opposite choice in my company's tree because I
carefully reviewed the code in question and judged (a) there was no cheap,
correct fix and (b) the cost of a correct fix was more than the cost of the
noise.

From: sfuerst on 15 Mar 2010 01:44

On Mar 11, 3:54 am, Urs Thuermann <u...(a)isnogud.escape.de> wrote:
> I have a typical producer/consumer problem (with varying-sized
> elements) which I currently have solved using a ring buffer in the
> classical way. Because of the varying size, no prior safe way to know
> the needed buffer size and to reduce copying I want to change that to
> linked list of items passed from producer to consumer.
>
> The producer would allocate memory for an item and append it to the
> list, the consumer would dequeue from the beginning of the list and
> free the memory for that item afterwards.
>
> The average rate will be roughly 40 to 50 allocations/deallocations in
> a strict LIFO order and there will be 30000 to 60000 items on the list.
>
> My questions are:
>
> 1. Will typical implementations of malloc()/free() in libc handle this
> load well? Or should I implement my own memory management? I
> currently use glibc on Debian but would also like to know about
> other libc implementations. BTW, the hardware is an Intel x86 CPU
> at about 2GHz.

The easiest response is to say "Benchmark it and see". However, I
think I can guess what you will find.

The glibc allocator is very fast. However, it has a problem with the
producer-consumer pattern. It uses "arenas" with separate locking.
When allocating, it tries to lock the last used arena, and if that
fails, uses a different one. When freeing, it needs to lock the arena
of that chunk of memory. The net result is that threads can end up
owning arenas with vastly different sizes, which can look like a
memory leak. This behaviour is described at
http://google-perftools.googlecode.com/svn/trunk/doc/tcmalloc.html

So google wrote their own allocator (tcmalloc) to fix this problem.
You may want to give it a try as it performs fairly well. The only
problem with it is that it doesn't return memory to the operating
system. This may pose a problem for long-running server tasks.

Another allocator you might want to try is Hoard. That one is fairly
old, and faster than tcmalloc for small allocation sizes. For large
sizes though, it is catastrophically slower, so you may want to
benchmark it to see if it helps you or not.

Then there is the allocator I wrote... which is faster than all three
of the above. It unfortunately, isn't open source. I have a nice
page showing the results of googles "t-test1" memory allocator
benchmark for various numbers of threads and average allocation sizes
for the above allocators. It should help you decide which one may be
the best for you: http://locklessinc.com/benchmarks.shtml

A final option is to write your own allocator. This is probably only
worth doing if the size-range of things on your queue is small. Then
you may use a simple linked list of free nodes, which should be fast
enough for you. If you have a wide size-range, then you should know
that writing a general purpose allocator isn't a 200 line job. To get
something better than the above allocators will probably require a few
thousand lines or more.

> 2. Are malloc() and free() thread-safe (according to POSIX and/or in
> typical libc implementations) or do I have to use a mutex?
>
> urs

They are in glibc. In older libcs, not necessarily. I'd also be
careful of some of the lesser-known allocators. Most of the ones I've
tested seemed to be fairly buggy if you have multiple threads
executing allocations and frees simultaneously. i.e. if something
crashes with the t-test1 benchmark, then it definitely isn't worth
using. This is the main reason that I don't have more allocators on
the benchmark page. Most simply didn't work properly when tested
hard.

The only other one of mention is jemalloc, which is the allocator that
firefox uses. That allocator is designed for minimal memory usage
rather than allocation speed. The result is that it is quite slow. I
thought that it would be unfair to compare it to the speed-demon
allocators which didn't care quite so much for memory usage.

Steven

From: Ersek, Laszlo on 15 Mar 2010 07:27

In article <nci077-sg2.ln1(a)wilbur.25thandClement.com>,
William Ahern <william(a)wilbur.25thandClement.com> writes:

> [...] Often a
> programmer makes the "premature optimization", especially in corner cases
> such as unwinding intermediate commitments after an error, that it's okay to
> lose track of an object because the allocator keeps a reference, neglecting
> altogether the possible security ramifications or the side-effect of losing
> possible signals on code correctness.

I couldn't tell how often this happens in "the real world", but it seems
very plausible. In the end, some of said allocators should probably be
written in the first place so that the tedium of such deep unwindings on
error paths can be avoided. Thanks for bringing this up.

lacos

From: Rainer Weikusat on 15 Mar 2010 10:36

sfuerst <svfuerst(a)gmail.com> writes:

[...]

> If you have a wide size-range, then you should know that writing a
> general purpose allocator isn't a 200 line job. To get something
> better than the above allocators will probably require a few
> thousand lines or more.

This should have been "implementing the malloc interface such that the
implementation performs reasonably well for the usual benchmark
cases isn't a 200 line job". Although it actually even might be: The
Linux-kernel uses a power-of-two-freelist allocator based on the
object caching allocator it also uses. And 'if it is good enough for
Linux, it will be good enough for $my_application' is usually a
sensible proposition. But it is really much better to not try to do
this to begin with. And for this case, a 'general purpose allocator'
which basically avoids external fragmentation and whose allocation and
deallocation operations are guaranteed to be fast and complete in
constant time can be implemented in less than 200 lines of code[*],
provided that scalability regarding concurrent allocations/
deallocations from multiple threads is not a priority concern.

But 'general purpose memory allocation' is actually a complicated
approach to problems which can often be solved in a much simpler
way[**]: Nowadays, computers don't have small amounts of RAM anymore
and reusing an area of already allocated space for different purposes
within a single program isn't really necessary. For instance, another
approach is to get a 'large' block of memory from the kernel and use
that to satisfy allocation requests for 'small structures' from many
subsystems which all cache their allocated structures internally and
reuse them before allocating new structures. Especially in a program
which isn't supposed to ever stop, every situation which did occur in
the past will happen again in future and this implies that letting go
of memory which has been allocated for a specific purpose just means
'running the risk that it won't be available next time'.

[*] Known to be good enough for the needs of a content-filtering
policy enforcing HTTP proxy serving up to 40 concurrent users while
running on a 200Mhz ARM-based appliance w/ 64M of RAM.

[**] Ever wondered why only software is always suposed to be 'general
purpose' while all other kinds of devices are usually intended to
be good at fulfilling a specific purpose? Assuming that 'general
purpose' is something to always strive for, why are there no general
purpose devices combining the functionality of a coffee maker, a
lawnmover, a washing machine, a mixer, a sports car and a truck in
one? Could it be that the idea is really just completely braindead?

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: Athena widgets
Next: Yourp new proxy site it's free to open blocked sites