Heavy malloc()/free() load and threads [Unix Programming]

Prev: Athena widgets
Next: Yourp new proxy site it's free to open blocked sites

From: Jonathan de Boyne Pollard on 15 Mar 2010 22:21

>
>
> Much to our surprise, we discovered that most embedded malloc()/free()
> operations are horrifically slow [and more buggy].
>
That doesn't surprise me. It's a special case of the more general
principle being reiterated by several people elsewhere in this thread:
Rolling one's own allocator is tricky to get right. The C/POSIX
libraries for embedded systems sometimes haven't had as much work done
on them as the C/POSIX libraries for "mainstream" (for want of a better
word) systems. They are, in effect, roll-your-own efforts done by a
library developer for the target platform.

This is not to say that the "mainstream" implementatations are
invariably better and bug-free. However they are, by their very
natures, more extensively tested as general-purpose allocators.
Although implementors do write unit tests, the applications softwares
using the library are in general still the most effective test corpus.
I've encountered bugs in specialist implementations that have existed
for years simply because none of the applications softwares ever
exercised the library in the particular way necessary to exhibit the bug.

From: Ersek, Laszlo on 15 Mar 2010 23:52

In article <P_WdnXGWZdnFRgPWnZ2dnUVZ_vWdnZ2d(a)posted.sasktel>,
Chris Friesen <cbf123(a)mail.usask.ca> writes:

> Would this work portably? It's more overhead so it would likely only
> makes sense if MAP_ANONYMOUS isn't supported.
>
> 1) open a scratch file
> 2) unlink it
> 3) ftruncate it to the desired size
> 4) mmap it
> 5) close it

I guess this could work, provided we generate system-wide non-clashing
filenames for scratch files, create the file with O_RDONLY | O_CREAT |
O_EXCL and access permission bits 0, block SIGINT and SIGTERM until
after unlinking the file, and mmap() the file with read-write protection
and MAP_PRIVATE (non)sharing.

Superficially reading up on thread cancellation, I think the
cancellability of the thread executing the steps listed above should be
set to PTHREAD_CANCEL_DISABLE, at least until after step 2. A
cancellation point may occur in unlink(). ... I think thread
cancellation could really open a can of worms, so let's ignore it.

Perhaps let's replace steps 1 and 2 with shm_open() and shm_unlink(), if
the Realtime XSI Option Group is supported as well (or at least the SHM
POSIX Option).

> [snip]

Thank you,
lacos

From: Rainer Weikusat on 16 Mar 2010 02:40

Jonathan de Boyne Pollard <J.deBoynePollard-newsgroups(a)NTLWorld.COM>
writes:

[...]

>> Much to our surprise, we discovered that most embedded
>> malloc()/free() operations are horrifically slow [and more buggy].
>>
> That doesn't surprise me. It's a special case of the more general
> principle being reiterated by several people elsewhere in this thread:
> Rolling one's own allocator is tricky to get right.

As soons as this gets more general than 'implementing malloc such that
it survives a lot of common benchmarks comfortably' it is wrong.

From: Rainer Weikusat on 16 Mar 2010 03:01

sfuerst <svfuerst(a)gmail.com> writes:
> On Mar 15, 7:36�am, Rainer Weikusat <rweiku...(a)mssgmbh.com> wrote:
>> sfuerst <svfue...(a)gmail.com> writes:
>>
>> [...]
>>
>> > If you have a wide size-range, then you should know that writing a
>> > general purpose allocator isn't a 200 line job. �To get something
>> > better than the above allocators will probably require a few
>> > thousand lines or more.
>>
>> This should have been "implementing the malloc interface such that the
>> implementation performs reasonably well for the usual benchmark
>> cases isn't a 200 line job".

[...]

>> But it is really much better to not try to do
>> this to begin with. And for this case, a 'general purpose allocator'
>> which basically avoids external fragmentation and whose allocation and
>> deallocation operations are guaranteed to be fast and complete in
>> constant time can be implemented in less than 200 lines of code[*],

[...]

> You might want to make your 200 line userspace allocator - there is
> nothing preventing you from doing it. A simple power-of-two allocator
> is a nice place to start.

There is an obvious mismatch between my statement, the core of which
is reproduced above, and this introduction (and your way of reusing my
statement about the 'kmalloc' Linux allocator).

From: Jonathan de Boyne Pollard on 20 Mar 2010 03:59

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
<blockquote cite="mid:87zl29wdxz.fsf(a)fever.mssgmbh.com" type="cite">
<blockquote type="cite">
However, I wouldn't want all the dozens of apps on my
desktop to all refuse to return memory back to the system just because
they might want it again at some point in the future (thus
unnecessarily forcing swapping to disk or a memory upgrade). Unless
there is a very specific reason to think that performance is critical,
my personal view is that it's polite for a userspace app to return
memory back to the underlying system whenever it is reasonably possible. 

</blockquote>
<code>malloc</code> may or may not return memory to the
system. Usually, it won't, except in fringe cases (eg 'large
allocations' done via <code>mmap</code>). Memory allocations which
originally happened by calling <code>brk</code>/<code>sbrk</code>
cannot easily be returned to the system, anyway, only if freeing them
happens to release a chunk of memory just below the current break. 

</blockquote>
On the contrary, usually it will.  I'm revising my estimate of the
quality of your "50 — 300 lines of code" implementation downwards
as a result of this statement, because you are erroneously conflating
allocating address space with committing pages.  Most implementations
that I am familiar with were written by people that didn't make this
mistake.  

My implementation (more correctly, one of my implementations (-:)
calls the OS/2 <code>DosSetMem()</code> function to commit partially
used pages and de-commit wholly unused pages within the heap arena as
necessary.  Several Win32 implementations that I'm aware of call <code>VirtualAlloc()</code>
to commit and de-commit pages within arenas.  (For a good explanation
of this process, see Matt Pietrek's dissection of the DOS-Windows 9x <code>HeapAlloc()</code>
function in his Windows 95 System Programming Secrets book.) 
The GNU C library version 2.11.1 calls <code>madvise()</code> with the
<code>MADV_DONTNEED</code> flag for wholly unused pages.  

The OS/2 and Win32 implementations are returning unused heap memory
to the operating system as a matter of course.  The GNU C library is
intending to do the same, and is doing the best that it can with the
more limited system API that it has to work with, and the operating
system bugs that it has to cope with. (See, for example, <a
href="http://bugzilla.kernel.org./show_bug.cgi?id=6282">Linux kernel
bug #6282</a>, reported by Samuel Thibault in 2007.) 

</body>
</html>

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: Athena widgets
Next: Yourp new proxy site it's free to open blocked sites