fork + exec; what are the possible resource leaks? [Unix Programming]

Prev: Kill process tree, again
Next: ANN: Seed7 Release 2010-04-18

From: David Given on 19 Apr 2010 17:26

On 19/04/10 20:39, Rainer Weikusat wrote:
[...]
> Yes. The actual solution is still (and will remain forever): DO NOT DO
> THIS. This cannot be that complicated, can it?

So, basically, you're saying, don't use fopen()? In any multithreaded
apps? And in any *library* which might be used by a multithreaded app?
Which these days, given how popular multithreaded environments like
GNOME, web servers, and in fact any non-trivial application are, is
pretty much all code?

Well, it ain't going to happen. fopen() is part of the flipping
*language* spec.

So, while not using fopen() is a perfectly valid solution to the
problem, it's not actually a useful one. It's like telling someone to
avoid drowning at the bottom of the sea by not inhaling water --- you
can hold your breath for as long as you like, but you know that
*eventually* you're going to have to take a lungful.

We all know that the default for close-on-exec is wrong; are there any
*useful* strategies for dealing with it in a real-world environment?

--
┌─── ｄｇ＠ｃｏｗｌａｒｋ．ｃｏｍ ───── http://www.cowlark.com ─────
│
│ "In the beginning was the word.
│ And the word was: Content-type: text/plain" --- Unknown sage

From: Chris Friesen on 19 Apr 2010 17:26

On 04/19/2010 02:58 PM, Joshua Maurice wrote:

> 1- Disallow calling fork + exec by our code, not use any libraries
> which call fork + exec (good luck finding documentation on that), and
> document that user extensions are not allowed to call fork + exec?
> This is entirely impractical.

> 2- Not have a multi-threaded engine? Again entirely impractical.

Arguably a multi-process engine is safer.

In any case, there are complexities when calling fork() from a threaded
process. Given that libraries cannot in general know the state of the
app, the app should also set up pthread_atfork() handlers to cover
everything that needs to be cleaned up.

Arguably it's a bad idea for libraries to call fork/exec or to create
new threads.

> 3- Do the best I can for my own fork + exec calls and close all
> unknown resources between fork and exec. For that, I need a way to
> enumerate over all open resources. For the other fork + exec calls
> beyond my direct control, see 4.

Certainly this is something that you can do, and probably should.

> 4- Ignore potential file descriptor leaks and other resource leaks
> across fork + exec as irrelevant. In practice, ignoring security
> concerns (which would require more than what we've been discussing),
> this might be practical. Without unbounded forking and with sufficient
> system resources, leaking the occasional file descriptor or whatever
> may not be a problem. This still seems like a horrible standing
> operating procedure.

It's not a new problem. The app is apparently working now, so anything
you do will help.

> I'm not trolling. I'm looking for an honest answer. All you've done
> thus far is say "Don't do it" and "It's been discussed before" without
> actually talking about any actual applicable facts, nor pointing me
> towards these previous discussions, nor giving me a useful summary of
> said discussions. You are the troll when you answer the question with
> "You're doing it wrong."

The simple fact is that you're stuck with a poorly-designed system and
you're trying to improve it.

Realistically, libraries don't have enough knowledge of the process
architecture to be able to fork/exec/pthread_create safely. Because of
that, it's almost never a good design to have libraries doing that sort
of thing.

If you need to have multiple different third-party products interwork,
it would probably be safer to run as multiple processes rather than
multiple threads. It's a bit more coding work, but on any recent unix
you can share memory between the processes, pass around file descriptors
via unix sockets, use process-shared mutexes/semaphores, etc. There is
a minimal overhead from the fact that you're not sharing memory maps
between threads and thus you need to flush the TLB on a context switch.
On the flip side you have much more isolation and you can choose
whether or not each process should be threaded. Using multiple
processes also tends to lead to better-designed interfaces since it
generally gets planned more carefully rather than just using data from
other threads in an ad-hoc manner. Lastly, depending on the
inter-process communication mechanisms that you use, it may be possible
to eavesdrop on the communication--this can be extremely useful in
debugging the system.

Chris

From: David Given on 19 Apr 2010 17:50

On 19/04/10 22:26, Chris Friesen wrote:
[...]
> Realistically, libraries don't have enough knowledge of the process
> architecture to be able to fork/exec/pthread_create safely. Because of
> that, it's almost never a good design to have libraries doing that sort
> of thing.

It's not necessarily the library that's doing it. Consider a
hypothetical terminal emulator running under a multithreaded UI library
such as GTK.

When the UI library starts up, it's going to create background threads
to do work --- it says clearly in the docs that it's going to do this,
so this isn't a problem. But once the UI library has started, the main
program now cannot safely call forkpty() and exec() to start the child
process, because one of those background threads might open a file
descriptor at the wrong time and get it propagated to the child.

The least bad way I know of dealing with this is to use one of the
aforesaid foul hacks to close unwanted file descriptors in the child
after it's forked, before exec() is called. But these are non-portable
and not necessarily very reliable, as we've already seen...

Is posix_spawn() the current favoured solution? Is it gaining much
traction? (I note that my Ubuntu Koala system doesn't have a man page
for it, for example.)

(I've actually run into this problem with LBW: I accidentally left file
descriptor 4 open when spawning the Linux process. Most programs don't
care, because they don't mind what number gets assigned to what file
descriptor... until I tried running dpkg, which passes data to its
children using streams on specifically numbered file descriptors,
causing horrible fail *really* weird ways.)

--
┌─── ｄｇ＠ｃｏｗｌａｒｋ．ｃｏｍ ───── http://www.cowlark.com ─────
│
│ "In the beginning was the word.
│ And the word was: Content-type: text/plain" --- Unknown sage

From: Joshua Maurice on 19 Apr 2010 17:55

On Apr 19, 2:50 pm, David Given <d...(a)cowlark.com> wrote:
> Is posix_spawn() the current favoured solution? Is it gaining much
> traction? (I note that my Ubuntu Koala system doesn't have a man page
> for it, for example.)

As far as I can tell, the semantics of posix_spawn are defined to be
equivalent to that of a user-written fork + exec, aka it carries the
same semantics and baggage, so it solves nothing for this problem. It
was intended to be a portable process creation on hardware without a
MMU. It was not intended to solve this resource leaks over fork + exec
problem.

http://www.opengroup.org/onlinepubs/009695399/functions/posix_spawn.html

From: David Given on 19 Apr 2010 18:31

On 19/04/10 22:55, Joshua Maurice wrote:
[...]
> As far as I can tell, the semantics of posix_spawn are defined to be
> equivalent to that of a user-written fork + exec, aka it carries the
> same semantics and baggage, so it solves nothing for this problem.

Ah --- I'd assumed that specifying a non-NULL fileactions pointer
started with a blank slate, not with the existing set of file
descriptors. Fair enough.

--
┌─── ｄｇ＠ｃｏｗｌａｒｋ．ｃｏｍ ───── http://www.cowlark.com ─────
│
│ "In the beginning was the word.
│ And the word was: Content-type: text/plain" --- Unknown sage

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: Kill process tree, again
Next: ANN: Seed7 Release 2010-04-18