fork + exec; what are the possible resource leaks? [Unix Programming]

Prev: Kill process tree, again
Next: ANN: Seed7 Release 2010-04-18

From: Rainer Weikusat on 18 Apr 2010 08:03

Joshua Maurice <joshuamaurice(a)gmail.com> writes:
> I'm somewhat new to POSIX. It seems that the only way to create a new
> process is fork.

No. The "but Windows does it differently!"-people have meanwhile
managed to reinvent CreateProcess (or whatever the function is
actually called) and are in the process of 'undefining' fork in order
to prevent its future use.

> However, fork inherits all file descriptors. exec
> closes only the file descriptors marked as "close on exec". I
> generally spawn a separate process because of the isolation this
> affords. If a process misbehaves, like if it has a resource leak, I
> know that when that process dies the resource leak will generally go
> away. However, if a process misbehaves, like not settings "close on
> exec" when opening the file descriptor (an option only available in
> recent Linux kernels), it's possible that I will leak a file
> descriptor to that child and all direct and indirect grandchildren.

Yes, and if you hit yourself on the head with a frying pan, it is very
probable that this will hurt badly.

> So, how does one generally deal with this?

By not doing it. That's generally a sensible course of action whenever
one senses a potential problem as result of a particular action.

From: Xavier Roche on 18 Apr 2010 08:42

Joshua Maurice a écrit :
> So, how does one generally deal with this? Close all file descriptors
> from 3 to the max possible file descriptor? "proc/self/fd" is a good
> alternative, but not portable in fact and not POSIX aka not portable
> in theory. What do other people do?

Playing with fork/exec, and closing all fd's before exec to ensure that
they are properly closed, starting from 3 to sysconf(_SC_OPEN_MAX).

I never found any cleaner way.

As you may have noticed, posix_spawn has design "choices" which prevent
from using it in a multithreaded environment if you want to get all fds
being closed on child.

See my previous "Handling the posix_spawn() file descriptor hell" rant:
<http://groups.google.com/group/comp.unix.programmer/browse_thread/thread/122a9b89a866c492/b18f45015951aaa9?pli=1>

To summarize the issues:
- third-party libraries may open files in the parent without FD_CLOEXEC,
causing leaks in childrens
- there is no way to set FD_CLOEXEC as default behaviour for fopen() as
far as I know, hence you are doomed anyway
- setting synchronously FD_CLOEXEC is impossible (at least for
non-recent kernels)
- posix_spawn is not solving our problem because it suffers from the
same race conditions

The fork/exec model is not the perfect solution (the lines of code
involved in a fork operation is really huge, and I occasionnaly ended up
in deadlocks with corrupted parent process because of post-fork
handlers -- the goal was to spawn an external debugger ; which is a very
specific case I must admit), but I never found any better one.

From: Rainer Weikusat on 18 Apr 2010 10:35

Joshua Maurice <joshuamaurice(a)gmail.com> writes:
> On Apr 16, 4:19�pm, sc...(a)slp53.sl.home (Scott Lurndal) wrote:
>> Joshua Maurice <joshuamaur...(a)gmail.com> writes:
>> > However, if a process misbehaves, like not settings "close on
>> >exec" when opening the file descriptor (an option only available in
>> >recent Linux kernels)
>>
>> � The "Close on Exec" option has been part of _every_ unix and linux kernel
>> since basically forever. � In Unix v7 it was an ioctl (FIOCLEX/FIONCLEX),
>> in System V it was made an fcntl(2) flag.
>
> Race condition. Up until a recent Linux kernel version, you could not
> set close on exec in open; you could only set it with fcntl. In a
> multithreaded program, there is a small window between open and fcntl
> in which fork could be called, resulting in that file descriptor being
> leaked.

And if you hit yourself onto the head with a frying pan, chances are
still that this will hurt badly. Coming to think of it, you could also
drop a hot pressing iron onto your feet and - again - you will manage
to hurt yourself by doing so. The solution to all problems mentioned
in this text so far is still: Don't do it.

From: Ersek, Laszlo on 18 Apr 2010 18:40

On Sun, 18 Apr 2010, Joshua Maurice wrote:

> On Apr 18, 7:35�am, Rainer Weikusat <rweiku...(a)mssgmbh.com> wrote:

>> [snip]
>
> How do you propose to write an application which

> 1- uses third party libraries which may not be correctly written, aka
> not use O_CLOEXEC?

> 2- is multi-threaded, and uses a non-recent kernel which lacks
> O_CLOEXEC, or uses a badly written library which does not create all
> file handles with O_CLOEXEC or equivalents (aka the new options, not
> fcntl)?

> 3- or any other combination where you want to program defensively, where
> you want to guarantee a degree of fault isolation between processes, aka
> one of the major point of processes, and have a stable system, aka one
> which does not leak resources?

Put stuff you don't trust in a separate process. This won't protect you
from malice, but it probably will from honest mistakes.

We did this with two closed source proprietary middleware client libs that
used to start threads on their own (one even forked in addition). We
wrapped them with separate daemon processes and accessed those over simple
RPC.

(This paid off immensely, because one of the client libs had a threading
bug (from our usage pattern and the external symptoms we concluded it
freed some resource and then accessed it some *indeterminate* time later),
and that bug reliably crashed the daemon until we developed a workaround.
The main program worked on many things simultaneously, and it was
important that such a crash didn't take down all those things, just one or
two of them, and even those in a way that could be handled gracefully in
the main program.)

I have the impression that the SUS carefully documents if an interface
might call fork() or interfere with the signal environment of the process
(system() and popen() come to mind). No scarcer documentation is
acceptable from a library you wish to link against.

Make the kernel your ally. Common address space is for friends you know
and trust. Look at qmail: it doesn't even trust itself.

Cheers,
lacos

From: Joshua Maurice on 18 Apr 2010 20:10

On Apr 18, 3:40 pm, "Ersek, Laszlo" <la...(a)caesar.elte.hu> wrote:
> On Sun, 18 Apr 2010, Joshua Maurice wrote:
> > On Apr 18, 7:35 am, Rainer Weikusat <rweiku...(a)mssgmbh.com> wrote:
> >> [snip]
>
> > How do you propose to write an application which
> > 1- uses third party libraries which may not be correctly written, aka
> > not use O_CLOEXEC?
> > 2- is multi-threaded, and uses a non-recent kernel which lacks
> > O_CLOEXEC, or uses a badly written library which does not create all
> > file handles with O_CLOEXEC or equivalents (aka the new options, not
> > fcntl)?
> > 3- or any other combination where you want to program defensively, where
> > you want to guarantee a degree of fault isolation between processes, aka
> > one of the major point of processes, and have a stable system, aka one
> > which does not leak resources?
>
> Put stuff you don't trust in a separate process. This won't protect you
> from malice, but it probably will from honest mistakes.
>
> We did this with two closed source proprietary middleware client libs that
> used to start threads on their own (one even forked in addition). We
> wrapped them with separate daemon processes and accessed those over simple
> RPC.
>
> (This paid off immensely, because one of the client libs had a threading
> bug (from our usage pattern and the external symptoms we concluded it
> freed some resource and then accessed it some *indeterminate* time later),
> and that bug reliably crashed the daemon until we developed a workaround.
> The main program worked on many things simultaneously, and it was
> important that such a crash didn't take down all those things, just one or
> two of them, and even those in a way that could be handled gracefully in
> the main program.)
>
> I have the impression that the SUS carefully documents if an interface
> might call fork() or interfere with the signal environment of the process
> (system() and popen() come to mind). No scarcer documentation is
> acceptable from a library you wish to link against.
>
> Make the kernel your ally. Common address space is for friends you know
> and trust. Look at qmail: it doesn't even trust itself.

Indeed and agreed. This is exactly what I expect from processes.
However, if resources can be easily leaked across process boundaries,
no if it's quite hard to not leak resources across process boundaries,
then we lose some degree of process isolation. I was specifically
asking how to make sure I don't get resources leaking across fork +
exec calls. You don't need to sell me on it. I've been vocal on this
point for the entire thread. I don't know how I can be more clear on
this.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: Kill process tree, again
Next: ANN: Seed7 Release 2010-04-18